OpenAI breaks its own record, GPT-5.4 Pro soars to IQ 150 in Norwegian MESNA test

make priority

OpenAI’s latest GPT-5.4 Pro model now achieves an IQ score higher than 99.96% of all humans, giving the market another signal that the increasing power of AI is beginning to outweigh the noise of normal product cycles.

OpenAI’s GPT-5.4 Pro hits 150 on public IQ benchmark as market enters another macroeconomic week

TrackingAI’s public leaderboard gives OpenAI GPT-5.4 Pro an IQ score of 150, a significant increase from the 136 score recorded by OpenAI’s o3 in last year’s Mensa Norway test.

The rally comes at a moment when markets are focused on Iran, energy, labor weakness, and the next trend in inflation. This raises another question for the week ahead. In other words, how fast is machine intelligence growing and when will its acceleration start to overlap with economic status?

Why this matters: Going from 136 to 150 on a widely known benchmark compresses complex functional changes into simple signals. For businesses, that signal feeds directly into decisions about automation, software budgets, and workforce planning. For the market, it adds another variable alongside interest rates, inflation, and growth expectations.

OpenAI introduced GPT-5.4 as the most capable and efficient frontier model for professional work, with more powerful coding, tool usage, computer usage, and a context window of up to 1 million tokens. In the same release, OpenAI said GPT-5.4 achieved a new state-of-the-art in GDPval and exceeded human performance in OSWorld-Verified.

Although these benchmarks are separate from public IQ tests, they are aligned in the same direction. Capabilities are increasing across individual measurement systems, and the improvements are rapid enough to impact budgeting, hiring planning, workflow design, and software spending.

A score of 150 on a public IQ-style benchmark compresses a broader range of features into a single portable signal. This number is easy to understand even before methodology is discussed.

Previous o3 Mensa results established the benchmark and its limits. GPT-4.1’s 1 million token context window showed how OpenAI is extending the model’s utility across long-term code and documentation tasks, while OpenAI’s Expanding Capital Loop linked model analysis saw progress into hardware expansion, funding loops, and infrastructure demands.

Taken together, these developments place modern IQ scores within a broader commercial and economic context. The rise from 136 to 150 in the public benchmark is remarkable in itself. The move from 136 to 150 has broader implications as OpenAI moves deeper into tool usage, computing usage, enterprise productivity, and capital-intensive infrastructure.

Public IQ benchmarks are limited, but the ability curve is still trending upwards

Public IQ-style tests remain imperfect instruments for measuring frontier models. TrackingAI runs Mensa-style public benchmarks and also maintains more difficult private offline tests.

IQ-style tests compress narrow ranges of cognitive performance into a single number, obscuring variation in reasoning types, context processing, creativity, and real-world problem solving.

For both AI and humans, scores are a noisy proxy for general ability because they are sensitive to test design, training exposure, and pattern habituation.

An IQ of 150 is at the top of the distribution and is often associated with figures such as Albert Einstein and Richard Feynman. In practical terms, this means very fast abstraction, powerful pattern recognition, and the ability to navigate complex multi-step problems with limited guidance.

The platform reports scores as a moving average across recent completions. This methodology raises common questions regarding prompt structure, reproducibility, training set contamination, and format familiarity. These concerns were already noticeable when o3 reached 136 and are still relevant now that GPT-5.4 Pro has reached 150.

OpenAI's o3 scored 136 points on Mensa Norway's test, beating 98% of humanity

Related books

OpenAI’s o3 scored 136 points on Mensa Norway’s test, beating 98% of humanity

OpenAI’s o3 model reached Mensa-level IQ in independent testing.

April 17, 2025 · Liam Akiva Wright

Even with these limitations, broader patterns are becoming increasingly difficult to ignore. One isolated benchmark result can be explained as a quirk. A set of benefits across public IQ-style testing, coding, browser use, desktop navigation, and performance on knowledge tasks has more analytical significance.

TrackingAI’s latest leaderboard places the GPT-5.4 Pro at the top of the public IQ board ahead of all Cluade, Gemini, Qwen, and Grok models, providing an easy-to-read external public benchmark that’s ready for broader feature discussion.

Few people need to understand the benchmark design in detail to understand that 150 is in the rare range. Investors don’t have to accept all the assumptions behind IQ-style tests to realize that this increase in size suggests acceleration rather than drift.

The graph titled “AI IQ Test Results” shows the average Mensa Norway IQ scores of the leading AI models as a bell curve, with OpenAI’s GPT-5.4 variant plotted near the upper end of the range.

And business buyers need not believe that IQ equals general intelligence to understand that systems with stronger pattern recognition, more powerful tool usage, and stronger long-term task processing move far beyond puzzle solving and toward the realm of economic usefulness.

This refers to systems that can search, plan, examine, navigate, and generate real-world work across augmented contexts. In that setting, IQ scores serve more as a signal of the density of frontier inferences than as a count of novelty.

The leaderboard itself is also competitive. A leadership position in public benchmarks strengthens OpenAI’s position in the race for visible feature leadership, especially at a time when model differentiation is difficult to discern from architecture notes alone.

Benchmark leadership compresses complexity into simple hierarchies. It gives developers a signal, enterprise buyers a narrative handle, and investors another indicator of where the functional frontier currently stands.

OpenAI Benchmark Rise Starts to Coincide with Upcoming Economic Weeks

I will still be running macros for the next week. The following important announcements are clearly marked on the Bureau of Labor Statistics’ calendar: The consumer price index for March will be announced on April 10th. The March Producer Price Index will be released on April 14th.

This timeline keeps interest rates, inflation, and growth concerns on the surface, but beneath the surface a second economic trajectory is taking shape, and OpenAI sits near the center of it.

Frontiers Improving AI capabilities is increasingly tied to capital allocation. A model that improves performance on public reasoning tests while also improving coding, search, and computer usage will change the way companies think about redesigning their workflows. This changes what software buyers expect from co-pilots and agents. This changes the speed at which companies move from experimentation to deployment.

Jack Dorsey recently posted that Block is moving “from hierarchy to intelligence,” using AI to take over coordination tasks once reserved for management as the company reorganizes around individual contributors, direct responsibilities, and player-coaches.

Increased capabilities also change which tasks can be taken out of the labor cost structure and reassigned to software. These impacts first pass through narrower channels such as document workflows, spreadsheet workflows, customer support, research tasks, browser automation, internal operations, code generation, and validation loops.

OpenAI’s commercial direction strengthens that interpretation. In its GPT-5.4 presentation, the company described improved performance in specialized jobs, enhanced tool search, use of native computers, and benchmarked knowledge work improvements across occupations that directly correspond to the U.S. economy.

The growth of AI capabilities therefore sits within the common market question of where the spending will go next if these systems continue to improve at this rate.

The answer goes beyond the subscription revenue model to assumptions about cloud demand, chips, data centers, networking, power, software licenses, and labor productivity. OpenAI’s expanding capital loop already reflects some of that structure, and benchmark profits add a simpler public signal to that.

This overlap gives broader relevance to the most recent results in macro-heavy weeks. The market already knows the CPI setting. Markets already know that oil prices can influence inflation expectations. The market already knows that the Fed minutes will be analyzed for policy tone.

But is intelligence growth itself starting to behave like a macro variable? Faster capacity growth could shift corporate spending plans, strengthen competitive pressures across white-collar sectors, support increased infrastructure spending, and strengthen the case for AI-related capital spending even in an environment of slower nominal growth.

When TrackingAI puts GPT-5.4 Pro at 150, this number falls within the range of a market that already views OpenAI as more than a laboratory. The company is a platform company, a deployment company, an infrastructure customer and a signal generator in adjacent sectors.

The next test will be in two locations at the same time. One is methodological. Published IQ style benchmarks will and should continue to be scrutinized. The other is economic. The market will gradually decide whether capacity increases of this magnitude are worth being priced in parallel with labor data, interest rate expectations, and capital spending trends.

OpenAI’s latest benchmark climb brings that decision even closer. Sheet music is compact, easy to read, and easy to distribute. Its deep relevance comes from the same place as the company’s broad product push. The frontier is still rising, and the economic impact of that rise is becoming increasingly difficult to fit into separate categories.

mentioned in this article

As AI reshapes mining, the real prize for Bitcoin miners is power

Did AI agents put the entire $148 billion DeFi sector at risk?

Hut 8 AI Landlord Data Center Strategy Turns Bitcoin Collateral into Bridging Funds

Y Combinator launches funding initiative aimed at on-chain startups with base partnerships

HTX Hotlist Weekly Summary (July 15th)

New York risk or reward?

Eyenovia announces $50 Million Hype Ministry of Finance and Rebranding Plan

Don't Miss

The Hidden Gas Crisis That Could End the AI Boom

As AI reshapes mining, the real prize for Bitcoin miners is power

Bitcoin Is Near A Break Point Nobody Sees Coming

Top Posts