Why Your Favorite LLMs Can’t Count to 1 Million

And Why You Shouldn't Ask Them To

You’ve probably seen the videos. Someone asks ChatGPT to count to one million. It fails. The memes are endless, but the failure reveals something fundamental about how AI actually works. To them, counting isn’t math. It’s storytelling.

You might think: “It’s a computer. Surely it knows math.”

Nope. LLMs or Large Language Models (like ChatGPT, Claude or Gemini) don’t calculate; they predict. To an LLM, “999,999 + 1” isn’t an equation to be solved. It’s a sentence to be finished. A statistical guess based on patterns it saw during training.

If you ask a model what comes after “2, 4, 6, 8,” it says “10”, not because it did the math, but because “10” usually follows that sequence in text. But asking it to maintain perfect precision while counting to 1 million? That shifts the task from pattern recognition to algorithmic execution.

And that’s not what these tools were designed to do.

The 3 Architectural Walls

Counting to 1 million requires 1,000,000 sequential steps. Each step must be flawless. Here’s why that’s architecturally impossible for a standard LLM.

Wall #1: The “Attention” Trap (Quadratic Scaling)

Every LLM has a “context window”, a limit on how much information it can hold at once. While models claim windows of 1 million tokens, their ability to focus degrades long before that. Think of it like a conversation at a crowded party. You might hear everyone, but as the room gets louder (more data), your ability to focus on specific details drops. To count to a million, the model needs to maintain perfect focus across a massive sequence. As the sequence grows, the computational cost explodes, and the signal degrades into noise.

Wall #2: The Distraction Factor (State-Update Interference)

Researchers call this “State-Update Interference,” but you can think of it as getting distracted by your own history. As the model counts, it looks back at everything it’s already generated. The longer the list, the more “distractors” exist to confuse the next prediction. And there’s a hard ceiling: most models cap output at ~256k characters (64k tokens) per response… Far short of a million numbers!

Wall #3: The compounding error

This is simple math. Even if an LLM is 99.9% accurate:

After 100 steps: ~90% chance of success.
After 1,000 steps: ~36% chance of success.
After 1,000,000 steps: 0% chance of success.

In a million-step task, you cannot survive without an external error-correction mechanism (a calculator).

Why This Matters for Finance and Operations

For writing emails, “99% accurate” is a miracle. But finance doesn’t grade on a curve. For Finance and Ops, “99% accurate” is a failure.

Imagine an LLM analyzing a P&L statement:

You: “Calculate the budget: Revenue $47,382, Costs $31,547, Marketing $8,200. What’s the profit?”

LLM: “Your profit is approximately $7,600.”

Reality: The answer is $7,635.

The LLM is close. It “feels” right. But in auditing, “close” is meaningless. You need exact figures, full audit trails, and proof-to-source traceability.

The Solution: Don’t Ask a Poet to do Accounting

This is exactly why we built Maxa the way we did.

We don’t use the LLM to perform the calculation. We use the LLM to understand your question.

Because Maxa sits on top of your ERP data, we unify and harmonize the actual numbers first. When you ask a question, Maxa runs a deterministic calculation against the source data. We get the exact, audit-proof answer. Then we use the LLM to explain the result to you.

The videos of LLMs failing to count aren’t a bug. They’re a feature of a system built for creativity, not calculus.

Use the LLM for the words. Use Maxa for the numbers.

Keep up with us