The company says SubQ attacks one of the biggest hidden problems in modern AI, the cost of making models handle very long prompts. Independent benchmark results give the claim more weight, but the model is still not widely available. So, for now, the story is part breakthrough, part waiting game.
The bottleneck behind long AI prompts
Most large language models work by breaking text into small pieces called tokens. A token can be a word, part of a word, or even punctuation, depending on the model.
The key technology behind many of today’s systems is the transformer, introduced in the2017 paper “Attention Is All You Need” by researchers linked to Google. It helped models compare different parts of a sentence, paragraph, or document so they could understand context better.
The trouble starts when the text gets very long. Standard attention compares many word pieces with many other word pieces, and the amount of work grows fast. Picture every student in a packed gym having to compare notes with every other student before answering one question.
What SubQ says it changed
Subquadratic says SubQ uses a method called sparse attention. In simple terms, the model does not compare every token with every other token. It tries to focus only on the relationships that matter most.
That sounds obvious, right? But it has been hard to do without making the model worse. Alex Whedon, co-founder and chief technology officer of the startup, has argued that language is too complex for fixed shortcuts, so SubQ chooses important relationships dynamically for each input.
In practical terms, that could matter for jobs that involve huge files. A lawyer might want to search many contracts at once. A developer might want an AI tool to inspect an entire codebase instead of small slices of it.
Benchmarks raised the stakes
The outside evaluation reported strong results on long-context retrieval, which means finding a specific fact buried inside a large amount of text. The evaluator said SubQ 1.1 Small Preview returned exact answers every time at one million and two million tokens, and reached 98 percent exact-match accuracy at six million and 12 million tokens.
On LiveCodeBench, a coding benchmark that collects new programming problems over time to reduce the risk that models have already seen the answers, SubQ reached 89.7 percent pass at four attempts across more than 1,000 problems. That means the model got credit when at least one of four tries solved the task.
Subquadratic also says SubQ 1.1 Small uses 64.5 times less compute than dense attention at one million tokens and runs 56 times faster than FlashAttention-2 on a single attention layer. Those are large claims, and they are exactly why the AI community is watching closely.

Why this is not settled yet
Benchmarks are useful, but they are not the same as everyday use. The developers of RULER, a long-context testing suite from Nvidia, note that their benchmark is not comprehensive and should not replace realistic tasks.
There is another catch. SubQ is still in limited access, so most outside developers cannot test it on messy real-world work. That means questions remain about reliability, edge cases, and how it behaves when documents are incomplete, duplicated, or full of contradictions.
Subquadratic also says it began with an existing open-weight frontier model, replaced dense attention with its own system, and then continued training on long books, documents, and repository-scale code. That does not erase the achievement, but it makes the exact source of the gains harder to judge.
What it could mean for AI costs
If the results hold up in wider testing, the biggest impact may be cost. Long prompts are expensive because they demand more computing power, more memory, and more time. At the end of the day, that can show up in cloud bills and, to some extent, in the electric bill behind data centers.
Current AI tools often work around the problem by chopping documents into pieces, searching for likely matches, and feeding only those pieces into a model. That can be useful, but it can also miss relationships that sit far apart in a file.
A model that can hold much more context at a lower cost could change how people use AI for research, coding, finance, and legal review. Not overnight. But it could move the field away from clever workarounds and closer to models that read the whole file.
Skepticism is still healthy
Some skepticism is justified because the launch claim was unusually bold. Dan McAteer, an AI engineer, summed up the reaction online by saying SubQ was either the biggest advance since the transformer or “the Theranos of AI.”
That line spread because it captured the mood. AI has seen real breakthroughs, but it has also seen plenty of overpromising. When numbers sound this strong, independent access matters.
Jeanine Sinanan-Singh, Appen’s director of generative AI research, said the results were exciting because they appeared to validate the architecture. Still, the real test will come when many more users can try SubQ on their own workloads.

What happens next
Subquadratic says SubQ is designed for coding and for searching across very large datasets. The company also says broader model releases are planned later in 2026, after work with select design partners.
Can it still perform when the files are ugly, the code is old, and the question is vague? That is where the story gets interesting, because real work is rarely as clean as a benchmark.
For now, the safest reading is this. SubQ has produced evidence that deserves attention, but not enough public testing to close the case.
The main independent benchmark brief has been published by Appen.











