SubQ: Why a 12 Million Token Context Window Could Change AI Agents

Subquadratic has emerged from stealth with SubQ, a model the company describes as the first fully subquadratic LLM for long-context work. The headline is easy to understand: up to a 12 million-token context window. If the claim holds in practice, AI agents could read entire codebases, long project histories, contract archives, and months of working memory without first breaking everything into small chunks.
This is not just another model with a bigger number in the spec sheet. SubQ targets one of the most expensive bottlenecks in today’s AI systems: long context often requires RAG, chunking, prompt chains, agent orchestration, and hand-built retrieval logic before the model can even start reasoning.
Source: Subquadratic – Introducing SubQ
What is SubQ?
SubQ is a long-context LLM from Subquadratic. The company says the model is built on Subquadratic Sparse Attention, or SSA. In simple terms, SSA tries to avoid making the model compare every token with every other token when the input becomes enormous.
In classic transformer models, attention work grows roughly quadratically. If you double the amount of context, attention cost can grow by roughly four times. That is one reason very long prompts become expensive, slow, and sometimes less reliable.
Subquadratic’s claim is that SubQ can scale closer to linearly: the model focuses on relationships that matter and skips large amounts of unnecessary comparison.
Source: Subquadratic – How SSA Makes Long Context Practical
What does SubQ do?
SubQ is designed for tasks where the answer is not inside one short document, but spread across a large body of material.
Examples where 12 million tokens could matter:
- Entire codebases: read source code, tests, history, and dependencies in one context.
- Long project memory: help an agent understand prior decisions, edits, bugs, and customer requirements.
- Contracts and policy: reason over definitions, exceptions, appendices, and cross-references.
- Research and analysis: compare many reports without losing structure and source context.
- Customer service and operations: combine knowledge bases, case history, and internal procedures.
Subquadratic is positioning two core products: SubQ API, an OpenAI-compatible API for long-context calls, and SubQ Code, a layer for coding agents designed to plug into tools such as Claude Code, Codex, and Cursor.
Source: Subquadratic – Efficiency is Intelligence
Why 12 million tokens is more than a big number
A large context window does not automatically mean the model works well. The important distinction is between a nominal context window and a functional context window.
- Nominal context window: how much text the model can technically accept.
- Functional context window: how much text the model can reliably use when finding, weighing, and reasoning over information.
That is why the SubQ announcement is interesting. The company is not only saying “we can fit more text.” It is saying the architecture makes long context cheaper, faster, and more usable for real workflows.
Subquadratic claims include:
- 12M-token reasoning in research results.
- 150 tokens per second on the company’s product page.
- About one-fifth the cost of other leading LLMs, according to the company’s own claims.
- Nearly 1,000x lower attention compute at 12M tokens compared with traditional frontier models.
Those numbers should be treated as vendor claims until more independent testing is available. But if even part of them holds, it could affect how companies build AI systems.
Source: Subquadratic – Efficiency is Intelligence
Why this matters for companies, schools, and small teams
For small businesses, schools, and non-technical teams, RAG and agent orchestration can be more complicated than the AI idea itself. Most teams do not want to start with vector databases, chunking strategies, re-ranking, evals, and error handling. They want AI to understand their material and help.
If long-context models become cheaper and more reliable, more AI projects can start simply:
- Insert the full policy document, not just five chunks.
- Let an agent read the whole curriculum, not only one search result.
- Give a support agent case history and internal procedures at the same time.
- Let a coding agent see relationships across the whole repository before changing anything.
This does not mean RAG disappears. Large organizations still need permissions, version control, traceability, and data governance. But the balance may change: less hand-built infrastructure, more direct reasoning over context.
For Hammer Automation’s audience, the practical question is: which recurring workflows would become simpler if the AI could read the full context from the start? That is where the value sits, not in the token record itself.
Benchmarks: promising, but not settled
Subquadratic reports results on RULER 128K, MRCR v2, and SWE-Bench Verified. The company’s product page says SubQ 1M-Preview reaches 81.8 percent on SWE-Bench Verified, 95.0 percent on RULER 128K, and 65.9 percent on MRCR v2 at 1M tokens.
Those are strong signals, especially because long-context quality often degrades when a model has to connect information spread across a large input. Still, there are important caveats:
- The full technical report has not yet been published.
- Access is private beta, not broad production proof.
- Some secondary analyses stress that independent reproduction is still missing.
- Benchmarks do not always show how a model behaves in messy real-world processes.
The most serious way to read the news is therefore: a potentially large architectural shift, but not yet proven as the new default.
Source: Subquadratic – Efficiency is Intelligence
Source: Fello AI – SubQ Review
Funding and market signal
SiliconANGLE reports that Subquadratic is launching with $29 million in seed funding. That matters because long-context models are not only a research question; they require expensive infrastructure, benchmarking, API product work, developer relations, and enterprise sales.
The market signal is clear: investors believe the next step in AI is not only smarter short answers, but models that can work across whole workspaces.
Source: SiliconANGLE – Subquadratic launches with $29M
What should you watch next?
For organizations that want to use AI practically, it is too early to build everything around SubQ. But it is the right time to follow the development.
Watch especially for:
- Technical report: how SSA actually works and what tradeoffs it makes.
- Independent benchmarks: especially on real codebases, document archives, and multi-hop questions.
- Pricing: whether the cost advantage holds in commercial API usage.
- Data protection and access control: long context makes wrong permissions more dangerous.
- Agent integrations: whether SubQ Code improves everyday coding-agent workflows.
Bottom line: less search, more context
SubQ is interesting because it points toward a different AI design philosophy. Instead of building everything around search, chunking, and compression, the model tries to carry more of the context itself.
If Subquadratic’s claims hold, long-context LLMs could make AI agents more practical: fewer brittle middle layers, less manual prompt logic, and better understanding of full workflows.
For small businesses and schools, the lesson is simple: start mapping the processes where the information already exists but is spread out. When models can read more at once, those processes are likely to be automated better first.


