SubQ: Why a 12 Million Token Context Window Could Change AI Agents

Subquadratic has emerged from stealth with SubQ, a model the company describes as the first fully subquadratic LLM for long-context work. The headline is easy to understand: up to a 12 million-token context window. If the claim holds in practice, AI agents could read entire codebases, long project histories, contract archives, and months of working memory without first breaking everything into small chunks.

This is not just another model with a bigger number in the spec sheet. SubQ targets one of the most expensive bottlenecks in today’s AI systems: long context often requires RAG, chunking, prompt chains, agent orchestration, and hand-built retrieval logic before the model can even start reasoning.

Source: Subquadratic – Introducing SubQ

What is SubQ?

SubQ is a long-context LLM from Subquadratic. The company says the model is built on Subquadratic Sparse Attention, or SSA. In simple terms, SSA tries to avoid making the model compare every token with every other token when the input becomes enormous.

In classic transformer models, attention work grows roughly quadratically. If you double the amount of context, attention cost can grow by roughly four times. That is one reason very long prompts become expensive, slow, and sometimes less reliable.

Subquadratic’s claim is that SubQ can scale closer to linearly: the model focuses on relationships that matter and skips large amounts of unnecessary comparison.

Source: Subquadratic – How SSA Makes Long Context Practical

What does SubQ do?

SubQ is designed for tasks where the answer is not inside one short document, but spread across a large body of material.

Examples where 12 million tokens could matter:

Entire codebases: read source code, tests, history, and dependencies in one context.
Long project memory: help an agent understand prior decisions, edits, bugs, and customer requirements.
Contracts and policy: reason over definitions, exceptions, appendices, and cross-references.
Research and analysis: compare many reports without losing structure and source context.
Customer service and operations: combine knowledge bases, case history, and internal procedures.

Subquadratic is positioning two core products: SubQ API, an OpenAI-compatible API for long-context calls, and SubQ Code, a layer for coding agents designed to plug into tools such as Claude Code, Codex, and Cursor.

Source: Subquadratic – Efficiency is Intelligence

Why 12 million tokens is more than a big number

A large context window does not automatically mean the model works well. The important distinction is between a nominal context window and a functional context window.

Nominal context window: how much text the model can technically accept.
Functional context window: how much text the model can reliably use when finding, weighing, and reasoning over information.

That is why the SubQ announcement is interesting. The company is not only saying “we can fit more text.” It is saying the architecture makes long context cheaper, faster, and more usable for real workflows.

Subquadratic claims include:

12M-token reasoning in research results.
150 tokens per second on the company’s product page.
About one-fifth the cost of other leading LLMs, according to the company’s own claims.
Nearly 1,000x lower attention compute at 12M tokens compared with traditional frontier models.

Those numbers should be treated as vendor claims until more independent testing is available. But if even part of them holds, it could affect how companies build AI systems.

Source: Subquadratic – Efficiency is Intelligence

Why this matters for companies, schools, and small teams

For small businesses, schools, and non-technical teams, RAG and agent orchestration can be more complicated than the AI idea itself. Most teams do not want to start with vector databases, chunking strategies, re-ranking, evals, and error handling. They want AI to understand their material and help.

If long-context models become cheaper and more reliable, more AI projects can start simply:

Insert the full policy document, not just five chunks.
Let an agent read the whole curriculum, not only one search result.
Give a support agent case history and internal procedures at the same time.
Let a coding agent see relationships across the whole repository before changing anything.

This does not mean RAG disappears. Large organizations still need permissions, version control, traceability, and data governance. But the balance may change: less hand-built infrastructure, more direct reasoning over context.

For Hammer Automation’s audience, the practical question is: which recurring workflows would become simpler if the AI could read the full context from the start? That is where the value sits, not in the token record itself.

Benchmarks: promising, but not settled

Subquadratic reports results on RULER 128K, MRCR v2, and SWE-Bench Verified. The company’s product page says SubQ 1M-Preview reaches 81.8 percent on SWE-Bench Verified, 95.0 percent on RULER 128K, and 65.9 percent on MRCR v2 at 1M tokens.

Those are strong signals, especially because long-context quality often degrades when a model has to connect information spread across a large input. Still, there are important caveats:

The full technical report has not yet been published.
Access is private beta, not broad production proof.
Some secondary analyses stress that independent reproduction is still missing.
Benchmarks do not always show how a model behaves in messy real-world processes.

The most serious way to read the news is therefore: a potentially large architectural shift, but not yet proven as the new default.

Source: Subquadratic – Efficiency is Intelligence

Source: Fello AI – SubQ Review

Funding and market signal

SiliconANGLE reports that Subquadratic is launching with $29 million in seed funding. That matters because long-context models are not only a research question; they require expensive infrastructure, benchmarking, API product work, developer relations, and enterprise sales.

The market signal is clear: investors believe the next step in AI is not only smarter short answers, but models that can work across whole workspaces.

Source: SiliconANGLE – Subquadratic launches with $29M

What should you watch next?

For organizations that want to use AI practically, it is too early to build everything around SubQ. But it is the right time to follow the development.

Watch especially for:

Technical report: how SSA actually works and what tradeoffs it makes.
Independent benchmarks: especially on real codebases, document archives, and multi-hop questions.
Pricing: whether the cost advantage holds in commercial API usage.
Data protection and access control: long context makes wrong permissions more dangerous.
Agent integrations: whether SubQ Code improves everyday coding-agent workflows.

Bottom line: less search, more context

SubQ is interesting because it points toward a different AI design philosophy. Instead of building everything around search, chunking, and compression, the model tries to carry more of the context itself.

If Subquadratic’s claims hold, long-context LLMs could make AI agents more practical: fewer brittle middle layers, less manual prompt logic, and better understanding of full workflows.

For small businesses and schools, the lesson is simple: start mapping the processes where the information already exists but is spread out. When models can read more at once, those processes are likely to be automated better first.

AI briefing April 29: the agent work layer takes shape

News

29 April 2026

AI briefing April 29: the agent work layer takes shape

Today’s AI productivity briefing: workspace agents, agent platforms, coding agents, and why every agent needs a clear work contract.

AI Brief: agents move from experiments to operations

News

30 April 2026

AI Brief: agents move from experiments to operations

Today’s AI productivity brief on Claude Code, Codex, MCP, and why agent work now needs to be managed like operations.

When AI stops being a chatbot and becomes infrastructure

Agentic AINews

1 May 2026

When AI stops being a chatbot and becomes infrastructure

A short summary of a NotebookLM episode about how AI is moving beyond chatbots and into operational infrastructure.

What is SubQ?

What does SubQ do?

Why 12 million tokens is more than a big number

Why this matters for companies, schools, and small teams

Benchmarks: promising, but not settled

Funding and market signal

What should you watch next?

Bottom line: less search, more context

Related

AI briefing April 29: the agent work layer takes shape

AI Brief: agents move from experiments to operations

When AI stops being a chatbot and becomes infrastructure