The AI week that just shook everything: 7 news items you need to know
Summary: Kimmy K 2.6 beats proprietary models at 5% of the cost, GPT 5.5 ("Spud") is expected within days, Codex turns into a "super-app" with computer vision on Mac, and Grock 4.5 is rumored to reach 1.5 trillion parameters by month's end. Meanwhile, Google is preparing at least three announcements for Google I/O — and a robot recently ran a marathon faster than humans.
0:04 – "One of the most intense stretches in AI history"
We've just come off one of the most intense stretches in AI news, and the next week looks even crazier. Today alone brought massive new developments: Moonshot AI dropped Kimmy K 2.6, an advanced open-source coding model already being compared to Opus 4.5 and 4.6. At the same time, GPT 5.5 — codename "Spud" — is reportedly right around the corner, possibly dropping today or Thursday. On Google's side, new Gemini checkpoints are surfacing and Qwen 3.6 Max has finally been released. Codex is transforming into something close to a super-app.
0:34 – "Kimmy K 2.6 — open source beats proprietary"
Kimmy K 2.6 is a new open-source coding model from Moonshot AI delivering state-of-the-art results across benchmarks like Swaybench, Browser Comp, and advanced math and vision tasks. In some cases it's being compared directly to Opus 4.6 — which is completely insane for an open-source model.
The big upgrades:
- 12-hour+ coding sessions with 4,000+ tool calls
- 300 parallel agents working together
- Multilingual, multi-file development from a single prompt
- 94% cheaper input and 95% cheaper output compared to Opus 4.6 — while still outperforming it on Swaybench Pro
"The fact that this model is basically on par with or just a little behind these proprietary models is insane."
Real-world example: Quantitative strategies across hundreds of assets
Kimmy K 2.6 can design and execute complex multi-step workflows end to end — for example, building full quantitative trading strategies across hundreds of assets. When it comes to front-end, the model is exceptional: it can generate beautiful landing pages with dynamic movements, varied typography, and interactive elements — something not possible with proprietary models.
It can run locally on dual M3 Ultra with MLX at a full one trillion parameter VLM.
2:38 – "GPT 5.5 Spud — the halfway point to GPT 6"
GPT 5.5, codename "Spud," is currently being A/B tested inside ChatGPT. Early demos show incredible speed, token efficiency, and reasoning with faster outputs and stronger performance on complex tasks.
Stands out particularly in:
- Coding
- SVG generation
- Game creation
- 3D workflows using tools like GS
The model goes beyond prompts to add structure, detail, and better design direction on its own. It feels like a halfway point to GPT 6 — better reasoning, faster performance, and lower cost in one model.
"The best way to think about it is that it's a halfway point to GPT 6, combining better reasoning, faster performance, and lower cost into one model."
According to Poly Market tipsters, the release is expected today or Thursday at the latest — the two days OpenAI typically delivers models.
Real-world example: Excel clone in minutes
With GPT 5.5, a complete Excel clone was created that doesn't just look like Excel — it feels like Excel. Full grid behavior, formatting interactions, cell selection. Scarily close to the real thing.
What makes this practically interesting: the model is token-efficient and readily accessible compared to Opus 4.7. For coding tasks, this could become the natural choice.
4:12 – "DeepSeek v4 — 1.6 trillion parameters incoming"
According to Zank — Princeton PhD researcher and AI lab fellow — DeepSeek version 4 may drop as early as this week. The rumored specs are massive:
Reported spec:
- 1.6 trillion parameters
- Sparse MQA, fused kernels, and hyperconnections
- MMLU around 99.4% — just 4 percentage points from maximum
- Swaybench: 83.7%
"Early leaks suggest extreme performance levels, but these numbers are still unverified."
The model is said to compete directly with Opus 4.7 and GPT 5.5. Due to the scale, only heavily quantized versions would realistically run locally — potentially requiring a 512 GB-class machine.
6:04 – "Codex becomes a super-app — sees, clicks and thinks"
OpenAI has transformed Codex into something far beyond a coding tool. Codex can now:
- Interact with apps on your Mac — see, click and type using its own cursor
- Run in the background without taking over your system
- Handle front-end iteration, app testing, and workflows without APIs
- Schedule work, pause and resume with full context in the same thread
- Suggest image generations with GPT Image 1.5 — GPT Image 2 may be coming directly inside Codex
"It is basically turning into a full super app for development and automation."
Real-world example: Automation over apps without APIs
Previously, automating apps without APIs required manual scripting or third-party tools. Now Codex can see the screen, understand what's happening, and interact with elements — like a human user, but without coffee breaks.
New: Chronicle — Codex builds memory from your work
Chronicle is a new research preview inside Codex that lets the model build memories from your day-to-day work on your computer, then uses those memories to become significantly more helpful and context-aware over time.
"People at OpenAI are already saying that it has noticeably changed how they use Codex in daily workflows."
8:38 – "Grock 4.3 — underrated but extremely capable"
Grock 4.3 beta is XAI's latest test model and it is truly underrated. Approximately 0.5 trillion parameters with an improved architecture, trained up to December 2025.
Key upgrades:
- Native multimodal with better visual understanding
- Agentic tool use and coding
- Generates documents, slides, PDFs, spreadsheets
- Improved reasoning with fewer hallucinations
Real-world example: CSGO clone with bazooka
With a single request, Grock 4.3 beta created a complete CSGO clone — including a functional bazooka. Fully generated code, no template.
Elon Musk's roadmap revealed
According to Musk's own comments:
- Grock 4.4 — 1 trillion parameters, early May
- Grock 4.5 — 1.5 trillion parameters, late May
- Grock 5 — positioned as AGI
"If even partially accurate, that would mean we're looking at two major model releases from what he's calling AGI."
(Note: we don't know Musk's exact definition of AGI.)
10:11 – "Qwen 3.6 Max — Alibaba's new flagship"
Alibaba has quietly released a preview of Qwen 3.6 Max — the next generation of their flagship model. Focus areas:
- Stronger agentic coding capabilities compared to Qwen 3.6 Plus
- Better instruction following and improved real-world reasoning
- Higher knowledge reliability
The model is designed to be smarter, more consistent in long-horizon tasks, and more capable as an autonomous agent in practical workflows.
"In simple terms: designed to be smarter, more consistent in long-horizon tasks and more capable as an autonomous agent."
11:33 – "Google I/O in 28 days — three things incoming"
With Google I/O roughly 28 days away, rumors are building. Three things stand out:
1. New Gemini checkpoints in AI Studio
Google is testing newer, significantly upgraded models internally. This could be Gemini 3.2 Pro or even Gemini 3.5 Pro — or possibly a lighter flash variant of Gemini 3.1.
2. Co-Work competitor inside Gemini
Google is developing a feature that functions exactly like Co-Work — an agentic automation for delegating goals, connecting applications, and automating workflows. What makes it especially powerful: deep integration with Google Workspace — Gmail, Sheets, Drive, all in one place.
3. Expanded AI Studio access for AI subscribers
Google has now expanded access so AI subscribers can use enhanced coding limits and direct access to Pro models without linking an API.
13:37 – "Robot marathon — the F1 of AI"
To close: robotics has now reached a point that feels straight out of a sci-fi simulation. A full-fledged robot is now competing in a marathon — actually outperforming humans in certain segments.
What makes it even more surreal is how the system is engineered. The entire movement system functions like a full F1 pit stop — humans step in quickly to service the robot, cool it down between runs, and in some cases dry ice is used for cooling.
"This is basically turning it into the F1 of robots."
Thoughts on how this affects the future
What is striking about this week is the pace. Open-source models like Kimmy K 2.6 now perform on par with the best proprietary alternatives at a fraction of the cost. This doesn't just democratize powerful AI — it pushes the entire industry forward.
At the same time, we are seeing a clear convergence: tools like Codex and Grock 4.3 are no longer pure code generation — they are becoming permanent work companions that remember, plan, and execute over time. The line between tools and colleagues is blurring.
For those building with AI daily, this means the choice is no longer "which model" — but "which agent architecture." The question is no longer theoretical.


