AI agent workbench: plugins, browser access and safe rollout

The AI agent is becoming a workbench: plugins, browsers and cloud workflows need new rules

It is easy to treat this week’s AI news as another pile of product launches. I think that misses the useful part. The AI agent is turning into a workbench. It gets a browser, plugins, sandboxes, files, research tools and cloud environments where work can continue after you close the laptop.

For Hammer customers, the question is not which agent won the week. The question is more uncomfortable: what may the agent touch, who can inspect what it did, and how do we stop the workflow when it goes wrong?

From chat window to workbench

An AI agent workbench is an environment where the model does more than answer in text. It can use tools: browsers, code execution, plugins, connectors, files, internal systems and sometimes a customer-controlled cloud workspace. That makes the agent more useful. It also makes it feel less like a search box and more like a new colleague with keys.

There is a big gap between "write a proposal" and "open the browser, debug the page, read customer data, create a report and share a link". The second version needs rules before it needs more prompts.

Four signals point in the same direction

OpenAI said on June 11 that it plans to acquire Ona, subject to customary closing conditions and regulatory approval. The point is not that Ona is already inside Codex. It is not. The point is that OpenAI wants Codex to run in secure, persistent, customer-controlled cloud environments for longer agent tasks. The same day, Codex app 26.609 added Developer mode for Browser use with controlled Chrome DevTools Protocol access, faster browser use and more Computer Use controls.

Source: OpenAI on Ona and OpenAI Codex changelog 2026-06-11

xAI launched the Grok Build Plugin Marketplace. A plugin can bundle skills, slash commands, agents, hooks, MCP servers and LSPs into one installable package. That is convenient, but the important detail is operational: xAI says every remote plugin in the catalog is pinned to a specific commit SHA and verified at install time. That is the kind of control you need when agent tools become interchangeable packages.

Source: xAI Grok Build Plugin Marketplace

Perplexity moved Deep Research into Computer. Research, citations, internal files, connectors and final deliverables such as reports, PDFs, dashboards or decks can now sit in one workflow. Perplexity also describes Search as Code, where the system writes and runs search programs in a sandbox to run many retrieval steps in parallel. Impressive, yes, but the value depends on traceability: which sources were used, what became an assumption, and what was approved before anything moved on?

Source: Perplexity: Deep Research, now in Computer

Anthropic and DXC show the same move from another angle. DXC will integrate Claude into mission-critical enterprise systems and train tens of thousands of Claude-certified forward-deployed engineers. DXC also says Claude is the default model inside OASIS, its platform for agentic operations, and that OASIS already serves more than 50 customers. This is less about a new button and more about distribution: AI gets embedded into operational systems through delivery partners, not only through standalone apps.

Source: Anthropic and the DXC alliance

What this means for everyday workflows

If you run training operations, a service office, a consultancy, or an internal support function, this is not far away. Not because you need every tool this week. Because the vendors are building toward the same model: the agent should take a task, gather information, use the right systems, create something and hand it to a human for review.

So "which AI should we buy?" is the wrong first question. Better ones are:

Which workflow is narrow enough for the first test?
Which systems may the agent read, and where must it never write?
Which plugins are approved, by whom and at which version?
Which events must be logged so we can understand the result later?
Who can pause the workflow when a vendor has an incident, changes pricing or ships a new model?

Dry? Good. The first agent workflows should be a little boring. A recurring research brief. A proposal appendix. An internal knowledge base. A controlled browser-debugging task. A support draft that a human reviews before sending.

The checklist before the first plugin is installed

Start with a workflow where the benefit is clear and the damage is limited.

Purpose: Write one sentence describing the job the agent should do. If the sentence gets long, the workflow is too large.
Access: Separate read access, write access and external sharing. They should not arrive as one bundle.
Environment: Use a test account, sandbox or separate workspace before the agent touches production.
Plugin list: Approve plugins per workflow. Keep the version, source, and reason for each plugin.
Review: Decide where human approval is required: before customer contact, file publication, system changes or payment.
Logs and rollback: Keep sources, prompts, tools, file versions and decisions. Name the person who restores the workflow if something breaks.
Pause rules: Write down when the workflow stops: status incident, unusual cost, unexpected access, new plugin version or output that cannot be verified.

This is not governance theater. It is ordinary operations discipline moved into AI work.

Hammer’s angle: build the smallest safe workbench

When Hammer helps an organization through Tool Forge, we would rather build a narrow workbench than a giant platform. An agent that reads the right sources, uses two approved tools and produces a reviewable draft can be worth more than a flashy demo that can touch everything.

If this sounds like a workflow you already have, start by naming the step that consumes time but should not make final decisions. That is often the first useful test: research to deliverable, support draft, browser debugging, internal report or scheduled check. Add permissions, logging and stop rules before you add more tools.

FAQ

What is an AI agent workbench?

An environment where AI can use tools such as browsers, plugins, code execution, connectors and files to complete a workflow.

What is the first risk?

That tools get broader access than the workflow needs, or that nobody owns testing, logging and rollback.

How do you start safely?

Begin with a read-only or research workflow, approved plugins, a test account, clear stop conditions and human review before external effects.

When does Tool Forge fit?

When you want to build a narrow agent workflow with the right tools, permissions, logs and human control before it connects to real systems.

The Forge newsletter

Get new articles in your inbox

Pick the topics you care about. No noise, at most one email a week.

We follow GDPR. Unsubscribe anytime.

Getting started with AI: the first step is a conversation

Mindset ForgePrompt Engineering

28 April 2026

Getting started with AI: the first step is a conversation

AI does not need to start with agents and automations. Start simpler: choose a tool, ask questions, and build the habit of conversing with AI.

When AI stops being a chatbot and becomes infrastructure

Agentic AINews

1 May 2026

When AI stops being a chatbot and becomes infrastructure

A short summary of a NotebookLM episode about how AI is moving beyond chatbots and into operational infrastructure.

AI is leaving the chat box: workflows are starting to run themselves

Agentic AI

3 May 2026

AI is leaving the chat box: workflows are starting to run themselves

This week’s podcast unpacks how Anthropic, Perplexity, OpenAI and Mistral are moving AI from simple prompts into governed, asynchronous work execution.