AI agents need a safety belt: five checks before they touch real accounts

Adam Olofsson HammareAdam Olofsson Hammare
AI agents need a safety belt: five checks before they touch real accounts

When an AI agent can read web pages, run code, click inside a business system and use accounts that are already signed in, the question is no longer “can it do the task?”. The question is: what stops it from doing the wrong task with the right permissions?

This week’s signals from Perplexity, OpenAI, Claude, Manus and Google point in the same direction. Serious AI tools are being built with safety belts: isolated environments, short-lived permissions, prompt injection defenses, approval gates and logs. That is not only a topic for large IT departments. It is a practical buying and rollout question for any organization that lets AI work close to real accounts.

Five security terms decision makers should understand

You do not need to become a security architect. But you do need better questions than “is the tool secure?”. Start here:

  • Sandbox: a constrained execution environment where the agent can work without automatically inheriting the same freedom as the user’s normal computer or account.
  • Connector control: rules for which services the agent may connect to, what data it may read and who is allowed to enable the connection.
  • Temporary permissions: access that exists for one task and disappears when the work is done, instead of permanent API keys or shared passwords.
  • Prompt injection defense: filters and routines that stop external web pages, documents, or emails from tricking the agent into ignoring instructions.
  • Audit log: a traceable record of what the agent tried to do, which systems it touched and where a human approved or stopped the action.

This is the difference between a demo and a workflow that survives everyday work.

What Perplexity and OpenAI show with their sandboxes

Perplexity describes how Computer runs each task inside a Firecracker microVM, with its own Linux kernel, isolated file system, private network, short-lived proxy tokens and a safe stop when suspicious content is detected. For a non-specialist, the important point is not the name Firecracker. The important point is the principle: the agent should not work directly inside “the whole company”, but inside a bounded environment where access can be shut down.

Source: Perplexity — How We Built Security Into Computer

OpenAI’s Windows write-up for Codex shows the same lesson from another angle. Codex needed a dedicated Windows sandbox because the agent otherwise runs commands with the user’s own rights. OpenAI describes separate Windows users, file system rules, firewall rules and a clear trade-off between convenience and risk.

Source: OpenAI — Building a safe, effective sandbox to enable Codex on Windows

Translated into daily operations: if an agent is going to help with files, code, finance exports or customer data, it is not enough that it “seems careful”. It needs technical guardrails and a human process around them.

Browser agents make accounts and sessions urgent

Manus Preferred Browser shows why browser automation quickly becomes sensitive. The point of the feature is that the agent can use an authorized Chrome environment that already has the right sign-ins, extensions, network access and internal portals. That makes work smoother, but it also makes the browser part of the security architecture.

Source: Manus — Introducing Preferred Browser

Google also describes Gemini in Chrome and auto browse on Android, where the agent can help with tasks such as bookings and everyday errands but is designed to ask for confirmation before sensitive actions such as purchases or social posting. That detail matters: AI can prepare the action, but a human should approve the consequence.

Source: Google — Gemini in Chrome with auto browse comes to Android

Claude’s post about computer and browser use adds a more technical but useful reminder: even screenshot resolution and coordinate scaling affect whether the agent clicks the right place. Once an agent is given an interface to operate, “it probably understood” is not a safety strategy.

Source: Claude — Best practices for computer and browser use with Claude

A practical checklist before the agent gets real accounts

Use the questions below before connecting an AI agent to email, finance systems, CRM, learning platforms, browsers or internal files.

1. Which account does the agent use?

Avoid private owner accounts, shared passwords and “just borrow my login”. Prefer a bounded account or a clearly authorized session with the least permission possible.

Good first test: let the agent read test data or a copy before it touches a live account.

2. What data may the agent read?

Write down the data sources: folders, projects, customer lists, student information, quotes, invoices, support tickets. If the list feels too sensitive to write down, it is too sensitive to connect casually.

Good first test: start with one source, not “all of Google Drive”.

3. What may the agent change?

Split actions into three levels:

  • Prepare: create drafts, summarize, suggest next steps.
  • Change internally: update a status, create an internal note, add a tag.
  • Affect externally: send email, make purchases, publish, change customer or finance data.

The agent can often do level one early. Level two needs a clear log. Level three should keep human approval until you have measured error risk and recovery.

4. Where is prompt injection stopped?

If the agent reads web pages, PDFs, emails or customer text, it can encounter instructions that try to steer it. Ask the vendor how that is detected and what happens when something looks suspicious.

Good first test: run a scenario where a web page includes obvious “ignore previous instructions” language and see whether the agent stops, warns or continues.

5. Where are the log and rollback plan?

You need to answer three questions afterwards:

  • What did the agent try to do?
  • Which human approved the action?
  • How do we restore the workflow if the result was wrong?

If the answer is “we can search the chat history”, the workflow is not ready for sensitive work.

Start with one workflow where the risk is visible

Do not choose the most impressive use case first. Choose a recurring workflow where it is clear what the agent reads, what it proposes and who approves the result.

Good starting points:

  • a draft invoice reminder approved by the finance owner
  • a support-ticket summary where no customer receives an automatic reply
  • a missing-document list before onboarding
  • an internal project-folder update for a weekly meeting
  • a browser-based check that only flags deviations, not changes data

This is a natural Mindset Forge or Tool Forge engagement: map accounts, data sources, approvals and logging before the automation gets more freedom. If that sounds like your next step, Hammer can help turn it into a simple agent policy and a first safe pilot workflow.