AI Enablement Radar week 20: make AI measurable before giving it more tools

This week’s pattern is pretty clear: AI is moving away from "try a chatbot" and toward "connect it to a real workflow, but measure and fence it first." For a small Nordic team, that does not mean building an AI department. It means picking one narrow workflow, deciding what AI may read and do, and writing down how you will know whether it saves time without creating new risk.
Top signals this week
- ServiceNow and Robinhood show why embedded AI beats sidecars. ServiceNow says AI, data, security, governance, and workflow execution are now included across its product portfolio. Robinhood says ServiceNow AI deflects 70 percent of employee requests before a human needs to step in, saving 2,200 manual hours across 1,300 tickets each month.
Source: ServiceNow press release
- Google Cloud points to agentic workflows, not just assistants. An agentic workflow is a multi-step workflow where AI can plan, retrieve information, and suggest or perform the next step under human control. Google highlights Telus, Suzano, and Danfoss as examples: Telus reports 57,000 employees using AI, Suzano cut query time by 95 percent, and Danfoss reduced order email response time from 42 hours to near real time.
Source: Google Cloud 2026 AI Agent Trends Report summary
- Databricks turns evals and governance into production questions. Evals are repeated tests of AI outputs against expected results, a kind of quality control for a workflow. Databricks reports that organizations using AI governance move over 12 times more AI projects into production, and organizations using evaluation tools move nearly 6 times more AI systems into production.
Source: Databricks Enterprise AI Agent Trends
- Coding agents are becoming workspaces. OpenAI Codex added mobile remote control through a Mac running the Codex app, hooks in general availability, access tokens, and more admin documentation. GitHub also released the Copilot app in technical preview, with isolated sessions from issues, pull requests, and previous work.
Source: OpenAI Codex changelog
Source: GitHub Copilot app technical preview
- The EU and NIST are making risk work more concrete. The EU AI Act page was updated on 11 May with risk levels, banned uses, and support resources. The EU GPAI guidelines explain who is covered by obligations for general-purpose AI models, while NIST continues to extend the AI RMF with a critical infrastructure profile.
Source: European Commission AI Act framework
Source: European Commission GPAI provider guidelines
Source: NIST AI Risk Management Framework
What organizations are actually doing with AI
The best examples this week are boring in the right way. They are about order emails, internal tickets, database questions, and reports. That is where a smaller organization should start too.
Google Cloud describes Danfoss using AI agents for email-based order processing, with 80 percent of transactional decisions automated and response time down from 42 hours to near real time. For a Swedish wholesaler, consultancy, or school administration, the translation is simple: choose an input flow that is already repetitive, such as quotes, absence cases, or support emails. Let AI sort, summarize, and draft. Let a human approve before anything is sent.
Source: Google Cloud 2026 AI Agent Trends Report summary
The ServiceNow and Robinhood example is worth watching too. The point is not that small teams should buy a large platform tomorrow. The point is that AI starts to matter when it sits inside ticket handling, with the right data, permissions, and logging. If AI sits next to the workflow in a loose chat, you often get faster wording but weaker control.
Source: ServiceNow press release
Deloitte gives the wider measurement for the same problem. Its report says worker access to AI rose by 50 percent in 2025, but the AI skills gap remains the biggest barrier to integration. Only one in five organizations has a mature governance model for autonomous AI agents. That sounds like enterprise data, but it may matter even more for small teams: without habits and rules, AI becomes a pile of personal shortcuts nobody else can review.
Source: Deloitte State of AI in the Enterprise 2026
The tooling layer: platforms, agents, and workflows
This week’s tooling signal is that agent tools are starting to look like workrooms. GitHub Copilot app is about isolated sessions: a task gets its own branch, files, conversation, and state. That is a healthy direction. When several AI tasks run in parallel, you need to see what is done, what is waiting for a human, and what should never be merged.
Source: GitHub Copilot app technical preview
OpenAI Codex points the same way from a different angle: remote control from mobile, hooks, access tokens, and auto-review documentation. Hooks are small rules or scripts that run when something happens, for example before a change or after a test. For small teams, this is more useful than it sounds. A simple hook can stop AI from touching customer data, run a test before a change moves forward, or remind everyone that a human must approve.
Source: OpenAI Codex changelog
MCP, Model Context Protocol, is a standard for connecting AI tools to external systems and data sources. GitHub shows both the upside and the risk. Secret scanning through the GitHub MCP Server is now generally available and can let MCP-compatible agents scan code for leaked secrets before a commit or pull request. Good. But the same principle means every connection is a new permission question. What may AI read? What may it change? How is it logged?
Source: GitHub secret scanning with MCP Server
Governance and risk: what needs to be in place before scaling
AI governance means the decisions, roles, and controls that decide how AI may be used. It does not need to start as a heavy policy. For a small team, four things are often enough: approved use cases, data boundaries, human review, and a simple scorecard.
The EU AI Act uses a risk-based model with banned uses, high-risk systems, transparency risks, and minimal risk. For schools, the distinction is especially clear: AI that affects assessment, admission, or student opportunities can sit in a very different risk category than AI helping a teacher draft a first lesson outline.
Source: European Commission AI Act framework
The EU GPAI guidelines and Code of Practice add more clarity around general-purpose AI models. They mainly target model providers, but small organizations should read the signal: documentation, copyright, transparency, and safety practices are becoming procurement questions. Ask vendors what the model may train on, what data is stored, and how they handle systemic risks.
Source: European Commission GPAI provider guidelines
Source: General-Purpose AI Code of Practice
COSO’s guidance on internal control over generative AI is not mainly written for small businesses, but the core point travels well: the risks show up in daily work. Prompt manipulation, opaque reasoning, model drift, and frequent configuration changes can affect reporting, decisions, and compliance. So write down who owns the workflow, who may change the prompt, and how errors are caught.
Source: COSO Achieving Effective Internal Control Over Generative AI
This week's practical Hammer test
This test takes 30 to 45 minutes and needs no new integrations. Pick a real but safe workflow: five old support emails, five old student questions, five incoming quote requests, or five internal tickets. Remove personal data first.
Do this:
- Pick a workflow where the result must always be reviewed by a human.
- Write three rules: what AI may read, what it may suggest, and what it must never do.
- Ask AI to summarize each case and suggest the next step.
- Measure three things: time saved, number of wrong or questionable suggestions, and how often the human had to rewrite the answer.
- Decide after the test: stop, adjust, or build further.
Copy the prompt:
You are helping us test whether AI can support one narrow workflow without making decisions by itself.
Material: [paste 3-5 anonymized examples]
Rules:
- Use only the material I give you.
- Do not make decisions and do not send anything to a customer, student, or supplier.
- For each example: summarize the situation, suggest the next human action, mark uncertainties, and say what information is missing.
- Finish with a scorecard: what saved time, what became risky, and what should a human always review?
If the test shows real time savings and few risks, it may be the first step toward a Tool Forge style workflow: not a big AI program, just a bounded workflow with measurement and human control. If you want to talk through one of those workflows, start through contact.
Companies and tools to watch
- ServiceNow: shows how AI becomes more useful when it is embedded in ticketing and governance rather than sitting in a separate chat.
- Google Cloud and Gemini: a useful source for concrete agent examples in customer service, SQL queries, and order flows.
- Databricks: puts numbers behind why governance and evals are production issues.
- GitHub Copilot and OpenAI Codex: show coding agents becoming workspaces with sessions, hooks, review, and remote control.
- Microsoft Education and OECD: worth following for schools that want to use AI without confusing faster task completion with better learning.
Source: Microsoft Education AI Toolkit
Source: EU Digital Skills & Jobs: OECD Digital Education Outlook 2026
The Forge newsletter
Get new articles in your inbox
Pick the topics you care about. No noise, at most one email a week.
We follow GDPR. Unsubscribe anytime.


