AI Enablement Radar week 25: measure the agent before it gets the keys

The clearest signal this week is not glamorous: AI agents are getting more than better tools. They are getting registries, cost metrics, security checks and clearer rules for when they are allowed to act. For a Nordic small team, a school or an organization without a large IT department, that means less magic and more operations: do not start with the smartest model. Start with the task the agent may touch, how you measure whether it did the job well, and who can stop it.
Top signals this week
- Microsoft framed its enterprise AI message as "Intelligence + Trust": AI should amplify the organization's own knowledge, while giving leaders governance, security, cost control and model flexibility. That is a useful summary of where the market is moving.
Source: Achieving success with AI, Microsoft
- Microsoft's Work IQ APIs became generally available on June 16. Work IQ is designed to give agents context from Microsoft 365, and Microsoft describes an average Work IQ data footprint of more than 600 TB among Fortune 500 customers, ten generic tools through MCP and 80% fewer tokens in internal tests. MCP, Model Context Protocol, is a way for AI clients to connect to tools and data sources without every integration being built from scratch.
Source: Announcing the new Work IQ APIs, Microsoft 365 Blog
- GitHub made agent discovery practical with Agent finder for GitHub Copilot. Instead of loading every MCP server, skill and tool up front, Copilot can find the right resource from a chosen registry. The important part: GitHub says tools are not installed automatically, and enterprises can govern which resources may be discovered.
Source: Agent finder for GitHub Copilot now available, GitHub Changelog
- GitHub also added daily AI credit consumption per user to the Copilot usage metrics API. It sounds dry, but this is exactly the kind of measurement that turns AI adoption into budget and capacity management instead of a feeling.
Source: AI credits consumed per user now in the Copilot usage metrics API, GitHub Changelog
- Workday launched Agent Passport to test, verify and continuously monitor AI agents. Each agent can be linked to tests against standards such as OWASP LLM Top 10, NIST AI RMF and MITRE ATLAS, with Cisco as the first testing partner. This is the passport metaphor made operational: who is the agent, what has it been tested for, and when should access be revoked?
Source: Workday Launches Agent Passport, Workday
- Databricks reports from its customer base that organizations using AI governance tools push more than 12 times more AI projects into production, and organizations using evaluation tools move nearly 6 times more AI systems into production. Evals, or evaluations, are repeatable tests that show whether an AI solution still works when prompts, models, data or tools change.
Source: Enterprise AI Agent Trends, Databricks
What organizations are actually doing with AI
What stands out is not one demo. It is the way large platforms are moving AI out of the chat tab and into existing work surfaces: Microsoft 365, GitHub, Workday, Shopify and data platforms. An agentic workflow here means the AI does not only answer. It plans, retrieves context, uses tools, leaves a trace and sometimes proposes or performs the next step.
In Workday's case, the focus is HR, finance and IT. Agent-Ready Tools should let agents retrieve records, update benefits or trigger approvals, while keeping Workday's security model, delegation model, business process controls and audit trail around the action. The practical lesson for smaller teams is simple: do not give the agent a loose API key and hope. Put actions behind tools that already have access control, logging and approval.
Source: Workday Launches New Tools for Developers to Build, Connect, and Verify AI Agents, Workday
In commerce, the same shift appears from another angle. Shopify says millions of merchants can sell in AI chats through Agentic Storefronts, with channels such as ChatGPT, Microsoft Copilot, AI Mode in Google Search and the Gemini app managed centrally from Shopify Admin. For a small online store, the point is not to chase every new AI channel. The point is to make product data, prices, delivery rules and customer promises clear enough for an AI assistant to quote correctly.
Source: Millions of merchants can sell in AI chats, Shopify
Shopify and Google have also presented Universal Commerce Protocol, UCP, as an open way for AI agents to connect to commerce systems. Shopify mentions support for REST, MCP, Agent Payments Protocol and Agent2Agent. That is another signal that AI adoption is integration work: product catalog, payment, returns policy and customer data need to fit together.
Source: The agentic commerce platform, Shopify
The tooling layer: platforms, agents, and workflows
The useful pattern for small organizations is that tool vendors are starting to show the same building blocks: registries of approved resources, usage measurement, run traces and instructions close to the work surface.
GitHub's Agent finder is one piece. An internal registry can be as simple as a list of which MCP servers, templates, prompts and data sources are approved for different tasks. When the agent needs booking data, CRM information or project files, it should find the right path instead of guessing.
Source: Agent finder for GitHub Copilot now available, GitHub Changelog
GitHub's security validation for third-party agents is another. Claude and OpenAI Codex can create code in repositories, and GitHub says the same automatic validation with CodeQL, GitHub Advisory Database and secret scanning now applies to third-party agents. If an agent introduces a vulnerability or a secret, that should be caught before the pull request is finalized.
Source: Security validation for third-party coding agents, GitHub Changelog
OpenAI points in the same direction in its documentation: use the Responses API when one model call with tools is enough, but use the Agents SDK when your application owns orchestration, tool execution, approvals and state. That boundary is useful. Not everything needs to become a large agent system. But when the agent has to take several steps and use real tools, the run needs to be coded, measurable and pausable.
Source: Agents SDK, OpenAI API docs
OpenAI's guidance on agent evals starts with traces: review the full run, tool calls, handoffs and guardrails before you build repeatable test datasets. That is a good start even without an expensive platform. Save three representative runs, score them against a short checklist, and do not change the prompt without comparing the result.
Source: Evaluate agent workflows, OpenAI API docs
Governance and risk: what needs to be in place before scaling
AI governance is not a binder nobody reads. In practice, it is a set of decisions about which AI systems you use, which data they can see, which actions they can take, how logs are kept, who approves exceptions and how you measure quality over time.
The EU AI Act remains the baseline for European organizations. It classifies AI systems by risk and sets stricter requirements for high-risk uses, including risk management, documentation, logging, information to users, human oversight, robustness, cybersecurity and accuracy. For a school, hiring process or public service, it is not enough to ask "does the prompt work?" Also ask whether the use case falls into a risk category that needs stronger control.
Source: AI Act, European Commission
For general-purpose AI models, the EU General-Purpose AI Code of Practice has become the practical bridge between legal text and provider work. The Commission describes the code as a voluntary tool to help providers comply with AI Act rules on transparency, copyright, safety and systemic risks for GPAI models.
Source: Drawing-up a General-Purpose AI Code of Practice, European Commission
The NIST AI Risk Management Framework is still a useful practical map, even for smaller organizations: identify risks, measure them, manage them and govern them. NIST also notes 2026 work on an AI RMF profile for critical infrastructure, which says something about where the bar is moving: AI controls should connect to real operating environments, not just policy documents.
Source: AI Risk Management Framework, NIST
The OWASP GenAI Security Project is more technical, but useful once agents get tools. Prompt injection, insecure output handling and data leakage are not abstract risks if the agent can read files, write code or send customer replies. That is why env vars, secret managers, scoped API keys, least-privilege permissions, redaction, approval gates and audit logs need to be part of the workflow from the start.
Source: OWASP Top 10 for Large Language Model Applications
ISO/IEC 42001 gives a more organization-level frame: an AI management system. For a small team, the full standard may be too heavy, but the mindset is useful. Decide policy, ownership, risk assessment, follow-up and improvement cycle before AI becomes everyone's private experiment.
Source: ISO/IEC 42001:2023, ISO
This week's practical Hammer test
Choose one recurring task where AI is already used informally: customer replies, proposal drafts, lesson planning, code review, meeting follow-up or research. Spend 30-45 minutes making it measurable.
- Write the task as a work card: goal, which sources the AI may use, which tools it may touch, and what it must never do without a human.
- Create three test cases: one easy, one normal and one awkward. Use real formats, but remove anything that should not sit in the prompt.
- Run the AI and save the result, the sources it used, which tools it tried to reach and what a human changed.
- Set two stop points: one for quality and one for access. Example: "do not send a customer reply without approval" or "read from CRM but do not write back".
- Choose one simple measurement for next week: time saved, errors caught, missing sources, cost per run or number of cases needing human correction.
This is a small Tool Forge test: not a large implementation, but a controlled agent run with a registry, log and review. If you want to build it with Hammer, Tool Forge can help shape the first version with the right permissions and stop points.
Companies and tools to watch
- Microsoft Work IQ and Agent 365: show how enterprise agents are moving toward context, governance and cost control inside Microsoft 365.
- GitHub Copilot: agent discovery, AGENTS.md, usage metrics and third-party validation make coding agents more operational.
- Workday Agent Passport: an interesting model for giving every agent a verifiable security and control profile.
- Databricks: its governance and eval numbers show that production is less about demos and more about measurement.
- Shopify Agentic Storefronts and UCP: an important signal for online stores as purchases begin in AI chats instead of on the homepage.
FAQ
What is the main AI signal in week 25 2026?
AI agents are moving from standalone chat into measurable, governed workflows with registries, cost metrics, security checks and human stop points.
What does MCP mean for small teams?
MCP, Model Context Protocol, makes it easier to connect AI to tools and data sources. Small teams should use it with approved servers, scoped permissions, logs and clear rules for what the agent may do.
How can we test an AI agent without a large project?
Choose one recurring task, create three test cases, log sources and tool calls, set one quality stop point and one access stop point, then measure the result next week.
The Forge newsletter
Get new articles in your inbox
Pick the topics you care about. No noise, at most one email a week.
We follow GDPR. Unsubscribe anytime.


