5 Policies Every AI Agent Should Follow Before Taking Action

Your AI agent is about to do something. Maybe it's calling an API. Maybe it's sending an email. Maybe it's querying a database, modifying a document, or triggering a workflow that affects a real customer, a real patient, or a real financial transaction.

Before any of that happens, something needs to decide whether the action is allowed.

Not after. Not in a log review next Tuesday. Before the action executes — while there's still time to block it, redirect it, or escalate it to a human.

This post covers five policies that every organization deploying AI agents should have in place before those agents start taking actions. These aren't abstract governance principles. They're concrete, enforceable rules — the kind you can implement in a policy engine and evaluate against every agent action in real time.

Policy 1: Restrict Which Tools an Agent Can Use

An AI agent's capabilities are defined by its tools. A customer support agent with access to the ticketing system, the knowledge base, and the CRM can handle support requests. Give that same agent access to the billing system, the HR database, and the production infrastructure, and you've created something very different — an agent that can issue refunds, read employee records, and modify live systems.

Most agent frameworks default to broad access. The developer defines a set of tools, the agent gets access to all of them, and the assumption is that the agent's instructions will keep it within appropriate boundaries. This assumption fails regularly. Prompt injection, hallucinated tool calls, ambiguous instructions, and simple misconfiguration all lead to agents attempting actions outside their intended scope. When the only thing between the agent and a dangerous action is the agent's own judgment about what it should do, you don't have governance. You have hope.

Tool restriction policies define the boundary explicitly. For each agent identity, you specify which tools it's authorized to use. Any tool call outside that boundary is blocked before execution — regardless of why the agent attempted it, regardless of whether the specific parameters look reasonable, regardless of what the agent's instructions say.

What the policy looks like in practice:

Agent: "customer-support-bot"

Allowed tools:
  ✅ zendesk.tickets.read
  ✅ zendesk.tickets.reply
  ✅ zendesk.tickets.update_status
  ✅ knowledge_base.search
  ✅ knowledge_base.read_article

Denied tools (everything else, including):
  ❌ stripe.refunds.create
  ❌ stripe.subscriptions.cancel
  ❌ crm.contacts.delete
  ❌ email.send_external
  ❌ slack.post_message

This is the AI agent equivalent of the principle of least privilege — the same concept that governs IAM roles in cloud infrastructure, file permissions in operating systems, and access control in databases. The agent gets the minimum set of tools required for its task. Everything else is denied by default.

Why this matters beyond security: Tool restriction also catches drift. As agent systems evolve, developers add new tools to the toolkit — a new API integration, a new database connector, a new communication channel. Without explicit tool policies, every new tool is automatically available to every agent. Tool restriction forces a deliberate decision: does this agent need this capability? If yes, add it to the allow list. If no, the agent never sees it.

The rule to implement:

- id: tool-access-control
  description: "Agent can only call tools in its approved list"
  severity: CRITICAL
  type: deterministic
  conditions:
    field: action.tool_name
    operator: NOT_IN
    values: [approved tool list per agent identity]
  enforcement: BLOCK

Policy 2: Block Sensitive Data from Leaving the Session

Agents access data to do their jobs. A patient intake agent reads medical records. A financial advisory agent retrieves account balances and transaction histories. A legal review agent processes contracts with confidential terms. The data access itself is appropriate — the agent needs this information to complete its task.

The risk isn't the access. It's what happens next.

An agent that reads a patient's medication list and then summarizes it for the treating physician is functioning correctly. The same agent reading the same medication list and then including it in an email to an external address — or logging it in a Slack channel, or passing it to another API, or including it in a response that gets cached by a third-party system — has just created a data breach.

Sensitive data exfiltration policies track what data an agent has accessed during a session and restrict what the agent can do with that data afterward. This is session-aware governance — the policy evaluates the current action in the context of what happened earlier in the same workflow.

What this looks like in practice:

Session: agent-session-7721
Agent: patient-intake-agent

Action 1: ehr.patient.read(patient_id: "P-4521")
  → Session updated: data_tags = ["PHI", "medications", "diagnosis"]
  → Policy check: ALLOW

Action 2: document.draft(content: "Patient summary for Dr. Chen...")
  → Session context: PHI accessed
  → Policy check: ALLOW (internal document creation)

Action 3: email.send(to: "admin@external-vendor.com", body: includes patient data)
  → Session context: PHI accessed earlier in session
  → Policy: "Block external transmission when PHI present in session"
  → Policy check: BLOCK
  → Reason: "Session contains PHI. External communication blocked."

The critical detail: Action 3 is blocked not because of anything wrong with the email itself in isolation, but because of what happened in Action 1. The agent accessed PHI earlier in the session. That context — accumulated across the session — triggers the enforcement.

This policy pattern applies to any category of sensitive data, not just healthcare. Financial account numbers, personally identifiable information, confidential contract terms, proprietary source code, credentials — any data classification that your organization considers sensitive can be tracked at the session level and used to restrict subsequent actions.

The rule to implement:

- id: phi-session-exfiltration-block
  description: "Block external communications when PHI has been accessed in session"
  severity: CRITICAL
  type: deterministic
  conditions:
    all:
      - field: session.dataTags
        operator: CONTAINS
        value: "PHI"
      - field: action.type
        operator: IN
        values: ["email.send_external", "slack.post", "api.call_external"]
  enforcement: BLOCK

Policy 3: Require Human Approval for High-Risk Actions

Not every agent action should be autonomous. Some actions carry consequences that are expensive, irreversible, or reputationally damaging enough that a human should review them before they execute — even if the action technically complies with every other policy.

The challenge is defining where to draw the line. Block too many actions and the agent becomes useless — every step requires human approval and you've rebuilt a manual workflow with extra steps. Block too few and the agent takes an action that costs real money, affects a real customer, or triggers a real regulatory incident before anyone knows it happened.

The right approach is risk-based thresholds. Define the categories of actions that require approval and the conditions under which approval is triggered. Everything else runs autonomously.

Common approval triggers:

Financial transactions above a threshold. The agent can process refunds under $200 automatically. Anything above $200 pauses for human approval.
External communications to new recipients. The agent can reply within an existing ticket thread. Initiating contact with a new external party requires approval.
Data deletion or modification. The agent can read and create records. Deleting or modifying existing records pauses for review.
Actions affecting production systems. The agent can operate freely in staging and development environments. Any action targeting production infrastructure requires approval.
Actions involving minors or vulnerable populations. In healthcare and education contexts, any agent action involving a minor's records triggers mandatory human review.

What the approval flow looks like:

Agent: financial-advisor-agent
Action: stripe.refunds.create(amount: $1,450, customer: "C-8823")

Policy evaluation:
  → Rule: "Refunds over $500 require human approval"
  → Amount: $1,450 > $500
  → Result: APPROVAL_REQUIRED

What happens next:
  1. Action is paused (not blocked, not allowed — paused)
  2. Approval request created with full context:
     - What: Refund of $1,450 to customer C-8823
     - Why: Agent determined refund warranted based on complaint
     - Session: Link to full action chain showing agent's reasoning
  3. Notification sent to designated approver (Slack, email)
  4. Approver reviews context and approves or rejects
  5. If approved: action executes, approval recorded in audit trail
  6. If rejected: agent receives rejection, adjusts workflow
  7. If timeout (no response in configured window): default to reject

The timeout default matters more than most teams realize. Without a timeout, a paused action waits indefinitely — which means the agent's workflow hangs, the customer waits, and the approver might not even realize there's something pending. A sensible default — reject after 30 minutes for financial actions, reject after 4 hours for administrative actions, escalate after 24 hours for low-urgency reviews — keeps the system moving.

What this doesn't mean: Human approval is not a substitute for policy enforcement. If an action violates a hard policy — accessing a prohibited tool, exfiltrating sensitive data, exceeding scope boundaries — it should be blocked outright, not queued for approval. Approval gates are for actions that are policy-compliant but high-risk. The agent is allowed to do this. The question is whether it should do it right now, for this specific case, with these specific parameters.

The rule to implement:

- id: financial-approval-gate
  description: "Financial transactions over $500 require human approval"
  severity: HIGH
  type: deterministic
  conditions:
    all:
      - field: action.type
        operator: IN
        values: ["stripe.refunds.create", "stripe.charges.create", "payment.send"]
      - field: action.params.amount
        operator: GT
        value: 500
  enforcement: APPROVAL_REQUIRED

Policy 4: Enforce Scope Limits

Every agent is built for a purpose. A customer support agent handles support tickets. A code review agent reviews pull requests. A data analysis agent queries databases and generates reports. The purpose defines the scope — the set of tasks the agent is designed and authorized to perform.

Scope drift happens when an agent starts doing things outside its intended purpose. Sometimes this is caused by ambiguous instructions. Sometimes by prompt injection. Sometimes by perfectly reasonable-seeming requests from users who don't realize they're asking the agent to exceed its boundaries. "While you're at it, can you also update the customer's billing address?" sounds harmless. If the support agent wasn't designed to modify billing records, it shouldn't — regardless of how politely the user asked.

Scope enforcement is different from tool restriction (Policy 1). Tool restriction controls which tools the agent can use. Scope enforcement controls what the agent is trying to accomplish. An agent might have access to the CRM as a tool — it needs to read customer records to handle support tickets. But using that CRM access to modify billing information, create new accounts, or export customer lists is outside the agent's scope, even though the tool access technically permits it.

What scope enforcement looks like:

Agent: code-review-agent
Defined scope: "Review pull requests for security, compliance, and
  code quality issues. Post review comments. Approve or request changes."

Action: github.issues.create(title: "Refactor auth module", body: "...")
  → Policy: "Agent actions must align with defined scope"
  → Evaluation: Creating issues is not within "review pull requests" scope
  → Result: BLOCK
  → Reason: "Action outside agent scope. code-review-agent is authorized
     to review PRs and post comments, not create issues."

Action: github.pulls.create_review(body: "Found hardcoded credential on line 42")
  → Policy: "Agent actions must align with defined scope"
  → Evaluation: Posting PR reviews is within defined scope
  → Result: ALLOW

Scope enforcement typically requires semantic evaluation — a deterministic rule can check tool names, but determining whether "create a GitHub issue" falls within the scope of "review pull requests" requires understanding intent. This is where LLM-powered policy evaluation earns its place. The scope definition is compared against the requested action, and the evaluation determines whether the action aligns with the agent's purpose.

Why scope enforcement matters for multi-agent systems: As organizations deploy multiple agents that interact with each other — an intake agent that triggers a processing agent that triggers a notification agent — scope enforcement prevents cascade failures. If Agent A can ask Agent B to do something outside B's scope, and B complies because it has the tool access, scope enforcement at Agent B catches the violation. Each agent enforces its own boundaries regardless of who made the request.

The rule to implement:

- id: agent-scope-enforcement
  description: "Agent actions must align with defined operational scope"
  severity: HIGH
  type: semantic
  evaluation:
    prompt: |
      Agent scope definition: {{entity.scope_description}}
      Requested action: {{action.type}} with parameters {{action.params}}
 
      Does this action fall within the agent's defined scope?
      Consider the intent of the action, not just the tool being used.
      A tool that is available to the agent may still be used in ways
      that exceed the agent's authorized purpose.
  enforcement: BLOCK

Policy 5: Log Every Decision for Full Auditability

The first four policies are about prevention — stopping agents from doing things they shouldn't. The fifth policy is about evidence — proving that governance was applied, consistently, to every action the agent took.

This isn't optional for regulated industries. If your AI agents make decisions that affect patients, financial transactions, legal outcomes, or customer data, regulators will ask how you governed those decisions. "We have a policy" is not sufficient. They want to see that the policy was evaluated, what the result was, and what happened afterward. They want the complete chain of evidence from the agent's first action to its last.

A comprehensive audit policy requires logging at four levels:

Action-level logging. Every action the agent attempts — tool call, API request, data access, communication — is recorded with the action type, target, parameters, timestamp, and the identity of the agent that attempted it.

Evaluation-level logging. Every policy evaluation is recorded with the policies that were checked, the rules that were triggered (or not), the evaluation result (allow, warn, block, approval required), and the explanation for the result. This includes the policy version — so you can prove which rules were active at the time of the evaluation.

Session-level logging. The full action chain for each agent session is preserved as an ordered sequence. This is the decision chain — the complete record of what the agent did, in what order, with what results. Session metadata includes the accumulated context (data tags, tool usage, action count) that informed cross-action policy evaluations.

Resolution-level logging. When a violation occurs, the resolution is recorded: who was assigned, when they acknowledged it, what investigation was performed, what the root cause was, and how it was resolved. This closes the loop — from detection through enforcement through remediation.

What the audit trail looks like for a single session:

Session: session-9f2a-4d1e
Agent: loan-underwriting-agent
Entity: Borrower B-7234
Started: 2026-01-14T14:23:07Z
Status: COMPLETED

Action Chain:
  #1  14:23:08  credit_bureau.pull(borrower: "B-7234")
      Policies evaluated: [data-access-controls, pii-handling]
      Result: ALLOW

  #2  14:23:11  document.read(file: "borrower-income-verification.pdf")
      Policies evaluated: [data-access-controls, document-handling]
      Result: ALLOW

  #3  14:23:14  underwriting.evaluate(
        credit_score: 682, dti: 41%, ltv: 88%,
        loan_type: "conventional", property: "SFR")
      Policies evaluated: [fannie-mae-guidelines, risk-thresholds]
      Result: WARN — DTI exceeds 43% guideline for conventional loans

  #4  14:23:15  underwriting.decision(recommendation: "approve_with_conditions")
      Policies evaluated: [approval-authority, risk-thresholds]
      Result: APPROVAL_REQUIRED — flagged for human review due to DTI warning

  #5  14:41:22  [Human approver reviewed and approved with condition:
      "Compensating factor: 12 months reserves documented"]

  #6  14:41:23  document.generate(type: "conditional_approval_letter")
      Policies evaluated: [document-standards, borrower-communications]
      Result: ALLOW

Session summary:
  Actions: 6
  Violations: 0
  Warnings: 1 (DTI threshold)
  Approvals requested: 1 (granted with condition)
  Duration: 18 minutes (including 17 min human review)
  Data accessed: [credit_report, income_verification, property_appraisal]

This single session record answers every question an auditor would ask: What data did the agent access? What decisions did it make? What rules were applied? Where did a human intervene? What was the outcome? The evidence isn't assembled after the fact — it's generated as a natural output of the enforcement pipeline.

The rule to implement:

This isn't a single policy rule — it's an infrastructure requirement. Every evaluation must write to the audit trail. Every session must preserve its action chain. Every violation must track its resolution lifecycle. The logging isn't something you opt into per policy. It's a property of the governance system itself.

That said, you can define policies about the audit trail:

- id: session-logging-required
  description: "All agent sessions must have complete action chain logging"
  severity: CRITICAL
  type: deterministic
  conditions:
    field: session.logging_enabled
    operator: EQUALS
    value: false
  enforcement: BLOCK
  note: "If an agent session cannot be fully logged, no actions are permitted"

Putting It All Together

These five policies aren't independent checkboxes. They form a layered defense where each policy catches risks that the others miss:

Tool restriction (Policy 1) defines the outer boundary — what the agent can touch.

Data exfiltration prevention (Policy 2) governs what happens to sensitive information the agent accesses within that boundary.

Human approval gates (Policy 3) add judgment to high-stakes decisions that are technically within policy but consequential enough to warrant review.

Scope enforcement (Policy 4) ensures the agent stays focused on its intended purpose, even when it has the tools and access to do more.

Audit logging (Policy 5) provides the evidence that all four preceding policies were evaluated, enforced, and documented for every action.

An agent that passes all five policies is operating within its authorized tools, handling sensitive data appropriately, escalating high-risk decisions to humans, staying within its defined scope, and generating a complete audit trail. That's not a perfectly governed agent — perfection doesn't exist in governance any more than it exists in security. But it's an agent that an organization can deploy with confidence, defend to a regulator, and investigate when something goes wrong.

The organizations deploying agents without these policies aren't moving faster. They're accumulating risk that compounds with every autonomous action taken without enforceable rules. The first incident — the wrong email sent, the unauthorized refund processed, the sensitive data exposed — will cost more to remediate than the governance infrastructure would have cost to implement.

Start with these five. Enforce them before the agent takes its next action. Build from there.

5 Policies Every AI Agent Should Follow Before Taking Action

Policy 1: Restrict Which Tools an Agent Can Use

Policy 2: Block Sensitive Data from Leaving the Session

Policy 3: Require Human Approval for High-Risk Actions

Policy 4: Enforce Scope Limits

Policy 5: Log Every Decision for Full Auditability

Putting It All Together

Related Posts

What AI Agent Governance Actually Looks Like

The US Government Just Made AI Agent Governance a National Priority

OpenAI Acquired OpenClaw. Who's Building the Governance Layer?

Enjoyed this post?

Ready to Govern Your AI?