AI Agents Are Becoming Your Middleman—But Here’s the 2-Line Web ‘Handshake’ That Determines Whether They Can Buy, Book, or Break Things

Q: Is robots.txt legally or technically enforceable?

Robots.txt is **advisory**, not enforcement. RFC 9309 describes it as a protocol that depends on crawler compliance. A well-behaved bot may check it early and follow it, but a malicious or non-compliant bot can ignore it. Anyone needing real protection should use authentication and application-layer security controls rather than relying on robots.txt.

Q: Why did RFC 9309 matter, and when was it published?

**RFC 9309** (Robots Exclusion Protocol) was published in **September 2022**. It matters because it formalized a long-running convention into an internet standard, clarifying mechanics such as group structure, user-agent matching, and rule resolution behavior. That clarity helps publishers and bot operators point to a shared baseline when policies are disputed.

Q: What’s the difference between a crawler and a user-triggered agent visit?

A **crawler** typically fetches content systematically for indexing, training collection, or search-style retrieval. A **user-triggered agent visit** happens because a person asked an AI system to open a specific URL as part of a task. From your server’s perspective, both are automated requests, but their purpose and policy treatment can differ—one reason governance is still confusing.

A plain-text file at your domain root still decides what many automated systems can reach—just as agents shift from reading pages to taking actions. The catch: it’s a handshake, not a lock.

By TheMurrow Editorial

May 17, 2026

AI Agents Are Becoming Your Middleman—But Here’s the 2-Line Web ‘Handshake’ That Determines Whether They Can Buy, Book, or Break Things

Key Points

1Understand robots.txt’s “two-line handshake”: `User-agent` plus `Allow`/`Disallow` can broadly shape what compliant agents can even reach.
2Treat RFC 9309 as policy baseline—not protection: robots.txt is advisory, so real security still requires authentication and authorization.
3Separate bot purposes before blocking: training crawlers, search/retrieval crawlers, and user-triggered browsing may use different user agents and rules.

A plain-text file at the root of your website now sits uncomfortably close to the front door of the AI economy.

For decades, `/robots.txt` was a quiet agreement between publishers and crawlers: a simple set of instructions, checked early, that suggested what a bot should and shouldn’t fetch. It wasn’t glamorous. It rarely made headlines. It also helped keep the web legible at scale.

Now the stakes have changed. AI systems are no longer limited to reading pages for indexing and summaries. They are increasingly positioned as intermediaries that browse on a user’s behalf—and, in some cases, take actions. Booking, buying, changing settings, following links: tasks that turn “access” from an abstract question into a practical one.

The irony is that the web’s most familiar “handshake” with machines remains almost comically small: two lines of text that can allow or block entire classes of automated agents. That minimalism is both its power and its problem.

“A two-line file can decide whether an automated agent gets to read your work—or even reach the pages where actions begin.”
— — TheMurrow Editorial

The two-line handshake that still runs the web

The most widely recognized handshake between websites and automated agents is still `/robots.txt`—a plain-text file located at the root of a domain. Reputable crawlers typically check it early—often first—before deciding what to access. The behavior is formalized in the Robots Exclusion Protocol, now an internet standard: RFC 9309, published in September 2022. That date matters because it marks the moment a long-standing convention became crisp enough to cite as policy, not folklore.

In practice, many site operators rely on an almost minimalist pattern—a “two-line handshake” that grants or denies broad access:

- `User-agent: `
- `Allow: /` or `Disallow: /`

Those two directives can permit or forbid a class of crawlers from accessing all paths. That’s not a trick or a hack; RFC 9309 describes how a robots.txt file is composed of groups that begin with one or more `User-agent:` lines, followed by `Allow:` and/or `Disallow:` rules.

Why two lines carry so much weight

The web is now full of automated readers with competing goals: indexing, training, search retrieval, and user-triggered browsing. When publishers ask “How do I control this?” robots.txt is often the first lever available—because it is widely supported, lightweight, and understood across the ecosystem.

Yet that convenience also produces overreach. A two-line decision can be blunt: it may keep out a training crawler but also reduce discoverability in certain search-like AI experiences, depending on how a provider uses its crawlers.

The critical limitation: advisory, not enforcement

RFC 9309 is explicit about what robots.txt is—and what it is not. Robots.txt is advisory, meaning it depends on bot compliance. It is not authentication. It is not authorization. It is not a security control.

RFC 9309 also notes that anyone who needs true access control should use real application-layer security—for example, authentication. The file can shape good-faith behavior, but it cannot compel it.

“Robots.txt is a handshake, not a lock.”
— — TheMurrow Editorial

RFC 9309: when a convention became an internet standard

Before 2022, robots.txt lived in that familiar space between “everyone does it” and “nobody owns it.” RFC 9309 changed that by formally documenting the syntax and behavior that many crawlers already followed—especially around user-agent matching and how allow/disallow rules should be interpreted.

For publishers and developers, the standardization is more than bureaucratic housekeeping. It means disputes can increasingly be framed in reference to a shared text: what a crawler should do, how it should decide which rule applies, and how groups should be evaluated.

How robots.txt matches and resolves rules

RFC 9309 describes a robots.txt file as a set of groups:

- One or more `User-agent:` lines
- Followed by rules such as `Allow:` and `Disallow:`

Matching is based on the crawler’s declared product token (its user-agent token). The resolution behavior is formalized with a longest-match style approach for choosing which rule applies. That matters because modern sites often combine broad rules (“Disallow all”) with narrow exceptions (“Allow this path”), and the order and specificity can determine outcomes.

Why standardization matters for “agent middlemen”

When AI systems sit between users and websites, every ambiguity becomes a liability. If a provider says a given user agent respects robots.txt, publishers can evaluate that claim against a clear standard. If a provider distinguishes between “crawler activity” and “user-triggered browsing,” publishers can ask whether the same robots.txt logic should apply—or whether an entirely different control surface is needed.

The standard doesn’t solve those policy questions, but it clarifies the technical base layer. That clarity is a necessary precondition for the harder governance debate now arriving.

September 2022

RFC 9309 was published then—turning robots.txt from convention into a citable internet standard.

When agents shift from reading to doing, access becomes risk

The biggest change is not that bots exist. The change is what they can do once they arrive.

OpenAI has described the security risks that emerge when systems retrieve web content and could be manipulated by crafted pages or links—especially around prompt-injection and URL-based data exfiltration attempts. These are not abstract concerns. They reflect a practical reality: when an agent “browses,” the content it reads can try to influence what it does next.

The “agent as browser” threat model

Traditional crawlers are designed to fetch, parse, and index. They don’t usually carry a user’s intent into a sensitive workflow. Agentic systems do. When an AI tool visits a URL because a user asked it to, the visit can be part of a larger sequence: read, decide, click, submit, purchase.

That creates new risk surfaces, including:

- Prompt injection embedded in web pages that tries to override instructions
- Data exfiltration attempts that entice an agent to reveal or transmit sensitive information
- Confusion between a trusted tool instruction and a malicious web instruction

OpenAI’s link-safety discussion frames these risks as central to agent design: if models retrieve web content, they can be tricked into leaking or mishandling data unless mitigations are in place.

Why robots.txt becomes more consequential—without becoming stronger

Robots.txt didn’t suddenly become enforcement. Its power hasn’t changed. What changed is the value of what lies behind the door.

If an agent is only indexing public pages, a permissive robots policy is a visibility choice. If an agent might follow links into transactional flows, the same permissive policy can become a safety choice. Publishers face a modern dilemma: how to remain discoverable without exposing sensitive surfaces to automated browsing in ways they didn’t anticipate.

“As soon as software can act, ‘who is allowed to fetch’ turns into ‘who is allowed to affect.’”
— — TheMurrow Editorial

The bot taxonomy problem: crawler, search, training, or user-triggered fetch?

Readers keep asking the same question in different forms: if you block “AI bots,” do you vanish from AI answers? The frustration comes from a messy taxonomy.

The ecosystem often conflates at least two categories:

1) Crawlers used for indexing, training collection, or search-style retrieval
2) User-triggered agent actions where a model visits a URL because a person asked it to

Those categories can look identical from a server log: both involve automated requests. Yet their purposes—and the policies around them—can differ sharply.

Why the distinction matters

If you publish a robots.txt rule targeting a training crawler, you might be making a statement about model training. If you target a search crawler, you might affect whether your pages can be retrieved for answers. If a provider treats user-triggered browsing as separate from crawling, robots.txt may not have the same effect you assumed.

OpenAI’s own documentation underscores the stakes: it documents multiple crawlers and frames robots.txt as the mechanism by which webmasters manage how their sites interact with OpenAI systems.

That sounds simple until you ask: which system are you controlling—training, search, or user browsing?

A real-world governance dilemma (without the drama)

Publishers want predictable outcomes. They want to know:

- Will blocking a crawler stop training collection?
- Will it also reduce answer quality or visibility in search-like experiences?
- Does a user-triggered fetch behave like a crawler or like a browser?

The honest answer is that different providers may draw the lines differently. The web wants one switch labeled “AI,” but it has several electrical circuits—and robots.txt can only label the wires it knows.

Key Insight

From your logs, training crawls, search retrieval, and user-triggered agent browsing can look identical—yet they can have totally different policy implications.

Targetable user agents: what OpenAI publishes, and why it matters

For site operators, the practical question is rarely philosophical. It’s operational: what do I write in robots.txt?

OpenAI documents multiple bots and explicitly positions robots.txt as the way webmasters manage interactions with OpenAI systems. Two names recur in the documentation and public references:

- OAI-SearchBot, a named crawler with a published user-agent string (per OpenAI documentation)
- GPTBot, widely referenced as the training-oriented crawler that can be disallowed via robots.txt

That separation is the beginning of a more granular approach: publishers can choose to allow one function (for example, search retrieval) while restricting another (for example, training collection). It isn’t perfect, but it is more nuanced than a single “block AI” toggle.

Practical example: the minimal rules that shape visibility

A publisher seeking the simplest controls may end up with variants of the two-line handshake:

- Allow broad access:
- `User-agent: `
- `Allow: /`

- Block broad access:
- `User-agent: `
- `Disallow: /`

But granular control comes from targeting specific user agents with distinct groups. RFC 9309 supports group-based structures, and providers that publish distinct tokens make that structure actionable.

The hard part: clarity for humans, not just machines

The web has long assumed that “User-agent: *” is good enough. The agent era punishes that assumption.

If you manage a site with a public editorial surface, a subscriber-only tier, and a transactional account area, you may want different behavior in each zone. Robots.txt can express some of that with path rules, but it cannot authenticate. It cannot distinguish a paid subscriber from a random bot. It cannot enforce “read but don’t act.”

Which leads to the next point: robots.txt is necessary, but it can’t carry the entire burden alone.

2 lines

A minimalist robots.txt group—`User-agent` plus `Allow`/`Disallow`—can broadly permit or block automated access across an entire site.

Robots.txt isn’t security—so what should publishers actually do?

RFC 9309 makes the boundary clear: robots.txt is not a security mechanism, and anyone who needs access control should use application-layer security.

That sounds obvious, yet the agent era tempts people to treat robots.txt like a firewall: “If I disallow it, I’m safe.” The standard itself rejects that assumption.

What robots.txt is good for

Robots.txt remains valuable for:

- Signaling preferences to compliant crawlers
- Reducing load from unwanted crawling
- Separating different classes of automated access when user agents are distinct and honest

It also creates a public, auditable statement of intent. That matters when norms are still forming.

What robots.txt cannot do

Robots.txt cannot:

- Verify a requester’s identity
- Prevent a non-compliant bot from fetching content
- Protect sensitive information if it’s publicly reachable
- Guarantee an agent won’t take action in an interface that allows it

Publishers who have genuine risk—account pages, internal tools, admin panels, personal data—need authentication, authorization, and conventional security controls. Robots.txt can complement those controls, but it cannot replace them.

A practical posture: “polite exclusion + real locks”

A sensible stance for many organizations is two-layered:

1) Use robots.txt as a policy signal for well-behaved crawlers
2) Use authentication and careful design for anything that should not be accessed or acted upon by unknown clients—human or automated

The agent era doesn’t change the fundamentals of security. It changes how quickly the fundamentals become relevant to ordinary publishing decisions.

“If it matters, lock it. If it’s a preference, publish it.”
— — TheMurrow Editorial

What robots.txt can and can’t do

✓Signal preferences to compliant crawlers
✓Reduce load from unwanted crawling
✓Separate automated access classes when tokens are distinct and honest
✓Verify identity (it can’t)
✓Prevent non-compliant bots from fetching (it can’t)
✓Protect sensitive data if it’s publicly reachable (it can’t)

Case study: the “buy, book, or break things” problem

Consider a typical commerce or services site. Public pages—product listings, help documentation, store locations—are meant to be read. The transaction flow—cart, checkout, account changes—exists to be used, but only in the right context and by the right party.

Agentic systems blur that boundary. A user might ask an assistant to “find the best option and book it.” A system might browse listings and then proceed into steps that look like ordinary web navigation, but at machine speed and with different failure modes.

Where robots.txt helps—and where it doesn’t

Robots.txt can discourage compliant bots from crawling sensitive paths. A publisher can disallow paths that should not be fetched by automated systems at scale.

Yet robots.txt cannot ensure the next request isn’t a user-triggered agent fetch. It also cannot prevent a malicious actor from ignoring the file. In a world where browsing can lead into action, publishers need stronger boundaries than “please don’t.”

Link safety and why providers care

OpenAI’s discussion of AI agent link safety highlights why providers are investing in mitigations: web content can include instructions designed to hijack an agent’s behavior or exfiltrate data. Those risks scale with capability. They also scale with access: the more an agent can reach, the more a hostile page can try.

The publisher’s role is not to solve model safety. The publisher’s role is to avoid mistaking a courtesy protocol for an access control system—and to recognize that “publicly reachable” increasingly means “reachable by automation.”

1 file

A single plain-text robots.txt at the domain root can shape early access decisions for many reputable automated agents.

Practical takeaways: how to think like an editor and an operator

Robots.txt has become a policy document as much as a technical file. That’s uncomfortable for publishers, because policy demands clarity and accountability. Yet a small set of disciplined habits can reduce confusion without forcing you into absolutist positions.

A checklist for sane robots decisions

- Decide what you’re optimizing for: discovery, training exclusion, or reduced automated load
- Use specific user agents when possible: providers that publish tokens make targeted rules feasible
- Avoid relying on robots.txt for sensitive areas: put authentication in front of anything that truly matters
- Segment your site by intent: public editorial content vs. transactional flows vs. private dashboards
- Monitor and revisit: as providers change user agents and products, your assumptions can drift

Multiple perspectives worth taking seriously

Publishers have legitimate reasons to restrict training collection. Search-oriented crawling can feel more like referral traffic, while training collection can feel extractive. At the same time, blocking everything can reduce a site’s presence in emerging answer experiences that users increasingly treat as entry points.

Providers argue that clear, standard mechanisms like robots.txt help maintain a functional web: machine-readable rules, respected at scale, reduce friction. Skeptics respond that “respected at scale” is still voluntary—and that the incentives of AI intermediaries do not always align with creators’.

Both perspectives share one truth: the old handshake is now doing new work. Publishers should treat it with the seriousness of any public-facing policy—because, increasingly, it is one.

Publisher posture in the agent era

Robots.txt can signal intent and shape compliant crawler behavior, but it cannot authenticate, authorize, or prevent action. Pair “polite exclusion” with real access controls.

Conclusion: the handshake is still small, but the door is bigger

The robots.txt file was never meant to govern an economy of agents. It was meant to keep crawlers polite.

RFC 9309 gave the protocol a formal spine in September 2022, and the AI boom gave it renewed relevance. The file can still do something important: it can express intent in a shared language that many automated systems recognize early. That is real power, even if it is not enforcement.

Publishers now face a more adult version of the old choice: decide what you welcome, decide what you refuse, and decide what must be protected by actual locks. A two-line handshake can signal your terms. It cannot defend them.

The next phase of the web will not be shaped only by models and product launches. It will be shaped by thousands of quiet decisions made in plain text—at the root of a domain—about who gets to knock, and what happens when the door opens.

1) What is the “two-line handshake” in robots.txt?

The phrase refers to the simplest useful robots.txt structure: a `User-agent:` line followed by an `Allow: /` or `Disallow: /` rule. Those two lines can broadly permit or block a class of crawlers across an entire site. RFC 9309 formalizes how these directives are structured and interpreted, making the minimalist approach both common and standards-based.

2) Is robots.txt legally or technically enforceable?

Robots.txt is advisory, not enforcement. RFC 9309 describes it as a protocol that depends on crawler compliance. A well-behaved bot may check it early and follow it, but a malicious or non-compliant bot can ignore it. Anyone needing real protection should use authentication and application-layer security controls rather than relying on robots.txt.

3) Why did RFC 9309 matter, and when was it published?

RFC 9309 (Robots Exclusion Protocol) was published in September 2022. It matters because it formalized a long-running convention into an internet standard, clarifying mechanics such as group structure, user-agent matching, and rule resolution behavior. That clarity helps publishers and bot operators point to a shared baseline when policies are disputed.

4) If I block AI bots, will I disappear from AI answers?

It depends on which bots you block and how an AI provider sources information. The ecosystem includes different automated agents—training crawlers, search/retrieval crawlers, and user-triggered browsing. OpenAI, for example, documents multiple bots and encourages using robots.txt for management. Blocking one user agent may affect one function more than another.

5) What’s the difference between a crawler and a user-triggered agent visit?

A crawler typically fetches content systematically for indexing, training collection, or search-style retrieval. A user-triggered agent visit happens because a person asked an AI system to open a specific URL as part of a task. From your server’s perspective, both are automated requests, but their purpose and policy treatment can differ—one reason governance is still confusing.

6) Which OpenAI bots can webmasters target in robots.txt?

OpenAI documents distinct user agents and positions robots.txt as the control mechanism. Two widely referenced examples are OAI-SearchBot (a named crawler with a published user-agent string in OpenAI documentation) and GPTBot, widely referenced as the training-oriented crawler that can be disallowed via robots.txt. Targetable tokens enable more granular choices than a blanket allow or deny.

7) What should I do if I’m worried about agents taking actions on my site?

Treat robots.txt as a preference signal, not a safeguard.

RFC 9309

The Robots Exclusion Protocol standard that formalizes robots.txt group syntax, user-agent matching, and allow/disallow rule resolution.

About the Author

TheMurrow Editorial is a writer for TheMurrow covering explainers.

Frequently Asked Questions

What is the “two-line handshake” in robots.txt?

Is robots.txt legally or technically enforceable?

Why did RFC 9309 matter, and when was it published?

If I block AI bots, will I disappear from AI answers?

What’s the difference between a crawler and a user-triggered agent visit?

Which OpenAI bots can webmasters target in robots.txt?

More in Explainers

Explainers·May 7

Apple’s ‘Encrypted RCS’ Fix Is Real—So Why Are Your “Green Bubble” Texts Still Less Private (and sometimes less reliable) than you think?

Apple says iOS 26.5 brings end‑to‑end encrypted RCS—but it’s beta, carrier‑gated, and threads can still downgrade to SMS/MMS. The color never promised privacy.

Explainers·May 4

America’s $800 ‘Duty‑Free’ Rule Is Collapsing in 2026—Here’s the Shipping Trick That Quietly Kept Your Shein/Temu Hauls Cheap (and what replaces it)

That “price magic” wasn’t logistics—it was Section 321 de minimis. EO 14324 flips the duty‑free switch off for most shipments, changing checkout totals, clearance, and fulfillment strategy.

Explainers·Apr 29

The Age-Verification Trick Lawmakers Aren’t Saying Out Loud: ‘Protect the Kids’ Bills That Turn Your Phone Into an ID Scanner (Even If You Don’t Have Kids)

The laws aren’t just targeting porn sites or social apps anymore—they’re targeting the chokepoints: app stores and even operating systems. To identify minors, the system has to process everyone, building a durable age/ID layer into everyday phone use.

Explainers·Apr 25

Your 2026 A/C Isn’t Being ‘Phased Out’—It’s Being Reclassified as a Fire Risk (and That’s Why Quotes Are Jumping by 20–40%)

The EPA’s shift is climate policy—GWP limits for new equipment—not a recall of what you already own. But the replacement refrigerants are often A2L “mildly flammable,” and that’s what’s changing codes, installs, labels, and prices.

Explainers·Apr 7

Half of America’s ‘AI Data Centers’ Aren’t Getting Built—So Why Are Your Electric Bills Still Rising? The Interconnection-Queue Trick Utilities Won’t Stop Using

Utilities are treating massive AI-related load requests like inevitable demand—even when many entries are duplicative, speculative, or never built. That paperwork can still steer billions in grid upgrades and show up in your rates before a single server rack turns on.

Explainers·Mar 29

AI Training Lawsuits Aren’t Really About “Fair Use” Anymore — They’re a Discovery War Over the One Dataset You’re Not Allowed to See

“Fair use” drives the headlines, but discovery drives leverage. The real fight is over what must be preserved, produced, and explained—then locked behind Attorneys’ Eyes Only.

Explainers·Mar 13

California’s One-Click Data-Deletion Tool Goes Live Aug. 1, 2026—So Why Might Your Data Spread Faster After You Click?

California’s DROP portal lets residents broadcast one deletion request to every registered data broker—but processing starts later, runs in cycles, and may require you to share more identifiers first.

Explainers·Mar 12

Your ‘AI Detection’ Tool Can Be 100% Right—and Still Lie: The New Proof That Provenance and Watermarks Can Contradict Each Other

A 2026 paper shows a cryptographically valid C2PA manifest and a highly reliable AI watermark can both “pass” yet imply incompatible stories. The result: verification that’s precise—but publicly misleading.

Sports·May 24

Pro Cycling Tried to Ban One Gear Combo—Then a Competition Court Said ‘No.’ Here’s Why a Bike Part Fight Could Decide the Next Wave of Safety Rules

A proposed UCI “54×11” maximum gearing trial was pitched as safety—but Belgian authorities said the process wasn’t transparent or proportionate, and it hit one supplier hardest. Now the sport’s next safety rules may depend on how they’re justified, staged, and enforced.

Health & Wellness·May 24

The FDA’s June 30 GLP-1 Deadline Isn’t About Weight Loss — It’s About ‘Copycat’ Chemistry (and why your injection may suddenly stop working)

June 30 isn’t a patient stop-date—it’s the close of an FDA public-comment window that could squeeze industrial compounding (503B) even as patient-specific compounding (503A) remains narrower, but not gone.

Travel·May 24

Your Face Is Becoming Your Boarding Pass—But Here’s the Part Nobody Tells You: You’re Still Re-Enrolling at Every Airport in 2026

Biometric lanes are real—but the U.S. built them as separate TSA, CBP, and airline systems. So the “one identity everywhere” promise still breaks the moment you change airports or carriers.

Style & Fashion·May 24

Europe’s July 19 Clothing Ban Sounds Like a Sustainability Win — So Why Are Brands Suddenly Obsessed With ‘Fit Tech’ and Smaller Returns?

The EU isn’t banning clothing—it’s banning the destruction of unsold apparel for large companies starting July 19, 2026. Once shredding is off the table, brands will chase the next biggest waste lever: fit-driven returns.

Business & Money·May 24

Stablecoins Aren’t ‘Digital Dollars’—They’re Short-Term Treasury Megafunds: The New Yield Loophole Banks Are Fighting (and why it could reshape your checking account by 2027)

USDC and USDT don’t run on piles of cash—they run on rolling T-bills and repo that generate real yield. The token stays at $1, but the portfolio underneath (and who captures the interest) is the real story.

World News·May 24

Bangladesh just passed 500 child deaths from measles — and the ‘contained’ outbreak is still spreading

The death toll’s headline number masks a crucial definitional split—lab-confirmed vs. “measles-like symptoms.” Meanwhile, WHO says 58 of 64 districts are affected, and emergency vaccination has escalated nationwide.

Opinion·May 24

Trump Says an Iran Deal Is Coming ‘Shortly.’ Here’s the Catch: A Hormuz ‘Victory’ Could Lock In $5 Gas for Months—and Make Washington Call It Peace

A ceasefire headline can move markets in hours, but safe, routine shipping through Hormuz is rebuilt on the water—via mine-clearing, insurance repricing, and proven transit. That lag is where $5 gas can stick even after Washington declares “peace.”

Reviews·May 23

Apple’s App Store Now Shows AI ‘Review Summaries’—Here’s the 3-Star Pattern They Can’t See (and the $9.99 Trap It Hides)

Apple is elevating an AI-written paragraph above the review pile—turning messy human feedback into a single, authoritative voice. That convenience can also smooth extremes, amplify manipulation, and quietly reshape what shoppers tolerate and what developers get blamed for.