AI Bot Detection: Turning Unknown AI Traffic Into Verifiable Evidence
Detection alone is not enough. Learn how metadata, model fingerprints, and PEAC receipts transform AI traffic from guesswork into verifiable, enforceable evidence.
“AI detection” is having a moment. But most people mean one of two things:
Is this content real, or did an AI model generate or alter it?
Is this visitor a human, or an AI bot quietly crawling my site or API?
Those are different jobs. Both matter. Both are easy to get wrong if you only rely on classifiers and vibes.
Originary takes a different view: every time an AI system touches your data, there should be a clear, verifiable trail of what happened. That trail needs to work for developers, lawyers, auditors, and automated agents at the same time.
That is exactly what PEAC Protocol provides - a neutral proof layer for AI interactions that issues cryptographic receipts for access, usage, and payments using a standard PEAC-Receipt HTTP header.
What “AI bot detection” really covers
People often bundle three separate capabilities under “AI detection”:
Fake vs real (content authenticity)
Classifying whether a text, image, audio, or video file was generated or altered by an AI model, usually with a probability score.
Model fingerprinting (who generated this)
Inferring which model family or vendor likely produced the artifact, or using watermarks and statistical fingerprints to attribute it.
Bot and agent detection (who is calling me)
Detecting that an incoming request is from an AI agent or crawler, not from a person in a browser, and understanding which agent, under what declared purpose.
“You cannot control, license, or monetize AI usage of your data if you cannot see which AI agents are actually accessing it.”
AI bot detection is that missing visibility layer between your content and the growing universe of AI crawlers, copilots, and headless agents.
Why “detection-only” is not enough
There is real value in content-level detection and model fingerprinting. But they have hard limits:
It is an arms race
As models improve, naive classifiers become less reliable. A detector that feels strong this quarter may be unreliable next quarter.
Scores are not proof
A "0.84 likelihood of AI" score is a hint. It is not a signed record that will stand up in an audit, complaint, or legal dispute.
No policy, no economics
Even if you know something is AI-generated, that does not tell you whether the agent respected your usage policy, paid you for access, or is allowed to keep the data.
Enterprises, regulators, and serious publishers need more than yes/no classification:
That is where Originary and PEAC push beyond detection-only to detection + policy + receipts.
The four pillars of useful AI bot detection
In practice, AI bot detection becomes powerful when you combine four signal types:
3.1 Metadata: the quiet truth-teller
Metadata is “data about the data.” For AI bot detection, you care about at least three layers:
File and media metadata
- EXIF and container metadata in images, audio, video
- C2PA provenance tags and content credentials
- Timestamps, edit history, device hints
Transport metadata
- HTTP headers, TLS fingerprints, IP ranges, ASNs
- User-Agent strings, model hints, API keys
- Request rate, timing patterns, geo patterns
Contract metadata
- Policy fields: allowed usage, price, retention
- Consent flags and legal basis
- Links to receipts, licenses, dispute records
On its own, metadata can be spoofed. Combined with cryptographic receipts, it becomes a strong integrity check. In PEAC, metadata is not an afterthought - effective AI preference policies (AIPREF) are discovered and snapshotted into every receipt, so audits are self-contained.
3.2 Model fingerprints: which model touched this
Model fingerprinting tries to answer: Which model family or vendor produced this artifact?
Risk & compliance
Some models may be disallowed for regulated data
Attribution & economics
Different pricing for different model types
Cross-checking claims
Detect mismatches between claims and reality
In Originary’s world, model fingerprints feed into policy and receipts: policies can say “allow research use from approved models, block others.” Receipts include which model was declared at access time.
3.3 Access: every AI call as a verifiable event
This is the most undervalued pillar. Traditional logs tell you IP, path, timestamp. That is not enough for AI agents and 402-style paid access.
In a PEAC-aware environment, each AI call becomes a structured, signed event:
agent_id → which agent or client called you agent_type → crawler, copilot, aggregator, training pipeline model_id → declared model family in use policy_version → which policy applied enforcement → e.g. http-402 for payment-gated access payment → rail, amount, currency, provider evidence aipref → snapshot of AI usage preferences in effect issued_at → when the receipt was generated
Instead of “we think an AI scraped our site,” you can say: “Agent X, using model Y, accessed resources A, B, C on these dates, under policy Z, via HTTP 402, and paid this amount. Here is the signed receipt.”
The PEAC kernel signs receipts using Ed25519 and ships them in a PEAC-Receipt header, ready for offline or online verification.
3.4 Artifact repository: cases, not random files
Once you have detection and rich access events, you need somewhere to put them. An artifact repository is:
- A structured library of artifacts: requests, responses, media, forensics, and receipts
- Grouped into cases or projects: incidents, audits, fraud investigations
- Enriched with metadata, fingerprints, and PEAC receipts
This lets banks, insurers, publishers, and regulators reconstruct what happened, show chain-of-custody evidence for disputes, and re-run analyses when policies change. Originary’s goal: your live AI traffic and artifact repository are two views of the same evidence layer.
How Originary + PEAC change AI bot detection in practice
Publish policies that agents can actually read
Every PEAC-aware service exposes a discovery file at /.well-known/peac.txt that advertises protocol version, payment rails, receipt requirements, and verification endpoints.
AIPREF policies describe how your content may be used. These are snapshotted into every receipt. AI agents can no longer pretend they did not know your terms.
Enforce and measure with HTTP 402 and receipts
When an AI agent hits a protected resource, it receives an HTTP 402 Payment Required response. Once the agent pays or proves entitlement, the PEAC kernel issues a signed receipt binding: what was accessed, who accessed it, which policy applied, and payment details.
AI bot detection becomes not just “yes, that looked like a bot” but “yes, that bot paid, under these terms, here is the signed proof.”
Give good agents a way to prove they are good
Most serious AI agents want a clean way to respect content owners. Originary + PEAC give them that path: pre-fetch peac.txt, integrate 402 flows, attach receipts when passing data downstream.
That is AI bot detection as positive infrastructure rather than only defensive heuristics.
Make bad or ambiguous agents stand out
Once good agents follow rules and produce receipts, what remains is easier to handle: crawlers ignoring peac.txt, tools spoofing user-agents, traffic with no receipts. These become clear anomalies. You can throttle, block, or litigate based on evidence rather than suspicion.
Where this is going next
This post is the high-level overview. We will follow up with a focused series on metadata, access events, fingerprinting, and artifact repositories.
Related Reading
Ready to turn AI traffic into verifiable evidence?
Learn how Originary and PEAC Protocol give you visibility, policy enforcement, and cryptographic receipts for every AI interaction.