AIPREF: A Common Language for AI Usage Preferences
The IETF AIPREF working group is developing a standardized way for publishers to express how their content should be used by automated systems. Here's what it is, how it works, and how to implement it today.
As AI systems increasingly rely on web content for training and operation, publishers need a clear, standardized way to communicate their usage preferences. The IETF AI Preferences (AIPREF) working group addresses this need by defining both a vocabulary for expressing usage preferences and mechanisms for attaching those preferences to content.
Unlike informal conventions or platform-specific controls, AIPREF provides an Internet-scale standard that works across the HTTP ecosystem. It builds on existing infrastructure (robots.txt, HTTP headers) while introducing purpose-specific semantics that robots.txt alone cannot provide.
What is AIPREF?
AIPREF consists of two complementary specifications currently in draft at the IETF:
1. Vocabulary Specification (draft-ietf-aipref-vocab)
Defines a structured vocabulary for expressing preferences about how content should be used by automated systems. The vocabulary includes categories like bots, train-ai, train-genai, and search, with allow (y) or disallow (n) values.
Latest version: draft-ietf-aipref-vocab-03 (September 2025)
2. Attachment Specification (draft-ietf-aipref-attach)
Specifies how to associate preferences with content using HTTP headers and robots.txt. This includes the Content-Usage HTTP header field and updates to RFC 9309 (robots.txt) to support preference directives.
Latest version: draft-ietf-aipref-attach-03 (September 2025)
Usage Categories
The vocabulary defines four primary categories, organized hierarchically:
Automated Processing (bots)
The broadest category covering all automated processing of content. This is the parent category for more specific usage types.
Use case: Blanket permission or restriction for any automated access
AI Training (train-ai)
A subset of automated processing specifically for training machine learning models. This includes both generative and non-generative AI systems.
Use case: Allow search indexing but restrict model training
Generative AI Training (train-genai)
A subset of AI training focused specifically on training models that generate synthetic content (text, images, audio, etc.).
Use case: Allow classification models but restrict generative models
Search (search)
Content indexing and discovery for search applications that direct users to original content locations.
Use case: Maintain search visibility while restricting AI training
Hierarchical Inheritance
Categories inherit from their parents. If you set bots=n but don't specify search, search will inherit the disallow preference. However, explicit values always override inherited ones - bots=n, search=y allows search while disallowing other automated processing.
How to Attach Preferences
AIPREF defines two mechanisms for associating preferences with content:
1. HTTP Content-Usage Header
The most granular method. Add the Content-Usage header to HTTP responses to specify preferences for specific resources:
HTTP/1.1 200 OK Content-Type: text/html Content-Usage: train-ai=n <!DOCTYPE html> <html>...</html>
Implementation Examples
Nginx
location / {
add_header Content-Usage "train-ai=n" always;
}Apache (.htaccess)
<IfModule mod_headers.c> Header set Content-Usage "train-ai=n" </IfModule>
Express.js
app.use((req, res, next) => {
res.setHeader('Content-Usage', 'train-ai=n');
next();
});Cloudflare Workers
export default {
async fetch(request) {
const response = await fetch(request);
const headers = new Headers(response.headers);
headers.set('Content-Usage', 'train-ai=n');
return new Response(response.body, {
headers,
status: response.status
});
}
};2. robots.txt Content-Usage Directive
For path-scoped preferences, add Content-Usage directives to your robots.txt file:
User-Agent: * Allow: / Content-Usage: train-ai=n User-Agent: * Allow: /public-research/ Content-Usage: /public-research/ train-ai=y, train-genai=n
Path Matching Rules
The robots.txt mechanism uses longest-prefix matching. If a resource path matches multiple Content-Usage directives, the one with the longest matching path prefix applies. This allows you to set site-wide defaults and override them for specific paths.
Preference Resolution Rules
When preferences come from multiple sources or specify overlapping categories, AIPREF defines clear resolution rules:
Explicit Values Win
An explicit y or n for a category takes precedence over inherited values from parent categories.
Specific Overrides General
More specific categories override broader ones. If train-genai isn't specified, it inherits from train-ai, which inherits from bots.
Multiple Sources: Disallow Wins
When combining preferences from HTTP headers and robots.txt, if any source indicates n (disallow), the usage is disallowed. Otherwise, if any indicates y (allow), it's allowed.
Unknown is Valid
If a category isn't specified and can't be inherited, the preference is "unknown." This is a valid state - not every publisher needs to express preferences for every category.
Example Resolution
Given: bots=y, train-ai=n, train-genai=y
bots: Allow (explicit)train-ai: Disallow (explicit, overrides parent)train-genai: Allow (explicit, overrides parent train-ai)search: Allow (inherits from bots)
Practical Considerations
Work in Progress
AIPREF is currently in draft status at the IETF. While the core concepts are stable, details may change before final standardization. Early adopters should track the working group's progress and be prepared to update implementations.
Current draft versions: draft-ietf-aipref-vocab-03 and draft-ietf-aipref-attach-03 (September 2025). Drafts expire March 9, 2026.
No Built-in Enforcement
AIPREF provides a mechanism for expressing preferences, not enforcing them. The specification does not define compliance mechanisms, auditing, or consequences for ignoring preferences. Publishers seeking enforcement should layer AIPREF with contracts, terms of service, or technical access controls.
Legal Context Matters
The specification explicitly notes that preferences do not automatically create legal rights. Recognized priorities (accessibility, security, legal obligations) may override preferences. For example:
- Accessibility tools may ignore
bots=nto serve users with disabilities - Security researchers may process content despite restrictions
- Existing licensing agreements supersede AIPREF preferences
Relationship to Other Signals
AIPREF complements rather than replaces existing mechanisms:
robots.txt (RFC 9309)
Handles crawl access control. AIPREF extends robots.txt with purpose semantics but doesn't replace its core function of controlling crawler access.
ai.txt
An informal convention for AI-specific permissions. AIPREF provides a standardized alternative with formal IETF backing and richer semantics.
C2PA / Content Credentials
Handles content provenance and authenticity. AIPREF expresses usage preferences; C2PA verifies content lineage. They work together - AIPREF states rules, C2PA provides evidence of compliance.
Implementation Roadmap
For organizations looking to adopt AIPREF today:
Phase 1: Express Baseline Preferences
Start with robots.txt directives for site-wide or path-based preferences. This requires minimal infrastructure changes and provides broad coverage.
User-Agent: * Content-Usage: train-genai=n, search=y
Phase 2: Add HTTP Header Support
Implement Content-Usage headers at your CDN, reverse proxy, or application layer. This enables resource-specific preferences and more granular control.
Phase 3: Document and Communicate
Publish your AIPREF policy in human-readable form. Link to it from your terms of service. Make it clear to AI system operators what your preferences are and why.
Phase 4: Monitor and Enforce
Track which systems respect your preferences. Consider pairing AIPREF with technical access controls (authentication, rate limiting) and legal agreements (licenses, terms of service) for enforcement.
Originary's Position
We support the IETF AIPREF effort and view it as a critical piece of infrastructure for the agentic web. Standardized, machine-readable preference signals reduce friction, improve transparency, and create conditions for responsible AI development at Internet scale.
Originary systems already read AIPREF preferences where publishers expose them and incorporate those preferences into our policy engine. We pair preference signals with cryptographic receipts and provenance tracking, enabling publishers to produce verifiable evidence of how their content was accessed and used.
As the specification matures, we'll continue to track the working group's progress and update our implementations to stay aligned with the final standard.
Further Reading
AIPREF Vocabulary Specification
draft-ietf-aipref-vocab - Official IETF working draft
AIPREF Attachment Specification
draft-ietf-aipref-attach - HTTP and robots.txt integration
IETF AIPREF Working Group
Official working group page and charter
C2PA Content Credentials
Content provenance and authenticity framework
Ready to implement AIPREF?
Learn how Originary helps publishers express preferences, verify compliance, and generate cryptographic receipts for AI access.