TECHNICAL

AIPREF: A Common Language for AI Usage Preferences

The IETF AIPREF working group is developing a standardized way for publishers to express how their content should be used by automated systems. Here's what it is, how it works, and how to implement it today.

Jithin Raj & Originary Team

As AI systems increasingly rely on web content for training and operation, publishers need a clear, standardized way to communicate their usage preferences. The IETF AI Preferences (AIPREF) working group addresses this need by defining both a vocabulary for expressing usage preferences and mechanisms for attaching those preferences to content.

Unlike informal conventions or platform-specific controls, AIPREF provides an Internet-scale standard that works across the HTTP ecosystem. It builds on existing infrastructure (robots.txt, HTTP headers) while introducing purpose-specific semantics that robots.txt alone cannot provide.

What is AIPREF?

AIPREF consists of two complementary specifications currently in draft at the IETF:

1. Vocabulary Specification (draft-ietf-aipref-vocab)

Defines a structured vocabulary for expressing preferences about how content should be used by automated systems. The vocabulary includes categories like bots, train-ai, train-genai, and search, with allow (y) or disallow (n) values.

Latest version: draft-ietf-aipref-vocab-03 (September 2025)

2. Attachment Specification (draft-ietf-aipref-attach)

Specifies how to associate preferences with content using HTTP headers and robots.txt. This includes the Content-Usage HTTP header field and updates to RFC 9309 (robots.txt) to support preference directives.

Latest version: draft-ietf-aipref-attach-03 (September 2025)

Usage Categories

The vocabulary defines four primary categories, organized hierarchically:

Automated Processing (`bots`)

The broadest category covering all automated processing of content. This is the parent category for more specific usage types.

Use case: Blanket permission or restriction for any automated access

AI Training (`train-ai`)

A subset of automated processing specifically for training machine learning models. This includes both generative and non-generative AI systems.

Use case: Allow search indexing but restrict model training

Generative AI Training (`train-genai`)

A subset of AI training focused specifically on training models that generate synthetic content (text, images, audio, etc.).

Use case: Allow classification models but restrict generative models

Search (`search`)

Content indexing and discovery for search applications that direct users to original content locations.

Use case: Maintain search visibility while restricting AI training

Hierarchical Inheritance

Categories inherit from their parents. If you set bots=n but don't specify search, search will inherit the disallow preference. However, explicit values always override inherited ones - bots=n, search=y allows search while disallowing other automated processing.

How to Attach Preferences

AIPREF defines two mechanisms for associating preferences with content:

1. HTTP Content-Usage Header

The most granular method. Add the Content-Usage header to HTTP responses to specify preferences for specific resources:

HTTP/1.1 200 OK Content-Type: text/html Content-Usage: train-ai=n <!DOCTYPE html> <html>...</html>

Implementation Examples

Nginx

location / {
  add_header Content-Usage "train-ai=n" always;
}

Apache (.htaccess)

<IfModule mod_headers.c>
  Header set Content-Usage "train-ai=n"
</IfModule>

Express.js

app.use((req, res, next) => {
  res.setHeader('Content-Usage', 'train-ai=n');
  next();
});

Cloudflare Workers

export default {
  async fetch(request) {
    const response = await fetch(request);
    const headers = new Headers(response.headers);
    headers.set('Content-Usage', 'train-ai=n');
    return new Response(response.body, {
      headers,
      status: response.status
    });
  }
};

2. robots.txt Content-Usage Directive

For path-scoped preferences, add Content-Usage directives to your robots.txt file:

User-Agent: * Allow: / Content-Usage: train-ai=n User-Agent: * Allow: /public-research/ Content-Usage: /public-research/ train-ai=y, train-genai=n

Path Matching Rules

The robots.txt mechanism uses longest-prefix matching. If a resource path matches multiple Content-Usage directives, the one with the longest matching path prefix applies. This allows you to set site-wide defaults and override them for specific paths.

Preference Resolution Rules

When preferences come from multiple sources or specify overlapping categories, AIPREF defines clear resolution rules:

Explicit Values Win

An explicit y or n for a category takes precedence over inherited values from parent categories.

Specific Overrides General

More specific categories override broader ones. If train-genai isn't specified, it inherits from train-ai, which inherits from bots.

Multiple Sources: Disallow Wins

When combining preferences from HTTP headers and robots.txt, if any source indicates n (disallow), the usage is disallowed. Otherwise, if any indicates y (allow), it's allowed.

Unknown is Valid

If a category isn't specified and can't be inherited, the preference is "unknown." This is a valid state - not every publisher needs to express preferences for every category.

Example Resolution

Given: bots=y, train-ai=n, train-genai=y

bots: Allow (explicit)
train-ai: Disallow (explicit, overrides parent)
train-genai: Allow (explicit, overrides parent train-ai)
search: Allow (inherits from bots)

Practical Considerations

Work in Progress

AIPREF is currently in draft status at the IETF. While the core concepts are stable, details may change before final standardization. Early adopters should track the working group's progress and be prepared to update implementations.

Current draft versions: draft-ietf-aipref-vocab-03 and draft-ietf-aipref-attach-03 (September 2025). Drafts expire March 9, 2026.

No Built-in Enforcement

AIPREF provides a mechanism for expressing preferences, not enforcing them. The specification does not define compliance mechanisms, auditing, or consequences for ignoring preferences. Publishers seeking enforcement should layer AIPREF with contracts, terms of service, or technical access controls.

Legal Context Matters

The specification explicitly notes that preferences do not automatically create legal rights. Recognized priorities (accessibility, security, legal obligations) may override preferences. For example:

Accessibility tools may ignore bots=n to serve users with disabilities
Security researchers may process content despite restrictions
Existing licensing agreements supersede AIPREF preferences

Relationship to Other Signals

AIPREF complements rather than replaces existing mechanisms:

robots.txt (RFC 9309)

Handles crawl access control. AIPREF extends robots.txt with purpose semantics but doesn't replace its core function of controlling crawler access.

ai.txt

An informal convention for AI-specific permissions. AIPREF provides a standardized alternative with formal IETF backing and richer semantics.

C2PA / Content Credentials

Handles content provenance and authenticity. AIPREF expresses usage preferences; C2PA verifies content lineage. They work together - AIPREF states rules, C2PA provides evidence of compliance.

Implementation Roadmap

For organizations looking to adopt AIPREF today:

Phase 1: Express Baseline Preferences

Start with robots.txt directives for site-wide or path-based preferences. This requires minimal infrastructure changes and provides broad coverage.

User-Agent: * Content-Usage: train-genai=n, search=y

Phase 2: Add HTTP Header Support

Implement Content-Usage headers at your CDN, reverse proxy, or application layer. This enables resource-specific preferences and more granular control.

Phase 3: Document and Communicate

Publish your AIPREF policy in human-readable form. Link to it from your terms of service. Make it clear to AI system operators what your preferences are and why.

Phase 4: Monitor and Enforce

Track which systems respect your preferences. Consider pairing AIPREF with technical access controls (authentication, rate limiting) and legal agreements (licenses, terms of service) for enforcement.

Originary's Position

We support the IETF AIPREF effort and view it as a critical piece of infrastructure for the agentic web. Standardized, machine-readable preference signals reduce friction, improve transparency, and create conditions for responsible AI development at Internet scale.

Originary systems already read AIPREF preferences where publishers expose them and incorporate those preferences into our policy engine. We pair preference signals with cryptographic receipts and provenance tracking, enabling publishers to produce verifiable evidence of how their content was accessed and used.

As the specification matures, we'll continue to track the working group's progress and update our implementations to stay aligned with the final standard.

Ready to implement AIPREF?

Learn how Originary helps publishers express preferences, verify compliance, and generate cryptographic receipts for AI access.

View Documentation Talk to Us