Canary Token Injection - User Guide

What is Canary Token Injection?

Canary Token Injection is a security feature that helps detect when data from one user or session is accidentally exposed to another user. Think of it like a "tripwire" for an early warning system that alerts you to potential data leakage in your AI systems.

Canary Detection

This is a new feature we're experimenting with, so far so good, and feedback would be appreciated!

The Problem It Solves

Modern AI systems can sometimes leak data between users due to:

Prompt injection attacks that extract other users' data
Provider-side caching that mixes conversation contexts
Shared memory or context across sessions
RAG retrieval that surfaces inappropriate documents

Canary tokens act as invisible tracking markers that help detect these leakage scenarios.

How It Works

Injection: When a request is processed, a small unique token is automatically embedded into the system prompt or context
Tracking: The token is registered in the database along with who owns it
Detection: When processing responses, the system looks for tokens that belong to other users
Alerting: If a token from user A appears in user B's response, an alert is generated

User A Request → [Token "abc12345" injected] → LLM Provider
                                                    │
                                                    ▼
User B Request → Response contains "abc12345" → ALERT!
                 (Cross-user leakage detected)

Understanding Alerts

Alert Types

Type	Severity	What It Means
Cross-User	🔴 Critical	User A's data appearing in User B's response
Cross-Session	🟡 Medium	Data from one session appearing in another session of the same user
Stale Canary	🟠 High	Very old token appearing (possible memorization by provider)

What to Do When Leakage is Detected

Don't Panic: A single alert may be a false positive
Review Context: Check the leakage details
- Who owned the canary?
- Where was it detected?
- How old was the token?
Investigate Root Cause:
- Was this a shared context scenario?
- Are users related/same organization?
- Is this a known testing scenario?
Take Action:
- For confirmed leakage: Check LLM provider configuration
- For repeated patterns: Consider stricter isolation

Configuration Options

Canary injection is disabled by default.

Basic Settings

Setting	Default	Description
`enabled`	`true`	Turn canary injection on/off
`injection_rate`	`1.0`	Percentage of requests to inject (0.0 to 1.0)
`retention_days`	`30`	How long to keep canary records

Canary Configuration

Example Configurations

Development Environment (full monitoring):

json

{
  "enabled": true,
  "injection_rate": 1.0,
  "retention_days": 7
}

Production Environment (balanced):

json

{
  "enabled": true,
  "injection_rate": 0.1,
  "retention_days": 30
}

High-Security Environment:

json

{
  "enabled": true,
  "injection_rate": 1.0,
  "retention_days": 90,
  "survival_alert_threshold": 0.1
}

Monitoring Dashboard

Key Metrics to Watch

Injection Count: Total canaries injected over time
Survival Rate: Percentage of canaries that successfully round-trip
- Low survival may indicate provider-side filtering
Leakage Count: Number of cross-boundary detections
Leakage by Type: Breakdown of leakage categories

Health Indicators

Metric	Healthy	Warning	Critical
Survival Rate	>50%	20-50%	<20%
Leakages/Day	0	1-2	>2
Cross-User Leakages	0	-	Any

Testing Your Setup

Verify Injection is Working

Create a test canary:

Simulate Leakage Detection

Test detection by simulating a response containing another user's canary:

Canary Simulate

System Alerts: Canary System Alert

Best Practices

1. Start with Low Injection Rates

Begin with 10% (injection_rate: 0.1) and increase as you verify the system is working correctly.

2. Review Initial Alerts Carefully

The first few alerts may be false positives due to:

Testing scenarios
Shared organizational contexts
Integration testing

3. Set Up Alerting

Configure your monitoring system to alert on:

Any cross-user leakage (immediate investigation required)
Sudden increases in any leakage type
Survival rate drops (may indicate provider issues)

4. Regular Maintenance

Review leakage reports weekly
Purge old canaries monthly (automatic with retention settings)
Verify injection is still working after provider updates

Troubleshooting

"Canaries not being injected"

Check:

Is canary injection enabled?
What is the injection rate? (0 = no injection)
Is the proxy correctly configured with canary service?

"High false positive rate"

Possible causes:

Users sharing accounts across sessions
Testing with same user IDs
Integration tests not using isolated contexts

Solution:

Use unique user identifiers per actual user
Configure test environments with separate settings

"Low survival rate"

This means injected canaries aren't appearing in responses.

Possible causes:

LLM provider filtering/transforming content
Response truncation
Content safety filters

Solution:

Try different canary formats
Reduce system prompt size
Check provider documentation for filtering

"Unexpected leakage patterns"

Review the context:

Are detected users in the same organization?
Are sessions related (e.g., same user, new browser)?
Is this expected data sharing?

Privacy Considerations

Data Retention: Canary records include user session identifiers. Configure appropriate retention periods.
Access Control: Limit access to leakage reports to security team members.
Log Redaction: Consider redacting response snippets if they may contain sensitive data.

FAQ

Q: Does this affect response quality? A: No. Canary tokens are designed to be small (~3 tokens) and appear as system metadata that LLMs typically ignore.

Q: Can users see the canary tokens? A: The tokens appear in the system prompt (not visible to end users) and may occasionally appear in raw API responses if not filtered.

Q: What happens if a canary is detected? A: An alert is logged but the response is still delivered. This is by design - we prioritize availability over blocking on potential false positives.

Q: How long should I keep canary records? A: 30 days is typical for most use cases. High-security environments may want 90+ days for forensic purposes.

Q: Can this detect all data leakage? A: No. This is a sampling-based tripwire, not comprehensive DLP. It detects leakage patterns but cannot catch 100% of incidents.

Canary Token Injection - User Guide ​

What is Canary Token Injection? ​

The Problem It Solves ​

How It Works ​

Understanding Alerts ​

Alert Types ​

What to Do When Leakage is Detected ​

Configuration Options ​

Basic Settings ​

Example Configurations ​

Monitoring Dashboard ​

Key Metrics to Watch ​

Health Indicators ​

Testing Your Setup ​

Verify Injection is Working ​

Simulate Leakage Detection ​

Best Practices ​

1. Start with Low Injection Rates ​

2. Review Initial Alerts Carefully ​

3. Set Up Alerting ​

4. Regular Maintenance ​

Troubleshooting ​

"Canaries not being injected" ​

"High false positive rate" ​

"Low survival rate" ​

"Unexpected leakage patterns" ​

Privacy Considerations ​

FAQ ​

Canary Token Injection - User Guide

What is Canary Token Injection?

The Problem It Solves

How It Works

Understanding Alerts

Alert Types

What to Do When Leakage is Detected

Configuration Options

Basic Settings

Example Configurations

Monitoring Dashboard

Key Metrics to Watch

Health Indicators

Testing Your Setup

Verify Injection is Working

Simulate Leakage Detection

Best Practices

1. Start with Low Injection Rates

2. Review Initial Alerts Carefully

3. Set Up Alerting

4. Regular Maintenance

Troubleshooting

"Canaries not being injected"

"High false positive rate"

"Low survival rate"

"Unexpected leakage patterns"

Privacy Considerations

FAQ