How I Fell Into the Compliance Rabbit Hole #
AWS re:Invent announced Amazon Bedrock AgentCore Evaluations. Managed LLM-as-judge for your AI agents — sounds great, right?
So I did what any architect does: I read the documentation to understand how to integrate it.
What started as “let me evaluate this new AWS service” turned into a deep dive that revealed a serious GDPR compliance risk hiding in plain sight. Not with Bedrock AgentCore Evaluations itself — but with the entire observability pattern we’ve all been using for AI agents.
And if you’re building AI agents in production, you need to know about this.
To be clear: this isn’t an “AWS is bad” problem. It exists across AWS, Azure, GCP, Datadog, and pretty much every observability stack. These systems were designed for debuggability and performance — not per-user erasure under GDPR.
The Two Integration Paths #
Amazon Bedrock AgentCore Evaluations gives you two options:
Online evaluation: Reads conversations directly from CloudWatch Logs + OpenTelemetry
On-demand evaluation: Requires extra steps on the agent side (at which point, why not just run your own LLM-as-judge?)
I started with online evaluation because it seemed simpler.
The CloudWatch Trap #
Online evaluation means shipping your entire conversation to CloudWatch Logs.
I live in the EU. I architect data systems. I breathe GDPR compliance. And here’s the truth:
Your conversation history WILL contain PII. Always.
Users naturally share:
- Names, locations, email addresses (direct identifiers)
- Job details, family information, health data (indirect identifiers)
- Unique writing styles and behavioral patterns
- Contextual details that combine to identify them
You cannot scrub this data without destroying the functionality of your agent. And even if you think you can anonymize it — GDPR says pseudonymized data is still personal data. You must delete the actual data, not just the user-to-ID mapping.
The killer: GDPR’s right to erasure. Users can request deletion within 1 month. CloudWatch? Partition-based deletion only. You can’t delete a single user’s history.
“But wait,” you might say, “what about Amazon Bedrock AgentCore Observability?”
Yeah. That’s just a wrapper around CloudWatch + X-Ray. Same immutable storage. Same partition-based deletion. Same GDPR problem.
Then Came the Traces #
Reading further, AWS documentation says:
AgentCore Evaluations integrates with popular agent frameworks including Strands and LangGraph with OpenTelemetry and OpenInference instrumentation libraries…
Wait. Traces?
Classic distributed traces don’t contain message content, right? So I dug into Strands documentation (my framework of choice).
The oh-no moment #
Check the example trace output. Messages are part of the trace data.
Then I went down the OpenTelemetry GenAI spec rabbit hole. According to the official examples, traces explicitly include:
- Full user messages
- AI responses
- Retrieved context
- Function arguments
This creates a serious GDPR problem.
Where Your PII Actually Lives #
Here’s what nobody tells you when you set up AI agent observability:
Your PII is now in at minimum three places:
- Your database (hopefully deletable)
- CloudWatch Logs (immutable, partition-based deletion only)
- OpenTelemetry traces → Datadog / New Relic / Honeycomb (also immutable)
Typical pattern:
User → AI Agent → Database (deletable)
↓
CloudWatch / X-Ray (immutable)
↓
OpenTelemetry → Datadog (immutable)
Why this breaks GDPR expectations:
- No granular deletion
- Full message content in immutable storage
- Retention often 30–90+ days
- Multiple processors involved
You can delete records from PostgreSQL instantly.
But the same conversation still lives in CloudWatch and Datadog — and you can’t remove it.
The Third-Party Processor Nightmare #
You typically have two tracing paths:
AWS X-Ray
- Stays within AWS
- Covered by your existing AWS DPA
- Still immutable, partition-based deletion
Third-party (Datadog, New Relic, Honeycomb)
- Additional data processors
- Cross-company data flow
- Often US-based
- No granular deletion
What I see in practice:
Teams start with X-Ray, then export to Datadog anyway. Or they send logs to both CloudWatch and another platform.
Now full conversations exist in multiple immutable systems across multiple processors.
This is often happening by default, just by following “observability quickstart” guides.
The Access Control Blindspot #
Even before deletion, there’s another issue:
Everyone with access to observability tools can read user conversations.
- Developers debugging bugs? See PII.
- SREs investigating latency? See PII.
- Support teams? See PII.
- Contractors? Also see PII.
This violates core GDPR principles:
- Data minimization
- Purpose limitation
- Access control / least privilege
And this is the default behaviour of many AI agent frameworks.
Architectural separation is not optional. It’s the only sane approach.
What You Should Actually Do #
Here’s the pattern that actually works.
1. Architectural Separation (Non-negotiable) #
Deletable storage (PostgreSQL, DynamoDB, etc.)
- Full chat messages
- User profiles
- All PII
- Granular, immediate deletion
- Strict access control
Immutable systems (CloudWatch, X-Ray, Datadog)
- Metadata ONLY
- Token counts
- Latency
- Error codes
- Hashed user IDs (one-way)
- No message content
- 30-day retention max
When there’s no PII, developers can safely access these systems.
2. Fix Your OpenTelemetry Instrumentation #
Default instrumentation is the real danger.
Configure it to send:
✅ Latency
✅ Token usage
✅ Error type
✅ Hashed user ID
❌ Full prompts
❌ Model responses
❌ Retrieved context
❌ Function arguments with PII
I’ve already opened a feature request with Strands to make this the default. AI frameworks should be compliant by design, not compliant by extra effort.
3. The 7–14 Day Rule (If you can’t fix it yet) #
If you absolutely must send message content (legacy reasons, framework limitations, transition period), your retention matters.
- 30 days – common and defensible, but extremely tight
- 90+ days – almost impossible to justify for chat data
- 7–14 days – much safer, still operationally useful
This is mitigation, not a solution.
The real fix is: no content in immutable systems.
Quick Audit Checklist #
If you’re running AI agents in prod, ask yourself:
- Do any logs or traces contain full prompts or responses?
- Can I delete a single user’s data from all systems?
- Who inside the company can query these logs?
- Is retention longer than 30 days?
- Do third-party tools receive this data?
If any of these make you uncomfortable — you’ve found your starting point.
The Bottom Line #
The default observability pattern for AI agents creates major GDPR risk.
It’s not about CloudWatch vs Datadog. It’s about what you send to them.
As long as AI frameworks export full conversations to immutable systems, you have:
- No granular deletion
- Excessive retention
- Over-broad internal access
- Multiple data processors
The fix is architectural and simple:
- Chat content → Deletable storage only
- Observability → Metadata only
- No messages in traces/logs
- Strict access control
What’s Next #
Honestly? I’m still processing this.
What started as “let me evaluate a new AWS service” turned into realizing that the entire AI observability ecosystem has a compliance problem baked into its defaults.
If you’re building AI systems in production:
Audit your pipeline. Fix your retention. Separate your storage.
We’ll figure out the rest as we go.
Building AI in the EU? Send me your war stories. Misery loves company.
— The Pragmatical Architect