Data Storage and Encryption#
Relevant controls: SC.03.01, SC.03.02, SC.03.03, SC.03.04
Compliance reference
This document supports ITRA control SC-03 Data Protection.
1. Data Classification#
AI connectors processes data on behalf of authenticated Novo Nordisk employees. No business data is persisted by the platform — all Microsoft Graph API response content is held in memory only and never written to disk or a database. The only platform-owned persistent data stores are the OAuth token cache (DynamoDB), audit logs (S3), and infrastructure secrets (SSM).
| Data type | NN Classification | Storage location | Notes |
|---|---|---|---|
| Microsoft Graph API responses (email, calendar, SharePoint files, Teams messages) | Varies (up to Strictly Confidential) | In-memory only — never persisted | Sensitivity label filtering applied at runtime for SharePoint and Outlook |
| Azure AD OAuth tokens (OBO access tokens) | Confidential | DynamoDB (TTL ~1 hour) | Short-lived; contain user identity claims |
| Audit log records (user OID, UPN, tool call metadata) | Internal | S3 (nn-aiconnectors-audit-{env}) |
Retained 365 days; no content payloads |
| Azure AD client secrets | Strictly Confidential | SSM Parameter Store (SecureString, KMS-encrypted) | Never in code, images, or logs |
| Application logs (HTTP access, errors) | Internal | CloudWatch Logs | No personal content; user identity not logged |
| Infrastructure state (Terraform) | Confidential | S3 (aiconnectors-terraform-state-{env}) |
Contains resource IDs and ARNs; KMS-encrypted |
2. Encryption in Transit#
All data in transit between components is encrypted. The platform enforces HTTPS at every boundary.
| Data path | Protocol | TLS version / policy | Notes |
|---|---|---|---|
| Client → ALB | HTTPS | TLS 1.3 preferred; TLS 1.2 minimum (ELBSecurityPolicy-TLS13-1-2-2021-06) |
TLS terminated at ALB; ACM wildcard certificate |
| ALB → ECS task | HTTP / 80 | N/A — plaintext | Internal VPC only; ECS tasks in private subnets with no internet exposure; traffic never leaves the VPC |
| ECS → Azure AD token endpoint | HTTPS | TLS 1.2+ (Microsoft-enforced) | login.microsoftonline.com |
| ECS → Microsoft Graph API | HTTPS | TLS 1.2+ (Microsoft-enforced) | graph.microsoft.com |
| ECS → Azure Databricks | HTTPS | TLS 1.2+ (Databricks-enforced) | Per-workspace HTTPS endpoint |
| ECS → DynamoDB | HTTPS | TLS 1.2+ (AWS-enforced) | Via VPC Gateway endpoint |
| ECS → SSM Parameter Store | HTTPS | TLS 1.2+ (AWS-enforced) | Via VPC; at container startup only |
| ECS → Kinesis Firehose | HTTPS | TLS 1.2+ (AWS-enforced) | Audit log delivery per tool call |
| ECS → ECR | HTTPS | TLS 1.2+ (AWS-enforced) | Image pull at task startup, via S3 Gateway VPC endpoint |
The ALB-to-ECS path is HTTP over a private subnet. This is an accepted design choice — the traffic is within a VPC that has no inbound access from the internet (only the ALB is internet-facing), and TLS at this hop would add latency with no meaningful security gain given the network controls in place.
ACM Certificate Management#
TLS termination uses AWS Certificate Manager (ACM) wildcard certificates:
| Environment | Domain | Renewal |
|---|---|---|
| Dev | *.dev.connectors.novo-genai.com |
Auto-renewed by AWS (DNS validation) |
| Prod | *.connectors.novo-genai.com |
Auto-renewed by AWS (DNS validation) |
Certificates are provisioned by Terraform in infra/main/shared/ (acm module), validated via Route 53 DNS CNAME (created automatically by Terraform), and shared across all MCP servers via the existing_acm_certificate_arn parameter. AWS auto-renews approximately 60 days before expiry — no manual action required.
If automatic renewal fails (e.g. DNS validation record is removed), ACM notifies the account contact email. Monitor for ACM Certificate Approaching Expiration events in the AWS Health Dashboard.
The platform does not use self-managed or self-signed certificates. All TLS is handled by ACM certificates attached to the ALB.
3. Encryption at Rest#
All persistent data stores use encryption at rest.
| Storage | Encrypted | Key type | Key management | Notes |
|---|---|---|---|---|
| DynamoDB (OAuth token cache) | Yes | AWS-managed KMS (aws/dynamodb) |
AWS-managed; automatic rotation | Default table encryption; one table per MCP per environment |
| SSM Parameter Store (client secrets) | Yes | AWS KMS (aws/ssm) |
AWS-managed; SecureString type |
Secrets never stored as String parameters |
S3 — audit logs (nn-aiconnectors-audit-{env}) |
Yes | SSE-S3 (AES-256) | AWS S3-managed | Encryption set in bucket configuration (server_side_encryption_configuration) |
S3 — Terraform state (aiconnectors-terraform-state-{env}) |
Yes | SSE-S3 (AES-256) | AWS S3-managed | encrypt = true in Terragrunt backend config |
S3 — ALB access logs (aiconnectors-public-alb-logs-{env}) |
Yes | SSE-S3 (default) | AWS S3-managed | Default S3 bucket encryption applied by the ai-lab-infra ALB module |
| CloudWatch Logs | Yes | AWS-managed | AWS-managed | CloudWatch Log groups are encrypted by default using AWS-managed keys |
| ECR images | Yes | AES-256 | AWS-managed | Default ECR encryption; images are also scanned for vulnerabilities via Snyk on each PR |
4. Data Minimisation and Retention#
The platform is deliberately designed to minimise data persistence.
| Data type | Retained? | Retention period | Deletion method | Rationale |
|---|---|---|---|---|
| Microsoft Graph API responses | No | Zero — in-memory only | Garbage collected after response | Content is returned to the caller; no caching reduces risk surface |
| Azure AD OAuth tokens (DynamoDB) | Yes | ~1 hour (TTL attribute) | Automatic DynamoDB TTL expiry | Required to avoid re-authenticating every tool call; short TTL minimises exposure |
| Audit log records (S3) | Yes | 365 days active; archived to Glacier after 90 days | S3 lifecycle: 90d → Glacier; 365d → expire | Required for security review and incident investigation |
| Application logs (CloudWatch) | Yes | 30 days | CloudWatch log group retention policy | Operational troubleshooting; no personal content data |
| ALB access logs (S3) | Yes | 30 days | S3 lifecycle (set by ai-lab-infra module) |
Network-level access records; contains client IP addresses |
| Terraform state (S3) | Yes | Indefinite (versioned) | Manual; no lifecycle expiry | Infrastructure source of truth; deletion would require full environment rebuild |
5. PII Handling#
The audit logs are the only platform-owned persistent store that contains personal data. Each audit record includes:
user_oid— Azure AD object ID of the calling useruser_upn— user principal name (corporate email address, e.g.user@novonordisk.com)params— key tool arguments (e.g. search query strings), which may incidentally contain names
Graph API response content (which can contain email bodies, calendar details, SharePoint document content, and Teams messages including names and communications) is never persisted — it is processed in-memory and returned directly to the caller.
No masking, pseudonymisation, or anonymisation is applied to audit records. The user identity in audit records is the primary mechanism for tracing tool calls to individuals, which is intentional for security and accountability purposes.
Cross-reference: privacy-gdpr.md for data subject rights and GDPR considerations.
6. Regulatory Considerations#
- GDPR: The platform processes personal data (names, email addresses, communication content) on behalf of Novo Nordisk employees. AWS infrastructure is in
eu-central-1(EU). Microsoft Graph data residency is EU for the NN tenant. Cross-reference: privacy-gdpr.md. - Data residency: All platform-owned data (audit logs, token cache, secrets, Terraform state) is stored in
eu-central-1. No data is replicated cross-region.