Skip to content

Data Storage and Encryption#

Relevant controls: SC.03.01, SC.03.02, SC.03.03, SC.03.04

Compliance reference

This document supports ITRA control SC-03 Data Protection.


1. Data Classification#

AI connectors processes data on behalf of authenticated Novo Nordisk employees. No business data is persisted by the platform — all Microsoft Graph API response content is held in memory only and never written to disk or a database. The only platform-owned persistent data stores are the OAuth token cache (DynamoDB), audit logs (S3), and infrastructure secrets (SSM).

Data type NN Classification Storage location Notes
Microsoft Graph API responses (email, calendar, SharePoint files, Teams messages) Varies (up to Strictly Confidential) In-memory only — never persisted Sensitivity label filtering applied at runtime for SharePoint and Outlook
Azure AD OAuth tokens (OBO access tokens) Confidential DynamoDB (TTL ~1 hour) Short-lived; contain user identity claims
Audit log records (user OID, UPN, tool call metadata) Internal S3 (nn-aiconnectors-audit-{env}) Retained 365 days; no content payloads
Azure AD client secrets Strictly Confidential SSM Parameter Store (SecureString, KMS-encrypted) Never in code, images, or logs
Application logs (HTTP access, errors) Internal CloudWatch Logs No personal content; user identity not logged
Infrastructure state (Terraform) Confidential S3 (aiconnectors-terraform-state-{env}) Contains resource IDs and ARNs; KMS-encrypted

2. Encryption in Transit#

All data in transit between components is encrypted. The platform enforces HTTPS at every boundary.

Data path Protocol TLS version / policy Notes
Client → ALB HTTPS TLS 1.3 preferred; TLS 1.2 minimum (ELBSecurityPolicy-TLS13-1-2-2021-06) TLS terminated at ALB; ACM wildcard certificate
ALBECS task HTTP / 80 N/A — plaintext Internal VPC only; ECS tasks in private subnets with no internet exposure; traffic never leaves the VPC
ECS → Azure AD token endpoint HTTPS TLS 1.2+ (Microsoft-enforced) login.microsoftonline.com
ECS → Microsoft Graph API HTTPS TLS 1.2+ (Microsoft-enforced) graph.microsoft.com
ECS → Azure Databricks HTTPS TLS 1.2+ (Databricks-enforced) Per-workspace HTTPS endpoint
ECS → DynamoDB HTTPS TLS 1.2+ (AWS-enforced) Via VPC Gateway endpoint
ECSSSM Parameter Store HTTPS TLS 1.2+ (AWS-enforced) Via VPC; at container startup only
ECS → Kinesis Firehose HTTPS TLS 1.2+ (AWS-enforced) Audit log delivery per tool call
ECSECR HTTPS TLS 1.2+ (AWS-enforced) Image pull at task startup, via S3 Gateway VPC endpoint

The ALB-to-ECS path is HTTP over a private subnet. This is an accepted design choice — the traffic is within a VPC that has no inbound access from the internet (only the ALB is internet-facing), and TLS at this hop would add latency with no meaningful security gain given the network controls in place.

ACM Certificate Management#

TLS termination uses AWS Certificate Manager (ACM) wildcard certificates:

Environment Domain Renewal
Dev *.dev.connectors.novo-genai.com Auto-renewed by AWS (DNS validation)
Prod *.connectors.novo-genai.com Auto-renewed by AWS (DNS validation)

Certificates are provisioned by Terraform in infra/main/shared/ (acm module), validated via Route 53 DNS CNAME (created automatically by Terraform), and shared across all MCP servers via the existing_acm_certificate_arn parameter. AWS auto-renews approximately 60 days before expiry — no manual action required.

If automatic renewal fails (e.g. DNS validation record is removed), ACM notifies the account contact email. Monitor for ACM Certificate Approaching Expiration events in the AWS Health Dashboard.

The platform does not use self-managed or self-signed certificates. All TLS is handled by ACM certificates attached to the ALB.


3. Encryption at Rest#

All persistent data stores use encryption at rest.

Storage Encrypted Key type Key management Notes
DynamoDB (OAuth token cache) Yes AWS-managed KMS (aws/dynamodb) AWS-managed; automatic rotation Default table encryption; one table per MCP per environment
SSM Parameter Store (client secrets) Yes AWS KMS (aws/ssm) AWS-managed; SecureString type Secrets never stored as String parameters
S3 — audit logs (nn-aiconnectors-audit-{env}) Yes SSE-S3 (AES-256) AWS S3-managed Encryption set in bucket configuration (server_side_encryption_configuration)
S3 — Terraform state (aiconnectors-terraform-state-{env}) Yes SSE-S3 (AES-256) AWS S3-managed encrypt = true in Terragrunt backend config
S3 — ALB access logs (aiconnectors-public-alb-logs-{env}) Yes SSE-S3 (default) AWS S3-managed Default S3 bucket encryption applied by the ai-lab-infra ALB module
CloudWatch Logs Yes AWS-managed AWS-managed CloudWatch Log groups are encrypted by default using AWS-managed keys
ECR images Yes AES-256 AWS-managed Default ECR encryption; images are also scanned for vulnerabilities via Snyk on each PR

4. Data Minimisation and Retention#

The platform is deliberately designed to minimise data persistence.

Data type Retained? Retention period Deletion method Rationale
Microsoft Graph API responses No Zero — in-memory only Garbage collected after response Content is returned to the caller; no caching reduces risk surface
Azure AD OAuth tokens (DynamoDB) Yes ~1 hour (TTL attribute) Automatic DynamoDB TTL expiry Required to avoid re-authenticating every tool call; short TTL minimises exposure
Audit log records (S3) Yes 365 days active; archived to Glacier after 90 days S3 lifecycle: 90d → Glacier; 365d → expire Required for security review and incident investigation
Application logs (CloudWatch) Yes 30 days CloudWatch log group retention policy Operational troubleshooting; no personal content data
ALB access logs (S3) Yes 30 days S3 lifecycle (set by ai-lab-infra module) Network-level access records; contains client IP addresses
Terraform state (S3) Yes Indefinite (versioned) Manual; no lifecycle expiry Infrastructure source of truth; deletion would require full environment rebuild

5. PII Handling#

The audit logs are the only platform-owned persistent store that contains personal data. Each audit record includes:

  • user_oid — Azure AD object ID of the calling user
  • user_upn — user principal name (corporate email address, e.g. user@novonordisk.com)
  • params — key tool arguments (e.g. search query strings), which may incidentally contain names

Graph API response content (which can contain email bodies, calendar details, SharePoint document content, and Teams messages including names and communications) is never persisted — it is processed in-memory and returned directly to the caller.

No masking, pseudonymisation, or anonymisation is applied to audit records. The user identity in audit records is the primary mechanism for tracing tool calls to individuals, which is intentional for security and accountability purposes.

Cross-reference: privacy-gdpr.md for data subject rights and GDPR considerations.


6. Regulatory Considerations#

  • GDPR: The platform processes personal data (names, email addresses, communication content) on behalf of Novo Nordisk employees. AWS infrastructure is in eu-central-1 (EU). Microsoft Graph data residency is EU for the NN tenant. Cross-reference: privacy-gdpr.md.
  • Data residency: All platform-owned data (audit logs, token cache, secrets, Terraform state) is stored in eu-central-1. No data is replicated cross-region.