Logging and Monitoring#

Relevant controls: SC.07.01, SC.07.02, SC.07.04, SC.07.06

1. Log Requirements#

All MCP tool invocations are logged to an immutable audit trail. Application events (startup, errors, auth failures, HTTP requests) are logged to CloudWatch. Infrastructure activity is captured by AWS CloudTrail.

Event type	Logged?	Source	Notes
Successful MCP tool calls	Yes	Audit Firehose → S3	Every tool call, regardless of outcome
Failed / errored tool calls	Yes	Audit Firehose → S3	`outcome=error` with error message
Redacted tool calls (sensitivity filter)	Yes	Audit Firehose → S3	`outcome=redacted`
OBO token exchange failures	Yes	CloudWatch (ERROR log)	Metric filter + alarm
HTTP 4xx responses	Yes	CloudWatch (uvicorn.access)	Metric filter + alarm
HTTP 5xx responses	Yes	CloudWatch (uvicorn.access)	Metric filter + alarm
User identity (oid, upn) per tool call	Yes	Audit Firehose → S3	Extracted from Azure AD JWT
Container startup / shutdown	Yes	CloudWatch (INFO log)
AWS API calls (infra, secrets access)	Yes	AWS CloudTrail	All AWS API activity
SSM secret reads (client ID, secret)	Yes	AWS CloudTrail	At container startup
Changes to IAM, ECS, ECR, SSM	Yes	AWS CloudTrail
Azure AD sign-in events	Yes	Azure AD sign-in logs	Managed by Microsoft / NN IT

2. Audit Log Format and Content#

Audit logs are the primary security record. Every MCP tool call produces exactly one NDJSON record written synchronously to Kinesis Firehose before the response is returned to the caller.

2.1 Record Structure#

{
  "event":       "tool_call",
  "ts":          "2026-04-29T10:32:01.123456Z",
  "mcp":         "sharepoint",
  "env":         "prod",
  "tool":        "search_documents",
  "user_oid":    "a1b2c3d4-e5f6-...",
  "user_upn":    "user@novonordisk.com",
  "params":      { "query": "project plan", "page_size": 25 },
  "outcome":     "success",
  "duration_ms": 312,
  "error":       null
}

Field	Type	Description
`event`	string	Always `"tool_call"`
`ts`	string	ISO 8601 UTC timestamp
`mcp`	string	MCP server name (`sharepoint`, `outlook`, `teams`, `databricks`)
`env`	string	Deployment environment (`dev`, `prod`)
`tool`	string	Tool function name as registered with FastMCP
`user_oid`	string	Azure AD object ID of the calling user
`user_upn`	string	User principal name (email address)
`params`	object	Key tool arguments (not large content payloads)
`outcome`	string	`success` \| `error` \| `redacted`
`duration_ms`	integer	Wall-clock execution time in milliseconds
`error`	string \| null	Error message when `outcome=error`; otherwise `null`

2.2 Outcome Values#

Outcome	Meaning
`success`	Tool executed and returned data to the caller
`error`	Tool raised an exception; `error` field contains the message
`redacted`	Tool completed but content was withheld due to sensitivity label filtering

2.3 Implementation#

Audit logging is implemented in connectors/libs/nn-mcp-core/src/nn_mcp_core/audit.py and applied to every tool via the @audit_tool decorator:

@mcp.tool()
@audit_tool(enable_oauth=settings.enable_oauth, check_redacted=True)
async def search_documents(query: str, page_size: int = 25) -> list[dict]:
    ...

The decorator always emits an audit record in a finally block — even if the tool raises an exception. Failures to deliver to Firehose are logged as warnings but never surface to the caller.

User identity (oid, upn) is extracted from the Azure AD JWT Bearer token on every request via get_user_claims() in nn_mcp_core/audit.py.

3. Application Log Format and Content#

Application logs are written as JSON to stdout, captured by the ECS awslogs driver, and sent to CloudWatch Logs.

3.1 Log Record Structure#

{
  "timestamp": "2026-04-29T10:32:01.123456+00:00",
  "level":     "INFO",
  "logger":    "sharepoint_mcp.server",
  "message":   "OBO token exchange failed",
  "mcp":       "sharepoint",
  "env":       "prod"
}

HTTP access log records additionally include:

{
  "status_code": 401,
  "method":      "POST",
  "path":        "/mcp"
}

Error records include a full traceback in an exception field.

3.2 Log Levels by Environment#

Environment	Log level	Effect
Dev	`DEBUG`	Full request/response detail, token exchange steps
Prod	`INFO`	Startup events, HTTP requests, errors

Third-party library loggers (httpx, boto3, azure, msal) are capped at WARNING in all environments to reduce noise.

4. Log Sources and Destinations#

Source	Log type	Destination	Format
MCP tool calls (all MCPs)	Audit	Kinesis Firehose → S3	NDJSON
ECS container stdout/stderr	Application	CloudWatch Logs	JSON
AWS API activity	Infrastructure	AWS CloudTrail	JSON
ALB	Access logs	S3 (`aiconnectors-public-alb-logs-{env}`)	ALB format
Azure AD	Sign-in / token events	Azure AD logs	Managed by NN IT

4.1 CloudWatch Log Groups#

Log groups follow the pattern /ecs/{cluster}/{service}/{appname}, e.g.: /ecs/aiconnectors-{env}-aiconnectors/mcp-{name}-main-svc/mcp-{name}

Log driver: awslogs, region: eu-central-1.

4.2 Audit S3 Bucket Layout#

nn-aiconnectors-audit-{env}/
└── dt=yyyy-MM-dd/
    └── aiconnectors-audit-{env}-1-yyyy-MM-dd-HH-mm-ss-{uuid}

Each object contains multiple NDJSON lines (buffered up to 5 MB or 60 seconds by Firehose). Objects are partitioned by date, enabling efficient querying by date range using the /check-audit-logs skill.

4.3 ALB Access Log Bucket#

ALB access logs are written by AWS to aiconnectors-public-alb-logs-{env} (S3, eu-central-1), created by the ai-lab-infra alb module.

Bucket configuration:

Control	Value
Retention / lifecycle	30 days → expire (S3 lifecycle rule `log_retention`)
ACL	`log-delivery-write` (AWS ELB delivery principal)
Insecure transport	`s3:PutObject` denied over HTTP (policy enforced by module)
Public access	Blocked (default S3 bucket public access block)
`force_destroy`	`true` — bucket can be destroyed via Terraform when empty

Note: ALB access logs contain client IP addresses (personal data under GDPR). The 30-day retention is set by the shared ai-lab-infra module default; this is shorter than the audit log retention (365 days). ALB logs are supplementary to the CloudWatch HTTP access logs and the Kinesis Firehose audit trail, which carry the primary security record.

5. Log Retention#

Log type	Retention	Location	Policy
Audit logs (tool calls)	90 days Standard, then Glacier; 365 days total	S3 `nn-aiconnectors-audit-{env}`	S3 lifecycle: 90d → Glacier; 365d → expire
CloudWatch application logs	30 days	CloudWatch Logs	Retention policy set in ECS task definition
Firehose delivery errors	30 days	CloudWatch `/aws/kinesisfirehose/aiconnectors-audit-{env}`	Hardcoded in audit.tf
AWS CloudTrail	90 days (default)	CloudTrail console	Event history retained by AWS; no separate S3 trail configured
ALB access logs	30 days	S3 `aiconnectors-public-alb-logs-{env}`	S3 lifecycle: 30d → expire (set by `ai-lab-infra` ALB module)

6. Log Integrity and Tamper Protection#

Audit S3 Bucket#

The audit S3 bucket (nn-aiconnectors-audit-{env}) has the following tamper protections:

Control	Implementation
Deletion prevention	Bucket policy explicitly denies `s3:DeleteObject` and `s3:DeleteObjectVersion` for all principals (including account root)
Public access blocked	All four S3 public access block flags set to `true`
Versioning	Enabled — overwrites create new versions, originals are preserved
Encryption at rest	SSE-S3 (AES256)
`force_destroy`	`false` — bucket cannot be destroyed via Terraform while objects exist

No IAM role in the platform has s3:DeleteObject permission on the audit bucket — not the ECS task role, not the GitHub Actions deploy role, and not the Firehose IAM role (which has s3:PutObject only).

CloudWatch Logs#

ECS task roles do not have logs:DeleteLogGroup or logs:DeleteLogStream permissions. Log groups are created by the ECS service at startup and can only be written to, not deleted, by the application.

Audit Record Integrity#

Audit records are written after tool execution in a finally block, meaning they cannot be suppressed by application-level errors. Failures to write to Firehose are logged as WARNING in CloudWatch but do not affect the tool response — this ensures audit delivery failures are themselves visible and alertable.

7. Monitoring and Alerting#

Four CloudWatch alarms are provisioned per MCP, each publishing to the shared SNS topic aiconnectors-alarms-{env} (email subscribers: AI connectors team).

Alarm	Metric filter	Threshold	Period	Action
Auth failures	`{ $.message = "OBO token exchange failed" }`	≥ 5 occurrences	5 min	SNS email
Error rate	`{ $.level = "ERROR" }`	≥ 20 occurrences	5 min	SNS email
HTTP 4xx spike	`{ $.status_code >= 400 && $.status_code < 500 }`	≥ 50 occurrences	5 min	SNS email
HTTP 5xx spike	`{ $.status_code >= 500 && $.status_code < 600 }`	≥ 10 occurrences	5 min	SNS email

All alarms use treat_missing_data = notBreaching — silence is not treated as a problem. Alarms operate on CloudWatch Logs metric filters in the MCP/Security namespace.

Additional Monitoring#

What	How
Audit log delivery failures	Firehose CloudWatch log group `/aws/kinesisfirehose/aiconnectors-audit-{env}`
ECS service health	ALB target group health checks (`GET /health`, 15s interval)
Container resource utilisation	ECS CloudWatch metrics (CPU, memory) — available but no alarm configured

8. Log Review Process#

Routine monitoring is fully automated via the CloudWatch alarms defined in section 7. Manual log review is reserved for security investigations and incident response — not performed on a scheduled basis.

During a security investigation, audit logs in S3 can be queried using the /check-audit-logs Claude Code skill, which:

Lists objects in nn-aiconnectors-audit-{env} for the selected date range
Downloads and filters NDJSON records by MCP, user, outcome, and date
Produces a summary table (total records, MCP breakdown, outcome breakdown, unique users)
Shows individual error records in full

AWS profiles required: ai-connectors-dev / ai-connectors-prod

See incident-response.md §2 for the full list of log sources and query methods used during an investigation.

Audit Instrumentation Verification#

Every tool must have audit logging instrumentation. This is verified per release using the /verify-audit-logging skill, which checks that every @mcp.tool() function has the required try/finally + log_tool_call pattern and that AUDIT_FIREHOSE_STREAM is set in the ECS environment YAMLs.