Logging and Monitoring#
Relevant controls: SC.07.01, SC.07.02, SC.07.04, SC.07.06
1. Log Requirements#
All MCP tool invocations are logged to an immutable audit trail. Application events (startup, errors, auth failures, HTTP requests) are logged to CloudWatch. Infrastructure activity is captured by AWS CloudTrail.
| Event type | Logged? | Source | Notes |
|---|---|---|---|
| Successful MCP tool calls | Yes | Audit Firehose → S3 | Every tool call, regardless of outcome |
| Failed / errored tool calls | Yes | Audit Firehose → S3 | outcome=error with error message |
| Redacted tool calls (sensitivity filter) | Yes | Audit Firehose → S3 | outcome=redacted |
| OBO token exchange failures | Yes | CloudWatch (ERROR log) | Metric filter + alarm |
| HTTP 4xx responses | Yes | CloudWatch (uvicorn.access) | Metric filter + alarm |
| HTTP 5xx responses | Yes | CloudWatch (uvicorn.access) | Metric filter + alarm |
| User identity (oid, upn) per tool call | Yes | Audit Firehose → S3 | Extracted from Azure AD JWT |
| Container startup / shutdown | Yes | CloudWatch (INFO log) | |
| AWS API calls (infra, secrets access) | Yes | AWS CloudTrail | All AWS API activity |
| SSM secret reads (client ID, secret) | Yes | AWS CloudTrail | At container startup |
| Changes to IAM, ECS, ECR, SSM | Yes | AWS CloudTrail | |
| Azure AD sign-in events | Yes | Azure AD sign-in logs | Managed by Microsoft / NN IT |
2. Audit Log Format and Content#
Audit logs are the primary security record. Every MCP tool call produces exactly one NDJSON record written synchronously to Kinesis Firehose before the response is returned to the caller.
2.1 Record Structure#
{
"event": "tool_call",
"ts": "2026-04-29T10:32:01.123456Z",
"mcp": "sharepoint",
"env": "prod",
"tool": "search_documents",
"user_oid": "a1b2c3d4-e5f6-...",
"user_upn": "user@novonordisk.com",
"params": { "query": "project plan", "page_size": 25 },
"outcome": "success",
"duration_ms": 312,
"error": null
}
| Field | Type | Description |
|---|---|---|
event |
string | Always "tool_call" |
ts |
string | ISO 8601 UTC timestamp |
mcp |
string | MCP server name (sharepoint, outlook, teams, databricks) |
env |
string | Deployment environment (dev, prod) |
tool |
string | Tool function name as registered with FastMCP |
user_oid |
string | Azure AD object ID of the calling user |
user_upn |
string | User principal name (email address) |
params |
object | Key tool arguments (not large content payloads) |
outcome |
string | success | error | redacted |
duration_ms |
integer | Wall-clock execution time in milliseconds |
error |
string | null | Error message when outcome=error; otherwise null |
2.2 Outcome Values#
| Outcome | Meaning |
|---|---|
success |
Tool executed and returned data to the caller |
error |
Tool raised an exception; error field contains the message |
redacted |
Tool completed but content was withheld due to sensitivity label filtering |
2.3 Implementation#
Audit logging is implemented in connectors/libs/nn-mcp-core/src/nn_mcp_core/audit.py and applied to every tool via the @audit_tool decorator:
@mcp.tool()
@audit_tool(enable_oauth=settings.enable_oauth, check_redacted=True)
async def search_documents(query: str, page_size: int = 25) -> list[dict]:
...
The decorator always emits an audit record in a finally block — even if the tool raises an exception. Failures to deliver to Firehose are logged as warnings but never surface to the caller.
User identity (oid, upn) is extracted from the Azure AD JWT Bearer token on every request via get_user_claims() in nn_mcp_core/audit.py.
3. Application Log Format and Content#
Application logs are written as JSON to stdout, captured by the ECS awslogs driver, and sent to CloudWatch Logs.
3.1 Log Record Structure#
{
"timestamp": "2026-04-29T10:32:01.123456+00:00",
"level": "INFO",
"logger": "sharepoint_mcp.server",
"message": "OBO token exchange failed",
"mcp": "sharepoint",
"env": "prod"
}
HTTP access log records additionally include:
Error records include a full traceback in an exception field.
3.2 Log Levels by Environment#
| Environment | Log level | Effect |
|---|---|---|
| Dev | DEBUG |
Full request/response detail, token exchange steps |
| Prod | INFO |
Startup events, HTTP requests, errors |
Third-party library loggers (httpx, boto3, azure, msal) are capped at WARNING in all environments to reduce noise.
4. Log Sources and Destinations#
| Source | Log type | Destination | Format |
|---|---|---|---|
| MCP tool calls (all MCPs) | Audit | Kinesis Firehose → S3 | NDJSON |
| ECS container stdout/stderr | Application | CloudWatch Logs | JSON |
| AWS API activity | Infrastructure | AWS CloudTrail | JSON |
| ALB | Access logs | S3 (aiconnectors-public-alb-logs-{env}) |
ALB format |
| Azure AD | Sign-in / token events | Azure AD logs | Managed by NN IT |
4.1 CloudWatch Log Groups#
Log groups follow the pattern /ecs/{cluster}/{service}/{appname}, e.g.:
/ecs/aiconnectors-{env}-aiconnectors/mcp-{name}-main-svc/mcp-{name}
Log driver: awslogs, region: eu-central-1.
4.2 Audit S3 Bucket Layout#
nn-aiconnectors-audit-{env}/
└── dt=yyyy-MM-dd/
└── aiconnectors-audit-{env}-1-yyyy-MM-dd-HH-mm-ss-{uuid}
Each object contains multiple NDJSON lines (buffered up to 5 MB or 60 seconds by Firehose). Objects are partitioned by date, enabling efficient querying by date range using the /check-audit-logs skill.
4.3 ALB Access Log Bucket#
ALB access logs are written by AWS to aiconnectors-public-alb-logs-{env} (S3, eu-central-1), created by the ai-lab-infra alb module.
Bucket configuration:
| Control | Value |
|---|---|
| Retention / lifecycle | 30 days → expire (S3 lifecycle rule log_retention) |
| ACL | log-delivery-write (AWS ELB delivery principal) |
| Insecure transport | s3:PutObject denied over HTTP (policy enforced by module) |
| Public access | Blocked (default S3 bucket public access block) |
force_destroy |
true — bucket can be destroyed via Terraform when empty |
Note: ALB access logs contain client IP addresses (personal data under GDPR). The 30-day retention is set by the shared
ai-lab-inframodule default; this is shorter than the audit log retention (365 days). ALB logs are supplementary to the CloudWatch HTTP access logs and the Kinesis Firehose audit trail, which carry the primary security record.
5. Log Retention#
| Log type | Retention | Location | Policy |
|---|---|---|---|
| Audit logs (tool calls) | 90 days Standard, then Glacier; 365 days total | S3 nn-aiconnectors-audit-{env} |
S3 lifecycle: 90d → Glacier; 365d → expire |
| CloudWatch application logs | 30 days | CloudWatch Logs | Retention policy set in ECS task definition |
| Firehose delivery errors | 30 days | CloudWatch /aws/kinesisfirehose/aiconnectors-audit-{env} |
Hardcoded in audit.tf |
| AWS CloudTrail | 90 days (default) | CloudTrail console | Event history retained by AWS; no separate S3 trail configured |
| ALB access logs | 30 days | S3 aiconnectors-public-alb-logs-{env} |
S3 lifecycle: 30d → expire (set by ai-lab-infra ALB module) |
6. Log Integrity and Tamper Protection#
Audit S3 Bucket#
The audit S3 bucket (nn-aiconnectors-audit-{env}) has the following tamper protections:
| Control | Implementation |
|---|---|
| Deletion prevention | Bucket policy explicitly denies s3:DeleteObject and s3:DeleteObjectVersion for all principals (including account root) |
| Public access blocked | All four S3 public access block flags set to true |
| Versioning | Enabled — overwrites create new versions, originals are preserved |
| Encryption at rest | SSE-S3 (AES256) |
force_destroy |
false — bucket cannot be destroyed via Terraform while objects exist |
No IAM role in the platform has s3:DeleteObject permission on the audit bucket — not the ECS task role, not the GitHub Actions deploy role, and not the Firehose IAM role (which has s3:PutObject only).
CloudWatch Logs#
ECS task roles do not have logs:DeleteLogGroup or logs:DeleteLogStream permissions. Log groups are created by the ECS service at startup and can only be written to, not deleted, by the application.
Audit Record Integrity#
Audit records are written after tool execution in a finally block, meaning they cannot be suppressed by application-level errors. Failures to write to Firehose are logged as WARNING in CloudWatch but do not affect the tool response — this ensures audit delivery failures are themselves visible and alertable.
7. Monitoring and Alerting#
Four CloudWatch alarms are provisioned per MCP, each publishing to the shared SNS topic aiconnectors-alarms-{env} (email subscribers: AI connectors team).
| Alarm | Metric filter | Threshold | Period | Action |
|---|---|---|---|---|
| Auth failures | { $.message = "OBO token exchange failed" } |
≥ 5 occurrences | 5 min | SNS email |
| Error rate | { $.level = "ERROR" } |
≥ 20 occurrences | 5 min | SNS email |
| HTTP 4xx spike | { $.status_code >= 400 && $.status_code < 500 } |
≥ 50 occurrences | 5 min | SNS email |
| HTTP 5xx spike | { $.status_code >= 500 && $.status_code < 600 } |
≥ 10 occurrences | 5 min | SNS email |
All alarms use treat_missing_data = notBreaching — silence is not treated as a problem. Alarms operate on CloudWatch Logs metric filters in the MCP/Security namespace.
Additional Monitoring#
| What | How |
|---|---|
| Audit log delivery failures | Firehose CloudWatch log group /aws/kinesisfirehose/aiconnectors-audit-{env} |
| ECS service health | ALB target group health checks (GET /health, 15s interval) |
| Container resource utilisation | ECS CloudWatch metrics (CPU, memory) — available but no alarm configured |
8. Log Review Process#
Routine monitoring is fully automated via the CloudWatch alarms defined in section 7. Manual log review is reserved for security investigations and incident response — not performed on a scheduled basis.
During a security investigation, audit logs in S3 can be queried using the /check-audit-logs Claude Code skill, which:
- Lists objects in
nn-aiconnectors-audit-{env}for the selected date range - Downloads and filters NDJSON records by MCP, user, outcome, and date
- Produces a summary table (total records, MCP breakdown, outcome breakdown, unique users)
- Shows individual error records in full
AWS profiles required: ai-connectors-dev / ai-connectors-prod
See incident-response.md §2 for the full list of log sources and query methods used during an investigation.
Audit Instrumentation Verification#
Every tool must have audit logging instrumentation. This is verified per release using the /verify-audit-logging skill, which checks that every @mcp.tool() function has the required try/finally + log_tool_call pattern and that AUDIT_FIREHOSE_STREAM is set in the ECS environment YAMLs.