Skip to content

Logging and Monitoring#

Relevant controls: SC.07.01, SC.07.02, SC.07.04, SC.07.06


1. Log Requirements#

All MCP tool invocations are logged to an immutable audit trail. Application events (startup, errors, auth failures, HTTP requests) are logged to CloudWatch. Infrastructure activity is captured by AWS CloudTrail.

Event type Logged? Source Notes
Successful MCP tool calls Yes Audit Firehose → S3 Every tool call, regardless of outcome
Failed / errored tool calls Yes Audit Firehose → S3 outcome=error with error message
Redacted tool calls (sensitivity filter) Yes Audit Firehose → S3 outcome=redacted
OBO token exchange failures Yes CloudWatch (ERROR log) Metric filter + alarm
HTTP 4xx responses Yes CloudWatch (uvicorn.access) Metric filter + alarm
HTTP 5xx responses Yes CloudWatch (uvicorn.access) Metric filter + alarm
User identity (oid, upn) per tool call Yes Audit Firehose → S3 Extracted from Azure AD JWT
Container startup / shutdown Yes CloudWatch (INFO log)
AWS API calls (infra, secrets access) Yes AWS CloudTrail All AWS API activity
SSM secret reads (client ID, secret) Yes AWS CloudTrail At container startup
Changes to IAM, ECS, ECR, SSM Yes AWS CloudTrail
Azure AD sign-in events Yes Azure AD sign-in logs Managed by Microsoft / NN IT

2. Audit Log Format and Content#

Audit logs are the primary security record. Every MCP tool call produces exactly one NDJSON record written synchronously to Kinesis Firehose before the response is returned to the caller.

2.1 Record Structure#

{
  "event":       "tool_call",
  "ts":          "2026-04-29T10:32:01.123456Z",
  "mcp":         "sharepoint",
  "env":         "prod",
  "tool":        "search_documents",
  "user_oid":    "a1b2c3d4-e5f6-...",
  "user_upn":    "user@novonordisk.com",
  "params":      { "query": "project plan", "page_size": 25 },
  "outcome":     "success",
  "duration_ms": 312,
  "error":       null
}
Field Type Description
event string Always "tool_call"
ts string ISO 8601 UTC timestamp
mcp string MCP server name (sharepoint, outlook, teams, databricks)
env string Deployment environment (dev, prod)
tool string Tool function name as registered with FastMCP
user_oid string Azure AD object ID of the calling user
user_upn string User principal name (email address)
params object Key tool arguments (not large content payloads)
outcome string success | error | redacted
duration_ms integer Wall-clock execution time in milliseconds
error string | null Error message when outcome=error; otherwise null

2.2 Outcome Values#

Outcome Meaning
success Tool executed and returned data to the caller
error Tool raised an exception; error field contains the message
redacted Tool completed but content was withheld due to sensitivity label filtering

2.3 Implementation#

Audit logging is implemented in connectors/libs/nn-mcp-core/src/nn_mcp_core/audit.py and applied to every tool via the @audit_tool decorator:

@mcp.tool()
@audit_tool(enable_oauth=settings.enable_oauth, check_redacted=True)
async def search_documents(query: str, page_size: int = 25) -> list[dict]:
    ...

The decorator always emits an audit record in a finally block — even if the tool raises an exception. Failures to deliver to Firehose are logged as warnings but never surface to the caller.

User identity (oid, upn) is extracted from the Azure AD JWT Bearer token on every request via get_user_claims() in nn_mcp_core/audit.py.


3. Application Log Format and Content#

Application logs are written as JSON to stdout, captured by the ECS awslogs driver, and sent to CloudWatch Logs.

3.1 Log Record Structure#

{
  "timestamp": "2026-04-29T10:32:01.123456+00:00",
  "level":     "INFO",
  "logger":    "sharepoint_mcp.server",
  "message":   "OBO token exchange failed",
  "mcp":       "sharepoint",
  "env":       "prod"
}

HTTP access log records additionally include:

{
  "status_code": 401,
  "method":      "POST",
  "path":        "/mcp"
}

Error records include a full traceback in an exception field.

3.2 Log Levels by Environment#

Environment Log level Effect
Dev DEBUG Full request/response detail, token exchange steps
Prod INFO Startup events, HTTP requests, errors

Third-party library loggers (httpx, boto3, azure, msal) are capped at WARNING in all environments to reduce noise.


4. Log Sources and Destinations#

Source Log type Destination Format
MCP tool calls (all MCPs) Audit Kinesis Firehose → S3 NDJSON
ECS container stdout/stderr Application CloudWatch Logs JSON
AWS API activity Infrastructure AWS CloudTrail JSON
ALB Access logs S3 (aiconnectors-public-alb-logs-{env}) ALB format
Azure AD Sign-in / token events Azure AD logs Managed by NN IT

4.1 CloudWatch Log Groups#

Log groups follow the pattern /ecs/{cluster}/{service}/{appname}, e.g.: /ecs/aiconnectors-{env}-aiconnectors/mcp-{name}-main-svc/mcp-{name}

Log driver: awslogs, region: eu-central-1.

4.2 Audit S3 Bucket Layout#

nn-aiconnectors-audit-{env}/
└── dt=yyyy-MM-dd/
    └── aiconnectors-audit-{env}-1-yyyy-MM-dd-HH-mm-ss-{uuid}

Each object contains multiple NDJSON lines (buffered up to 5 MB or 60 seconds by Firehose). Objects are partitioned by date, enabling efficient querying by date range using the /check-audit-logs skill.

4.3 ALB Access Log Bucket#

ALB access logs are written by AWS to aiconnectors-public-alb-logs-{env} (S3, eu-central-1), created by the ai-lab-infra alb module.

Bucket configuration:

Control Value
Retention / lifecycle 30 days → expire (S3 lifecycle rule log_retention)
ACL log-delivery-write (AWS ELB delivery principal)
Insecure transport s3:PutObject denied over HTTP (policy enforced by module)
Public access Blocked (default S3 bucket public access block)
force_destroy true — bucket can be destroyed via Terraform when empty

Note: ALB access logs contain client IP addresses (personal data under GDPR). The 30-day retention is set by the shared ai-lab-infra module default; this is shorter than the audit log retention (365 days). ALB logs are supplementary to the CloudWatch HTTP access logs and the Kinesis Firehose audit trail, which carry the primary security record.


5. Log Retention#

Log type Retention Location Policy
Audit logs (tool calls) 90 days Standard, then Glacier; 365 days total S3 nn-aiconnectors-audit-{env} S3 lifecycle: 90d → Glacier; 365d → expire
CloudWatch application logs 30 days CloudWatch Logs Retention policy set in ECS task definition
Firehose delivery errors 30 days CloudWatch /aws/kinesisfirehose/aiconnectors-audit-{env} Hardcoded in audit.tf
AWS CloudTrail 90 days (default) CloudTrail console Event history retained by AWS; no separate S3 trail configured
ALB access logs 30 days S3 aiconnectors-public-alb-logs-{env} S3 lifecycle: 30d → expire (set by ai-lab-infra ALB module)

6. Log Integrity and Tamper Protection#

Audit S3 Bucket#

The audit S3 bucket (nn-aiconnectors-audit-{env}) has the following tamper protections:

Control Implementation
Deletion prevention Bucket policy explicitly denies s3:DeleteObject and s3:DeleteObjectVersion for all principals (including account root)
Public access blocked All four S3 public access block flags set to true
Versioning Enabled — overwrites create new versions, originals are preserved
Encryption at rest SSE-S3 (AES256)
force_destroy false — bucket cannot be destroyed via Terraform while objects exist

No IAM role in the platform has s3:DeleteObject permission on the audit bucket — not the ECS task role, not the GitHub Actions deploy role, and not the Firehose IAM role (which has s3:PutObject only).

CloudWatch Logs#

ECS task roles do not have logs:DeleteLogGroup or logs:DeleteLogStream permissions. Log groups are created by the ECS service at startup and can only be written to, not deleted, by the application.

Audit Record Integrity#

Audit records are written after tool execution in a finally block, meaning they cannot be suppressed by application-level errors. Failures to write to Firehose are logged as WARNING in CloudWatch but do not affect the tool response — this ensures audit delivery failures are themselves visible and alertable.


7. Monitoring and Alerting#

Four CloudWatch alarms are provisioned per MCP, each publishing to the shared SNS topic aiconnectors-alarms-{env} (email subscribers: AI connectors team).

Alarm Metric filter Threshold Period Action
Auth failures { $.message = "OBO token exchange failed" } ≥ 5 occurrences 5 min SNS email
Error rate { $.level = "ERROR" } ≥ 20 occurrences 5 min SNS email
HTTP 4xx spike { $.status_code >= 400 && $.status_code < 500 } ≥ 50 occurrences 5 min SNS email
HTTP 5xx spike { $.status_code >= 500 && $.status_code < 600 } ≥ 10 occurrences 5 min SNS email

All alarms use treat_missing_data = notBreaching — silence is not treated as a problem. Alarms operate on CloudWatch Logs metric filters in the MCP/Security namespace.

Additional Monitoring#

What How
Audit log delivery failures Firehose CloudWatch log group /aws/kinesisfirehose/aiconnectors-audit-{env}
ECS service health ALB target group health checks (GET /health, 15s interval)
Container resource utilisation ECS CloudWatch metrics (CPU, memory) — available but no alarm configured

8. Log Review Process#

Routine monitoring is fully automated via the CloudWatch alarms defined in section 7. Manual log review is reserved for security investigations and incident response — not performed on a scheduled basis.

During a security investigation, audit logs in S3 can be queried using the /check-audit-logs Claude Code skill, which:

  1. Lists objects in nn-aiconnectors-audit-{env} for the selected date range
  2. Downloads and filters NDJSON records by MCP, user, outcome, and date
  3. Produces a summary table (total records, MCP breakdown, outcome breakdown, unique users)
  4. Shows individual error records in full

AWS profiles required: ai-connectors-dev / ai-connectors-prod

See incident-response.md §2 for the full list of log sources and query methods used during an investigation.

Audit Instrumentation Verification#

Every tool must have audit logging instrumentation. This is verified per release using the /verify-audit-logging skill, which checks that every @mcp.tool() function has the required try/finally + log_tool_call pattern and that AUDIT_FIREHOSE_STREAM is set in the ECS environment YAMLs.