Skip to content

Risk Assessment: Databricks Access#

Risk Matrix#

Impact: High = Critical failure | Medium = Business disruption | Low = Minor issue
Likelihood: High = Likely | Medium = Possible | Low = Unlikely


Risk 1: Unauthorized Access to Databricks Data#

Related Requirement: @URS:DatabricksAccess (@workspace-access, @catalog-browsing)

What Could Fail: An authenticated user gains access to Databricks data beyond their Unity Catalog permissions, or an unauthenticated request reaches the Databricks REST API.

Cause: - Missing or bypassed Azure AD token validation on incoming MCP requests - OBO token exchange skipped, allowing direct use of the user's raw token with elevated scope - Workspace probing leaking workspace existence to unauthorized users

Impact: High
Likelihood: Low
Risk Level: High

Mitigation: - Azure AD OAuth2 enforced on all MCP endpoints via FastMCP AzureProvider; unauthenticated requests rejected before any Databricks call is made - OBO token exchange (auth.py) mandatory for all workspace access; the Databricks token is derived from the user's Azure AD token and inherits their permissions - Unity Catalog enforces per-user data access on all REST API and managed MCP calls — the MCP connector cannot grant access beyond what Databricks already permits - Per-user client cache keyed by Azure AD object ID (oid), preventing cross-user token reuse

Verification: - Test: Submit request without Bearer token → expect 401 before any Databricks API call is made - Test: Submit request with a valid token for User A while cached session exists for User B → expect User A's own data, not User B's - Test: Attempt to access a catalog the test user does not have Unity Catalog permissions for → expect 403 from Databricks

Residual Risk: Low


Risk 2: SQL Write Operations Executed via Proxy#

Related Requirement: @URS:DatabricksAccess (@sql-query, @read-only-enforcement)

What Could Fail: An AI assistant or user calls a Databricks SQL tool that executes a destructive SQL statement (INSERT, UPDATE, DELETE, DROP, CREATE, ALTER), causing unintended data modification or loss.

Cause: - The Databricks SQL MCP endpoint exposes both read and write tools (including execute_sql) - AI assistants may select write tools without understanding the intended read-only scope

Impact: High
Likelihood: Medium
Risk Level: High

Mitigation: - Application-level blocklist (_BLOCKED_SQL_TOOLS = {"execute_sql"}) prevents execute_sql from being forwarded to Databricks - list_sql_tools filters out blocked tools before returning the tool list, so write tools are never advertised to the AI assistant - proxy_to_sql raises a ValueError if a blocked tool name is submitted directly - Tool description explicitly states read-only intent: "Only read operations (SELECT, DESCRIBE, EXPLAIN) are permitted"

Verification: - Test: Call proxy_to_sql with tool_name="execute_sql" → expect ValueError, no Databricks call made - Test: Call list_sql_tools → expect execute_sql absent from returned list - Test: Submit SELECT query via proxy_to_sql → expect results returned normally

Residual Risk: Low


Risk 3: Sensitive Data Exposure via Audit Logs or Error Messages#

Related Requirement: @URS:DatabricksAccess (@workspace-access)

What Could Fail: OAuth2 tokens, Databricks bearer tokens, or sensitive query results are written to CloudWatch logs or returned in error responses, enabling unauthorized access to credentials or data.

Cause: - OBO error handler logging full Azure AD error response body (may include token fragments) - Unhandled exceptions propagating raw httpx error details including Authorization headers to the MCP response

Impact: High
Likelihood: Medium
Risk Level: High

Mitigation: - Audit logging via @audit_tool captures tool name, user identity, and outcome — does not log tool arguments or results - Azure AD client secrets stored exclusively in AWS SSM Parameter Store; never in environment variables or code - OBO token exchange errors are caught and re-raised without exposing downstream API details to the MCP response - CloudWatch log access controlled by IAM policies (ECS task role, not public)

Verification: - Test: Trigger OBO token exchange failure → inspect CloudWatch logs for absence of bearer token values - Code Review: Confirm audit_tool decorator does not log tool_args or result content - Manual Review: Verify SSM parameters are SecureString type with KMS encryption

Residual Risk: Medium
(Note: OBO error body logging — see security finding [E6.6] / issue #124 — is an open gap that elevates residual risk until resolved.)


Risk 4: Resource Exhaustion via Unbounded Proxy Calls#

Related Requirement: @URS:DatabricksAccess (@sql-query, @genie-query)

What Could Fail: A user or AI assistant submits many concurrent long-running Genie or SQL queries, exhausting ECS container resources or exceeding Databricks compute budgets, causing service degradation for other users.

Cause: - No application-level rate limiting on proxy_to_genie or proxy_to_sql - Databricks Genie and SQL queries can run up to 300 seconds each (configurable DATABRICKS_PROXY_TIMEOUT) - ECS Fargate container has fixed CPU/memory; many concurrent blocked requests exhaust available threads

Impact: Medium
Likelihood: Medium
Risk Level: Medium

Mitigation: - ECS auto-scaling increases container count under sustained load - Databricks itself enforces cluster-level concurrency limits and query queuing - CloudWatch alarms monitor ECS CPU/memory and trigger alerts before saturation

Verification: - Test: Submit 20 concurrent proxy_to_sql requests from the same user → verify service remains responsive for other users - Manual Review: Confirm CloudWatch alarms are configured for ECS task CPU > 80%

Residual Risk: Medium
(Note: Application-level rate limiting is an open gap — see security finding [D3.3] / issue #123.)


Risk 5: Insufficient Audit Trail for Databricks Data Access#

Related Requirement: @URS:DatabricksAccess (@catalog-browsing, @sql-query, @genie-query)

What Could Fail: Access to sensitive Databricks data assets cannot be investigated after a security incident because tool invocations are not durably logged with sufficient context.

Cause: - Audit logging disabled or misconfigured (empty AUDIT_FIREHOSE_STREAM) - Tool invocations not captured with user identity and workspace context

Impact: Medium
Likelihood: Low
Risk Level: Medium

Mitigation: - @audit_tool decorator applied to all MCP tools logs tool name, user identity (from Azure AD token), timestamp, and outcome to Kinesis Firehose → S3 - AUDIT_FIREHOSE_STREAM set to aiconnectors-audit-dev / aiconnectors-audit-prod in ECS task definition; local dev uses empty string (audit disabled only in dev) - Databricks platform provides an independent, authoritative audit trail for all SQL, API, and managed MCP access at the workspace level via Unity Catalog audit logs (see Databricks audit log documentation); this log is managed by DataCore (the Databricks platform owner) and is outside the scope of this MCP connector - Because all MCP operations are performed via OBO (the user's own identity is forwarded), DataCore can correlate MCP-originating queries with user identity directly from the Databricks audit log if an investigation is required

Verification: - Test: Invoke list_catalogs as an authenticated user → verify audit log entry appears in S3 with correct user ID and tool name - Test: Invoke proxy_to_sql → verify audit log entry includes tool name and outcome - Manual Review: Confirm AUDIT_FIREHOSE_STREAM is non-empty in production ECS task definition - Manual Review (DataCore): Confirm Databricks workspace audit logging is enabled and retained per NN data governance policy

Residual Risk: Low


Risk 6: Exposure of Strictly Confidential Information (SCI) via MCP Queries#

Related Requirement: @URS:DatabricksAccess (@sql-query, @genie-query, @catalog-browsing)

What Could Fail: A user retrieves Strictly Confidential Information (SCI) from Databricks through the MCP connector — either via a SQL SELECT on a table containing SCI, or via a natural language Genie query that returns SCI in its response — and that information is subsequently surfaced in an AI assistant context where it should not appear.

Cause: - The MCP connector is a transparent proxy: it does not inspect, classify, or filter the content of Databricks query results - Databricks workspaces (in particular the DataCore-managed workspaces) may contain data classified as SCI under Novo Nordisk's data classification policy - Genie Spaces are a specific concern: access to a Genie Space is controlled by space-level permissions (not catalog-level Unity Catalog tags), meaning a user with access to a Genie Space could receive SCI-containing responses if the underlying data is not pre-filtered at the source

Impact: High
Likelihood: Medium
Risk Level: High

Mitigation: - The MCP connector pre-filters SCI-tagged resources to the best of its ability using Unity Catalog tag metadata: catalogs, schemas, and tables carrying an SCI classification tag are excluded from list_catalogs, list_schemas, list_tables, get_table, and search_tables responses before they are returned to the AI assistant - All MCP access is user-scoped via OBO — users cannot access Databricks data beyond what their Unity Catalog permissions already permit; the tag-based pre-filter adds a defence-in-depth layer on top of permission enforcement - DataCore is responsible for correctly tagging SCI data assets in Unity Catalog; the effectiveness of the MCP's pre-filter depends entirely on DataCore applying classification tags consistently at catalog, schema, and table level - Genie Spaces are a residual gap: Genie Space access is governed by space-level configuration, not catalog tags; the MCP cannot inspect or filter Genie response content for SCI. DataCore must ensure that Genie Spaces configured for use via this MCP do not have access to underlying SCI data sources - The MCP connector's audit trail (Firehose → S3) and the Databricks platform audit log (managed by DataCore) provide a record of all tool invocations, supporting SCI incident investigation

Verification: - Test: Call list_catalogs / list_schemas / list_tables → confirm catalogs/schemas/tables tagged SCI are absent from responses - Test: Attempt proxy_to_sql query against a known SCI-tagged table → expect the table to be absent from list_tables and a 404/403 from Databricks if queried directly - Manual Review (DataCore): Confirm SCI classification tags are applied consistently to all SCI-containing catalogs, schemas, and tables in workspaces reachable via this MCP - Manual Review (DataCore): Confirm Genie Spaces exposed via this MCP do not have access to SCI-classified data sources

Residual Risk: Medium
(Pre-filtering on UC tags reduces exposure for catalog/schema/table browsing and SQL. Genie Spaces remain an accepted residual risk: the MCP cannot filter Genie response content, and mitigation depends on DataCore's Genie Space configuration. Any reduction of residual risk for Genie requires DataCore to attest that no exposed Genie Space has access to SCI data.)


Summary Table#

Risk ID Description Impact Likelihood Level Mitigation Residual
RISK-001 Unauthorized access to Databricks data High Low High Azure AD enforcement, OBO exchange, Unity Catalog permissions, per-user cache Low
RISK-002 SQL write operations executed via proxy High Medium High Application blocklist, filtered tool list, explicit ValueError on blocked tools Low
RISK-003 Sensitive data exposure via logs or errors High Medium High Audit tool excludes args/results, SSM secrets, IAM-controlled logs Medium
RISK-004 Resource exhaustion via unbounded proxy calls Medium Medium Medium ECS auto-scaling, Databricks concurrency limits, CloudWatch alarms Medium
RISK-005 Insufficient audit trail Medium Low Medium @audit_tool on all tools, Firehose → S3, Databricks platform audit (DataCore) Low
RISK-006 SCI data exposure via SQL/Genie queries High Medium High MCP pre-filters SCI-tagged UC assets; OBO user-scoping; Genie gap mitigated by DataCore space config Medium

Version: 1.0
Date: 2026-04-28
Approved by: Pending Review
Related Artifacts: - User Requirements: requirements/features/databricks-access.feature - Intended Use: docs/requirements/01-intended-use.md - Design: docs/design-databricks-access.md - Open security findings: #122 (E6.2), #123 (D3.3), #124 (E6.6/G7.2), #125 (D3.1), #126 (V8.1/V8.2)