Skip to content

Installation Validation: Databricks Access#

System Criticality: Business-Critical Validation Approach: Risk-based Related Design: DESIGN:DatabricksAccess Related URS: @URS:DatabricksAccess Related Risks: RISK:DatabricksAccess

Overview#

This Installation Validation (IV) plan validates the Databricks Access MCP connector against the documented requirements in @URS:DatabricksAccess and mitigates risks identified in RISK:DatabricksAccess. The connector provides read-only access to Databricks workspaces (Unity Catalog, SQL, Genie) via the MCP protocol, using Azure AD OAuth2 OBO authentication.

Validation Scope: - User-scoped workspace access (OBO authentication; Unity Catalog enforces per-user permissions) - Read-only SQL enforcement (write tools blocked at application layer; execute_sql never forwarded) - Genie Space natural language query forwarding (proxy to Databricks managed MCP) - Unity Catalog browsing (catalogs, schemas, tables, table metadata) - SCI pre-filtering (catalogs/schemas/tables tagged Strictly Confidential excluded from discovery tool responses) - Authentication enforcement (unauthenticated requests rejected before any Databricks call) - Audit logging (all tool invocations recorded to Firehose → S3)


Test Strategy#

Level Purpose Owner When Coverage
Unit Individual tool functions, OBO exchange, blocklist logic Developers During dev Python unit tests with mocked DatabricksClient and mcp_proxy
Integration Tool → DatabricksClient → Databricks REST API QA After unit tests Dev environment with real Databricks workspace, test AAD account
System End-to-end user flows (browse catalog, SQL query, Genie query) QA Before prod deployment Dev environment with production-like ECS config
Security Authentication enforcement, read-only enforcement Security team Before prod deployment Test accounts without Databricks access; blocked tool submissions
Configuration Audit logging, ECS env vars, SSM secrets QA + Ops After infra deploy Manual inspection of CloudWatch logs and audit S3 bucket

Test Environments:

Environment URL Purpose
Dev https://databricks.dev.connectors.novo-genai.com Integration, system, and BDD tests
Prod https://databricks.connectors.novo-genai.com Post-deploy smoke tests only

Test Accounts: - testuser-databricks@novonordisk.com — has Unity Catalog access to test catalog/schema - testuser-noaccess@novonordisk.com — valid Azure AD account with no Databricks permissions


Traceability Matrix#

Requirement Summary Risk Test Case Type Status
@workspace-access User accesses a Databricks workspace RISK-001 TEST-001 Integration Not Started
@catalog-browsing User browses Unity Catalog data assets RISK-001, RISK-006 TEST-002, TEST-010 Integration, Security Not Started
@sql-query User retrieves data with read-only SQL RISK-002 TEST-003 Integration Not Started
@genie-query User asks natural language question via Genie RISK-001 TEST-004 Integration Not Started
@read-only-enforcement Only read-only SQL tools are exposed RISK-002 TEST-005, TEST-006 Security Not Started
RISK-001 Unauthenticated access rejected RISK-001 TEST-007 Security Not Started
RISK-003 No tokens in audit logs RISK-003 TEST-008 Configuration Not Started
RISK-005 Audit log entries created per tool call RISK-005 TEST-009 Configuration Not Started
RISK-006 SCI-tagged assets excluded from discovery responses RISK-006 TEST-010 Security Not Started

Coverage: 5/5 URS scenarios + 5 risk mitigations = 10 test cases (Target: 100%)


Test Cases#

TEST-001: User Accesses an Accessible Databricks Workspace#

Related: @workspace-access | RISK-001 Risk Level: High

Preconditions: - testuser-databricks authenticated via Azure AD OAuth2 - At least one configured Databricks workspace accessible to testuser-databricks

Steps: 1. Connect MCP client to https://databricks.dev.connectors.novo-genai.com with testuser-databricks token 2. Call list_genie_spaces (triggers workspace auto-detection)

Expected: - Response contains at least one Genie space entry - No authentication error returned - Audit log entry created for list_genie_spaces

Pass Criteria: All expected results met


TEST-002: User Browses Unity Catalog Data Assets#

Related: @catalog-browsing | RISK-001 Risk Level: High

Preconditions: - testuser-databricks authenticated; has access to test catalog test_catalog

Steps: 1. Call list_catalogs → collect catalog names 2. Call list_schemas(catalog_name="test_catalog") → collect schema names 3. Call list_tables(catalog_name="test_catalog", schema_name="test_schema") → collect table names

Expected: - Each response contains correctly structured dicts with required fields (name, full_name) - Results limited to catalogs/schemas/tables the user has Unity Catalog permissions for - No 403 or 500 errors - Any catalogs/schemas/tables tagged as Strictly Confidential in Unity Catalog are absent from the responses

Pass Criteria: All three calls succeed, return well-structured results, and contain no SCI-tagged assets


TEST-003: User Retrieves Data with a Read-Only SQL Query#

Related: @sql-query | RISK-002 Risk Level: High

Preconditions: - testuser-databricks authenticated; has SELECT access to test_catalog.test_schema.customers

Steps: 1. Call list_sql_tools → collect available tool names 2. Call proxy_to_sql(tool_name="execute_sql_read_only", tool_args={"query": "SELECT * FROM test_catalog.test_schema.customers LIMIT 5"})

Expected: - list_sql_tools returns execute_sql_read_only (and possibly poll_sql_result) but NOT execute_sql - proxy_to_sql returns up to 5 rows from the customers table - No authentication or authorization error

Pass Criteria: All expected results met


TEST-004: User Asks a Natural Language Question via a Genie Space#

Related: @genie-query | RISK-001 Risk Level: High

Preconditions: - testuser-databricks authenticated; has access to Genie Space gs-test-space

Steps: 1. Call list_genie_spaces → collect space IDs; identify gs-test-space 2. Call list_genie_tools(genie_space_id="gs-test-space") → collect tool names 3. Call proxy_to_genie(genie_space_id="gs-test-space", tool_name=<query_tool>, tool_args={"question": "How many records are in the customers table?"})

Expected: - proxy_to_genie returns either a direct answer or a pending result with a poll token - No authentication or authorization error

Pass Criteria: Response contains either an answer or a valid polling token; no error


TEST-005: Write-Capable SQL Tool Not Listed#

Related: @read-only-enforcement | RISK-002 Risk Level: High

Preconditions: - testuser-databricks authenticated

Steps: 1. Call list_sql_tools 2. Inspect the returned tool list

Expected: - execute_sql is NOT present in the returned list - At least one read-only tool (execute_sql_read_only or similar) IS present

Pass Criteria: execute_sql absent from results


TEST-006: Direct Submission of Blocked SQL Tool Returns Error#

Related: @read-only-enforcement | RISK-002 Risk Level: High

Preconditions: - testuser-databricks authenticated

Steps: 1. Call proxy_to_sql(tool_name="execute_sql", tool_args={"query": "DROP TABLE test_catalog.test_schema.customers"})

Expected: - Error returned immediately (before any Databricks call) - Error message indicates the tool is blocked

Pass Criteria: ValueError raised; no Databricks API call made (verify via absence of audit log entry for downstream call)


TEST-007: Unauthenticated Request Rejected Before Databricks Call#

Related: RISK-001 Risk Level: High

Preconditions: - No Bearer token provided

Steps: 1. Submit a raw HTTP GET to https://databricks.dev.connectors.novo-genai.com/databricks without Authorization header

Expected: - 401 response from MCP server - No Databricks API call made (verify in CloudWatch logs — no outgoing request logged)

Pass Criteria: 401 returned; no Databricks API call in logs


TEST-008: No Token Material in CloudWatch Logs#

Related: RISK-003 Risk Level: High

Preconditions: - Recent authenticated tool invocations completed (TEST-001 through TEST-004 run)

Steps: 1. Query CloudWatch log group /ecs/aiconnectors-dev-aiconnectors/mcp-databricks-main-svc/mcp-databricks for the last 30 minutes 2. Search log entries for patterns: Bearer, access_token, eyJ (JWT prefix)

Expected: - No log entries contain Bearer token values - No log entries contain raw JWT strings (eyJ...)

Pass Criteria: Zero matches for token-related patterns in logs


TEST-009: Audit Log Entry Created for Each Tool Invocation#

Related: RISK-005 Risk Level: Medium

Preconditions: - testuser-databricks authenticated; AUDIT_FIREHOSE_STREAM set to aiconnectors-audit-dev

Steps: 1. Call list_catalogs as testuser-databricks 2. Wait up to 60 seconds 3. Query S3 audit bucket for entries with tool_name = "list_catalogs" and the user's oid

Expected: - At least one audit log entry found with correct tool_name, user identity, timestamp, and outcome

Pass Criteria: Audit entry present in S3 within 60 seconds of invocation


TEST-010: SCI-Tagged Assets Excluded from Discovery Responses#

Related: @catalog-browsing | RISK-006 Risk Level: High

Preconditions: - testuser-databricks authenticated; has Unity Catalog access that would normally include at least one SCI-tagged catalog or schema - DataCore has applied the SCI classification tag to at least one catalog or schema in a workspace reachable via this MCP (coordinate with DataCore to confirm a tagged test asset exists)

Steps: 1. Call list_catalogs → collect returned catalog names 2. Call list_schemas for any catalog that contains a known SCI-tagged schema → collect returned schema names 3. Call list_tables for any schema that contains a known SCI-tagged table → collect returned table names

Expected: - SCI-tagged catalog is absent from list_catalogs response - SCI-tagged schema is absent from list_schemas response - SCI-tagged table is absent from list_tables response - Non-SCI assets the user has permission to see are still returned normally

Pass Criteria: All SCI-tagged assets absent; non-SCI assets present as expected

Note: This test requires DataCore to confirm which specific test assets are tagged SCI before execution.


Quality Gates#

Before production deployment: - [ ] All HIGH risk test cases (TEST-001 through TEST-007, TEST-010) passed in dev environment - [ ] Zero critical or high defects open - [ ] TEST-008 (no token leakage in logs) passed - [ ] TEST-009 (audit logging) passed - [ ] TEST-010 (SCI pre-filtering) passed — DataCore confirmation of tagged test assets obtained - [ ] BDD tests pass in CI (pytest requirements/bdd/ -m databricks) - [ ] CloudWatch alarms verified (ECS CPU, memory, ALB 5xx) - [ ] SSM Parameter Store secrets confirmed as SecureString type in prod account


Version: 1.0 Date: 2026-04-28 Approved by: Pending Review Related Artifacts: - User Requirements: requirements/features/databricks-access.feature - Risk Assessment: docs/risks/risk-assessment-databricks-access.md - Design: docs/design-databricks-access.md - Intended Use: docs/requirements/01-intended-use.md