Installation Validation: Databricks Access#
System Criticality: Business-Critical
Validation Approach: Risk-based
Related Design: DESIGN:DatabricksAccess
Related URS: @URS:DatabricksAccess
Related Risks: RISK:DatabricksAccess
Overview#
This Installation Validation (IV) plan validates the Databricks Access MCP connector against the documented requirements in @URS:DatabricksAccess and mitigates risks identified in RISK:DatabricksAccess. The connector provides read-only access to Databricks workspaces (Unity Catalog, SQL, Genie) via the MCP protocol, using Azure AD OAuth2 OBO authentication.
Validation Scope:
- User-scoped workspace access (OBO authentication; Unity Catalog enforces per-user permissions)
- Read-only SQL enforcement (write tools blocked at application layer; execute_sql never forwarded)
- Genie Space natural language query forwarding (proxy to Databricks managed MCP)
- Unity Catalog browsing (catalogs, schemas, tables, table metadata)
- SCI pre-filtering (catalogs/schemas/tables tagged Strictly Confidential excluded from discovery tool responses)
- Authentication enforcement (unauthenticated requests rejected before any Databricks call)
- Audit logging (all tool invocations recorded to Firehose → S3)
Test Strategy#
| Level | Purpose | Owner | When | Coverage |
|---|---|---|---|---|
| Unit | Individual tool functions, OBO exchange, blocklist logic | Developers | During dev | Python unit tests with mocked DatabricksClient and mcp_proxy |
| Integration | Tool → DatabricksClient → Databricks REST API | QA | After unit tests | Dev environment with real Databricks workspace, test AAD account |
| System | End-to-end user flows (browse catalog, SQL query, Genie query) | QA | Before prod deployment | Dev environment with production-like ECS config |
| Security | Authentication enforcement, read-only enforcement | Security team | Before prod deployment | Test accounts without Databricks access; blocked tool submissions |
| Configuration | Audit logging, ECS env vars, SSM secrets | QA + Ops | After infra deploy | Manual inspection of CloudWatch logs and audit S3 bucket |
Test Environments:
| Environment | URL | Purpose |
|---|---|---|
| Dev | https://databricks.dev.connectors.novo-genai.com |
Integration, system, and BDD tests |
| Prod | https://databricks.connectors.novo-genai.com |
Post-deploy smoke tests only |
Test Accounts:
- testuser-databricks@novonordisk.com — has Unity Catalog access to test catalog/schema
- testuser-noaccess@novonordisk.com — valid Azure AD account with no Databricks permissions
Traceability Matrix#
| Requirement | Summary | Risk | Test Case | Type | Status |
|---|---|---|---|---|---|
| @workspace-access | User accesses a Databricks workspace | RISK-001 | TEST-001 | Integration | Not Started |
| @catalog-browsing | User browses Unity Catalog data assets | RISK-001, RISK-006 | TEST-002, TEST-010 | Integration, Security | Not Started |
| @sql-query | User retrieves data with read-only SQL | RISK-002 | TEST-003 | Integration | Not Started |
| @genie-query | User asks natural language question via Genie | RISK-001 | TEST-004 | Integration | Not Started |
| @read-only-enforcement | Only read-only SQL tools are exposed | RISK-002 | TEST-005, TEST-006 | Security | Not Started |
| RISK-001 | Unauthenticated access rejected | RISK-001 | TEST-007 | Security | Not Started |
| RISK-003 | No tokens in audit logs | RISK-003 | TEST-008 | Configuration | Not Started |
| RISK-005 | Audit log entries created per tool call | RISK-005 | TEST-009 | Configuration | Not Started |
| RISK-006 | SCI-tagged assets excluded from discovery responses | RISK-006 | TEST-010 | Security | Not Started |
Coverage: 5/5 URS scenarios + 5 risk mitigations = 10 test cases (Target: 100%)
Test Cases#
TEST-001: User Accesses an Accessible Databricks Workspace#
Related: @workspace-access | RISK-001 Risk Level: High
Preconditions:
- testuser-databricks authenticated via Azure AD OAuth2
- At least one configured Databricks workspace accessible to testuser-databricks
Steps:
1. Connect MCP client to https://databricks.dev.connectors.novo-genai.com with testuser-databricks token
2. Call list_genie_spaces (triggers workspace auto-detection)
Expected:
- Response contains at least one Genie space entry
- No authentication error returned
- Audit log entry created for list_genie_spaces
Pass Criteria: All expected results met
TEST-002: User Browses Unity Catalog Data Assets#
Related: @catalog-browsing | RISK-001 Risk Level: High
Preconditions:
- testuser-databricks authenticated; has access to test catalog test_catalog
Steps:
1. Call list_catalogs → collect catalog names
2. Call list_schemas(catalog_name="test_catalog") → collect schema names
3. Call list_tables(catalog_name="test_catalog", schema_name="test_schema") → collect table names
Expected:
- Each response contains correctly structured dicts with required fields (name, full_name)
- Results limited to catalogs/schemas/tables the user has Unity Catalog permissions for
- No 403 or 500 errors
- Any catalogs/schemas/tables tagged as Strictly Confidential in Unity Catalog are absent from the responses
Pass Criteria: All three calls succeed, return well-structured results, and contain no SCI-tagged assets
TEST-003: User Retrieves Data with a Read-Only SQL Query#
Related: @sql-query | RISK-002 Risk Level: High
Preconditions:
- testuser-databricks authenticated; has SELECT access to test_catalog.test_schema.customers
Steps:
1. Call list_sql_tools → collect available tool names
2. Call proxy_to_sql(tool_name="execute_sql_read_only", tool_args={"query": "SELECT * FROM test_catalog.test_schema.customers LIMIT 5"})
Expected:
- list_sql_tools returns execute_sql_read_only (and possibly poll_sql_result) but NOT execute_sql
- proxy_to_sql returns up to 5 rows from the customers table
- No authentication or authorization error
Pass Criteria: All expected results met
TEST-004: User Asks a Natural Language Question via a Genie Space#
Related: @genie-query | RISK-001 Risk Level: High
Preconditions:
- testuser-databricks authenticated; has access to Genie Space gs-test-space
Steps:
1. Call list_genie_spaces → collect space IDs; identify gs-test-space
2. Call list_genie_tools(genie_space_id="gs-test-space") → collect tool names
3. Call proxy_to_genie(genie_space_id="gs-test-space", tool_name=<query_tool>, tool_args={"question": "How many records are in the customers table?"})
Expected:
- proxy_to_genie returns either a direct answer or a pending result with a poll token
- No authentication or authorization error
Pass Criteria: Response contains either an answer or a valid polling token; no error
TEST-005: Write-Capable SQL Tool Not Listed#
Related: @read-only-enforcement | RISK-002 Risk Level: High
Preconditions:
- testuser-databricks authenticated
Steps:
1. Call list_sql_tools
2. Inspect the returned tool list
Expected:
- execute_sql is NOT present in the returned list
- At least one read-only tool (execute_sql_read_only or similar) IS present
Pass Criteria: execute_sql absent from results
TEST-006: Direct Submission of Blocked SQL Tool Returns Error#
Related: @read-only-enforcement | RISK-002 Risk Level: High
Preconditions:
- testuser-databricks authenticated
Steps:
1. Call proxy_to_sql(tool_name="execute_sql", tool_args={"query": "DROP TABLE test_catalog.test_schema.customers"})
Expected: - Error returned immediately (before any Databricks call) - Error message indicates the tool is blocked
Pass Criteria: ValueError raised; no Databricks API call made (verify via absence of audit log entry for downstream call)
TEST-007: Unauthenticated Request Rejected Before Databricks Call#
Related: RISK-001 Risk Level: High
Preconditions: - No Bearer token provided
Steps:
1. Submit a raw HTTP GET to https://databricks.dev.connectors.novo-genai.com/databricks without Authorization header
Expected: - 401 response from MCP server - No Databricks API call made (verify in CloudWatch logs — no outgoing request logged)
Pass Criteria: 401 returned; no Databricks API call in logs
TEST-008: No Token Material in CloudWatch Logs#
Related: RISK-003 Risk Level: High
Preconditions: - Recent authenticated tool invocations completed (TEST-001 through TEST-004 run)
Steps:
1. Query CloudWatch log group /ecs/aiconnectors-dev-aiconnectors/mcp-databricks-main-svc/mcp-databricks for the last 30 minutes
2. Search log entries for patterns: Bearer, access_token, eyJ (JWT prefix)
Expected:
- No log entries contain Bearer token values
- No log entries contain raw JWT strings (eyJ...)
Pass Criteria: Zero matches for token-related patterns in logs
TEST-009: Audit Log Entry Created for Each Tool Invocation#
Related: RISK-005 Risk Level: Medium
Preconditions:
- testuser-databricks authenticated; AUDIT_FIREHOSE_STREAM set to aiconnectors-audit-dev
Steps:
1. Call list_catalogs as testuser-databricks
2. Wait up to 60 seconds
3. Query S3 audit bucket for entries with tool_name = "list_catalogs" and the user's oid
Expected:
- At least one audit log entry found with correct tool_name, user identity, timestamp, and outcome
Pass Criteria: Audit entry present in S3 within 60 seconds of invocation
TEST-010: SCI-Tagged Assets Excluded from Discovery Responses#
Related: @catalog-browsing | RISK-006 Risk Level: High
Preconditions:
- testuser-databricks authenticated; has Unity Catalog access that would normally include at least one SCI-tagged catalog or schema
- DataCore has applied the SCI classification tag to at least one catalog or schema in a workspace reachable via this MCP (coordinate with DataCore to confirm a tagged test asset exists)
Steps:
1. Call list_catalogs → collect returned catalog names
2. Call list_schemas for any catalog that contains a known SCI-tagged schema → collect returned schema names
3. Call list_tables for any schema that contains a known SCI-tagged table → collect returned table names
Expected:
- SCI-tagged catalog is absent from list_catalogs response
- SCI-tagged schema is absent from list_schemas response
- SCI-tagged table is absent from list_tables response
- Non-SCI assets the user has permission to see are still returned normally
Pass Criteria: All SCI-tagged assets absent; non-SCI assets present as expected
Note: This test requires DataCore to confirm which specific test assets are tagged SCI before execution.
Quality Gates#
Before production deployment:
- [ ] All HIGH risk test cases (TEST-001 through TEST-007, TEST-010) passed in dev environment
- [ ] Zero critical or high defects open
- [ ] TEST-008 (no token leakage in logs) passed
- [ ] TEST-009 (audit logging) passed
- [ ] TEST-010 (SCI pre-filtering) passed — DataCore confirmation of tagged test assets obtained
- [ ] BDD tests pass in CI (pytest requirements/bdd/ -m databricks)
- [ ] CloudWatch alarms verified (ECS CPU, memory, ALB 5xx)
- [ ] SSM Parameter Store secrets confirmed as SecureString type in prod account
Version: 1.0
Date: 2026-04-28
Approved by: Pending Review
Related Artifacts:
- User Requirements: requirements/features/databricks-access.feature
- Risk Assessment: docs/risks/risk-assessment-databricks-access.md
- Design: docs/design-databricks-access.md
- Intended Use: docs/requirements/01-intended-use.md