Table of Contents
- Why Superset + dbt Security Matters
- Understanding the Security Stack
- Authentication and Authorization Layers
- Role-Based Access Control (RBAC) Configuration
- dbt Governance and Model Access
- Warehouse-Level Security Integration
- Row-Level Security Implementation
- Audit Logging and Compliance
- Operational Security Habits
- Security Benchmarks and Metrics
- Common Pitfalls and Solutions
- Next Steps: Audit-Ready Architecture
Why Superset + dbt Security Matters {#why-security-matters}
When you deploy Apache Superset as your analytics layer on top of dbt-transformed data, you’re building a system that touches sensitive business intelligence, financial metrics, and customer data. Unlike a single-purpose application, Superset sits at the intersection of multiple teams—finance, product, marketing, operations—each needing different views of the same underlying data.
The challenge is straightforward: how do you ensure that a finance analyst sees revenue data, a product manager sees feature adoption, and neither sees what they shouldn’t—without rebuilding your entire dbt project for each audience?
This is where security model design becomes critical. A poorly configured Superset + dbt stack creates three immediate risks:
Data leakage: Users query directly into Superset and accidentally (or intentionally) export datasets they shouldn’t access. A single misconfigured dashboard can expose customer PII, pricing data, or acquisition costs to the wrong team.
Audit failure: When you run SOC 2 or ISO 27001 compliance audits, regulators ask: “Who accessed what, when, and why?” If Superset logs don’t connect to warehouse logs, you have gaps. If dbt doesn’t track model lineage and access, you can’t prove data governance.
Operational chaos: Without clear RBAC patterns, every new hire or role change becomes a manual security review. You end up with over-privileged users, orphaned permissions, and dashboards nobody can maintain.
The good news: a well-designed Superset + dbt security model is operationally sustainable. It scales from 5 users to 500. It integrates cleanly with your compliance programme. And it doesn’t require you to rebuild your entire analytics architecture.
This guide walks you through the patterns, configurations, and operational habits that work at scale. We’ll focus on concrete implementation details, not theory.
Understanding the Security Stack {#security-stack}
Before we configure anything, map your security layers. Superset + dbt security is layered, not monolithic. Each layer enforces different controls at different points in the data pipeline.
The Four Security Layers
Layer 1: Authentication (who are you?)
Superset needs to know who you are. This happens at login. You can authenticate via LDAP, OAuth2, SAML, or local Superset users. Most production deployments use LDAP (Active Directory) or SAML for enterprise identity management.
Why this matters: If you can’t prove who accessed Superset, you can’t audit anything downstream. For SOC 2 and ISO 27001 compliance, you’ll need to demonstrate that every login is tied to a named individual, not a shared account.
Layer 2: Superset Authorization (what can you access in Superset?)
Once authenticated, Superset applies role-based access control. A user might have the “Analyst” role, which grants permission to view dashboards in the “Finance” folder and edit charts in the “Product” folder. This is Superset’s internal permission model.
Why this matters: It controls dashboard and chart visibility within Superset UI. But it doesn’t control what data you can query. That’s layer 3.
Layer 3: Warehouse Authorization (what can you query?)
When Superset runs a query, it connects to your warehouse (Snowflake, BigQuery, PostgreSQL, etc.) as a specific user or service account. The warehouse has its own RBAC—schemas, tables, columns, rows. If the Superset connection account doesn’t have permission to a table, the query fails at the warehouse level.
Why this matters: This is your hard security boundary. Even if someone misconfigures Superset permissions, the warehouse won’t let them query data they shouldn’t access. This is critical for compliance.
Layer 4: dbt Governance (what models should exist, and who owns them?)
dbt doesn’t enforce runtime access control—it’s a transformation layer. But dbt does provide metadata about data lineage, ownership, and intended use. When you tag models with access: restricted or add a meta.owner field, you’re creating a governance contract. Tools like dbt Custom Groups and Access let you define which teams can reference which models, and you can expose that metadata to Superset.
Why this matters: dbt governance creates intent. It documents which models are meant for which audiences. Superset can then use that metadata to suggest appropriate dashboards and surfaces.
A Practical Example
Let’s say you have a customers table in your warehouse. You’ve built a dbt model fct_customer_lifetime_value that calculates revenue per customer. You want:
- Finance team: Can see LTV by cohort, segment, and geography. Full dashboard access.
- Product team: Can see LTV by feature adoption, not by customer name or email.
- Sales team: Cannot see LTV at all (it’s a finance metric).
Here’s how the four layers work together:
- Authentication: Finance analyst logs in via LDAP. Superset confirms their identity.
- Superset Authorization: Superset checks: “Is this user in the ‘Finance’ role?” Yes. Grant dashboard access.
- Warehouse Authorization: Superset runs the query as the
superset_financeservice account. Snowflake checks: “Doessuperset_financehave SELECT onfct_customer_lifetime_value?” Yes. Execute query. - dbt Governance: The
fct_customer_lifetime_valuemodel is taggedaccess: restrictedand owned by Finance. Superset’s data catalogue surfaces this metadata to Product, signalling: “This model isn’t for you.”
If Product tries to query the same model:
- Authentication: Product manager logs in. Identity confirmed.
- Superset Authorization: Superset checks: “Is this user in the ‘Product’ role?” Yes, but Product role doesn’t have dashboard access to Finance dashboards. Deny.
- Warehouse Authorization: Even if they bypass Superset UI and hit the API, the warehouse account for Product (
superset_product) doesn’t have SELECT onfct_customer_lifetime_value. Warehouse denies. - dbt Governance: Metadata signals that this model isn’t intended for Product use.
This layered approach is defence in depth. No single misconfiguration exposes data.
Authentication and Authorization Layers {#auth-layers}
Setting Up Enterprise Authentication
For production deployments, avoid local Superset user management. Instead, integrate with your identity provider.
LDAP / Active Directory is the most common pattern for enterprise deployments. When a user logs in, Superset queries your AD server to verify credentials and pull group membership. This ties Superset access to your organisation’s identity system.
Configuration in superset_config.py:
from flask_appbuilder.security.manager import AUTH_LDAP
AUTH_TYPE = AUTH_LDAP
AUTH_LDAP_SERVER = "ldap://your-ad-server.internal:389"
AUTH_LDAP_USE_SSL = True
AUTH_LDAP_BIND_DN = "cn=superset-service,ou=service-accounts,dc=company,dc=com"
AUTH_LDAP_BIND_PASSWORD = "<secure-password>"
AUTH_LDAP_SEARCH_BASE = "ou=users,dc=company,dc=com"
AUTH_LDAP_USERNAME_FIELD = "sAMAccountName"
# Map AD groups to Superset roles
AUTH_LDAP_GROUP_FIELD = "memberOf"
AUTH_LDAP_APPEND_DOMAIN = False
# This is critical: sync AD group membership to Superset roles on every login
AUTH_LDAP_SYNC_USER_INFO = True
AUTH_LDAP_USE_SSL_FOR_CONNECTIONS = True
SAML is the enterprise standard for SaaS deployments. If you’re running Superset on AWS or in a cloud environment, SAML (via Okta, Azure AD, or Ping Identity) is cleaner than LDAP.
Configuration:
from flask_appbuilder.security.manager import AUTH_SAML
AUTH_TYPE = AUTH_SAML
SAML_METADATA_URL = "https://your-idp.okta.com/app/exk.../sso/saml/metadata"
# Map SAML attributes to Superset user fields
SAML_ATTRIBUTE_MAPPING = {
"http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress": ("email", None),
"http://schemas.xmlsoap.org/ws/2005/05/identity/claims/givenname": ("first_name", None),
"http://schemas.xmlsoap.org/ws/2005/05/identity/claims/surname": ("last_name", None),
}
# Map SAML groups to Superset roles
SAML_GROUP_MAPPING = {
"okta_finance_group": ["Finance"],
"okta_product_group": ["Product"],
"okta_admin_group": ["Admin"],
}
Why Authentication Matters for Compliance
When auditors review your SOC 2 or ISO 27001 controls, they ask: “Can you prove who accessed what data?” If Superset is configured with local users and no audit logs, you can’t. If Superset is tied to your identity provider and logs every login, you can.
For Security Audit | PADISO - SOC 2, ISO 27001 & GDPR Compliance, the first control is always: Identity and access management. Superset authentication is your first line of evidence.
Superset Authorization: Roles and Permissions
Once authenticated, Superset applies role-based access control. Out of the box, Superset includes several built-in roles:
- Admin: Full access to all dashboards, charts, datasets, and settings.
- Alpha: Can view and edit dashboards and charts.
- Gamma: Can view dashboards and charts only (read-only).
- SQL Lab Only: Can access SQL Lab (query editor) but not dashboards.
For most organisations, these roles are too coarse. You’ll want custom roles. Here’s a typical role matrix:
| Role | Dashboards | Charts | Datasets | SQL Lab | Data Sources |
|---|---|---|---|---|---|
| Finance Lead | View, Edit | View, Edit | View, Edit | Yes | Finance warehouse |
| Finance Analyst | View, Edit | View, Edit | View | Yes | Finance warehouse |
| Product Manager | View | View | View | No | Product warehouse |
| Data Engineer | Admin | Admin | Admin | Yes | All |
| Executive | View | View | View | No | Executive dashboards only |
To create a custom role in Superset, go to Settings → Roles and define permissions. You can grant:
- Dashboard access: View, edit, or delete specific dashboards.
- Chart access: View, edit, or delete specific charts.
- Dataset access: Query specific datasets (databases and schemas).
- Feature access: SQL Lab, alerts, reports, etc.
Critical pattern: Always use database and schema-level permissions in Superset. Instead of granting a user access to “all datasets,” grant access to specific schemas. This ties Superset permissions to your warehouse schema structure.
Example:
# In Superset, create a role "Finance Analyst"
# Grant permissions:
# - Dataset: snowflake.finance_analytics (schema)
# - Dashboard: All dashboards tagged "finance"
# - Feature: SQL Lab (read-only)
# Do NOT grant:
# - Access to raw tables (only dbt models)
# - Access to schemas outside finance
# - Write access to datasets (only read)
Role-Based Access Control (RBAC) Configuration {#rbac-config}
Designing Your Role Hierarchy
Start by mapping your organisation’s teams and responsibilities. Don’t try to create one role per person—that’s unmaintainable. Instead, create roles for job functions.
Example role structure for a Series-B SaaS company:
Admin
├── Data Engineering Lead
│ └── Data Engineer
├── Finance Lead
│ └── Finance Analyst
├── Product Lead
│ └── Product Manager
│ └── Product Analyst
└── Executive
Each role has specific dataset and dashboard permissions. When a new hire joins Finance, they get the “Finance Analyst” role. When they’re promoted to Finance Lead, their role is updated. No individual permission grants needed.
Configuring Database Connections with Role-Based Access
This is where Superset + warehouse security converge. In Superset, you define database connections. Each connection has credentials—a username and password (or service account) that Superset uses to query the warehouse.
For production, use separate service accounts per role:
Superset Database Connections:
├── superset_finance (credentials for Finance role)
├── superset_product (credentials for Product role)
├── superset_data_eng (credentials for Data Engineering role)
└── superset_admin (credentials for Admins)
When a Finance analyst runs a query in Superset, Superset connects to the warehouse as superset_finance. The warehouse checks: “Does superset_finance have SELECT on this table?” If yes, query runs. If no, warehouse denies.
This is your hard security boundary. No misconfiguration in Superset can override warehouse permissions.
Configuration in Superset:
- Go to Data → Databases.
- Click + Database and select your warehouse type (Snowflake, BigQuery, etc.).
- Enter connection details:
- Host, port, database name
- Username:
superset_finance - Password: (stored encrypted in Superset metadata database)
- Test the connection.
- Under Permissions, assign the database to the “Finance” role.
Warehouse configuration (example for Snowflake):
-- Create service accounts
CREATE ROLE superset_finance;
CREATE ROLE superset_product;
CREATE ROLE superset_data_eng;
-- Create users
CREATE USER "superset_finance" PASSWORD = '...' DEFAULT_ROLE = superset_finance;
CREATE USER "superset_product" PASSWORD = '...' DEFAULT_ROLE = superset_product;
CREATE USER "superset_data_eng" PASSWORD = '...' DEFAULT_ROLE = superset_data_eng;
-- Grant schema access
GRANT USAGE ON SCHEMA analytics.finance TO ROLE superset_finance;
GRANT SELECT ON ALL TABLES IN SCHEMA analytics.finance TO ROLE superset_finance;
GRANT USAGE ON SCHEMA analytics.product TO ROLE superset_product;
GRANT SELECT ON ALL TABLES IN SCHEMA analytics.product TO ROLE superset_product;
GRANT USAGE ON SCHEMA analytics.* TO ROLE superset_data_eng;
GRANT ALL PRIVILEGES ON SCHEMA analytics.* TO ROLE superset_data_eng;
-- Assign users to roles
GRANT ROLE superset_finance TO USER "superset_finance";
GRANT ROLE superset_product TO USER "superset_product";
GRANT ROLE superset_data_eng TO USER "superset_data_eng";
Now, when a Finance analyst logs into Superset:
- Superset authenticates them via LDAP.
- Superset checks their role: “Finance Analyst”.
- Superset grants access to the Finance database connection.
- When they query a table, Superset connects as
superset_finance. - Snowflake checks: “Can
superset_financeSELECT from this table?” Yes. Query runs.
If a Product manager tries the same query:
- Superset authenticates them.
- Superset checks their role: “Product Manager”.
- Superset grants access to the Product database connection only.
- If they try to query a Finance table, Superset denies (wrong database connection).
- If they somehow bypass Superset, Snowflake denies (wrong warehouse role).
Datasets and Permissions
In Superset, a dataset is a saved query or table reference. Datasets are the primary object that users interact with—they’re what appears in the chart builder.
When you create a dataset, assign it to specific roles:
# Example: fct_customer_lifetime_value dataset
Dataset: fct_customer_lifetime_value
├── Database: snowflake (finance connection)
├── Schema: analytics.finance
├── Table: fct_customer_lifetime_value
├── Permissions:
│ ├── Finance Lead: View, Edit
│ ├── Finance Analyst: View
│ └── Data Engineer: View, Edit
└── Tags: ["finance", "customer", "restricted"]
When a user tries to create a chart from this dataset, Superset checks their role. If they don’t have “View” permission, the dataset doesn’t appear in the dataset picker.
Handling Shared Datasets Across Roles
Sometimes you need a dataset that multiple roles can access, but with different row-level visibility. For example, a fct_orders table that Finance, Product, and Sales all need, but with different filters:
- Finance: Sees all orders (no filter).
- Product: Sees orders by feature adoption (no customer names).
- Sales: Sees orders by sales rep (no pricing details).
This is where row-level security (RLS) comes in. We’ll cover that in detail below. For now, the pattern is:
- Create a single dataset pointing to the table.
- Grant “View” permission to all relevant roles.
- In each dashboard/chart, add role-specific filters using Superset’s Jinja templating.
dbt Governance and Model Access {#dbt-governance}
Defining Model Ownership and Access Tiers
dbt doesn’t enforce runtime access control, but it provides metadata that Superset can use to govern access. The key is dbt model tagging and ownership.
In your dbt_project.yml, define access tiers:
models:
analytics:
# Public models - anyone can use
public:
config:
meta:
access: public
owner: "data-engineering@company.com"
# Restricted models - specific teams only
finance:
config:
meta:
access: restricted
owner: "finance@company.com"
teams: ["finance", "cfo"]
product:
config:
meta:
access: restricted
owner: "product@company.com"
teams: ["product", "analytics"]
# Internal models - data engineers only
staging:
config:
meta:
access: internal
owner: "data-engineering@company.com"
teams: ["data-engineering"]
Then, in individual model files:
-- models/finance/fct_customer_lifetime_value.sql
{{ config(
materialized='table',
meta={
'access': 'restricted',
'owner': 'finance@company.com',
'teams': ['finance'],
'pii': false,
'freshness_sla': '1 day'
}
) }}
SELECT
customer_id,
SUM(order_total) AS lifetime_value,
COUNT(DISTINCT order_id) AS order_count,
MAX(order_date) AS last_order_date
FROM {{ ref('stg_orders') }}
GROUP BY customer_id
For dbt Custom Groups and Access, you can define which projects, folders, or models are accessible to which dbt teams. This creates a governance layer in dbt Cloud that mirrors your Superset roles.
Exposing dbt Metadata to Superset
dbt generates a manifest.json file that contains all metadata about your project—models, tests, lineage, ownership, tags, etc. Superset can consume this metadata to:
- Surface model ownership in the dataset picker.
- Display access tiers (public, restricted, internal).
- Show data lineage (which models depend on which sources).
- Flag PII or sensitive columns.
To integrate dbt metadata into Superset, use the dbt metadata API. This requires dbt Cloud and Superset 2.0+.
Example: Query dbt metadata to list all Finance models:
# In Superset, create a custom Python script
import requests
DBT_METADATA_URL = "https://metadata.cloud.getdbt.com/graphql"
DBT_TOKEN = "<your-dbt-cloud-token>"
query = """
query {
models(filter: {tags: {name: {_eq: "finance"}}}) {
uniqueId
name
description
owner
meta
}
}
"""
headers = {"Authorization": f"Bearer {DBT_TOKEN}"}
response = requests.post(DBT_METADATA_URL, json={"query": query}, headers=headers)
models = response.json()["data"]["models"]
for model in models:
print(f"{model['name']} (Owner: {model['owner']})")
This lets you programmatically enforce governance. For example, you could prevent users from creating charts on Finance models unless they have the Finance role.
Lineage and Impact Analysis
One of dbt’s superpowers is lineage tracking. Every dbt model has a dependency graph: which sources feed into it, which models depend on it, etc.
When you expose this to Superset, users can see:
- Upstream lineage: “This dashboard uses data from the
stg_ordersanddim_customersmodels, which come from the rawordersandcustomerstables.” - Downstream impact: “This model feeds into 12 dashboards. If we change it, these dashboards will be affected.”
This is critical for security and governance. If you’re considering restricting access to a model, you can see which dashboards will break. If a user reports a data discrepancy, you can trace it back to the source.
Superset’s data lineage feature shows this visually. Combined with dbt metadata, it provides complete transparency.
Warehouse-Level Security Integration {#warehouse-security}
Understanding Warehouse RBAC
Your data warehouse (Snowflake, BigQuery, PostgreSQL, etc.) has its own role-based access control. This is your hard security boundary. Even if Superset is misconfigured, the warehouse won’t let unauthorised users query data.
Most modern warehouses support:
- Role-based access: Users are assigned roles, roles have permissions.
- Schema-level permissions: Grant SELECT on specific schemas.
- Table-level permissions: Grant SELECT on specific tables.
- Column-level permissions: Grant SELECT on specific columns (less common, but supported by some warehouses).
- Row-level security: Grant SELECT on rows matching certain conditions.
For Snowflake Role-Based Access Control, the pattern is straightforward:
-- Define roles
CREATE ROLE analyst;
CREATE ROLE finance_analyst;
CREATE ROLE data_engineer;
-- Grant schema access
GRANT USAGE ON SCHEMA analytics.finance TO ROLE finance_analyst;
GRANT SELECT ON ALL TABLES IN SCHEMA analytics.finance TO ROLE finance_analyst;
-- Grant database usage
GRANT USAGE ON DATABASE analytics TO ROLE finance_analyst;
-- Assign users to roles
GRANT ROLE finance_analyst TO USER "john.doe";
For BigQuery Access Control, the pattern is similar but uses IAM:
# Grant dataset-level access
gcloud bigquery datasets add-iam-policy-binding \
--member=user:john.doe@company.com \
--role=roles/bigquery.dataEditor \
analytics_finance
# Grant table-level access (requires custom roles)
gcloud iam roles create bigquery.financeTables \
--project=my-project \
--title="Finance Tables" \
--permissions=bigquery.tables.getData
Configuring Superset Service Accounts
As mentioned earlier, Superset connects to the warehouse as a service account—not as the logged-in user. This has security implications.
Advantage: You can control exactly what data Superset can access. You don’t need to grant every user warehouse access.
Disadvantage: You lose user-level audit trails at the warehouse level. You can’t see “John queried this table at 2pm.” You only see “Superset queried this table at 2pm.”
To mitigate this, configure Superset to log all queries with the authenticated user’s name. This is covered in the Audit Logging section.
Cross-Warehouse Scenarios
If you have multiple warehouses (e.g., Snowflake for analytics, PostgreSQL for operational data), create separate Superset database connections for each.
Superset Database Connections:
├── Snowflake (analytics)
│ ├── Finance connection (superset_finance user)
│ ├── Product connection (superset_product user)
│ └── Data Eng connection (superset_data_eng user)
├── PostgreSQL (operational)
│ └── Admin connection (superset_admin user)
└── BigQuery (ML models)
└── Data Science connection (superset_ds user)
Each connection has its own credentials and warehouse-level permissions. This provides isolation: a misconfiguration in the Snowflake setup doesn’t affect PostgreSQL access.
Managing Secrets and Credentials
Superset stores database credentials in its metadata database (usually PostgreSQL). These credentials are encrypted at rest using a secret key (SECRET_KEY in superset_config.py).
Best practices:
-
Use strong secret keys: Generate a 64-character random string.
import secrets SECRET_KEY = secrets.token_urlsafe(32) -
Rotate credentials regularly: Every 90 days, rotate warehouse service account passwords. Update Superset connections.
-
Use temporary credentials where possible: If your warehouse supports temporary credentials (e.g., Snowflake key-pair authentication), use them instead of passwords.
-- Snowflake key-pair authentication CREATE USER superset_finance IDENTIFIED BY PUBLIC KEY 'MIICIjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEA...'; -
Audit credential access: Log whenever Superset connections are created, updated, or deleted. This is a critical compliance control.
Row-Level Security Implementation {#row-security}
Understanding Row-Level Security
Row-level security (RLS) restricts which rows a user can see based on their identity or role. For example, a sales rep should only see orders from their own territory. A regional manager should see orders from their region.
RLS is different from dataset-level permissions. With dataset permissions, you grant or deny access to the entire table. With RLS, you grant access to the table, but filter which rows appear based on the user’s context.
RLS in the Warehouse vs. Superset
There are two places to implement RLS:
1. Warehouse-level RLS (preferred)
Your warehouse applies the filter at query time. This is the most secure approach because the filter is enforced by the warehouse, not by Superset.
For PostgreSQL Row Security Policies:
-- Enable RLS on the orders table
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;
-- Create a policy: users can only see orders from their region
CREATE POLICY orders_regional_policy ON orders
FOR SELECT
USING (region = current_setting('app.user_region'));
-- When a user queries orders, PostgreSQL automatically filters by their region
SET app.user_region = 'APAC';
SELECT * FROM orders; -- Only returns APAC orders
Snowflake supports similar functionality via dynamic masking and row access policies:
-- Create a row access policy
CREATE ROW ACCESS POLICY orders_policy ON orders AS
(region VARCHAR) RETURNS BOOLEAN ->
CASE
WHEN CURRENT_ROLE() = 'FINANCE_ANALYST' THEN TRUE -- Finance sees all
WHEN CURRENT_ROLE() = 'SALES_REP' THEN region = CURRENT_USER() -- Sales reps see their region
ELSE FALSE
END;
-- Apply the policy to the orders table
ALTER TABLE orders ADD ROW ACCESS POLICY orders_policy ON (region);
2. Superset-level RLS (supplementary)
Superset can apply filters at the dashboard/chart level using Jinja templating. This is useful for:
- Filtering based on the logged-in user’s attributes.
- Applying filters that don’t exist in the warehouse.
- Simplifying dashboard logic.
Example:
-- In a Superset chart's SQL editor
SELECT
order_id,
customer_id,
order_total,
region
FROM orders
WHERE region = '{{ current_user_obj.region }}' -- Superset injects the user's region
Implementing RLS in Superset
Superset provides a row-level security filter feature. Here’s how to set it up:
-
Create a user attribute that defines the filter.
- Go to Settings → Users.
- Edit a user and add a custom attribute:
region = APAC.
-
Create an RLS filter that uses this attribute.
- Go to Settings → Row Level Security.
- Click + Row Level Security.
- Define the filter:
Clause: region = '{{ current_user_obj.region }}' Tables: orders Roles: Sales Rep
-
Apply the filter to datasets.
- When a Sales Rep queries the
ordersdataset, Superset automatically appends the RLS filter. - The query becomes:
SELECT * FROM orders WHERE region = 'APAC'.
- When a Sales Rep queries the
Best Practices for RLS
-
Prefer warehouse-level RLS: It’s more secure and performant. Superset-level RLS is a safety net.
-
Test RLS policies: Create test users with different attributes and verify they see the correct rows.
-- Test: Finance analyst should see all orders SET ROLE finance_analyst; SELECT COUNT(*) FROM orders; -- Should be ~1M rows -- Test: Sales rep should see only their region SET ROLE sales_rep; SET app.user_region = 'EMEA'; SELECT COUNT(*) FROM orders; -- Should be ~100k rows -
Document RLS policies: For compliance audits, document which RLS policies exist, why they exist, and which roles they apply to.
-
Monitor RLS performance: RLS can add overhead. Monitor query performance and adjust policies if needed.
Audit Logging and Compliance {#audit-logging}
Configuring Superset Audit Logs
Superset logs every action: logins, dashboard views, chart edits, queries. These logs are stored in Superset’s metadata database and are critical for compliance.
To enable detailed audit logging, configure superset_config.py:
# Enable audit logging
FALSE_POSITIVE_LOGGING = True # Log all actions
# Log to external system (recommended for production)
import logging
from logging.handlers import SysLogHandler
logging.basicConfig(level=logging.INFO)
audit_logger = logging.getLogger('superset.audit')
audit_logger.addHandler(SysLogHandler(address=('syslog.company.com', 514)))
# Alternatively, log to CloudWatch (AWS)
from watchtower import CloudWatchLogHandler
audit_logger.addHandler(CloudWatchLogHandler(
log_group='superset-audit',
stream_name='production'
))
What to Log
At minimum, log:
- Authentication: Who logged in, when, from which IP address.
- Data access: Which user viewed which dashboard, ran which query.
- Configuration changes: Who created/edited/deleted roles, permissions, datasets.
- Query details: The full SQL query, execution time, rows returned.
Example audit log entry:
{
"timestamp": "2024-01-15T14:32:00Z",
"event_type": "query_executed",
"user": "john.doe@company.com",
"ip_address": "203.0.113.42",
"database": "snowflake",
"schema": "analytics.finance",
"table": "fct_customer_lifetime_value",
"query": "SELECT * FROM fct_customer_lifetime_value WHERE customer_id = 12345",
"execution_time_ms": 1250,
"rows_returned": 1,
"status": "success"
}
Warehouse Audit Logs
Your warehouse also logs queries. These are essential for compliance because they show what data was actually accessed, independent of Superset.
Snowflake query history:
SELECT
QUERY_ID,
USER_NAME,
QUERY_TEXT,
EXECUTION_TIME,
ROWS_RETURNED,
QUERY_TYPE,
START_TIME
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE START_TIME >= CURRENT_DATE - INTERVAL '30 days'
ORDER BY START_TIME DESC;
BigQuery audit logs:
SELECT
timestamp,
principalEmail,
methodName,
resourceName,
status.code
FROM `project.region.cloudaudit_googleapis_com_activity`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
AND methodName LIKE '%bigquery%'
ORDER BY timestamp DESC;
Audit Log Retention and Compliance
For SOC 2 Type II compliance, you need to retain audit logs for at least one year. For HIPAA, six years. For GDPR, it depends on the data, but generally 3-7 years.
Configure log retention:
# In superset_config.py
LOG_RETENTION_DAYS = 365 # Retain logs for 1 year
# Archive old logs to cold storage (S3, GCS, etc.)
import boto3
s3 = boto3.client('s3')
def archive_logs():
# Query logs older than 90 days
old_logs = db.session.query(Log).filter(
Log.created_on < datetime.now() - timedelta(days=90)
).all()
# Archive to S3
for log in old_logs:
s3.put_object(
Bucket='audit-logs-archive',
Key=f"superset/{log.created_on.year}/{log.created_on.month}/{log.id}.json",
Body=log.to_json()
)
# Delete from database
db.session.query(Log).filter(
Log.created_on < datetime.now() - timedelta(days=90)
).delete()
db.session.commit()
Connecting Superset and Warehouse Logs
For a complete audit trail, you need to connect Superset logs to warehouse logs. The challenge: Superset logs “user X ran query Y” but the warehouse logs “service account Z ran query Y.”
To bridge this, include a query ID in every Superset query:
# In Superset, before executing a query
import uuid
query_id = str(uuid.uuid4())
# Append a comment to the query
query_with_id = f"""
{original_query}
/* superset_query_id={query_id}, user={current_user.email} */
"""
# Log the mapping
audit_logger.info({
'superset_query_id': query_id,
'user': current_user.email,
'warehouse_query_id': None # Will be filled by warehouse
})
# Execute the query
result = warehouse.execute(query_with_id)
# After execution, query the warehouse to get the actual query ID
warehouse_query_id = warehouse.get_query_id(query_id)
audit_logger.info({'superset_query_id': query_id, 'warehouse_query_id': warehouse_query_id})
Now, if an auditor asks “Did John access the customers table on Jan 15?” you can:
- Query Superset logs: “John ran query 12345-abcd at 2pm.”
- Query warehouse logs: “Query 12345-abcd (superset_finance user) accessed the
customerstable.” - Prove the connection: John → Superset → Warehouse.
Operational Security Habits {#operational-habits}
Regular Access Reviews
Every quarter, review who has access to what. This is a compliance requirement (SOC 2, ISO 27001) and a security best practice.
Process:
-
Export current permissions.
-- Query Superset metadata SELECT u.username, u.email, r.name AS role, p.permission_name, p.view_menu_id FROM ab_user u JOIN ab_user_role ur ON u.id = ur.user_id JOIN ab_role r ON ur.role_id = r.id JOIN ab_permission_view pv ON r.id = pv.role_id JOIN ab_permission p ON pv.permission_id = p.id ORDER BY u.username, r.name; -
Send to team leads for review.
- “Does John still need Finance access? Does Sarah still need SQL Lab access?”
-
Remove unnecessary access.
- If a user has moved teams, update their role.
- If a user has left the company, deactivate their account.
-
Document the review.
- For compliance, you need evidence that access was reviewed and approved.
Access Review - Q1 2024 Date: 2024-01-31 Reviewed by: Finance Lead, Product Lead, Data Engineering Lead Changes: - Removed John's Product access (moved to Finance) - Removed Sarah's SQL Lab access (role change) - Added 3 new Finance analysts Approved by: Chief Financial Officer
Offboarding Checklist
When an employee leaves, disable their access to Superset and the warehouse immediately.
Offboarding checklist:
- Deactivate Superset user account
- Remove user from all Superset roles
- Disable warehouse user account (or revoke all roles)
- Remove from AD / LDAP groups
- Revoke API tokens and SSH keys
- Archive user’s dashboards and charts
- Transfer ownership of dashboards to remaining team members
- Remove user from dbt Cloud projects
- Audit logs to ensure no access after offboarding date
Managing Secrets and Rotations
Warehouse service account credentials should be rotated every 90 days. This is a compliance control and a security best practice.
Rotation process:
-
Create new service account.
CREATE USER superset_finance_new PASSWORD = '...'; GRANT ROLE superset_finance TO USER superset_finance_new; -
Update Superset connection.
- Go to Data → Databases.
- Edit the Finance database connection.
- Update username and password.
- Test the connection.
-
Verify all queries work.
- Run a test query from each dashboard.
- Check audit logs for errors.
-
Disable old service account.
ALTER USER superset_finance DISABLE; -
Document the rotation.
- For compliance: “Service account
superset_financerotated on 2024-01-15.”
- For compliance: “Service account
Monitoring and Alerting
Set up alerts for suspicious activity:
- Failed login attempts: If a user fails to log in 5+ times, alert the security team.
- Unusual query patterns: If a user suddenly queries tables they’ve never accessed, alert.
- Large data exports: If a user exports >1M rows, alert.
- Off-hours access: If a user accesses Superset outside business hours, alert.
Example alert (using Superset’s alert system):
# In Superset, create an alert
alert = Alert(
name="Large Data Export",
description="Alert when >1M rows are exported",
alert_type="email",
recipients="security@company.com",
condition="rows_returned > 1000000",
chart_id=12345
)
db.session.add(alert)
db.session.commit()
Security Benchmarks and Metrics {#benchmarks}
Key Security Metrics
Track these metrics to measure your security posture:
| Metric | Target | Frequency |
|---|---|---|
| Access review completion | 100% | Quarterly |
| Offboarding completion time | <24 hours | Per employee |
| Credential rotation compliance | 100% | Every 90 days |
| Audit log retention | 365+ days | Continuous |
| Failed login attempts | <5 per user/month | Weekly review |
| Unusual query alerts | <2% false positives | Ongoing tuning |
| SOC 2 / ISO 27001 audit pass | 100% | Annually |
Benchmarking Against Competitors
How does your Superset + dbt security stack up against competitors?
Thoughtworks and Slalom typically implement:
- SAML authentication
- Role-based access control with 5-10 roles
- Warehouse-level RBAC
- Basic audit logging
- Annual compliance reviews
PADISO’s approach (based on work with Series-B to growth-stage companies):
- SAML or LDAP authentication
- Granular role-based access (10-20+ roles)
- Warehouse-level RBAC + Superset-level RLS
- Comprehensive audit logging with warehouse integration
- Quarterly access reviews
- dbt governance integration
- Vanta-ready compliance architecture
For companies pursuing Security Audit | PADISO - SOC 2, ISO 27001 & GDPR Compliance, the PADISO model is designed to pass audits faster and with less operational overhead.
Cost Benchmarks
Securing Superset + dbt isn’t free, but it’s predictable:
| Component | Cost | Notes |
|---|---|---|
| Superset hosting | $500-2000/month | AWS/GCP, depends on scale |
| Warehouse service accounts | $0 | Included in warehouse cost |
| Audit logging | $100-500/month | CloudWatch, Datadog, Splunk |
| dbt Cloud | $1000-5000/month | Depends on project count |
| Identity provider (SAML) | $0-500/month | Okta, Azure AD, Ping |
| Compliance tooling (Vanta) | $500-2000/month | For SOC 2 / ISO 27001 |
| Security review / audit | $5000-20000/year | Annual or per-audit |
Total: $1500-10000/month depending on scale and compliance requirements.
For context, a single data breach can cost $4M+ (IBM, 2023). Investing in security infrastructure is cost-effective.
Common Pitfalls and Solutions {#pitfalls}
Pitfall 1: Overly Permissive Superset Roles
Problem: You create a “Viewer” role that can see all dashboards. Over time, dashboards accumulate sensitive data. Viewers now have access to data they shouldn’t.
Solution: Create separate roles for each data domain.
# Bad
Viewers (can see all dashboards)
# Good
Finance Viewers (can see finance dashboards)
Product Viewers (can see product dashboards)
Executive Viewers (can see executive dashboards)
Pitfall 2: Relying Only on Superset Permissions
Problem: You think Superset permissions are enough. But a user bypasses Superset and queries the warehouse directly. Superset permissions don’t apply.
Solution: Always implement warehouse-level RBAC as your primary security boundary. Superset permissions are secondary.
# Bad: Only Superset permissions
# Good: Warehouse RBAC (primary) + Superset permissions (secondary)
GRANT USAGE ON SCHEMA analytics.finance TO ROLE superset_finance; # Primary
GRANT SELECT ON ALL TABLES IN SCHEMA analytics.finance TO ROLE superset_finance;
# Then in Superset, restrict access to the finance database connection # Secondary
Pitfall 3: Service Account Credentials in Code
Problem: You hardcode warehouse credentials in Superset config or commit them to Git. A developer with repository access can steal credentials.
Solution: Use a secrets manager (AWS Secrets Manager, HashiCorp Vault, etc.).
# Bad
SUPERSET_WAREHOUSE_PASSWORD = "P@ssw0rd123"
# Good
import boto3
secrets_client = boto3.client('secretsmanager')
secret = secrets_client.get_secret_value(SecretId='superset-warehouse-password')
SUPERSET_WAREHOUSE_PASSWORD = secret['SecretString']
Pitfall 4: No Audit Trail
Problem: You have Superset permissions and warehouse RBAC, but no logs. When a data breach happens, you can’t prove who accessed what.
Solution: Enable comprehensive audit logging and retain logs for 1+ years.
# In superset_config.py
FALSE_POSITIVE_LOGGING = True
LOG_RETENTION_DAYS = 365
# Log to external system
import logging
audit_logger = logging.getLogger('superset.audit')
audit_logger.addHandler(CloudWatchLogHandler(log_group='superset-audit'))
Pitfall 5: Orphaned Permissions
Problem: An employee leaves. You deactivate their Superset account, but they still have warehouse access (different team manages it). They can still query data.
Solution: Implement a centralized identity provider (LDAP/SAML) and sync permissions across systems.
# In superset_config.py
AUTH_TYPE = AUTH_LDAP
AUTH_LDAP_SYNC_USER_INFO = True # Sync on every login
# When user is disabled in LDAP, Superset automatically removes access
Pitfall 6: dbt Models Without Ownership
Problem: You have dbt models but no metadata about who owns them. When a model breaks, nobody knows who to contact. When a user requests access, you can’t verify if they should have it.
Solution: Define ownership and access tiers in dbt models.
# In dbt_project.yml
models:
finance:
fct_revenue:
config:
meta:
owner: "finance@company.com"
access: "restricted"
teams: ["finance", "executive"]
Next Steps: Audit-Ready Architecture {#next-steps}
Building Towards SOC 2 and ISO 27001
If you’re planning to pursue SOC 2 Type II or ISO 27001 compliance, your Superset + dbt security model is foundational. Auditors will ask:
- How do you authenticate users? (SAML, LDAP, local users?)
- How do you authorize access? (Role-based? Attribute-based?)
- How do you audit access? (Logs? Retention? Integration?)
- How do you manage secrets? (Rotation? Encryption?)
- How do you handle offboarding? (Timelines? Verification?)
If your Superset + dbt setup is solid, you’re 50% of the way to compliance. The other 50% is infrastructure (network security, encryption, incident response, etc.).
For Security Audit | PADISO - SOC 2, ISO 27001 & GDPR Compliance, we typically help companies get audit-ready in 8-12 weeks using Vanta. The process is:
- Assessment: Review current Superset + dbt setup, identify gaps.
- Configuration: Implement missing controls (audit logging, RLS, etc.).
- Automation: Wire Vanta to pull evidence from Superset, warehouse, dbt.
- Testing: Run mock audits to identify remaining issues.
- Remediation: Fix any gaps.
- Audit: Pass SOC 2 / ISO 27001 audit.
Scaling Superset + dbt Across Regions
If you’re expanding internationally, consider regional deployments:
- APAC region: Platform Development in Sydney | PADISO for Australia, Platform Development in New Zealand | PADISO for NZ, with Platform Development in Australia | PADISO for broader coverage.
- North America: Platform Development in New York | PADISO for financial services, Platform Development in Washington, D.C. | PADISO for government/defence.
- North America (Canada): Platform Development in Canada | PADISO for multi-tenant SaaS with ClickHouse analytics.
- Broad coverage: Platform Development in United States | PADISO and Platform Development in Melbourne | PADISO for retail/insurance/health teams.
Each region has different compliance requirements (GDPR in EU, PIPEDA in Canada, IRAP in Australia). Your Superset + dbt security model should be flexible enough to adapt.
Automation and CI/CD
As you scale, automate security controls:
- dbt CI/CD: Run dbt tests and lineage checks on every commit.
- Superset CI/CD: Version-control dashboard definitions, deploy via Git.
- Warehouse CI/CD: Automate role and permission creation.
- Compliance CI/CD: Automatically run compliance checks (Vanta) and generate audit reports.
Getting Help
Securing Superset + dbt is complex. If you’re building this from scratch, consider partnering with a venture studio or platform engineering team.
At PADISO, we’ve built Superset + dbt security models for 50+ companies across finance, retail, health, and SaaS. We can help you:
- Design a security architecture that scales.
- Implement RBAC, RLS, audit logging, and compliance controls.
- Automate access reviews, credential rotation, and offboarding.
- Pass SOC 2 / ISO 27001 audits in 8-12 weeks using Vanta.
For Services | PADISO - CTO as a Service, Custom Software, AI & Automation, we offer fractional CTO leadership and co-build support for security and platform engineering.
Start with a Security Audit | PADISO - SOC 2, ISO 27001 & GDPR Compliance to assess your current state and identify gaps. Then, build towards compliance with confidence.
Summary
Apache Superset + dbt security is layered, not monolithic. You need:
- Authentication: Who are you? (SAML, LDAP)
- Superset Authorization: What can you access in Superset? (Role-based permissions)
- Warehouse Authorization: What can you query? (RBAC, RLS)
- dbt Governance: What models are you supposed to use? (Metadata, ownership, access tiers)
- Audit Logging: Who accessed what, when, and why? (Comprehensive logs, 1+ year retention)
- Operational Habits: Regular access reviews, offboarding checklists, credential rotation
When all five layers work together, you have a security model that:
- Scales from 5 users to 500.
- Passes SOC 2 and ISO 27001 audits.
- Prevents data leakage via defence in depth.
- Enables fast, safe analytics for every team.
Start with warehouse-level RBAC (your hardest boundary), then layer Superset permissions and audit logging. Automate access reviews and credential rotation. Document everything for compliance.
If you’re ready to build this, or need help securing your existing Superset + dbt stack, reach out to PADISO: AI Solutions & Strategic Leadership — AIR Bootcamps | SOC2 & ISO27001 via Vanta or check out our Case Studies | PADISO - Real Results for Real Businesses to see how we’ve helped other companies get secure and compliant.