Apache Superset + Iceberg: Security Model
Table of Contents
- Why Superset + Iceberg Security Matters
- Understanding the Superset Security Architecture
- Iceberg’s Data Governance Layer
- Access Control Patterns
- Configuration Hardening
- Audit Readiness and Compliance
- Operational Habits for Production
- Common Pitfalls and Recovery
- Benchmarks and Performance Trade-offs
- Next Steps and Implementation
Why Superset + Iceberg Security Matters
When you pair Apache Superset with Apache Iceberg, you’re building a modern analytics platform capable of handling petabyte-scale data with ACID transactions and time-travel queries. But scale and capability introduce risk. Superset sits at the edge of your data layer—it’s the user-facing portal to sensitive datasets. Iceberg controls the metadata and table versioning underneath. If either layer leaks access or misconfigures permissions, you expose PII, financial records, or competitive intelligence to unauthorised users.
The stakes are real. A critical access control flaw in Apache Superset demonstrated how role-based access control (RBAC) bypasses can expose sensitive data to unauthorised users. Iceberg’s metadata layer, while immutable and audit-friendly, can still leak schema and partition information if not properly gated. Together, misconfiguration can spiral.
This guide walks you through the operational security model for running Superset atop Iceberg in production. We cover configuration patterns, benchmarks, and the daily habits that keep your analytics platform secure, auditable, and compliant.
Understanding the Superset Security Architecture
Apache Superset’s security model rests on four pillars: authentication, authorisation, encryption, and audit logging. Each must be correctly configured to prevent unauthorised access to dashboards, charts, and underlying datasets.
Authentication and Identity
Superset supports multiple authentication backends: local username/password, LDAP, OAUTH2, SAML, and API keys. In production, you should never rely on local credentials alone. Instead, integrate with your organisation’s identity provider (Okta, Azure AD, Ping Identity, or Auth0).
Why? Local credentials are hard to revoke at scale. If an employee leaves, you must manually delete their Superset user. With SAML or OAUTH2, deactivating them in your identity provider automatically revokes Superset access within minutes. This is non-negotiable for SOC 2 and ISO 27001 audit readiness—auditors will ask for evidence of timely access revocation.
Configuration example:
from superset.extensions import security_manager
# Enable SAML authentication
AUTH_TYPE = AUTH_SAML
SAML_METADATA_URL = "https://your-idp.example.com/metadata"
SAML_ASSERTION_CONSUMER_SERVICE_URL = "https://superset.example.com/saml/acs"
For API access, use short-lived tokens instead of permanent API keys. Superset 2.0+ supports JWT tokens with configurable expiry. Rotate them every 90 days.
Role-Based Access Control (RBAC)
Superset’s RBAC model assigns permissions to roles, then binds users to roles. The granularity is critical. Superset allows you to control access at the database, dataset (table), and chart level.
However, the official Superset security documentation warns that dataset-level access control has known limitations. A user with access to a dataset can sometimes infer data about other datasets through metadata queries or cross-filtering. This is why you must layer security at the query level as well.
Role Design Pattern:
- Analyst: Read-only access to specific datasets. Can view dashboards and run ad-hoc queries on permitted tables.
- Dashboard Editor: Can create and modify dashboards using analyst-accessible datasets.
- Data Engineer: Full access to database connections, dataset definitions, and metadata. Cannot see dashboard content unless explicitly granted.
- Admin: Full platform access. Responsible for user management, role assignment, and audit log review.
Never grant Admin to end-users. Use a separate admin account for platform operations, separate from your personal user account.
Column-Level Security
Iceberg’s schema is immutable, but Superset can mask or redact columns at query time using WHERE clauses or virtual columns. For sensitive fields like email, phone, or SSN, define a virtual dataset that excludes or encrypts those columns before users see them.
Example: Create a view in your data warehouse that strips PII:
CREATE VIEW customer_safe AS
SELECT
customer_id,
customer_name,
'***-****' AS phone_masked,
country
FROM customer_raw
WHERE region IN ('APAC', 'EMEA');
Then expose customer_safe to Superset, not customer_raw. Iceberg’s time-travel and snapshots make it easy to audit who queried what and when.
Row-Level Security (RLS)
Row-level security ensures users see only data relevant to them. Superset supports RLS through dataset filters. When a user runs a query, Superset automatically injects a WHERE clause based on their role or attributes.
For example, a regional sales manager should see only sales from their region. Configure an RLS rule:
{
"dataset_id": 42,
"clause": "region = '{{ current_user_region }}'",
"roles": ["sales_manager"]
}
This requires that your identity provider passes user attributes (region, department, cost centre) as SAML assertions or JWT claims. Superset reads these and applies filters automatically.
Iceberg’s Data Governance Layer
Iceberg is a table format, not a database. It stores data and metadata in your object store (S3, GCS, Azure Blob) or data lake. Its security model differs from traditional databases because there’s no centralised authentication layer. Instead, security is delegated to the query engine (Trino, Spark, Flink) and the object store’s access controls.
Metadata and Schema Security
Iceberg’s security documentation emphasises that table metadata—schema, partition layout, snapshots—is often less sensitive than data itself, but it still reveals structure. If an attacker learns that your customer table has columns for credit card and SSN, they know where to look.
Iceberg stores metadata in a metadata folder within the table’s object store path. Control access to this folder via bucket policies or IAM roles. In AWS:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::ACCOUNT:role/superset-query-role"
},
"Action": [
"s3:GetObject",
"s3:GetObjectVersion"
],
"Resource": "arn:aws:s3:::data-lake/iceberg/metadata/*"
},
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::ACCOUNT:role/superset-query-role"
},
"Action": [
"s3:GetObject"
],
"Resource": "arn:aws:s3:::data-lake/iceberg/data/*"
}
]
}
This policy allows the Superset query role to read metadata and data, but not write or delete. Immutability at the storage layer is a strong baseline.
Partition Pruning and Data Exposure
Iceberg’s partition layout is optimised for query performance. But partition keys can leak sensitive information. If you partition by customer_id, an attacker can infer the cardinality and distribution of customers. Partition by year and region instead—less granular, but safer.
Time-Travel and Snapshot Isolation
Iceberg’s snapshots and time-travel queries are powerful for auditing. A user can query data as it existed at a specific timestamp, which is excellent for compliance. However, this also means old snapshots remain accessible unless explicitly expired.
Set a snapshot retention policy:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("iceberg").getOrCreate()
# Expire old snapshots (keep only 30 days of history)
spark.sql("""
ALTER TABLE my_catalog.my_namespace.my_table
SET TBLPROPERTIES (
'history.expire.max-snapshot-age-ms'='2592000000' -- 30 days in ms
)
""")
This prevents an attacker from querying deleted records by reverting to an old snapshot. However, it also means you lose historical audit trails beyond 30 days. Decide based on your compliance requirements. SOC 2 typically requires 1 year of audit logs; ISO 27001 requires 3 years. Store snapshots in a separate, immutable archive for compliance.
Access Control Patterns
Now we move from theory to practice. Here are three battle-tested patterns for securing Superset + Iceberg.
Pattern 1: Query Engine as the Security Boundary
In this pattern, Superset connects to Iceberg through a query engine like Trino, which acts as the security boundary. Trino enforces authentication, row-level filters, and column masking before data reaches Superset.
Architecture:
User (via SAML) → Superset → Trino (RLS, Column Masking) → Iceberg → S3
Superset passes the authenticated user’s identity to Trino via a connection property. Trino reads the user’s attributes and applies filters automatically.
Implementation:
- Configure Superset to pass the current user’s username to Trino:
# In superset_config.py
SQLALCHEMY_CONNECT_ARGS = {
"user": "{{ current_user_name }}",
"password": "{{ trino_service_password }}"
}
- Configure Trino to enforce RLS rules based on the username:
# In Trino's access-control.properties
access-control.name=file
security.config.file=/etc/trino/access-control.json
- Define RLS rules in
/etc/trino/access-control.json:
{
"catalogs": [
{
"catalog": "iceberg",
"tables": [
{
"schema": "analytics",
"table": "customer",
"filter": "region = CURRENT_USER_REGION()"
}
]
}
]
}
Pros: Centralised security enforcement. Trino audits all queries before they reach Iceberg. Simple to audit.
Cons: Trino becomes a performance bottleneck if not sized correctly. Query latency increases by 10–20% due to filter injection.
Pattern 2: Service Account with Fine-Grained IAM Roles
In this pattern, Superset uses a single service account to connect to Iceberg. Fine-grained access control is enforced via IAM roles at the object store level (AWS S3, GCS, Azure Blob).
Architecture:
User (via SAML) → Superset (Role-Based Dataset Access) → IAM Role → Iceberg → S3
Superset checks which datasets a user can access. If they can, Superset uses the service account’s IAM role to query Iceberg. The IAM role is scoped to specific S3 prefixes (tables).
Implementation:
- Create an IAM role for Superset with minimal permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "s3:GetObject",
"Resource": [
"arn:aws:s3:::data-lake/iceberg/analytics/customer/*",
"arn:aws:s3:::data-lake/iceberg/analytics/order/*"
]
},
{
"Effect": "Deny",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::data-lake/iceberg/analytics/payment/*"
}
]
}
- In Superset, configure dataset-level access control. Only users with the “Analyst” role can access the
customerdataset:
# In Superset's database connection
connection = {
"engine": "iceberg",
"catalog": "s3",
"warehouse": "s3://data-lake/iceberg",
"iam_role": "arn:aws:iam::ACCOUNT:role/superset-service"
}
- Assign roles to users in Superset’s UI. Users with the “Analyst” role see the
customerdataset; others don’t.
Pros: Simpler architecture. No additional query engine overhead. IAM roles are auditable via CloudTrail.
Cons: Coarse-grained. All users with access to a dataset see all rows. RLS requires application-level filtering, which is slower and error-prone.
Pattern 3: Hybrid with Data Masking and Encryption
For highly sensitive data, combine patterns 1 and 2 with encryption and masking.
Architecture:
User → Superset (RBAC) → Trino (RLS) → Iceberg → S3 (Encrypted, Masked)
Data at rest in S3 is encrypted with AWS KMS. Sensitive columns are masked in Superset’s virtual datasets. Trino enforces row-level filters.
Implementation:
- Enable S3 encryption:
aws s3api put-bucket-encryption \
--bucket data-lake \
--server-side-encryption-configuration '{
"Rules": [{
"ApplyServerSideEncryptionByDefault": {
"SSEAlgorithm": "aws:kms",
"KMSMasterKeyID": "arn:aws:kms:us-east-1:ACCOUNT:key/KEY_ID"
}
}]
}'
- Create a masked view in Superset:
CREATE VIEW customer_masked AS
SELECT
customer_id,
customer_name,
SUBSTR(email, 1, 3) || '***@' || SUBSTR(email, POSITION('@' IN email)) AS email_masked,
country
FROM customer_raw;
- Expose
customer_maskedto Superset. Trino applies RLS on top:
{
"table": "customer_masked",
"filter": "country = CURRENT_USER_COUNTRY()"
}
Pros: Defence in depth. Even if Superset is compromised, S3 encryption and masking limit exposure.
Cons: Operational complexity. More moving parts to configure and audit.
Configuration Hardening
Secure configuration is the foundation. Here are the critical settings for production Superset + Iceberg deployments.
Superset Configuration
1. Disable Unsafe Features
Superset has several features that can leak data if misconfigured:
# In superset_config.py
# Disable SQL Lab for non-admins (prevents ad-hoc queries)
SQL_LAB_ALLOW_TEMPLATED_QUERIES = False
SQLLAB_FEATURE_FLAG = False # Or restrict to admin role
# Disable public dashboards
FEATURE_FLAGS = {
"ALLOW_DASHBOARD_EXPORT": False,
"ALLOW_DATASET_EXPORT": False,
"ENABLE_TEMPLATE_PROCESSING": False # Prevents Jinja2 injection
}
# Disable guest tokens
FEATURE_FLAGS["GUEST_TOKEN_AUTH"] = False
2. Enable CSRF Protection
Cross-Site Request Forgery (CSRF) attacks can trick users into modifying dashboards or running queries. Superset includes CSRF protection, but it must be enabled:
WTF_CSRF_ENABLED = True
WTF_CSRF_CHECK_DEFAULT = True
WTF_CSRF_TIME_LIMIT = None # No expiry
3. Enforce HTTPS
All Superset traffic must be encrypted. Configure your reverse proxy (nginx, Apache) to enforce HTTPS and set security headers:
server {
listen 443 ssl http2;
server_name superset.example.com;
ssl_certificate /etc/ssl/certs/superset.crt;
ssl_certificate_key /etc/ssl/private/superset.key;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
# Security headers
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-XSS-Protection "1; mode=block" always;
add_header Referrer-Policy "strict-origin-when-cross-origin" always;
location / {
proxy_pass http://superset:8088;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
4. Secret Management
Database passwords, API keys, and encryption keys must never be hardcoded. Use a secrets manager:
import os
from aws_secretsmanager_caching import SecretCache
cache = SecretCache()
db_password = cache.get_secret_string("superset/db-password")
db_uri = f"postgresql://user:{db_password}@db.example.com/superset"
SQLALCHEMY_DATABASE_URI = db_uri
Rotate secrets every 90 days. Superset supports reading secrets from environment variables; use this in production:
export SQLALCHEMY_DATABASE_URI="postgresql://user:${DB_PASSWORD}@db.example.com/superset"
Iceberg Configuration
1. Immutable Metadata
Ensure Iceberg metadata is immutable and versioned:
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName("iceberg") \
.config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
.config("spark.sql.catalog.my_catalog", "org.apache.iceberg.spark.SparkCatalog") \
.config("spark.sql.catalog.my_catalog.type", "hive") \
.config("spark.sql.catalog.my_catalog.warehouse", "s3://data-lake/iceberg") \
.getOrCreate()
# Prevent accidental writes to metadata
spark.sql("ALTER TABLE my_catalog.my_namespace.my_table SET TBLPROPERTIES ('write.update.mode'='merge-on-read')")
2. Snapshot Retention and Expiry
Set aggressive snapshot expiry to reduce the window for data recovery attacks:
spark.sql("""
ALTER TABLE my_catalog.my_namespace.my_table
SET TBLPROPERTIES (
'history.expire.max-snapshot-age-ms'='604800000', -- 7 days
'history.expire.min-snapshots-to-keep'='2' -- Always keep at least 2 snapshots
)
""")
3. S3 Bucket Policies
Lock down S3 bucket access to specific IAM roles and IP ranges:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::ACCOUNT:role/superset-service"
},
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::data-lake/*",
"Condition": {
"IpAddress": {
"aws:SourceIp": ["10.0.0.0/8"]
}
}
},
{
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": "arn:aws:s3:::data-lake/*",
"Condition": {
"Bool": {
"aws:SecureTransport": "false"
}
}
}
]
}
Audit Readiness and Compliance
SOC 2 Type II and ISO 27001 audits require evidence of access control, change management, and incident response. Superset + Iceberg can support this, but you must configure audit logging correctly.
Audit Logging Strategy
Log three layers: Superset application logs, Trino query logs, and S3 access logs.
Superset Application Logs:
Enable the audit log extension:
FEATURE_FLAGS = {
"ENABLE_SUPERSET_LOGGING": True,
"ENABLE_SUPERSET_META_DB_LOGGING": True
}
# Send logs to CloudWatch or Splunk
LOGGING_CONFIG = {
"version": 1,
"handlers": {
"cloudwatch": {
"class": "watchtower.CloudWatchLogHandler",
"log_group": "/superset/audit",
"stream_name": "superset-app"
}
},
"root": {
"handlers": ["cloudwatch"]
}
}
Superset logs include:
- User login/logout
- Dashboard and chart access
- Dataset modifications
- Query execution
- Role and permission changes
Trino Query Logs:
Enable query logging in Trino:
# In Trino's config.properties
event-listener.config-files=/etc/trino/event-listeners.json
Create an event listener that logs all queries to a central location:
{
"event-listeners": [
{
"class": "com.example.QueryAuditListener",
"properties": {
"log.destination": "s3://audit-logs/trino-queries"
}
}
]
}
Trino logs include:
- Query text
- Execution user
- Start and end time
- Rows scanned and returned
- Success or failure
S3 Access Logs:
Enable S3 access logging to track all reads and writes:
aws s3api put-bucket-logging \
--bucket data-lake \
--bucket-logging-status '{
"LoggingEnabled": {
"TargetBucket": "audit-logs",
"TargetPrefix": "s3-access-logs/"
}
}'
S3 logs include:
- Requester (AWS account, IAM role)
- Operation (GET, PUT, DELETE)
- Key (S3 path)
- Timestamp
- HTTP status
Audit Log Retention
Retain logs for at least 3 years for ISO 27001. Use S3 Glacier for cost-effective long-term storage:
aws s3api put-bucket-lifecycle-configuration \
--bucket audit-logs \
--lifecycle-configuration '{
"Rules": [{
"Id": "MoveToGlacier",
"Status": "Enabled",
"Transitions": [{
"Days": 90,
"StorageClass": "GLACIER"
}],
"Expiration": {
"Days": 1095
}
}]
}'
Compliance Mapping
When auditors ask for evidence of access control, map audit logs to compliance requirements:
- SOC 2 CC6.1 (Logical Access Controls): Show Superset role assignments and login logs.
- SOC 2 CC7.2 (System Monitoring): Show query logs and access attempts.
- ISO 27001 A.9.2 (User Access Management): Show user provisioning and deprovisioning logs.
- ISO 27001 A.12.4.1 (Event Logging): Show application and system logs.
Many organisations use Vanta to automate compliance evidence collection. Vanta integrates with AWS, Okta, and other platforms to pull audit logs automatically, significantly reducing the time and effort for SOC 2 and ISO 27001 audits.
For enterprises pursuing compliance, consider partnering with a team experienced in audit-ready architectures. PADISO’s Security Audit service helps teams get audit-ready in weeks, not months, by implementing the right logging, access control, and evidence collection patterns from day one.
Operational Habits for Production
Configuration is static; operations are dynamic. Here are the daily and weekly habits that keep Superset + Iceberg secure.
Weekly Access Review
Every Monday, review who has access to what:
- Query Superset’s user and role table:
SELECT
u.username,
u.is_active,
STRING_AGG(r.name, ', ') AS roles,
u.changed_on
FROM ab_user u
LEFT JOIN ab_user_role ur ON u.id = ur.user_id
LEFT JOIN ab_role r ON ur.role_id = r.id
GROUP BY u.id, u.username, u.is_active, u.changed_on
ORDER BY u.changed_on DESC
LIMIT 50;
-
Cross-reference with your HR system. Have any users left? Are their Superset accounts still active? If yes, disable them immediately.
-
Review role changes. Did a junior analyst get promoted to admin? Verify the change request.
Monthly Snapshot Audit
Every month, audit Iceberg snapshots to ensure old data is being expired:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("iceberg-audit").getOrCreate()
# List all snapshots for a table
spark.sql("""
SELECT
snapshot_id,
timestamp_ms,
operation,
committed_at
FROM my_catalog.my_namespace.my_table.snapshots
ORDER BY timestamp_ms DESC
LIMIT 100
""").show()
If snapshots older than 30 days still exist, trigger expiry manually:
spark.sql("""
ALTER TABLE my_catalog.my_namespace.my_table
EXPIRE SNAPSHOTS OLDER_THAN CURRENT_TIMESTAMP - INTERVAL '30' DAY
""")
Quarterly Permission Audit
Every quarter, audit dataset-level permissions. Superset stores these in the database:
SELECT
r.name AS role,
ds.dataset_name,
ds.schema,
COUNT(*) AS permission_count
FROM ab_role r
LEFT JOIN ab_permission_view pv ON r.id = pv.role_id
LEFT JOIN ab_dataset ds ON pv.view_menu_id = ds.id
GROUP BY r.name, ds.dataset_name, ds.schema
ORDER BY permission_count DESC;
Look for anomalies:
- Analysts with access to payment or PII datasets.
- Contractors with access to internal datasets 6 months after their contract ended.
- Roles with overly broad permissions (e.g., a single role with access to 100+ datasets).
If you find issues, revoke permissions immediately and investigate the root cause.
Query Performance and Anomaly Detection
Monitor Trino query logs for anomalies:
SELECT
user,
COUNT(*) AS query_count,
AVG(execution_time_ms) AS avg_duration_ms,
MAX(rows_scanned) AS max_rows_scanned
FROM trino_query_log
WHERE query_date = CURRENT_DATE
GROUP BY user
ORDER BY query_count DESC
LIMIT 20;
Alert if:
- A user runs 1000+ queries in a day (possible data exfiltration).
- A query scans 10B+ rows (possible misconfiguration or abuse).
- A user queries datasets they don’t normally access.
Set up automated alerts in your monitoring tool (Datadog, New Relic, CloudWatch).
Credential Rotation
Rotate database passwords and API keys every 90 days:
# In your secrets manager (AWS Secrets Manager, HashiCorp Vault)
aws secretsmanager rotate-secret \
--secret-id superset/db-password \
--rotation-rules AutomaticallyAfterDays=90
Superset will automatically pick up the new password from the secrets manager on the next restart.
Common Pitfalls and Recovery
Even with careful planning, things go wrong. Here’s how to recover from common security incidents.
Pitfall 1: Accidental Public Dashboard
A user creates a dashboard with sensitive data and forgets to set permissions. The dashboard is accessible to anyone with the URL.
Recovery:
- Immediately disable the dashboard:
from superset.models.dashboard import Dashboard
from superset.extensions import db
dash = db.session.query(Dashboard).filter_by(id=DASHBOARD_ID).first()
dash.published = False
dash.is_managed_externally = False
db.session.commit()
- Check audit logs to see who accessed it and when:
SELECT
event_time,
user,
action,
dashboard_id
FROM superset_audit_log
WHERE dashboard_id = DASHBOARD_ID
AND event_time > NOW() - INTERVAL '7 days'
ORDER BY event_time DESC;
- If sensitive data was exposed, notify your security team and consider it a potential breach. Follow your incident response plan.
Pitfall 2: Expired Snapshots Not Cleaned Up
You intended to expire snapshots older than 30 days, but the expiry job failed silently. Now you have 2 years of snapshots taking up 500 GB of S3 storage and creating a security liability.
Recovery:
- Check the Iceberg table properties:
spark.sql("""
SHOW TBLPROPERTIES my_catalog.my_namespace.my_table
""").show()
- If
history.expire.max-snapshot-age-msis not set, set it now:
spark.sql("""
ALTER TABLE my_catalog.my_namespace.my_table
SET TBLPROPERTIES (
'history.expire.max-snapshot-age-ms'='604800000'
)
""")
- Manually expire old snapshots:
spark.sql("""
ALTER TABLE my_catalog.my_namespace.my_table
EXPIRE SNAPSHOTS OLDER_THAN CURRENT_TIMESTAMP - INTERVAL '30' DAY
""")
- Monitor the S3 bucket to confirm old data files are deleted (this may take 24 hours).
Pitfall 3: Superset Access Control Bypass
A user without permission to a dataset somehow queries it. This could be due to a Superset bug (like the one documented in OX Security’s research) or misconfigured RLS.
Recovery:
- Immediately identify which queries were run:
SELECT
event_time,
user,
dataset_id,
query_text
FROM superset_audit_log
WHERE action = 'query_execution'
AND event_time > NOW() - INTERVAL '24 hours'
ORDER BY event_time DESC;
- Check if the user had permission:
SELECT
r.name AS role,
ds.dataset_name,
pv.permission_name
FROM ab_user u
JOIN ab_user_role ur ON u.id = ur.user_id
JOIN ab_role r ON ur.role_id = r.id
JOIN ab_permission_view pv ON r.id = pv.role_id
JOIN ab_dataset ds ON pv.view_menu_id = ds.id
WHERE u.username = 'USERNAME'
AND ds.id = DATASET_ID;
-
If the user didn’t have permission, this is a security incident. Escalate to your security team.
-
Check the Superset version and apply patches:
pip show apache-superset
# If version < 2.0.1, upgrade immediately
pip install --upgrade apache-superset
- Review all RLS rules to ensure they’re correctly applied.
Benchmarks and Performance Trade-offs
Security and performance are often at odds. Here are realistic benchmarks for the three access control patterns.
Pattern 1: Trino with RLS
Latency Impact:
- Simple query (< 100 MB scanned): +15–20 ms
- Medium query (100 MB – 1 GB): +50–100 ms
- Large query (> 1 GB): +200–500 ms
Why? Trino injects RLS filters into the query plan, which adds planning overhead. The actual data filtering is fast (done at the Iceberg level), but the overhead is noticeable for large queries.
Throughput:
- Single user: 100 queries/minute
- 10 concurrent users: 50 queries/minute per user (500 total)
- 100 concurrent users: 5 queries/minute per user (500 total)
Trino’s concurrency is limited by its coordinator node. Beyond 100 concurrent users, you’ll need to scale Trino horizontally.
Resource Cost:
- Trino cluster: 3 nodes (m5.2xlarge) = $1,500/month
- Superset: 1 node (m5.xlarge) = $400/month
- Total: $1,900/month
Pattern 2: Service Account with IAM Roles
Latency Impact:
- Simple query: +5–10 ms (S3 credential lookup)
- Medium query: +10–20 ms
- Large query: +20–50 ms
Why? Minimal overhead. Superset uses the service account’s IAM role, which is cached.
Throughput:
- Single user: 200 queries/minute
- 10 concurrent users: 100 queries/minute per user (1,000 total)
- 100 concurrent users: 50 queries/minute per user (5,000 total)
Superset’s throughput is much higher because there’s no intermediate query engine.
Resource Cost:
- Superset: 1 node (m5.xlarge) = $400/month
- S3 API calls: ~$5/month (very cheap)
- Total: $405/month
Pattern 3: Hybrid with Encryption and Masking
Latency Impact:
- Simple query: +30–50 ms (Trino RLS + S3 encryption/decryption)
- Medium query: +100–200 ms
- Large query: +500–1,000 ms
Why? Encryption and masking add CPU overhead. S3 decryption happens on every read.
Throughput:
- Single user: 50 queries/minute
- 10 concurrent users: 25 queries/minute per user (250 total)
- 100 concurrent users: 2 queries/minute per user (200 total)
Throughput is significantly lower due to encryption overhead.
Resource Cost:
- Trino cluster: 5 nodes (m5.2xlarge) = $2,500/month (more CPU for decryption)
- Superset: 1 node (m5.xlarge) = $400/month
- KMS key: $1/month
- Total: $2,901/month
Recommendation
For most organisations, Pattern 1 (Trino with RLS) is the sweet spot. It provides strong security (row-level filtering enforced at the query engine), acceptable latency (50–100 ms for typical queries), and reasonable cost ($1,900/month for a mid-market deployment).
Use Pattern 2 if you have simple access control needs (e.g., analysts only see their region’s data) and want to minimise cost and latency.
Use Pattern 3 only if you have highly sensitive data (e.g., healthcare, financial) and can tolerate the performance hit.
Next Steps and Implementation
You now understand the security model for Superset + Iceberg. Here’s how to implement it in your organisation.
Phase 1: Assessment (Weeks 1–2)
-
Audit your current Superset deployment:
- Who has access to what datasets?
- Are RBAC rules correctly configured?
- Are audit logs being collected?
-
Review your Iceberg setup:
- How are snapshots being managed?
- Are S3 bucket policies restrictive?
- Is encryption enabled?
-
Identify gaps against your compliance requirements (SOC 2, ISO 27001, GDPR).
Phase 2: Design (Weeks 3–4)
-
Choose your access control pattern (1, 2, or 3).
-
Design your role hierarchy:
- Admin
- Data Engineer
- Dashboard Editor
- Analyst
- Viewer
-
Map datasets to roles.
-
Define RLS rules if using Pattern 1 or 3.
Phase 3: Implementation (Weeks 5–8)
-
Set up authentication (SAML/OAUTH2).
-
Configure Superset RBAC and dataset access.
-
Deploy Trino (if using Pattern 1 or 3).
-
Set up audit logging (Superset, Trino, S3).
-
Configure snapshot expiry in Iceberg.
-
Enable S3 encryption and bucket policies.
Phase 4: Validation (Weeks 9–10)
-
Run penetration tests to verify access control.
-
Audit logs to confirm all queries are being logged.
-
Test incident response procedures (e.g., revoking access).
-
Document your security model for auditors.
Phase 5: Ongoing Operations
-
Weekly access reviews.
-
Monthly snapshot audits.
-
Quarterly permission audits.
-
Annual security assessment.
If you’re building a data platform at scale, consider partnering with a team experienced in secure, audit-ready architectures. PADISO specialises in platform engineering for data-intensive organisations. Our Platform Development services cover Superset, Iceberg, and the full modern data stack across Australia and globally.
For organisations pursuing SOC 2 or ISO 27001 compliance, PADISO’s Security Audit service accelerates the audit-ready process using automated evidence collection via Vanta. We’ve helped 50+ teams pass audits in weeks instead of months by implementing the right logging, access control, and governance patterns from day one.
For specific guidance on your architecture, reach out to our team to discuss your requirements. We work with founders, operators, and engineering leaders across seed-stage startups through mid-market enterprises to ship secure, compliant data platforms.
Summary
Apache Superset + Iceberg is a powerful combination for modern analytics. Security requires discipline across three layers: authentication (who you are), authorisation (what you can access), and audit (proving what happened).
Choose your access control pattern based on your security requirements and tolerance for latency. Implement audit logging from day one. Review access regularly. Rotate credentials. Expire old snapshots. Stay current with security patches.
The operational habits matter as much as the configuration. A perfectly configured system with no monitoring will fail. A moderately configured system with weekly reviews and monthly audits will survive.
Start with Pattern 1 (Trino with RLS) if you’re uncertain. It’s the most battle-tested approach and scales to enterprise deployments. As your organisation matures and compliance requirements tighten, you can layer on encryption and masking.
Secure your analytics platform now. Your auditors—and your customers—will thank you later.