PADISO.ai: AI Agent Orchestration Platform - Launching April 2026
Back to Blog
Guide 5 mins

Data Platform Consolidation Across a PE Portfolio

Merge Snowflake, BigQuery, Redshift across PE portfolio. Unified data governance, multi-tenant architecture, compliance, cost control. Complete guide.

Padiso Team ·2026-04-17

Data Platform Consolidation Across a PE Portfolio

Table of Contents

  1. Why Data Platform Consolidation Matters for PE Firms
  2. The Multi-Cloud Data Challenge
  3. Architectural Patterns for Unified Data Platforms
  4. Governance and Security at Scale
  5. Migration Strategy: From Fragmented to Governed
  6. Cost Optimisation Across Consolidated Platforms
  7. Compliance and Audit Readiness
  8. Building a Shared Analytics Layer
  9. Operational Handover and Scaling
  10. Real-World Implementation Roadmap

Why Data Platform Consolidation Matters for PE Firms

Private equity firms acquire companies at speed. Within months of closing, portfolio companies operate with fragmented data infrastructure—one subsidiary runs Snowflake in us-east-1, another manages BigQuery datasets scattered across GCP projects, and a third maintains Redshift clusters in AWS without centralised governance. The result: duplicated spend, siloed analytics, compliance risk, and zero visibility into cross-portfolio trends.

Data platform consolidation is not a technology project. It is a value-creation play. Bain’s 2024 report on private equity operating groups shows that PE firms driving the highest portfolio returns focus on operational improvements and shared services. Unified data platforms sit at the core of that strategy.

For a PE-owned roll-up with 8–12 add-on acquisitions, consolidating Snowflake, BigQuery, and Redshift into a single governed platform typically yields:

  • 30–45% reduction in data infrastructure costs through eliminated redundancy and optimised compute allocation
  • 6–8 week faster time-to-insight for cross-portfolio analytics (instead of manual data harmonisation across silos)
  • Single source of truth for revenue, customer, and operational metrics across all portfolio companies
  • Audit-ready compliance posture via centralised access controls and audit logging (critical for SOC 2 and ISO 27001 readiness)
  • Ability to identify and execute bolt-on acquisition synergies before deal close

The challenge is technical, organisational, and operational. Consolidating data platforms requires deep expertise in cloud architecture, data governance, and the specific quirks of each warehouse technology. That is why PE firms increasingly partner with specialists—whether internal engineering teams, venture studios like PADISO, or dedicated consolidation practices.


The Multi-Cloud Data Challenge

Why Portfolio Companies End Up in Different Clouds

When you acquire a company, you inherit its tech stack. That company chose Snowflake because the founder had Snowflake experience. Another chose BigQuery because it was already on GCP for compute. A third was sold Redshift by an AWS sales team. None of these decisions were made with portfolio consolidation in mind.

Within the first 90 days post-close, your data team faces a dilemma: force immediate migration (risky, expensive, disrupts operations) or accept fragmentation (costly, reduces synergies). Most PE firms choose a hybrid: stabilise each platform, then plan a phased consolidation.

The Hidden Costs of Fragmentation

Fragmented data infrastructure creates cascading costs:

Compute sprawl. Each data warehouse runs independently. Snowflake clusters in one company run 24/7 even during off-hours. BigQuery projects accumulate unused datasets. Redshift clusters run at 20% utilisation. Total annual spend across the portfolio: $2.8M. Consolidated: $1.6M.

Duplicate data pipelines. Three companies ingest customer data from Salesforce. Each builds its own Salesforce-to-warehouse connector. Three separate ETL jobs, three separate maintenance burdens, three separate points of failure. A unified ingest layer reduces that to one.

Compliance and audit burden. SOC 2 and ISO 27001 audits require consistent access controls, audit logging, and data lineage across all systems. With three warehouses, you manage three separate audit trails, three separate identity providers, three separate encryption strategies. PADISO’s Security Audit service helps portfolio companies achieve compliance, but fragmented platforms multiply the audit scope and cost.

Slow cross-portfolio analytics. Want to compare customer churn across all portfolio companies? You extract data from Snowflake, BigQuery, and Redshift separately, then join them in Python or SQL. That takes weeks and introduces data quality risk. A unified platform answers that question in minutes.

Multi-Cloud vs. Single-Cloud Consolidation

You have two strategic choices:

Option A: Single-cloud consolidation. Migrate all workloads to Snowflake (on AWS, Azure, or GCP), BigQuery, or Redshift. Simplest operationally, but requires the largest migration lift and may lock you into one vendor’s roadmap.

Option B: Federated data mesh with unified governance. Keep Snowflake, BigQuery, and Redshift where they are, but add a governance and orchestration layer on top. Companies like Collibra, Alation, or Atlan sit above all three and enforce consistent metadata, lineage, and access controls. More complex, but preserves existing investments and allows gradual migration.

For most PE portfolios, Option A (single-cloud consolidation) is the right choice because:

  • Migration happens once, then operations scale linearly
  • You avoid the complexity of managing three separate warehouse technologies
  • Consolidation creates a defensible operational advantage
  • Total cost of ownership (TCO) is lower over 3–5 years

The question is not whether to consolidate, but how quickly and in what sequence.


Architectural Patterns for Unified Data Platforms

The Hub-and-Spoke Model

The hub-and-spoke architecture is the most common pattern for PE portfolio consolidation. Each portfolio company maintains a local data layer (the spoke) but feeds data into a central data platform (the hub). The hub enforces governance, serves cross-portfolio analytics, and acts as the single source of truth.

Architecture:

Portfolio Company A (Snowflake) → Data Lake (S3/GCS)

Portfolio Company B (BigQuery) → Central Data Warehouse (Snowflake)

Portfolio Company C (Redshift) → Governance Layer (Collibra/Atlan)

                              Cross-Portfolio Analytics
                              (BI Tools, ML Models)

Each company’s data flows into a cloud data lake (S3 on AWS, GCS on GCP, ADLS on Azure). From there, a central Snowflake instance (or BigQuery project) ingests, transforms, and serves data. A governance layer on top tracks lineage, enforces access controls, and maintains a data catalog.

Advantages:

  • Each company maintains operational independence
  • Central team controls compliance and cost
  • Easy to add new portfolio companies
  • Supports phased migration without disrupting operations

Disadvantages:

  • Requires robust ETL orchestration (Airflow, dbt, or Fivetran)
  • Governance layer adds tooling cost ($50–150K annually)
  • Data duplication across spokes and hub increases storage cost

The Unified Warehouse Model

In this model, all data from all portfolio companies flows directly into a single warehouse (Snowflake, BigQuery, or Redshift). Multi-tenancy is enforced via schema separation and row-level security (RLS).

Architecture:

Portfolio Company A → ETL Pipeline → Unified Snowflake
Portfolio Company B → ETL Pipeline → (Multi-tenant schemas)
Portfolio Company C → ETL Pipeline → Row-level security (RLS)

Each company’s data lives in a separate schema. Access controls and RLS policies prevent Company A from seeing Company B’s data. A central dbt project handles all transformations.

Advantages:

  • Simplest operational model
  • Lowest infrastructure cost
  • Single source of truth for all analytics
  • Easiest to enforce compliance (one audit trail, one set of controls)

Disadvantages:

  • Requires bulletproof multi-tenancy implementation (RLS bugs can leak data)
  • All portfolio companies depend on one warehouse (availability risk)
  • Less flexibility for company-specific customisations
  • Harder to support legacy tools that assume single-company datasets

The Lakehouse Model

A lakehouse combines the flexibility of a data lake (cheap storage, any data format) with the query performance of a warehouse (structured, indexed, governed). Tools like Apache Iceberg, Delta Lake, or Snowflake’s Iceberg integration enable this.

Architecture:

Raw Data (S3/GCS) → Apache Iceberg / Delta Lake → Unified Query Layer
                      (Open format, ACID transactions)

                      SQL + Analytics
                      (Snowflake, Spark, Presto)

Raw data lands in a lake (S3 or GCS) in open formats (Parquet, ORC). Iceberg or Delta Layer adds transactional guarantees and versioning. A unified query layer (Snowflake on Iceberg, or Spark SQL) serves analytics across all portfolio data.

Advantages:

  • Vendor-agnostic (no lock-in to Snowflake or BigQuery)
  • Lowest storage cost
  • Supports both batch and streaming analytics
  • Easy to integrate with ML platforms (MLflow, Databricks)

Disadvantages:

  • Requires more sophisticated engineering (Spark expertise, cost optimisation)
  • Query performance can lag behind native warehouses
  • Governance is more complex (open-source tooling is less mature)

For 8–12 portfolio companies with mixed cloud platforms, the unified warehouse model is optimal. Here is why:

  1. Cost is predictable. One warehouse, one bill, no hidden data duplication costs.
  2. Compliance is simpler. One audit trail, one set of controls, one SOC 2 / ISO 27001 scope.
  3. Operations scale. Adding a new portfolio company is a matter of creating a new schema and configuring ETL.
  4. Time-to-value is fastest. Cross-portfolio analytics work on day one.

Choose Snowflake as the unified warehouse if you need flexibility and multi-cloud support. Choose BigQuery if you are all-in on GCP. Choose Redshift if you are all-in on AWS and want to minimise data egress costs.


Governance and Security at Scale

Multi-Tenancy via Schema Separation and Row-Level Security

In a unified warehouse, each portfolio company is a “tenant.” Data isolation is enforced at two levels:

Schema separation. Company A’s data lives in company_a.raw, company_a.transformed. Company B’s data lives in company_b.raw, company_b.transformed. A role-based access control (RBAC) policy grants Company A’s analysts access only to their schema.

Row-level security (RLS). For shared tables (e.g., a centralised customer table), RLS policies filter rows based on the user’s company affiliation. A query from Company A automatically returns only Company A’s rows, even if the analyst tries to access the shared table.

Implementation in Snowflake:

-- Create tenant schemas
CREATE SCHEMA company_a.raw;
CREATE SCHEMA company_a.transformed;
CREATE SCHEMA company_b.raw;
CREATE SCHEMA company_b.transformed;

-- Create shared customer table
CREATE TABLE shared.customers (
  customer_id INT,
  company_id INT,
  customer_name STRING,
  revenue DECIMAL
);

-- Apply row-level security policy
CREATE ROW ACCESS POLICY customer_policy
  ON (company_id INT)
  AS (company_id) = CURRENT_USER_ATTRIBUTE('company_id')
  USING (company_id);

ALTER TABLE shared.customers
  ADD ROW ACCESS POLICY customer_policy ON (company_id);

-- Create tenant-specific roles
CREATE ROLE company_a_analyst;
GRANT USAGE ON SCHEMA company_a.raw TO ROLE company_a_analyst;
GRANT USAGE ON SCHEMA company_a.transformed TO ROLE company_a_analyst;
GRANT SELECT ON TABLE shared.customers TO ROLE company_a_analyst;

-- Assign role to user
CREATE USER analyst_a;
GRANT ROLE company_a_analyst TO USER analyst_a;

When analyst_a queries shared.customers, the RLS policy automatically filters to show only rows where company_id = 'company_a'. Even if the analyst writes a query like SELECT * FROM shared.customers, they cannot see Company B’s data.

Access Control Hierarchy

Build a tiered access model:

Tier 1: Portfolio-level access. Central data team and CFO have access to all company data (for consolidated reporting). These users get the broadest permissions but are tracked via audit logging.

Tier 2: Company-level access. Each company’s analysts and engineers access only their own schemas and company-specific rows in shared tables.

Tier 3: Read-only access. BI tool service accounts get read-only access to transformed schemas. They cannot modify data or access raw data.

Tier 4: ETL service accounts. Pipeline orchestration tools (Airflow, Fivetran) get write access only to raw and staging schemas. They cannot modify transformed data (which is managed by dbt).

Audit Logging and Compliance

Centralised audit logging is critical for SOC 2 and ISO 27001 compliance. Every query, every login, every schema change must be logged.

In Snowflake:

  • Query history is available via QUERY_HISTORY() view (90-day retention)
  • Login attempts are logged in Account Usage schema
  • Schema changes are captured in OBJECT_CHANGES view
  • All logs should be exported to a centralised security information and event management (SIEM) tool like Splunk or Datadog

In BigQuery:

  • Query logs are available via Cloud Logging
  • Access logs are captured in Cloud Audit Logs
  • Enable Data Access logs for fine-grained tracking
  • Export logs to Cloud Storage or BigQuery for long-term retention

In Redshift:

  • Query logs are stored in stl_query and related system tables
  • Export logs to CloudWatch or S3 for centralised analysis
  • Use AWS CloudTrail for API-level logging

PADISO’s Security Audit service, powered by Vanta, helps portfolio companies achieve SOC 2 and ISO 27001 compliance by automating audit log collection, access control verification, and remediation tracking across all data platforms.

Encryption and Data Protection

In-transit encryption: All data flowing into and out of the warehouse must be encrypted via TLS 1.2 or higher. Snowflake, BigQuery, and Redshift all enforce this by default.

At-rest encryption: Use customer-managed encryption keys (CMK) for sensitive data. In AWS, use KMS. In GCP, use Cloud KMS. In Azure, use Key Vault.

Field-level encryption: For highly sensitive data (credit card numbers, PII), encrypt at the application level before storing in the warehouse. Use libraries like cryptography (Python) or tink (Google’s encryption library).

Data masking: For non-production environments, apply dynamic data masking to hide sensitive fields from developers and analysts. Snowflake and BigQuery both support this natively.


Migration Strategy: From Fragmented to Governed

Phase 1: Assessment and Planning (Weeks 1–4)

Before moving a single byte of data, understand the current state.

Data inventory:

  • How many tables, datasets, and schemas exist across all three platforms?
  • What is the total data volume? (Snowflake: 50TB, BigQuery: 120TB, Redshift: 80TB)
  • What are the most critical tables for business operations?
  • What is the growth rate? (Is data doubling every 6 months?)

Dependency mapping:

  • Which BI tools connect to which warehouses? (Tableau to Snowflake, Looker to BigQuery, etc.)
  • Which applications or ETL pipelines depend on each warehouse?
  • What is the blast radius of a migration failure?

Cost baseline:

  • Current monthly spend on each platform
  • Compute, storage, and data transfer costs separately
  • Unused resources (idle clusters, abandoned datasets)

Compliance and security review:

  • What access controls currently exist?
  • Are there any compliance requirements specific to each company?
  • What is the current audit posture? (Any SOC 2 or ISO 27001 certifications?)

Output: A migration roadmap, prioritised by business criticality and technical risk.

Phase 2: Target Architecture Design (Weeks 4–8)

Based on the assessment, design the target state.

Warehouse selection: Choose Snowflake, BigQuery, or Redshift as the unified platform. For most PE portfolios, Snowflake is the best choice because it supports all three cloud providers and has the most mature multi-tenancy features.

Schema design: Define the schema structure for all portfolio companies. Use a naming convention:

company_a.raw.salesforce_accounts
company_a.raw.salesforce_opportunities
company_a.transformed.customer_360
company_a.transformed.revenue_metrics
shared.customers (with row-level security)
shared.products (with row-level security)

ETL design: Plan the data ingestion pipelines. For each source system (Salesforce, Stripe, Segment, etc.), design the extraction, transformation, and loading logic. Use dbt for transformations and Fivetran or custom Airflow DAGs for orchestration.

Governance design: Define the data catalog structure, lineage tracking, and access control policies. McKinsey’s analysis on private equity strategies emphasises that unified governance is key to portfolio value creation.

BI and analytics design: Plan how BI tools will connect to the unified warehouse. Will you use a single Tableau or Looker instance for all companies, or will each company have its own BI environment that queries the unified warehouse?

Phase 3: Pilot Migration (Weeks 8–16)

Start with the smallest, lowest-risk portfolio company. Migrate its data, validate accuracy, and iterate.

Step 1: Set up the target warehouse environment.

  • Provision Snowflake (or BigQuery / Redshift) with the target schema structure
  • Configure encryption, audit logging, and access controls
  • Set up a separate non-production environment for testing

Step 2: Migrate the first dataset.

  • Extract all data from the source warehouse (e.g., Snowflake for Company A)
  • Transform it into the target schema
  • Load it into the unified warehouse
  • Validate row counts, data types, and sample data

Step 3: Run parallel validation.

  • For 1–2 weeks, run both the old and new pipelines in parallel
  • Compare query results from the old warehouse and the new warehouse
  • Identify and fix discrepancies

Step 4: Cutover.

  • Switch the BI tool and downstream applications to the new warehouse
  • Monitor for errors and performance issues
  • Keep the old warehouse running for 2 weeks as a rollback safety net

Lessons from the pilot:

  • Are the ETL pipelines fast enough?
  • Are the access controls working as expected?
  • Are there data quality issues?
  • How long does the migration take per company?

Phase 4: Production Migration (Weeks 16–32)

Based on lessons from the pilot, migrate the remaining portfolio companies in parallel batches.

Wave 1 (Weeks 16–20): Migrate Companies B and C in parallel Wave 2 (Weeks 20–24): Migrate Companies D, E, and F in parallel Wave 3 (Weeks 24–28): Migrate Companies G, H, and I in parallel Wave 4 (Weeks 28–32): Migrate Companies J, K, and L in parallel

Running migrations in parallel (rather than sequentially) reduces total project duration from 6 months to 2 months. The trade-off is higher concurrent resource requirements.

Phase 5: Optimisation and Handover (Weeks 32–40)

Once all data is migrated, optimise performance and hand over to the operations team.

Query optimisation:

  • Identify slow queries and add indexes or partitions
  • Cluster tables by frequently-joined keys
  • Archive cold data to cheaper storage

Cost optimisation:

  • Right-size warehouse clusters (compute nodes, concurrency)
  • Enable auto-scaling for variable workloads
  • Implement query result caching
  • Archive historical data to object storage

Operational handover:

  • Document runbooks for common tasks (adding a new company, resetting passwords, investigating query failures)
  • Train the central data team on Snowflake administration
  • Set up monitoring and alerting for data quality and performance
  • Establish an on-call rotation for production issues

Cost Optimisation Across Consolidated Platforms

The Cost Breakdown

Before consolidation, a typical PE portfolio with three warehouses incurs:

  • Snowflake: $15K/month (compute + storage)
  • BigQuery: $12K/month (analysis + storage)
  • Redshift: $18K/month (compute + storage)
  • Governance tooling: $8K/month (Collibra or Alation)
  • ETL tools: $6K/month (Fivetran or Talend)
  • Total: $59K/month ($708K/year)

After consolidation into a single Snowflake instance:

  • Snowflake (unified): $28K/month (compute + storage for all companies)
  • Governance tooling: $4K/month (simplified, fewer connectors needed)
  • ETL tools: $4K/month (consolidated pipelines, fewer connectors)
  • Total: $36K/month ($432K/year)

Savings: $276K/year (39% reduction). This is before accounting for engineering labour savings (no longer maintaining three separate platforms) or improved analytics speed (faster decision-making).

Compute Optimisation

Right-sizing warehouse clusters:

In Snowflake, a “warehouse” is a cluster of compute nodes. Each node costs roughly $4/credit per hour. A typical portfolio company runs a Large warehouse (8 credits/hour) 24/7, costing $2,880/month even if it is idle 80% of the time.

In a unified warehouse, you can run a single Extra-Large warehouse (128 credits/hour) and achieve better utilisation because you are aggregating workloads from all companies. If the unified warehouse is idle 20% instead of 80%, you save 60% on compute.

Auto-scaling and suspension:

Configure the warehouse to auto-suspend after 10 minutes of inactivity and auto-scale up during peak hours. This reduces idle time and cost.

ALTER WAREHOUSE unified_warehouse
AUTO_SUSPEND = 10
AUTO_SCALE_MAX_CLUSTER_COUNT = 3
MIN_CLUSTER_COUNT = 1;

Scheduled compute:

If analytics only run during business hours (9 AM–6 PM), suspend the warehouse outside those hours. Saves 40% of compute cost.

Storage Optimisation

Compression and deduplication:

Snowflake compresses data automatically, but you can improve compression ratios by:

  • Using appropriate data types (VARCHAR instead of STRING where possible)
  • Partitioning large tables by date or company_id
  • Clustering tables by frequently-joined keys

Archival and tiering:

Data older than 12 months is rarely queried. Archive it to S3 (Standard-IA or Glacier) at 1/10th the cost of Snowflake storage. Use Snowflake’s Iceberg integration to query archived data directly from S3 without rehydrating it.

Deduplication:

After consolidation, check for duplicate tables across portfolio companies. If three companies have their own customers table with overlapping data, consolidate into a single shared table with row-level security.

Data Transfer Optimisation

Minimise data egress:

Data transfer out of a cloud region costs $0.02–0.12 per GB. If you are consolidating Snowflake (on AWS us-east-1), BigQuery (on GCP us-central1), and Redshift (on AWS eu-west-1), you will incur egress charges during migration.

Strategy:

  1. Migrate BigQuery data to AWS S3 (incurs egress from GCP: ~$2K for 100TB)
  2. Use S3 as the staging area
  3. Load from S3 into Snowflake (no egress charge, same cloud)

Total data transfer cost: ~$2–5K. Amortised over 3 years, this is negligible.

Consolidate on a single cloud:

For ongoing operations, run all portfolio companies on the same cloud and region. If most companies are on AWS, consolidate on Snowflake-on-AWS. If most are on GCP, consolidate on BigQuery. This eliminates future egress charges.

Governance Tooling Optimisation

A data governance tool (Collibra, Alation, Atlan) costs $50–150K annually. In a fragmented portfolio, you need connectors to Snowflake, BigQuery, and Redshift, multiplying cost.

In a unified warehouse, you need one connector. This cuts governance tooling cost by 40–50%.

Alternatively, use open-source tools (OpenMetadata, Amundsen) to avoid vendor lock-in and reduce cost to $10–20K annually (for hosting and maintenance).


Compliance and Audit Readiness

SOC 2 Type II Compliance

SOC 2 Type II requires demonstrating that you maintain effective controls over data security, availability, processing integrity, confidentiality, and privacy for a minimum of 6 months (ideally 12 months).

For a PE portfolio with a unified data platform, SOC 2 scope includes:

  1. Access controls: Only authorised users can access the warehouse and their assigned data
  2. Audit logging: All queries, logins, and schema changes are logged and retained
  3. Encryption: Data is encrypted in transit (TLS) and at rest (CMK)
  4. Change management: All changes to the warehouse are tracked and approved
  5. Incident response: Documented procedures for responding to data breaches or access violations

Implementation checklist:

  • Enable audit logging in Snowflake (Query History, Login History, Object Changes)
  • Export audit logs to a centralised SIEM (Splunk, Datadog, or Sumo Logic)
  • Implement role-based access control (RBAC) with documented access matrix
  • Configure multi-factor authentication (MFA) for all human users
  • Implement row-level security (RLS) to enforce data isolation by company
  • Document change management procedures (who can deploy schema changes, how are they approved)
  • Conduct a quarterly access review (verify that users still need their current permissions)
  • Document incident response procedures (what happens if a user’s credentials are compromised)
  • Engage a SOC 2 auditor (Big 4 firm or specialist like Vanta) to validate controls

ISO 27001 Certification

ISO 27001 is a broader information security management standard. It requires:

  1. Information security policy: Documented policies for data handling, access control, encryption, incident response
  2. Risk assessment: Identified threats and vulnerabilities in your data platform
  3. Supplier management: Verified that third-party tools (Fivetran, dbt Cloud, BI tools) meet security standards
  4. Incident management: Documented procedures for detecting, responding to, and learning from security incidents
  5. Continuous improvement: Regular audits and updates to security controls

Implementation checklist:

  • Document information security policies (access control, encryption, incident response, etc.)
  • Conduct a risk assessment (identify threats, assess likelihood and impact, prioritise mitigation)
  • Implement technical controls (encryption, MFA, audit logging, RLS)
  • Implement administrative controls (access reviews, change management, incident response)
  • Conduct a supplier audit (verify that Snowflake, Fivetran, dbt meet your security requirements)
  • Document asset inventory (all systems, data, and applications in scope)
  • Implement monitoring and alerting (detect suspicious activity)
  • Conduct annual internal audits
  • Engage an ISO 27001 auditor for certification

Using Vanta for Automated Compliance

PADISO’s Security Audit service, powered by Vanta, automates much of the compliance work. Vanta:

  • Continuously monitors your cloud infrastructure for misconfigurations and access violations
  • Auto-collects audit logs from Snowflake, AWS, GCP, and other platforms
  • Generates compliance reports for SOC 2 and ISO 27001
  • Tracks remediation of identified issues
  • Integrates with your ticketing system (Jira, Linear) to assign and track fixes

Using Vanta, a PE portfolio can achieve SOC 2 Type II certification in 4–6 months (vs. 9–12 months with manual processes) and ISO 27001 certification in 6–9 months.

Cost: $15–30K per year depending on the number of connected systems. ROI is immediate if you are consolidating multiple warehouses (each warehouse would have its own compliance scope without Vanta).


Building a Shared Analytics Layer

Semantic Layer Design

Once data is consolidated, build a semantic layer to abstract complexity from business users. A semantic layer (also called a metrics layer) defines:

  • Entities: Customer, Product, Order, Subscription
  • Metrics: Revenue, Churn Rate, Customer Acquisition Cost (CAC), Lifetime Value (LTV)
  • Dimensions: Company, Region, Product Category, Customer Segment

Tools like Cube, Looker, or dbt Semantic Layer let business users query metrics without writing SQL.

Example:

A CFO wants to compare revenue by company and month. Without a semantic layer, they write:

SELECT
  company_id,
  DATE_TRUNC('month', order_date) AS month,
  SUM(amount) AS revenue
FROM shared.orders
WHERE order_date >= '2024-01-01'
GROUP BY 1, 2
ORDER BY 1, 2;

With a semantic layer, they click a few buttons in Looker or Tableau:

  • Metric: Revenue
  • Dimension: Company
  • Dimension: Month
  • Filter: Date >= 2024-01-01

The BI tool generates the SQL automatically.

Cross-Portfolio Metrics

Define metrics that span all portfolio companies:

Revenue metrics:

  • Total portfolio revenue (sum across all companies)
  • Revenue by company (with row-level security to show each company only its own)
  • Revenue growth rate (month-over-month, year-over-year)
  • Revenue per employee (headcount data from HR systems)

Customer metrics:

  • Total customers (deduplicated across all companies)
  • Customer churn rate
  • Net revenue retention (NRR)
  • Customer acquisition cost (CAC) and payback period

Operational metrics:

  • Data freshness (how recent is the data in the warehouse?)
  • Query performance (average query latency by user group)
  • Data quality score (percentage of rows that pass validation checks)

BI Tool Strategy

You have two options:

Option A: Single BI instance for all companies.

  • One Looker or Tableau instance with dashboards for all companies
  • Central team controls dashboard creation and access
  • Easier to enforce consistency and compliance
  • Less flexibility for company-specific customisations

Option B: Company-specific BI instances querying the unified warehouse.

  • Each company has its own Looker or Tableau instance
  • Company-specific dashboards and BI teams
  • More flexibility but harder to enforce consistency
  • Each instance queries the unified warehouse and sees only its own data (via row-level security)

For PE portfolios, Option A (single BI instance) is recommended because it simplifies governance and reduces cost. Each company can have its own dashboards and access controls without running separate BI instances.


Operational Handover and Scaling

Building the Central Data Team

Once the unified platform is live, you need a central data team to operate it. The team should include:

Data platform engineer (1 FTE):

  • Manages Snowflake (or BigQuery / Redshift) infrastructure
  • Handles cluster sizing, cost optimisation, and performance tuning
  • Owns disaster recovery and backup procedures
  • Monitors system health and responds to incidents

Data engineer (1–2 FTE):

  • Designs and implements ETL pipelines
  • Owns data quality and validation
  • Onboards new data sources and portfolio companies
  • Maintains dbt models and transformations

Data analyst (1 FTE):

  • Builds dashboards and reports for executives
  • Supports self-service analytics for business users
  • Identifies data quality issues and works with engineers to fix them
  • Conducts ad-hoc analysis for strategic decisions

Data governance lead (0.5 FTE):

  • Maintains the data catalog and lineage
  • Manages access control policies and access reviews
  • Owns compliance and audit readiness
  • Trains business users on data governance best practices

Total cost: $400–500K annually for a Sydney-based team. This is offset by the $276K annual savings from consolidation, plus additional savings from faster decision-making and reduced analytics headcount across portfolio companies.

Runbooks and Documentation

Document every operational procedure:

  1. Onboarding a new portfolio company:
  • Create schemas and roles - Configure ETL pipelines - Set up access controls and RLS policies - Validate data accuracy - Estimated time: 2 weeks
  1. Responding to a data quality issue:
  • Identify the affected tables and rows - Determine root cause (ETL bug, source system issue, etc.) - Implement a fix - Reprocess affected data - Validate and communicate to stakeholders - Estimated time: 4–8 hours
  1. Adding a new data source:
  • Assess data volume and complexity - Design the extraction and transformation logic - Implement the ETL pipeline - Validate data accuracy - Deploy to production - Estimated time: 1–3 weeks
  1. Investigating a slow query:
  • Check query execution plan - Identify missing indexes or partitions - Implement optimisation (index, partition, or query rewrite) - Validate performance improvement - Estimated time: 2–4 hours

Monitoring and Alerting

Set up proactive monitoring to catch issues before they impact business users:

Data freshness alerts:

  • Alert if a daily pipeline has not completed by 9 AM
  • Alert if data is more than 24 hours stale

Data quality alerts:

  • Alert if row counts change by >10% from previous day
  • Alert if null values appear in critical columns
  • Alert if duplicate rows are detected

Performance alerts:

  • Alert if query latency exceeds 30 seconds
  • Alert if warehouse credit usage exceeds budget

Security alerts:

  • Alert if an unusual number of queries are executed from a single user
  • Alert if data access patterns change significantly
  • Alert if schema changes occur outside of change windows

Tools like Datadog, New Relic, or Splunk integrate with Snowflake and provide out-of-the-box monitoring.

Scaling as the Portfolio Grows

As you acquire more companies, the data platform must scale:

At 5 companies: Single Snowflake warehouse, 1 data engineer, 1 analyst At 10 companies: Single Snowflake warehouse, 2 data engineers, 2 analysts, 1 governance lead At 20 companies: Distributed architecture (regional Snowflake instances feeding a central data lake), 4 data engineers, 3 analysts, 1 governance lead, 1 platform engineer

For portfolios exceeding 15 companies, consider a distributed architecture where each region or business unit has its own Snowflake instance, all feeding a central data lake. This reduces latency (queries run faster because data is local) and improves cost (you can right-size regional instances).


Real-World Implementation Roadmap

8-Week Fast-Track Consolidation (High-Risk, High-Reward)

For PE firms that need to consolidate quickly (e.g., to meet a 100-day plan for a bolt-on acquisition), here is an aggressive 8-week timeline:

Week 1–2: Assessment

  • Inventory all data across Snowflake, BigQuery, and Redshift
  • Map dependencies (BI tools, ETL pipelines, applications)
  • Identify critical tables and validate requirements

Week 3–4: Architecture and Design

  • Choose Snowflake as the unified platform
  • Design schema structure and access controls
  • Plan ETL pipelines using Fivetran or Airflow

Week 5: Pilot Migration

  • Migrate Company A (smallest portfolio company)
  • Validate data accuracy and performance
  • Iterate on ETL logic and access controls

Week 6–7: Production Migration

  • Migrate Companies B–D in parallel
  • Switch BI tools and applications to the unified warehouse
  • Monitor for performance and data quality issues

Week 8: Optimisation and Handover

  • Optimise warehouse performance (indexes, clustering, query caching)
  • Document runbooks and operational procedures
  • Train the central data team

Risks:

  • Insufficient testing increases data quality risk
  • Aggressive timeline puts pressure on engineering team
  • May miss compliance requirements (SOC 2, ISO 27001)

Mitigations:

  • Hire a specialist firm (like PADISO) to de-risk the project
  • Automate testing to reduce manual validation effort
  • Run parallel validation for 2 weeks to catch issues early

16-Week Standard Consolidation (Moderate Risk, Lower Cost)

For PE firms with more flexibility, here is a 16-week timeline:

Weeks 1–4: Assessment and Planning

  • Comprehensive data inventory and dependency mapping
  • Cost baseline and ROI analysis
  • Compliance and security review

Weeks 5–8: Architecture Design and Pilot Setup

  • Design target architecture and schema
  • Provision Snowflake environment
  • Implement access controls and audit logging
  • Start pilot migration for Company A

Weeks 9–12: Pilot Validation and Wave 1 Migration

  • Complete pilot migration and parallel validation
  • Cutover Company A to unified warehouse
  • Migrate Companies B and C in parallel

Weeks 13–16: Wave 2 Migration and Optimisation

  • Migrate Companies D–F in parallel
  • Optimise warehouse performance
  • Implement governance tooling (data catalog, lineage)
  • Train the central data team

Risks:

  • Longer timeline may lose momentum
  • Business priorities may shift mid-project

Mitigations:

  • Establish a steering committee to maintain alignment
  • Define clear success metrics and track progress weekly
  • Build in buffer time for unexpected issues

24-Week Enterprise Consolidation (Low Risk, High Governance)

For large PE portfolios (15+ companies) or highly regulated industries, here is a 24-week timeline with strong governance:

Weeks 1–6: Assessment, Planning, and Governance Setup

  • Comprehensive assessment and dependency mapping
  • Design information security policies and compliance framework
  • Establish data governance council (representatives from each company)
  • Plan SOC 2 and ISO 27001 audit strategy

Weeks 7–12: Architecture Design and Pilot

  • Design federated architecture (regional Snowflake instances + central data lake)
  • Implement governance tooling (Collibra or Alation)
  • Pilot migration for Company A with full compliance validation

Weeks 13–18: Wave 1 and 2 Migrations

  • Migrate Companies B–F with parallel validation
  • Implement SOC 2 controls (audit logging, access reviews, change management)
  • Conduct compliance readiness assessment

Weeks 19–24: Wave 3 Migration and Compliance Certification

  • Migrate Companies G–L
  • Conduct SOC 2 Type II audit (12-month observation period)
  • Achieve ISO 27001 certification
  • Optimise and handover

Risks:

  • Lengthy timeline increases cost and reduces time-to-value
  • Compliance requirements may delay migration

Mitigations:

  • Run migrations and compliance work in parallel (don’t wait for audit to start migrations)
  • Use Vanta to automate compliance tracking
  • Engage external auditors early to align on requirements

Conclusion: From Fragmentation to Unified Value Creation

Data platform consolidation is not a technology project—it is a value-creation strategy for PE portfolios. By merging Snowflake, BigQuery, and Redshift into a unified, governed platform, you unlock:

  • 30–45% cost savings through eliminated redundancy and optimised compute
  • 6–8 week faster time-to-insight for cross-portfolio analytics
  • Single source of truth for all portfolio metrics
  • Audit-ready compliance via centralised controls and logging
  • Ability to identify and execute synergies faster

The path forward is clear: assess your current state, design a multi-tenant architecture, execute a phased migration, and build a central data team to operate the platform long-term.

For PE firms looking to accelerate this journey, PADISO’s Platform Design & Engineering service provides fractional CTO leadership, architecture design, and hands-on implementation support. We have helped PE-backed companies consolidate data platforms, achieve compliance, and unlock $5M+ in annual value. Explore our case studies to see how we have delivered results for similar organisations.

The firms that consolidate fastest will outpace their peers. The question is not whether to consolidate, but how quickly you can move.