A hybrid Data Platform with Azure Data Factory & dbt - Part 6

Series Overview

This is Part 6 of our series on building a hybrid data platform. If you’re joining mid-series, here are the previous articles:

Part 1: From Something-with-Data to Data-as-a-Product - Medallion architecture and business transformation
Part 2: Infrastructure as Code Foundation with Terraform - IaC patterns and module design
Part 3: Domain-Driven Design for Data Engineering - Source system separation and Conway’s Law
Part 4: Hybrid Connectivity Architecture - Integration runtimes and Azure Relay Bridge
Part 5: Extract and Load Pipeline Evolution - Four-pipeline pattern and deletion detection
Part 6: Data Transformation Architecture - Dual-track approach with dbt and analyst SQL
Part 7: CI/CD as Organizational Strategy - Selective deployment and complexity placement
Part 8: DATEV Integration Patterns - Hardcoding Clients and Embracing Failure
Part 9: Integrating Product Telemetry - Integrating Open Telemetry Into Unified Analytics
Part 10: RevOps Funnel Analytics - Building Bowtie GTM Metrics

Introduction

In my first article, I’ve walked you through our hybrid data platform architecture, from infrastructure foundations to extract-and-load patterns. The medallion architecture provides a clear conceptual framework: bronze for raw data, silver for cleansed transformations, and gold for business-ready analytics. But the medallion pattern doesn’t tell you how to implement those transformations or who should write them.

Today I want to share how we implemented our transformation layer and why we ended up with a dual-track approach that serves both engineering velocity and data analyst expertise. This isn’t a story about choosing the “right” tool, but about recognizing that different types of data transformations can be developed differently and require different development workflows.

In my past experiences I’ve learned that ‘one size fits all’’ often creates more problems than it solves. Our production architecture separates operational transformations (handled by dbt) from financial analysis (handled by analyst-authored SQL), and this separation has proven crucial for both technical performance and development agility.

The Challenge: Different Users, Different Requirements

When we designed our transformation layer, we initially followed the common wisdom: standardize on dbt for all transformations. It seemed elegant—one tool, one workflow, consistent patterns across all our data processing. But production reality quickly taught us that different types of transformations have fundamentally different requirements.

Our operational transformations follow predictable patterns: clean data types, standardize business entities, filter active records, and create normalized views for customer portals and licensing systems. These transformations benefit from dbt’s engineering-friendly features: version control, testing, documentation, and modular design.

Our financial analysis transformations tell a different story. MRR forecasting requires contract-specific business rules, hardcoded exceptions for data quality issues, and sophisticated calculations that evolve rapidly as business requirements change. Our data analyst needs to prototype calculations quickly, test edge cases in production, and implement complex financial logic that often defies standardization.

Forcing sophisticated financial calculations through dbt’s abstraction layer was counterproductive. Translation overhead slowed development, introduced errors, and created friction between the analyst’s domain expertise and the engineering tool’s constraints.

The Dual-Track Architecture

Our solution separates transformations along those boundaries and business domains:

Operational Track: dbt models for engineering-managed transformations

Bronze to silver staging models with consistent business logic
Operational gold layer outputs to dbt_out schema
Version controlled, tested, and deployed through CI/CD pipelines

Financial Track: Analyst-authored SQL for financial controlling

Analyst creates views directly in contoso schema working from gold layer data
Rapid development cycles through direct production database access
Finalized SQL extracted and committed to version control
Engineering embeds SQL scripts in ADF pipeline definitions
Views deployed and materialized to tables through CI/CD pipelines

This separation aligns tools with users: engineers work in dbt with familiar software development patterns, while the analyst works in SQL with direct database access for rapid iteration.

The dbt Track: Engineering-Driven Transformations

Our dbt implementation follows standard staging model patterns but with business-specific logic embedded through variables. Here’s how we handle complex business entity creation:

{{ config(
    materialized="table",
    schema = "staging"
) }}

WITH source_active_contracts AS (
    SELECT * FROM {{ source('dbt_view','active_contracts') }}
),

source_active_contractitem AS (
    SELECT * FROM {{ source('dbt_view','active_contractitems') }}
),

account_active_product AS (
    SELECT
        c.c_customerid AS accountid,
        a.ac_name AS company,
        p.p_group AS pgroup,
        min(c.c_contractstartdate) AS startdate,
        max(records) AS records,

        (CASE
            WHEN p.p_group IN ({{ var('maintenance_productgroups') }}) THEN 1
            WHEN p.p_group IN ({{ var('contoso_productgroups') }}) 
                AND ci.ci_productcode IN ({{ var('contoso_maintenance_productcodes') }}) THEN 1
            ELSE 0
        END) AS ismaintenance,
        
        (CASE
            WHEN p.p_group IN ({{ var('subscription_productgroups') }}) THEN 1
            WHEN p.p_group IN ({{ var('contoso_productgroups') }}) 
                AND ci.ci_productcode IN ({{ var('contoso_subscription_productcodes') }}) THEN 1
            ELSE 0
        END) AS issubscription

    FROM source_active_contracts AS c
    INNER JOIN source_customer AS a ON c.c_customerid = a.ac_sfid
    LEFT JOIN source_active_contractitem AS ci ON c.c_sfid = ci.ci_contractid
    LEFT JOIN source_product AS p ON ci.ci_productcode = p.p_code

    WHERE pa.ac_businessentity = 'ContosoSoftwareentwicklungGmbH'
        AND p.p_group NOT IN ({{ var('consulting_productgroups') }})

    GROUP BY c.c_customerid, a.ac_name, p.p_group, /* ... */
)

SELECT * FROM account_active_product

The key patterns in our dbt models:

Business Logic Variables: Product groupings and classifications are centralized in dbt_project.yml:

vars:
  maintenance_productgroups: " 'contoso OnPrem' "
  subscription_productgroups: " ''contoso Rent','contoso SaaS' "
  consulting_productgroups: " 'contoso ISA' "

Operational Outputs: dbt models target operational systems like customer portals and licensing:

{{ config(
    materialized="table",
    incremental_strategy='merge',
    schema="out",
    unique_key = 'Id'
) }}

-- Customer portal access permissions
WITH csp_access_account AS (
    SELECT
        a.ac_name AS company,
        k.csp_accessing_contact AS contactid,
        c.co_email AS email,
        k.csp_productgroup AS product,
        CONCAT_WS(',',
            CASE WHEN k.csp_access_permission = '1' THEN 'Access' END,
            CASE WHEN k.csp_download_permission = '1' THEN 'Download' END
        ) AS permission
    FROM source_csp_access AS k
    LEFT JOIN source_contact AS c ON k.csp_accessing_contact = c.co_sfid
    LEFT JOIN source_account AS a ON c.co_accountid = a.ac_sfid
)

SELECT * FROM csp_access_account

Model Selection: Container execution selects domain-specific models:

# In dbt container entrypoint
dbt run --select $ENV_DBT_MODEL  # e.g., "sfdc-contoso_dwh"

The Analyst SQL Track: Financial Expertise

Financial transformations follow a completely different pattern. Our data analyst works directly in the production database, creating sophisticated views that capture complex business logic:

CREATE OR ALTER VIEW [contoso].[v_SFO_MRR_Forecast] AS 
SELECT
    C_SFID, C_AccountID, C_CustomerID, SM_SubscriptionID,
    DD_Date, C_Number, SM_CurrencyIsoCode, SM_MRR,
    
    -- Business-specific MRR type classification
    CASE WHEN SM_MRR=0 THEN 
        CASE WHEN ISNULL(c.C_RenewalPeriod,0)=0 
            THEN 'expected (No-AutoRenewal)' 
            ELSE 'expected (AutoRenewal)' 
        END 
    ELSE 
        CASE WHEN dd.DD_Date < ISNULL(C_Next_Contract_Term_End_Date, '1900-01-01') 
            THEN 'Contracted' 
            ELSE 'expected (AutoRenewal)' 
        END 
    END MRR_Type,
    
    -- Contract-specific hardcoded exceptions for data quality
    CASE WHEN C_Number IN ('C-2459','C-2460') AND P_Code IN ('CT_M_E') 
        THEN 1080.75  -- Fixed MRR for specific contracts with unfixable data issues
    ELSE 
        CASE WHEN a.AC_SFID ='0017Q000003ZXYpQAO' 
            THEN _MRR.MRR / (100-contoso.fn_SFO_Discount(SM_SubscriptionID, SM_ProductID, DD_Date, 
                             CASE WHEN SM_MRR<>0 THEN 0 ELSE 1 END))*100 
            ELSE _MRR.MRR 
        END
    END MRR_final

FROM contoso.SFO_ContractContoso c
INNER JOIN contoso.SFO_Subscription s ON c.C_SFID = s.S_Contract
INNER JOIN dbo.TBL_Dimension_Date dd ON /* date range logic */
-- Complex joins and business logic continue...

This view demonstrates several patterns that would be difficult to implement in dbt:

Contract-Specific Exceptions: Hardcoded fixes for specific contract numbers where upstream data quality issues can’t be resolved.

Custom Function Integration: References to business-specific functions like contoso.fn_SFO_Discount() that implement proprietary calculation logic.

Financial Domain Logic: MRR type classification and revenue recognition rules that require deep business knowledge.

Pipeline Orchestration and Consistency

The dual-track approach requires careful orchestration to maintain data consistency. Our pipeline structure separates initialization, view management, and data loading:

Init Pipeline: Creates clean state

Drop and recreate table schemas
Create custom functions via embedded SQL script actions
Trigger views pipeline to establish view definitions
Run daily pipeline for initial data load

Views Pipeline: Dependency-managed view creation

{
    "name": "v_SFO_MRR_Forecast",
    "type": "Script",
    "dependsOn": [
        {
            "activity": "v_SFO_SubscriptionMetric_DateRange",
            "dependencyConditions": ["Succeeded"]
        }
    ],
    "typeProperties": {
        "scripts": [
            {
                "type": "Query",
                "text": "CREATE OR ALTER VIEW [contoso].[v_SFO_MRR_Forecast] AS..."
            }
        ]
    }
}

Transactional Consistency: Financial view materialization happens atomically:

BEGIN TRANSACTION
    -- Materialize all financial views to tables
    DROP TABLE IF EXISTS contoso.SFO_MRR_Forecast_Table
    SELECT * INTO contoso.SFO_MRR_Forecast_Table FROM contoso.v_SFO_MRR_Forecast
    
    DROP TABLE IF EXISTS contoso.SFO_Snowball_Table  
    SELECT * INTO contoso.SFO_Snowball_Table FROM contoso.v_SFO_Snowball
    
    -- Additional materializations...
    
    IF @@ERROR = 0 COMMIT ELSE ROLLBACK

This approach ensures that financial data is either completely current and consistent, or remains at the previous consistent state. We prefer clear error indication over partial updates that could result in financial reporting inconsistencies.

Development Workflows: Hybrid Approaches for Different Users

The dual-track approach enables workflows optimized for different types of users:

Engineering Workflow (dbt track):

Develop models in local dbt environment
Test with dbt test and validate with sample data
Commit to version control
Deploy through CI/CD pipeline to environments

Analyst Workflow (SQL track):

Prototype calculations directly in production database
Test edge cases with real data immediately
Communicate changes through direct discussion
Extract finalized SQL to baseline directory for git diff review
Embed SQL as text in ADF pipeline JSON definitions
Deploy through CI/CD pipeline alongside infrastructure changes

Infrastructure as Code for Analyst SQL: While the analyst develops directly in production for rapid iteration, the final deployment still follows Infrastructure as Code principles. When the analyst finalizes changes, we extract the SQL scripts to a baseline/ directory:

baseline/
├── functions/
│   └── fn_SFO_Discount.sql
└── views/
    ├── v_SFO_MRR_Forecast.sql
    ├── v_SFO_Snowball.sql
    └── v_SFO_Switch_MtS.sql

These scripts are then embedded as text within ADF pipeline JSON definitions and deployed through the same CI/CD pipelines as our dbt models:

{
    "name": "v_SFO_MRR_Forecast", 
    "type": "Script",
    "typeProperties": {
        "scripts": [
            {
                "type": "Query",
                "text": "CREATE OR ALTER VIEW [contoso].[v_SFO_MRR_Forecast] AS\nSELECT C_SFID, C_AccountID..."
            }
        ]
    }
}

This approach tries to provide the benefits of both worlds: the analyst gets rapid iteration cycles for development, while the organization gets version control, git diff review, and repeatable deployments through Infrastructure as Code patterns.

Error Handling Philosophy: Fail Fast and Clear

Throughout our architecture, we prioritize clear error indication over automated recovery. This design principle shapes several implementation decisions:

Pipeline Failure Strategy: If any transformation fails, the entire pipeline stops rather than attempting to continue with partial data. This ensures that downstream consumers either get complete, consistent data or a clear indication that something needs attention.

Transactional Boundaries: Financial view materialization happens within explicit transactions. Either all financial tables are updated to the new state, or all remain at the previous consistent state.

Dependency Management: Views are created in strict dependency order through ADF activity dependencies. If a foundational view fails to create, dependent views don’t execute, preventing cascade failures.

This approach requires more manual intervention when issues occur, but it provides confidence that data consumers are never working with partially updated or inconsistent information while not having to invest into complex error recovery for errors we have not had (yet). Premature optimization is at the root of a lot of technical debt.

Lessons Learned: Matching Tools to Users

Implementing this dual-track transformation architecture taught us several important lessons about data platform design:

Different Users Need Different Tools: Engineers benefit from dbt’s software development patterns—version control, testing, modular design. Financial analysts need direct SQL access for rapid iteration and complex business logic implementation. But both tracks ultimately deploy through Infrastructure as Code patterns, ensuring governance and repeatability regardless of the development workflow.

Hybrid Development Models Work: You don’t have to choose between analyst productivity and engineering governance. Our approach lets analysts work in their preferred environment (direct SQL in production) while still maintaining version control, code review, and automated deployment through CI/CD pipelines.

Business Domain Boundaries Matter: The separation between operational and financial transformations reflects real organizational boundaries. Operational data feeds systems that need standard, predictable formats. Financial data supports analysis that requires flexibility and domain expertise.

Consistency Is Paramount for Financial Data: In financial reporting, being slightly behind but completely consistent is preferable to being current but potentially inconsistent. Our transactional approach to view materialization reflects this priority.

Tool Abstraction Has Costs: While dbt provides excellent abstractions for software engineering workflows, those same abstractions can become barriers when implementing highly specialized business logic. Recognizing when to embrace SQL directly rather than forcing everything through an abstraction layer is crucial.

Error Handling Philosophy Should Be Explicit: Choosing between automated recovery and fail-fast error handling is an architectural decision that affects the entire platform. We chose clarity over automation, and this decision influences everything from pipeline design to monitoring strategies.

Performance and Operational Considerations

The dual-track approach creates some operational complexity that’s worth acknowledging:

Schema Coordination: Managing separate schemas for raw data (raw), dbt staging, operational gold (dbt_out), and financial gold (contoso) requires some coordination to avoid naming conflicts and ensure proper permissions.

Development Environment Strategy: While dbt models can be developed and tested locally, financial views require production data for meaningful testing. We mitigate risks through communication and the analyst’s deep domain knowledge.

Monitoring and Observability: We monitor both for pipeline success and resulting data freshness. All materialized tables from analytical views contain a timestamp when they where generated.

Performance Optimization: Complex financial views benefit from the analyst’s ability to optimize SQL directly.

Evolution and Future Considerations

This transformation architecture serves our current needs well, but we’ve designed it to evolve as organizational and technical requirements change.

Scaling the Analyst Track: As financial analysis becomes more complex, we may introduce intermediate staging tables to improve performance while maintaining the analyst’s development flexibility.

dbt Expansion: As operational requirements grow, we might be adding more sophisticated dbt patterns like snapshots for slowly changing dimensions and macros for common business logic.

Organizational Growth: If we add more data analysts, we may need to introduce more formal coordination mechanisms while preserving the rapid iteration benefits of direct SQL development.

Tool Evolution: Both dbt and Azure Data Factory continue to evolve. We’re prepared to adapt our patterns as new capabilities become available, but the fundamental principle of matching tools to users will remain constant.

Conclusion and Looking Forward

Data transformation architecture is ultimately about enabling the people who create business value from data. Our dual-track approach recognizes that different types of transformations serve different users and optimize for different outcomes.

The medallion architecture provides the conceptual framework, but the implementation details—how transformations are authored, deployed, and maintained—determine whether the platform serves its users or constrains them.

By separating engineering-driven operational transformations from analyst-driven financial transformations—while maintaining Infrastructure as Code principles for both—we’ve created a platform that enables rapid business iteration without sacrificing governance and repeatability.

In my next article, I’ll explore how we orchestrate deployments across this complex architecture. I’ll share our CI/CD patterns, repository organization strategies, and the coordination mechanisms that keep everything working together while maintaining the autonomy that makes each track effective.

The goal isn’t to create the perfect abstraction that works for everyone, but to create the right abstractions for each type of user while maintaining overall system coherence and reliability.

Holger Reinhardt

I love building products.