A hybrid Data Platform with Azure Data Factory & dbt - Part 7

Series Overview

This is Part 7 of our series on building a hybrid data platform. If you’re joining mid-series, here are the previous articles:

Part 1: From Something-with-Data to Data-as-a-Product - Medallion architecture and business transformation
Part 2: Infrastructure as Code Foundation with Terraform - IaC patterns and module design
Part 3: Domain-Driven Design for Data Engineering - Source system separation and Conway’s Law
Part 4: Hybrid Connectivity Architecture - Integration runtimes and Azure Relay Bridge
Part 5: Extract and Load Pipeline Evolution - Four-pipeline pattern and deletion detection
Part 6: Data Transformation Architecture - Dual-track approach with dbt and analyst SQL
Part 7: CI/CD as Organizational Strategy - Selective deployment and complexity placement
Part 8: DATEV Integration Patterns - Hardcoding Clients and Embracing Failure
Part 9: Integrating Product Telemetry - Integrating Open Telemetry Into Unified Analytics
Part 10: RevOps Funnel Analytics - Building Bowtie GTM Metrics

Introduction

Most CI/CD articles focus on orchestrating complex deployments. This is about making most deployments simple.

In my first article, I’ve walked you through our hybrid data platform architecture built around domain separation and evolutionary design. Today I want to share how this organizational thinking transforms CI/CD from complex orchestration into simple, isolated deployments.

Here’s our operational reality: we have a sophisticated orchestration pipeline that can deploy everything, but in practice we deploy individual pieces based on what actually changed. If only Salesforce logic changes, I run just the Salesforce pipeline. If only dbt models change, I run just the container build. Most deployments are single-stage operations, not complex coordinated releases.

This wasn’t an accident. It’s the result of applying Conway’s Law intentionally to create deployment boundaries that enable organizational evolution.

The Selective Deployment Reality

Our main orchestration pipeline looks comprehensive:

stages:
- stage: az_bootstrap          # Bootstrap infrastructure
- stage: infra_deploy          # Core infrastructure  
- stage: dbt_docker_image      # Build dbt container
- stage: elt_template          # Deploy shared templates
- stage: sfdc_contoso_dwh      # Salesforce pipeline
- stage: product_contoso_dwh   # Product telemetry
- stage: datev_contoso_dwh     # Financial integration
# Additional domain stages...

But here’s what actually happens in a typical week:

Monday: Run sfdc_contoso_dwh stage for Salesforce object updates
Wednesday: Run dbt_docker_image stage for new transformation models
Friday: Run product_contoso_dwh stage for new product data sources

Coordinated deployments are the exception, not the rule. Maybe once a month I’ll run infrastructure updates followed by affected domains, or deploy a major dbt change that requires multiple domain updates. But 98% of deployments are isolated, single-stage operations.

The domain separation delivers on its promise: most changes are isolated and can be deployed independently. I manually track dependencies, but unless dbt models have changes, it’s sufficient to deploy just the affected domain repository—it automatically picks up the current tagged container versions.

Repository Boundaries Enable Deployment Boundaries

This selective deployment pattern works because we organized our 10+ repositories by domain boundaries:

Domain-Specific Pipelines:

ContosoData/sfdc-contoso_dwh      # Salesforce data pipeline
ContosoData/product-contoso_dwh   # Product telemetry pipeline  
ContosoData/datev-contoso_dwh     # Financial system integration

Shared Infrastructure:

SharedData/shared           # Bootstrap and shared modules
SharedData/infra_deploy     # Core infrastructure
SharedData/elt_template     # Reusable pipeline patterns

Container Images:

ContosoData/image_dbt         # dbt execution containers
SharedData/image_azbridge     # Azure Relay Bridge containers

This organization is domain-driven design applied to code structure. The repository boundaries allow multiple developers to work in parallel on different aspects while minimizing coordination needs.

Each repository owns its deployment logic through its own azure-pipeline.yml template, but they all plug into the same orchestration framework. When only Salesforce logic changes, only the sfdc-contoso_dwh repository needs attention, so only that pipeline needs to run.

The Complexity Trade-off

Currently, stage selection is manual from the Azure DevOps UI, and I maintain dependency oversight manually. This might seem primitive, but it’s a deliberate choice about complexity placement.

Complex problems don’t have simple solutions, but we have the choice where to place the complexity of our solution.

We chose the complexity of managing multiple simple repositories and simple pipelines over the complexity of having a single monorepo with sophisticated automation. Adding automation for dependency detection, feature branch support, and automated coordination would create significant overhead in pipeline complexity.

Instead, we use trunk-based deployment, which keeps pipelines simple. No feature branches, no merge automation, no environment coordination complexity. Just simple, plain development with no merge conflicts. This works well as long as the team is aligned and pulls in the same direction.

It’s a small price to pay in the beginning, given that a historically grown monorepo would be hard to separate later on. This is one of those decisions you either make in the beginning or pay a significant toll later when a monorepo is no longer manageable. We designed with the end in mind.

Environment Strategy: Variable Groups Enable Flexibility

Our three-environment setup uses Azure DevOps variable groups for environment-specific configuration, which supports the selective deployment pattern across environments:

euc-play (Development):

variables:
  variableGroup: euc-play 
  serviceConnection: datacontoso-euc

euc-test (Validation):

variables:
  variableGroup: euc-test 
  serviceConnection: datacontoso-euc-test

euc-prod (Production):

variables:
  variableGroup: euc-prod 
  serviceConnection: datacontoso-euc-prod

Each variable group contains environment-specific values for database connections, container tags, feature flags, and resource sizing. The same pipeline templates work across all environments—only the variable values change.

This approach scales well because it maintains environment isolation without duplicating pipeline logic. When I run just the sfdc_contoso_dwh stage for development, it automatically uses the development variable group. The selective deployment pattern works consistently across all environments.

We handle shared component evolution through versioning variables like DBT_CONTAINER_TAG. When I update dbt models, I build a new container image with a new tag, then update the variable groups to reference the new version. Domain pipelines automatically pick up the new version on their next deployment.

When This Approach Works (And When It Doesn’t)

This CI/CD architecture has specific success conditions that are important to understand:

Works Well When:

Team is aligned and pulls in the same direction (trunk-based development requirement)
Change frequency allows manual coordination (not deploying multiple times per day)
Domain boundaries are clear and stable
Team size allows one person to maintain overview of dependencies

Would Need Evolution If:

Team grows beyond aligned, single-direction development
Deployment frequency increases significantly
Domain boundaries become blurred or require frequent coordination
Dependency tracking becomes too complex for manual oversight

The key insight is recognizing when you’re operating within the success conditions versus when you need to evolve the approach. In my experience, simple workflows tend to last much longer than we would have assumed in the beginning.

Our approach works because we’re operating within its success conditions. We have team alignment, reasonable change frequency, and clear domain boundaries. When any of these conditions change, we’ll need to evolve the approach accordingly.

Error Handling: Fail Fast and Clear

Our CI/CD architecture follows the same error handling philosophy as our data pipelines: clear error indication over automated recovery. When failures occur, the domain isolation makes root cause identification straightforward, and we can roll back individual domains without affecting others.

We deliberately avoid automatic retry logic and prefer clear failure indication. This approach requires more manual intervention when issues occur, but it provides confidence that we’re never operating with partially updated or inconsistent deployments.

Operational Lessons: Conway’s Law in Practice

Operating this CI/CD architecture has validated the Conway’s Law principles we designed for:

Domain Autonomy: Teams can modify their pipelines independently without coordination. The Salesforce pipeline team doesn’t need to coordinate with the product telemetry team for deployments.

Parallel Development: Multiple developers can work on different domains simultaneously. Repository boundaries eliminate coordination bottlenecks.

Organizational Evolution: When we’re ready to delegate domain ownership to specialized teams, the repository and deployment boundaries are already established.

Complexity Placement: The approach of multiple simple repositories with manual coordination has worked perfectly for our current scale and team alignment.

The domain-driven repository organization enables deployment separation, which enables team autonomy, which enables the organizational evolution we designed for throughout this series.

The Strategic Insight: Complexity Placement as Architecture

CI/CD architecture is ultimately about enabling organizational capability and evolution. Our approach demonstrates that sophisticated orchestration doesn’t require complex automation—sometimes the right complexity placement delivers better outcomes than premature optimization.

Early separation pays dividends: the upfront cost of multiple repositories and coordination patterns is much lower than the exponential cost of separating a grown monorepo later. Simple patterns scale better than complex automation systems, and they’re easier to understand, debug, and modify.

Most importantly, organizing code and deployment boundaries along anticipated organizational boundaries makes future delegation straightforward. It’s Conway’s Law applied strategically rather than accidentally.

The goal was never to create the perfect CI/CD system from day one, but to establish patterns that work well now and can evolve naturally as organizational and technical needs change.

Or as Gall’s Law reminds us: “A complex system that works is invariably found to have evolved from a simple system that worked. The inverse proposition also appears to be true: A complex system designed from scratch never works and cannot be made to work. You have to start over, beginning with a working simple system.”

Conclusion

We designed a blueprint that can deploy everything, but in practice we deploy individual pieces based on what changed. This selective deployment pattern transforms complex orchestration into simple, isolated operations.

The architecture works because we made an intentional choice about complexity placement: multiple simple repositories with manual coordination instead of single complex automation. This only works when your team is aligned and domain boundaries are clear, but when those conditions exist, it’s far simpler and more maintainable than sophisticated automation.

In my final article, I’ll synthesize the operational lessons learned from running this platform in production and reflect on how the architectural decisions throughout this series have performed under real-world constrains.

Holger Reinhardt

I love building products.