A hybrid Data Platform with Azure Data Factory & dbt - Part 10

Series Overview

This is Part 10 of our series on building a hybrid data platform. If you’re joining mid-series, here are the previous articles:

Introduction

Having built a Data Warehouse is one thing, but putting it to use is another. In spring 2025, I worked with our RevOps lead who whared with me the problem he is having: marketing reported lead volumes one way, sales tracked pipeline another way, and neither could say how effective the entire sales funnel was. Depending on who you talked to, you would get different answers. So he wanted a single unified metrics system that everyone could use.

While so far we have discussed how we build the data warehouse, this article is about putting the data to use. And how much the non-linear nature of B2B SaaS sales would shape the resulting technical implementation. If you are curious, read on.

The Bowtie Model

The bowtie model visualizes the customer journey as two connected funnels. The first funnel narrows—from many leads to fewer opportunities to even fewer closed deals. The second funnel widens—from new customers to expansion revenue to long-term retention. When you draw it out, it looks like a bowtie.

Our focus was the acquisition side: VM1 through VM5.

  • VM1 (Lead): All leads entering the system
  • VM2 (MQL): Marketing Qualified Leads (companies > X employees, excluding existing customers)
  • VM2.1 (SD-Ready): Leads picked up by Sales Development Reps
  • VM3 (SQL): Sales Qualified Leads—converted to opportunities
  • VM4 (SAL): Sales Accepted Leads—past the qualification stage
  • VM5 (Closed Won): Actual wins

Between each stage, we’d measure conversion rates (CR1, CR2, etc.) and velocity metrics (how many days between stages). Standard funnel analytics.

Except B2B sales processes aren’t standard funnels at all.

Why Simple Funnels Fail

The first data quality check revealed something odd: a number of leads (42.7% of SD-Ready leads) had L_Timestamp_SD_Ready dates earlier than their L_Timestamp_Open dates. In a linear funnel, that’s impossible. You can’t be SD-Ready before you’re qualified as an MQL.

My first instinct was to treat this as a data error. But when we dug into the business process, I learned this was actually how sales worked. High-value leads get fast-tracked. An SDR sees a high value company fill out a demo request and immediately marks it SD-Ready, skipping the normal MQL qualification process.

This pattern repeated throughout the data:

  • Leads could enter at any stage, not just at the top
  • Opportunities could be created without any lead at all (partner referrals with OP_RecordTypeName = 'Qualification' and OP_LeadSource = 'Partner')
  • Stages could be jumped or skipped entirely
  • Timestamps didn’t always follow chronological order

The simple funnel view to query the timestamp fields, calculate differences, count conversions could produce technically correct but business-meaningless numbers.

Evolution 1: From Timestamp Fields to Audit Trails

The initial implementation used timestamp fields directly from the SFO_Lead table:

-- Initial approach: Use timestamp fields directly
COALESCE(L_Timestamp_Open, L_Timestamp_Person_Reached, L_Timestamp_Enrichment) AS Timestamp_VM2

This worked for leads that progressed normally, but the fallback logic was arbitrary. Why prioritize L_Timestamp_Open over L_Timestamp_Person_Reached? The business couldn’t give me a clear answer because these fields represented different internal processes, not customer-facing milestones.

Our SalesForce folks suggested looking at SF_LeadHistory. This is Salesforce’s audit trail table that logs every field change. Specifically, when the Company_Type__c field changes from NULL to a value, that’s the moment a lead becomes qualified as an MQL.

-- LeadHistory approach: Track when qualification actually happened
LEFT JOIN (
    SELECT 
        LH_LeadId,
        CAST(MIN(LH_CreatedDate) AS DATE) AS CompanyType_FirstSet_Date
    FROM [contoso_dwh_play].[raw].[SF_LeadHistory]
    WHERE LH_Field = 'Company_Type__c'
    AND (LH_OldValue IS NULL OR LH_OldValue = 'NULL' OR LH_OldValue = '')
    AND LH_NewValue IS NOT NULL AND LH_NewValue != 'NULL' AND LH_NewValue != ''
    GROUP BY LH_LeadId
) CTC ON L.L_SFID = CTC.LH_LeadId

-- Use audit trail timestamp with fallback
COALESCE(CompanyType_FirstSet_Date, CAST(L_Timestamp_Open AS DATE)) AS Timestamp_VM2

This gave us timestamps based on actual business events, not internal system updates. The audit trail became the source of truth.

Evolution 2: DATE Resolution

Early velocity calculations showed leads moving through stages in fractional days: 0.3 days from MQL to SD-Ready, 0.7 days from SD-Ready to SQL. Technically accurate if timestamps were captured at different times of day, but meaningless for business analysis.

But for the funnel metrics we wanted to see velocity in whole days. Fast-tracking meant same-day progression, not sub-24-hour precision.

-- Convert all timestamps to DATE resolution
CAST(MIN(LH_CreatedDate) AS DATE) AS CompanyType_FirstSet_Date

-- Velocity calculations now in whole days
DATEDIFF(DAY, Timestamp_VM1, Timestamp_VM2) AS Velocity_VM1_to_VM2_Days

This also eliminated same-day timestamp ordering issues. If a lead was marked SD-Ready and qualified as MQL on the same calendar day, both got the same DATE value—no artificial ordering conflict.

Evolution 3: Cohort Analysis Function

The views gave us individual lead and opportunity records with their bowtie metrics. But weneeded to compare cohorts: “How did leads from Q1 2025 perform compared to Q2 2025? What’s our lead-to-opportunity conversion rate trending over time?”

A table-valued function let us calculate aggregate metrics for any date range:

CREATE FUNCTION fn_BowtieMetrics_Cohort(
    @StartDate DATE,
    @EndDate DATE
)
RETURNS TABLE
AS
RETURN (
    SELECT 
        @StartDate AS CohortStartDate,
        @EndDate AS CohortEndDate,
        
        -- Volume Metrics
        SUM(VM1_Lead) AS VM1_Leads,
        SUM(VM2_MQL) AS VM2_MQL,
        SUM(VM2_1_SDReady) AS VM2_1_SDReady,
        SUM(VM3_SQL) AS VM3_SQL,
        SUM(VM4_SAL) AS VM4_SAL,
        SUM(VM5_ClosedWon) AS VM5_ClosedWon,
        
        -- Conversion Rates
        CASE WHEN SUM(VM1_Lead) > 0 
             THEN ROUND(CAST(SUM(VM2_MQL) AS FLOAT) / SUM(VM1_Lead) * 100, 1)
             ELSE 0 END AS CR1_LeadToMQL_Pct,
        
        -- Velocity Metrics
        AVG(CASE WHEN Timestamp_VM1 IS NOT NULL AND Timestamp_VM2 IS NOT NULL 
                 THEN DATEDIFF(DAY, Timestamp_VM1, Timestamp_VM2) END) AS Velocity_VM1_to_VM2_Days
    
    FROM vw_BowtieMetrics_Individual
    WHERE CreatedDate BETWEEN @StartDate AND @EndDate
);

-- Usage: Compare Q1 vs Q2 2025
SELECT * FROM fn_BowtieMetrics_Cohort('2025-01-01', '2025-03-31');
SELECT * FROM fn_BowtieMetrics_Cohort('2025-04-01', '2025-06-30');

Key Takeaways

1. Lead-centric AND deal-centric perspectives A converted lead carries timestamps from both its lead phase (VM1-VM2.1) and opportunity phase (VM3-VM5):

-- For converted leads, use lead timestamps for early stages
COALESCE(L.Lead_FirstCreated_Date, CAST(L.L_Timestamp_New AS DATE)) AS Timestamp_VM1,
COALESCE(L.CompanyType_FirstSet_Date, CAST(L.L_Timestamp_Open AS DATE)) AS Timestamp_VM2,

-- Use opportunity timestamps for later stages
CAST(O.OP_Timestamp_Qualification AS DATE) AS Timestamp_VM3,
CAST(O.OP_Timestamp_Closed AS DATE) AS Timestamp_VM5

This preserved the full customer journey rather than forcing an artificial choice between lead or deal perspective.

2. Qualified opportunities without leads Partner-sourced opportunities bypass the normal lead flow entirely:

WHERE (
    -- Include qualified opportunities
    (OP_RecordTypeName = 'Qualification' AND OP_LeadSource = 'Partner')
    OR 
    -- Include opportunities converted from leads
    EXISTS (SELECT 1 FROM SFO_Lead L 
            WHERE L.L_ConvertedOpportunityId = O.OP_SFID 
            AND L.L_IsConverted = 1)
)

These count toward VM3-VM5 metrics but not VM1-VM2, accurately reflecting that they entered the funnel mid-stream.

What we didn’t get (for another few months): Full Funnel Validation.

B2B SaaS sales cycles run 6-9 months. A lead entering in May 2025 might not close until December 2025 or January 2026. I could see leads flowing into VM1 and converting to VM2 MQLs. I could see some early conversions to VM3 SQL opportunities. But the full picture—whether our velocity metrics accurately predicted time-to-close, whether our conversion rate calculations held up across the entire funnel—wouldn’t be visible until later in 2025.

So we build analytics that wouldn’t prove its value for months.

This is the reality of B2B SaaS analytics: longer feedback loops mean you often don’t get to see immediatly whether your work actually helps. You implement the system, validate what you can in the short term, and hope for the best while the data matures.

But the bowtie model gave RevOps a common language across teams. The LeadHistory audit trail gave them timestamps they could trust. The analytics gave them metrics they will be able to act on.