Part 7/15:
Data Streaming and Processing: The CDC events were streamed into Azure Data Lake and processed using a hybrid approach—combining Spark Structured Streaming and storage-based processing for various complexities. Spark would reconstitute full lead records via window functions, while Cosmos DB stored pre-calculated features for rapid lookup.
Feature Engineering and Scoring: Pipeline features were generated using Databricks, with real-time scoring achieved through Azure Functions reacting to new events. This decoupled approach ensured latency remained under 10 seconds, surpassing the initial 30-second requirement.
Feedback Loop: The predicted lead scores were written back into Salesforce, enabling sales representatives to prioritize high-probability prospects efficiently.