Expert in designing and deploying robust, scalable ETL pipelines for structured and semi-structured data ingestion using Azure Databricks and Spark.
Medallion Architecture Mastery
Proven ability to implement enterprise-grade Medallion architecture (Bronze → Silver → Gold) for multi-format data ingestion and transformation.
Enterprise-Grade Solutions
Experienced in deploying secure, high-performance data solutions on Azure Databricks, ensuring data quality, governance, and reliability.
Work Experience: Driving Data Innovation
Associate Data Engineer
V4C.ai | Pune, India
Nov 2025 - Feb 2026
Designed and deployed scalable ETL pipelines for structured and semi-structured data ingestion, leveraging Azure Databricks and Spark.
Built feature-ready and analytics-ready datasets through meticulous data modeling, transformation, and validation frameworks.
Implemented distributed data processing workflows, efficiently handling large-scale datasets with high performance.
Applied stringent data quality checks, robust monitoring mechanisms, and performance tuning to ensure reliability and strict SLA adherence.
Managed cloud-based deployments, ensuring optimal scalability, fault tolerance, and production stability for critical data operations.
Flagship Project: E-Commerce Analytics Platform
Architected and implemented a comprehensive enterprise-grade analytics platform.
1
Medallion Data Platform
Designed Bronze → Silver → Gold architecture on Azure Databricks using Delta Lake, Unity Catalog (3-tier namespace), external locations, and Service Principal authentication.
2
Real-time DLT & Streaming
Built Delta Live Tables (DLT) & Structured Streaming pipelines with schema evolution, data quality constraints, checkpointing, and fault tolerance for near real-time processing.
3
Automated CI/CD with DABs
Implemented CI/CD using Databricks Asset Bundles (DABs) with GitHub integration, enabling automated multi-environment (dev/prod) deployment of notebooks, jobs, and pipelines.
4
Performance & Governance
Enforced production-level governance (Row/Column-Level Security, Dynamic Data Masking) and performance optimization (OPTIMIZE, VACUUM, Time Travel), improving query performance by ~40%.
Project: Stock Market Price Prediction with News Sentiment
Developed a sophisticated system to predict stock price movements by integrating real-time market data with news sentiment analysis.
Built a real-time data pipeline using web scraping & APIs (News API, Yahoo Finance) to collect stock news and historical price data.
Applied advanced Natural Language Processing (NLP) to extract sentiment from financial headlines, combining it with time-series market data for enriched analysis.
Trained and fine-tuned ML models to forecast short-term stock price movements based on sentiment trends.
Deployed the solution on AWS SageMaker, EC2, and S3 for scalable, low-latency cloud predictions.
Designed real-time Power BI dashboards to visualize sentiment scores, stock performance, and prediction outputs.
Project: Databricks Medallion Data Pipeline
1
Enterprise-Grade Architecture
Built a robust Medallion architecture (Bronze, Silver, Gold) for multi-format data ingestion and transformation using Databricks and Delta Lake, ensuring data quality and scalability.
2
Scalable Ingestion & Processing
Implemented Auto Loader, COPY INTO, Delta Live Tables, and MERGE INTO for scalable ingestion and incremental processing of diverse datasets.
3
Advanced Governance & Optimization
Applied robust governance via Unity Catalog (RLS, CLS, Data Masking) and optimized performance using Liquid Clustering, partitioning, and Z-ordering, with thorough benchmarking analysis.
4
Automated CI/CD
Automated deployments using Git integration, Databricks Asset Bundles (DABs), and GitHub Actions CI/CD, streamlining development and deployment cycles.