Ojas Yeole: Data Engineering Visionary
Transforming complex data into intelligent, scalable solutions for tomorrow's challenges.
Contact Me
Summary: Architecting the Future of Data
Scalable ETL Pipelines
Expert in designing and deploying robust, scalable ETL pipelines for structured and semi-structured data ingestion using Azure Databricks and Spark.
Medallion Architecture Mastery
Proven ability to implement enterprise-grade Medallion architecture (Bronze → Silver → Gold) for multi-format data ingestion and transformation.
Enterprise-Grade Solutions
Experienced in deploying secure, high-performance data solutions on Azure Databricks, ensuring data quality, governance, and reliability.
Work Experience: Driving Data Innovation
Associate Data Engineer
V4C.ai | Pune, India
Nov 2025 - Feb 2026
  • Designed and deployed scalable ETL pipelines for structured and semi-structured data ingestion, leveraging Azure Databricks and Spark.
  • Built feature-ready and analytics-ready datasets through meticulous data modeling, transformation, and validation frameworks.
  • Implemented distributed data processing workflows, efficiently handling large-scale datasets with high performance.
  • Applied stringent data quality checks, robust monitoring mechanisms, and performance tuning to ensure reliability and strict SLA adherence.
  • Managed cloud-based deployments, ensuring optimal scalability, fault tolerance, and production stability for critical data operations.
Flagship Project: E-Commerce Analytics Platform
Architected and implemented a comprehensive enterprise-grade analytics platform.
1
Medallion Data Platform
Designed Bronze → Silver → Gold architecture on Azure Databricks using Delta Lake, Unity Catalog (3-tier namespace), external locations, and Service Principal authentication.
2
Real-time DLT & Streaming
Built Delta Live Tables (DLT) & Structured Streaming pipelines with schema evolution, data quality constraints, checkpointing, and fault tolerance for near real-time processing.
3
Automated CI/CD with DABs
Implemented CI/CD using Databricks Asset Bundles (DABs) with GitHub integration, enabling automated multi-environment (dev/prod) deployment of notebooks, jobs, and pipelines.
4
Performance & Governance
Enforced production-level governance (Row/Column-Level Security, Dynamic Data Masking) and performance optimization (OPTIMIZE, VACUUM, Time Travel), improving query performance by ~40%.
Project: Stock Market Price Prediction with News Sentiment
Developed a sophisticated system to predict stock price movements by integrating real-time market data with news sentiment analysis.
  • Built a real-time data pipeline using web scraping & APIs (News API, Yahoo Finance) to collect stock news and historical price data.
  • Applied advanced Natural Language Processing (NLP) to extract sentiment from financial headlines, combining it with time-series market data for enriched analysis.
  • Trained and fine-tuned ML models to forecast short-term stock price movements based on sentiment trends.
  • Deployed the solution on AWS SageMaker, EC2, and S3 for scalable, low-latency cloud predictions.
  • Designed real-time Power BI dashboards to visualize sentiment scores, stock performance, and prediction outputs.
Project: Databricks Medallion Data Pipeline
1
Enterprise-Grade Architecture
Built a robust Medallion architecture (Bronze, Silver, Gold) for multi-format data ingestion and transformation using Databricks and Delta Lake, ensuring data quality and scalability.
2
Scalable Ingestion & Processing
Implemented Auto Loader, COPY INTO, Delta Live Tables, and MERGE INTO for scalable ingestion and incremental processing of diverse datasets.
3
Advanced Governance & Optimization
Applied robust governance via Unity Catalog (RLS, CLS, Data Masking) and optimized performance using Liquid Clustering, partitioning, and Z-ordering, with thorough benchmarking analysis.
4
Automated CI/CD
Automated deployments using Git integration, Databricks Asset Bundles (DABs), and GitHub Actions CI/CD, streamlining development and deployment cycles.
Project: End-to-End Telecom Customer Churn Prediction
Developed a comprehensive solution to predict customer churn in the telecom sector, providing actionable insights for retention.
  • Cleaned and transformed telecom customer data using Python and SQL, creating ML-ready datasets and identifying key churn drivers.
  • Conducted extensive Exploratory Data Analysis (EDA) and correlation analysis to pinpoint top churn indicators.
  • Built an interpretable PCA + Logistic Regression model, achieving 94% test accuracy and a 0.95+ ROC-AUC score.
  • Designed interactive Power BI dashboards to visualize churn trends, key performance indicators (KPIs), and customer segmentation insights.
  • Generated actionable retention strategies by analyzing tenure, contract type, and service usage patterns.
Education & Certifications
A foundation of academic excellence coupled with industry-recognized certifications.
PG Diploma, Big Data Analytics
CDAC Mumbai, Navi Mumbai, India
Feb 2025 - Aug 2025
B.E, Electronics & Telecommunication
K. K. Wagh Institute, Nashik
Jun 2024
Databricks Certified Data Engineer Associate
Databricks
Jan 2026
AWS Academy Cloud Foundations
AWS Academy
Apr 2025
Technical Expertise & Tools
Proficient across a wide spectrum of programming languages, frameworks, and cloud platforms essential for modern data engineering.
Programming Languages
  • Python
  • SQL
Libraries & Frameworks
  • Pandas, NumPy, Scikit-learn
  • TensorFlow, Seaborn, Matplotlib
  • PySpark, Databricks
Data Engineering & Cloud
  • Azure (Data Factory, Databricks, ADLS Gen2)
  • AWS (EC2, S3, Lambda, SageMaker, IAM)
  • Apache Spark, Azure Databricks
  • Medallion Architecture, ETL Pipelines, Data Modeling
Data Visualization
  • Power BI
  • Tableau
  • Jupyter Notebook
Governance & DevOps Accumen
Unity Catalog
Implementing robust data governance and access control for secure and compliant data environments.
RLS/CLS & Data Masking
Ensuring fine-grained security with Row/Column-Level Security and Dynamic Data Masking.
Git & GitHub Actions
Leveraging version control and CI/CD for automated, efficient, and reliable code deployments.
Databricks Asset Bundles (DABs)
Streamlining Databricks asset management and deployment across multiple environments.
Made with