Savita Biradar
Professional Summary
Innovative Data Engineer with 5+ years of experience in the Big Data ecosystem and a recent focus on generative AI technologies. Adept at automating ETL processes, optimizing SQL queries, and designing scalable data solutions. Expertise includes integrating advanced AI models into data pipelines to enhance predictive analytics and drive data-driven decision-making. Proficient in cloud platforms and certified in SAS and Java Fullstack. Combines a Master’s in Computer Science and a Post Graduate Diploma in Data Science and Analytics with a passion for leveraging emerging AI tools to deliver impactful data solutions.
Core Skills
Work History
Photon — Generative AI Engineer
Alpharetta · April 2024 - Present
Project: IDG summary page powered by GenAI for hospice care — managed data migration, ETL development, and implemented AI-driven summaries while handling sensitive patient PII.
- Migrated data from SQL Server to PostgreSQL in Azure, ensuring data integrity and consistency.
- Developed and deployed ETL processes in Azure Function Apps with time-based triggers.
- Preprocessed staging data before integrating into the IDG schema to improve data readiness.
- Conducted data modeling to optimize database performance and scalability.
- Built the IDG summary page using OpenAI GPT-4o for AI-generated clinical summaries.
- Deployed models using Azure DevOps pipelines to automate and standardize deployments.
- Added edit/save functionality to capture user feedback for reinforcement learning and fine-tuning.
Environment: GenAI, Azure Function Apps, Azure Event Hub, PostgreSQL, SQL Server, ETL, Data Modeling, OpenAI GPT-4o, Azure DevOps
Capgemini — Generative AI Data Engineer
Alpharetta · June 2021 - March 2024
Led platform decisions and managed Dataiku DSS environments, and designed NLP and generative AI solutions across multiple projects.
- Managed Dataiku DSS setup, maintenance, upgrades, disk and memory tuning, and user permissions.
- Coordinated releases, backups, and migration of DSS instances; automated workflows and code environments.
- Built scalable NLP pipelines, training workflows, and low-latency inference engines (PyTorch/TensorFlow).
- Implemented model fine-tuning, distillation, pruning, and CUDA optimizations to reduce inference latency.
- Automated wealth-management ETL flows using Dataiku and optimized SQL for production readiness.
- Developed ESG data pipelines on AWS using PySpark (EMR), Glue, Lambda, and Redshift.
Environment: Linux, Terminal, Impala, Spark SQL, Python, Dataiku, HiveQL, PyTorch, TensorFlow, TensorFlow Serving, ONNX Runtime, AWS, EMR, S3, Glue
Vizion Technologies — Data Engineer
Ohio, USA · March 2021 - June 2021
Worked on cloud migrations and data lake transformations, including dbt orchestration and Hadoop-to-Azure data movement.
- Configured dbt to connect source (IBM Cloud) and target (Azure Data Lake) and developed dbt models.
- Designed data transformations to meet business rules and implemented migration workflows.
- Analyzed large data sets using Hive, created tables on Hadoop data lakes, and wrote MapReduce/Hive jobs.
Environment: dbt, SQL, Azure Data Lake, Hadoop, HDFS, Hive, PySpark, AWS, EMR, S3, Lambda
Northeastern Illinois University — Graduate Research Assistant
Chicago, IL · Jan 2020 - Dec 2020
Developed a dialogue system to communicate breast cancer information to Latinx patients using NLP.
- Implemented NLP experiments using BERT and cosine similarity models and compared results to human judgments.
- Built backend services with Flask and used TensorFlow and Transformers for model development.
- Used SNLI corpus for semantic evaluation and applied Python, NumPy, and ML toolchains.
Environment: Python, TensorFlow, Flask, ML, NLP, Anaconda, R
IGSRS-Alchemus — AI Engineer
Mumbai, India · Jan 2018 - Nov 2019
Integrated AI into talent acquisition systems and built ML/NLP models and APIs.
- Developed ML models (LDA, Naive Bayes, PCA) and applied data cleaning, quality checks, and feature engineering.
- Built REST APIs, API tests, and automation scripts using Python and Selenium.
- Participated in product meetings and provided performance feedback and improvement suggestions.
Environment: Python, scikit-learn, Gensim, Data Studio, NLP, MongoDB, Kafka
IAastha Technologies — Business Intelligence
Mumbai, India · Jan 2017 - Apr 2018
Enhanced search relevance using NLP, trained SpaCy NER models, and deployed via Flask APIs.
- Created training data for NER tagging and trained SpaCy models; deployed models with Flask APIs.
- Built Python web crawlers with Scrapy and created NIFI workflows to run them; pushed processed data to Elasticsearch.
- Built dashboards and reports in Google Data Studio to demonstrate results and quick wins.
Environment: Python, SpaCy, NLTK, Scrapy, Apache NIFI, Google Data Studio
Education
- Northeastern Illinois University — M.S., Computer Science · Jan 2020 - Dec 2020
- iFEEL — Post Graduate Diploma, Data Science and Analytics · Jan 2019 - Dec 2019
- D.Y. Patil University — M.Tech, Integrated Biotechnology (Bioinformatics) · Jul 2013 - Mar 2018