Parth Sanjay Sapre

📞 814‑218‑7384  •  📧 sapreparthu@gmail.com
🔗 LinkedIn

Professional Summary

Led migration of healthcare risk‑scoring pipelines to AWS using PySpark, EMR, Glue and S3, optimising large‑scale risk and anomaly detection workloads. Owned 75% of the core migration runtime, improved performance by 45%, processed 2.8 TB daily and cut monthly costs by $12K. Designed AI‑driven RAG solutions and predictive models integrated with ML workflows, built distributed data pipelines for LLM dataset prep processing over 1.2 billion tokens, and engineered ETL orchestration with AWS Step Functions and Airflow to triple concurrency and reduce recovery time by 70%.

Work Experience

Python PySpark Engineer

June 2022 – Present | IBM Contractor – Technology Square Inc, Baltimore MD
  • Led migration of healthcare risk‑scoring pipelines to AWS using PySpark, EMR and S3, improving runtime by 45%, handling 2.8 TB/day and saving $12K/month.
  • Designed AI‑driven RAG solutions and predictive models integrated with ML workflows producing reproducible datasets for analytics and fine‑tuning.
  • Engineered distributed data pipelines for LLM dataset prep (tokenisation, embeddings, preprocessing) processing 1.2 billion tokens with 60% higher throughput.
  • Built ETL orchestration with AWS Step Functions & Airflow, tripling concurrency, cutting recovery time by 70% and reducing failures by 85%.
  • Applied OOP, modular design and algorithmic optimisation across 12K LOC, increasing maintainability by 40% and code reuse by 55%.
  • Ensured data validation & compliance (HIPAA, FHIR, HL7) with 99.2% validation coverage and zero compliance issues.

Software Engineer

September 2021 – May 2022 | Nest Technology, Sterling VA
  • Built ML pipelines for fraud and customer risk scoring using Python & Scikit‑learn, delivering 35% better detection accuracy and 50% faster inference.
  • Designed ETL monitoring & validation, reducing incidents by 80% and improving coverage by 95%.
  • Partnered with product and data teams to automate reporting, saving 150+ hours/month and boosting adoption by 40%.
  • Developed Airflow ingestion workflows, cutting execution time by 43% with 99.5% uptime.

Research Assistant

January 2020 – January 2021 | Gannon University, Erie PA
  • Preprocessed over 12 million NLP records, increasing model accuracy by 18% and decreasing prep time by 30%.
  • Built a blockchain‑based healthcare data exchange achieving 100% FHIR compliance and 10K TPS throughput.
  • Conducted research on secure data sharing and anomaly detection using blockchain frameworks.

Skills

Cloud & Big Data

AWS (Redshift, S3, Lambda, EMR, Step Functions, API Gateway, IAM, CloudFormation), Apache Spark (PySpark), Airflow, distributed systems, ETL pipeline automation and data warehousing.

AI/ML & LLM

LLM data preparation (tokenisation, embeddings, fine‑tuning datasets), NLP, predictive analytics, RAG models, Generative AI (Bedrock), Scikit‑learn, model deployment, experiment tracking and model evaluation (AUC, precision, recall).

Software Engineering

Object‑oriented programming, design patterns, algorithms, data structures, concurrency and system design.

Healthcare & Compliance

CMS RAF risk scoring, HIPAA, FHIR, HL7 and healthcare claims & payer data frameworks.

Observability & Data Quality

CloudWatch, Splunk, SES, SNS, data validation frameworks, schema design and data dictionaries.

Programming & Tools

Python, PySpark, SQL (with familiarity in Scala and Java), QuickSight for dashboards and reporting, modern ETL/orchestration (Airflow/Step Functions).

Projects

Credit Risk Classifier

Developed a supervised machine‑learning model using Python and Scikit‑learn to classify high‑risk transactions from historical payment data, achieving an 81% recall for fraudulent transactions.

Cyber Threat Pattern Detection

Implemented a Spark‑based anomaly detection pipeline on AWS to analyse log data and identify suspicious access patterns, improving detection speed and scalability.

Education & Certifications

Computer & Information Science (Data Science)

Gannon University, Erie PA | 2021

GPA: 3.94, Scholastic Excellence Award.

Computer Engineering

Dharmsinh Desai University, Nadiad, India | 2019

GPA: 3.64.

Computer Engineering

Maharaja Sayajirao University, Vadodara, India | 2016

GPA: 3.78.

Certifications

Machine Learning Specialisation (Coursera, Andrew Ng); Certified AWS Cloud Practitioner; Certified Data Analytics (AWS); PySpark; Python; Natural Language Processing; Machine Learning; Pattern Recognition; Advanced SQL; AWS; Problem Solving.