Data Scientist · Chicago

Sri Hari Sivashanmugam

Data Scientist and Architect specializing in scalable data platforms, machine learning, and cloud systems. Driving public health data modernization in Chicago, serving over 2.7M citizens through interoperable, analytics-ready solutions built on Azure and big data technologies.

4+ Years of Experience
2.7M+ Citizens Served
15+ Data Systems Integrated

Who I Am

High-impact Data Scientist and Data Architect with deep expertise in designing scalable, production-grade data models that drive decision-making across public sector and healthcare environments. Specializing in transforming fragmented, high-volume datasets into centralized, interoperable platforms that support analytics, governance, and regulatory compliance.

Currently at the City of Chicago, leading data modernization efforts to consolidate multi-agency sources into unified, longitudinal models that enable real-time insights and strategic planning. Experienced in USCDI standards, HL7/FHIR integration, and building data infrastructure that scales securely across industries.

Core Competencies: Data Architecture • Cloud Solutions • Scalable Data Pipelines • Machine Learning • Metadata Governance • Standards-Driven Data Modeling • Agile Cross-Functional Collaboration

Academic Background

Master of Data Science

Illinois Institute of Technology

GPA: 3.7/4.0 • 2023

Specialized in Big Data Technologies, Neural Networks, Time Series Analysis, Data Mining, Statistical Learning, and Cloud Computing

Bachelor of Technology

Amrita Vishwa Vidyapeetham

Computer Science • GPA: 8.95/10.0 • 2021

Focus on Neural Networks, Machine Learning, Cloud Computing, Database Management, and Software Engineering

When I'm Not Coding

Coding with music

Developer

Exploring new technologies and building cool stuff using it. Attending Hackathons. Code is my canvas, data is my art.

Hiking adventure

Mountain Explorer

Finding clarity and inspiration in nature. Every hike teaches patience, perspective, and perseverance.

Running outdoors

Fitness Enthusiast

Running towards goals, both personal and professional. Maintaining balance through active lifestyle.

Experience

2024 – Present

Chicago Department of Public Health

Data Scientist

Driving data modernization and interoperability efforts serving 2.7M+ citizens through scalable Azure-based data systems.

  • Designed a longitudinal health data model using a Medallion Data Lakehouse Architecture, standardizing HL7, FHIR, and CCDA formats to enable seamless cross-agency data interoperability and analytics.
  • Built and deployed Entity Resolution and Master Data Management system with ~99% accuracy.
  • Orchestrated ETL pipelines using Spark and Databricks to stream and process 100M+ records daily, ensuring near real-time data availability and system scalability.
  • Developed and maintained 50+ Tableau dashboards visualizing community health trends and program KPIs across Chicago.
  • Implemented data governance frameworks using Azure Purview and Active Directory to strengthen transparency and compliance.
Technologies Used
Python Apache Spark Databricks Azure Data Factory Azure Synapse Azure SQL Medallion Lakehouse HL7 FHIR CCDA Tableau Azure Purview Active Directory Entity Resolution / MDM
2023 – 2024

Intellihot

Data Scientist

Built predictive analytics and IoT-driven insights to optimize industrial heating systems and minimize downtime.

  • Developed asynchronous APIs and data pipelines that increased data accessibility by 97%, improving decision-making speed.
  • Applied time-series modeling for predictive maintenance, reducing system downtime by 15% across deployed units.
  • Designed Power BI dashboards automating KPI monitoring, cutting manual reporting time by 95%.
  • Integrated machine learning models into operational workflows, enhancing system reliability by 12%.
Technologies Used
Python Scikit-learn Power BI Async APIs ThingWorx Azure SQL CData Connectors Time-series Modeling
2021 – 2023

Illinois Institute of Technology

Graduate Teaching Assistant

Mentored 200+ graduate students in Big Data Technologies and scalable data engineering practices using Hadoop, Spark, and Kafka.

  • Guided 200+ students through hands-on projects in data pipelines, distributed processing, and machine learning.
  • Improved student outcomes by 30% through workshops on data modeling and big data best practices and cloud scalability.
Technologies Used
Hadoop Apache Spark Kafka Python SQL AWS Azure Tableau
Editor-in-Chief of TechNews (College Newspaper)

Led editorial operations for the college newspaper, overseeing content strategy, publication workflow, and a cross-functional team of writers and editors.

  • Managed a team of 30+ editors and 150+ writers, ensuring consistent weekly publication with a 98% on-time rate.
  • Conducted training sessions that improved writing quality by 25% and enhanced content clarity and engagement.
  • Implemented editorial process automation and streamlined review cycles, reducing turnaround time by 40%.
  • Partnered with the business team to launch marketing campaigns that increased readership engagement by 40% and ad revenue by 15%.
Technologies Used
WordPress Google Workspace Canva Trello SEO Analytics
2021

Caterpillar

Data Scientist

Developed marketing analytics and customer retention models to drive sales and optimize business performance.

  • Built a Shapley-based marketing attribution model that boosted sales by 8% and improved monthly repeat purchases by 50K+.
  • Optimized a propensity model by reducing features from 90+ to 40+, increasing the ROC-AUC score by 15%.
  • Designed a data-driven customer insights dashboard supporting 45K+ daily users and improving engagement by 30%.
Technologies Used
Python Scikit-learn SQL GCP Power BI Data Studio Shapley Attribution

Projects

🔍
01

Workplace Safety Monitoring

Real-time CV system achieving 99% mask detection and 93% pose accuracy, reducing violations by 15%.

🔍
02

Google Ads Optimization

CatBoost models increasing impressions by 20%, traffic by 10%, and ROI by 15%.

🔍
03

Custom Database System

Python DBMS with B+-tree indexing reducing query times by 30%.

🔍
04

Customer Churn Prediction

ML model with 92% accuracy enabling retention strategies.

🔍
05

Fraud Detection System

Real-time anomaly detection reducing false positives by 20%.

🔍
06

Sales Forecasting

ARIMA and LSTM models predicting sales with 95% accuracy.

Skills & Technologies

Programming & Analytics

Python
R
SQL
NoSQL
Git/dbt/MLflow

Data Visualization

Tableau
Power BI
Streamlit
Plotly

Machine Learning & AI

Scikit-learn
TensorFlow
Keras
Deep Learning
Feature Engineering
A/B Testing

Big Data Technologies

Apache Spark
Apache Kafka
Hadoop
Hive
DynamoDB

Cloud Platforms

Microsoft Azure
AWS
Google Cloud
Databricks
Snowflake

Data Engineering

ETL Pipelines
Data Lakehouse
Azure Synapse
Azure Data Factory

Business Tools

Salesforce
Google Analytics 4
Microsoft Excel
Thingworx

Standards & Frameworks

HL7/FHIR
USCDI Standards
Data Governance
Agile Methodology

Let's Connect

Open to new opportunities and collaborations

🎨
🏆 Achievement Unlocked!