Hi 👋, I'm Shawn!
Nice to meet you.

Senior Data Solution Architect with 11+ years’ experience designing and optimizing scalable data solutions. Expert in ETL pipelines, big data processing, and cloud architectures (Talend, NiFi, Airflow, Informatica) across AWS, Azure, and GCP. Skilled in data warehousing (Star, Snowflake, Data Vault) and big data tools (Hadoop, Spark, Kafka, HDFS) for real-time streaming. Strong in data governance, ensuring quality, metadata management, and compliance (HIPAA, GDPR). Experienced in deploying ML models (Scikit-learn, TensorFlow, PyTorch) via Databricks. Proficient in data visualization (Tableau, Power BI, QuickSight, Plotly) to deliver insights. Adept in DevOps practices with Docker, Kubernetes, and CI/CD pipelines for efficient delivery.

Meet Fun Shawn!

Featured

Little more about me!

GitHub

Shawn Iqbal

Explore my code, commits, and collaborative projects.

Skills

A quick snapshot of my toolkit

🐍

Python

98%

🛢️

SQL

96%

⚙️

Golang

88%

🐳

Docker

90%

🦜

Kafka

89%

🧩

Debezium

84%

⚡

Spark

93%

🧠

LangChain

92%

🤗

HuggingFace

91%

📚

RAG

94%

📈

Scikit-Learn

97%

☁️

AWS

90%

🌐

GCP

88%

🖥️

Azure

86%

🗄️

MySQL

92%

🐘

PostgreSQL

94%

🍃

MongoDB

89%

❄️

Snowflake

87%

🧱

Databricks

85%

📊

Tableau

95%

📈

Power BI

96%

🔄

Alteryx

83%

⚙️

Talend

82%

⚡

FastAPI

90%

Experience

Data Solution Architect

2022.10 - Present

Technologies

Apache KafkaAWS (EC2, Lambda)Apache FlinkAWS KinesisGCPAzureAWSDelta LakeDatabricksSnowflake

Highlights

Architected and deployed real-time data streaming infrastructures using Apache Kafka, Apache Flink, and AWS Kinesis, enabling 99.9% uptime for data pipelines and improving supply chain visibility and Operational responsiveness by 35%.
• Designed and implemented scalable, cloud-native data architectures across AWS, Azure, and GCP, integrating Amazon Redshift, GoogleBigQuery, Azure DataLake, Databricks Lakehouse, and Snowflake, leading to a 50% reduction in infrastructure costs and enhanced performance elasticity
• Led end to endmigration of on premise data warehouses to modern cloud ecosystems, leveraging Snowflake, Delta Lake, and Databricks, resulting in 60% improvement in query performance and 70% decrease in maintenance overhead.

Senior Data Engineer

2019.08 - 2022.09

Technologies

MLflowDatabricks Apache Airflow Apache NiFi Amazon S3 HBaseHDFSHiveKafka Spark Hadoop A/B TestingTime Series Modeling

Highlights

Designed and maintained cloud-scale ELT pipelines using Spark, Snowflake, and Airflow to support analytics and reporting at scale.
• Experience implementing data contracts and aligning with Data Mesh principles for decentralized ownership across distributed teams.
• Integrated Snowflake for cloud warehouse migrations, optimizing query performance and enabling real-time analytics dashboards.

Data Engineer

2017.06 - 2019.07

Technologies

REST APIsGreat Expectations PythonApache Beam Google Cloud Dataflow

Highlights

Implemented data governance frameworks, including data lineage, access control, and regulatory compliance for healthcare datasets.
• Architected and implemented a data lake on Google Cloud Platform (GCP), enhancing data accessibility and crossfunctional analytics.
• Developed and automated data quality validation frameworks using Great Expectations, reducing data discrepancies by 40%.

Projects

Data Solution Architect

Regulatory complianceAWS Apache Airflow Apache NiFi Apache Kafka

HIPAA-compliant data pipelines

Designed and led the development of a real-time healthcare analytics platform integrating EHR and claims data using Apache Kafka, Apache Flink, and AWS Kinesis.
Enabled predictive insights for population health management and reduced data processing latency by 60%.
Deployed HIPAA-compliant data pipelines with Apache NiFi and Airflow on AWS, enhancing care quality and regulatory compliance.

Cloud Data Lakehouse Migration

MLflowMachine learning models TalendETL workflows Microsoft Azure Delta Lake Databricks

Led the migration of legacy on-premises data infrastructure to a unified cloud-based lakehouse using Databricks and Delta Lake on Azure.
Streamlined ETL workflows using Apache Spark and Talend, improving data refresh rates by 70%.
Integrated machine learning models with MLflow to forecast energy demands, increasing predictive accuracy by 30%.

Financial Data Pipeline Modernization

Microsoft AzureData quality checks Data validation Data architecture Cloud-native data lake

Developed scalable ETL pipelines with Apache Beam, Python, and Google Cloud Dataflow, processing over 10 million financial records daily.
Designed a cloud-native data lake on GCP, enabling seamless access to structured and unstructured data for cross-team analytics.
Implemented automated data validation and quality checks using Great Expectations, reducing data inconsistencies by 40%.

ML Feature Store for Fraud Detection

Real-time feature engineeringFraud detection Model iteration FeastMLflowDatabricks

Designed and deployed a centralized ML Feature Store using Databricks, MLflow, and Feast, enabling 3× faster model iterations. Reduced fraud detection false positives by 18% through real-time feature engineering.

Hi 👋, I'm Shawn! Nice to meet you.

Featured

Shawn Iqbal

Skills

Experience

Data Solution Architect

Technologies

Highlights

Senior Data Engineer

Technologies

Highlights

Data Engineer

Technologies

Highlights

Projects

Data Solution Architect

Cloud Data Lakehouse Migration

Financial Data Pipeline Modernization

ML Feature Store for Fraud Detection

Hi 👋, I'm Shawn!
Nice to meet you.