Salary:
Start Date: 05/26/2025
Title: Python Data Engineer
Location: Remote
Type: Temp to Perm
Description:
We are looking for a Python Data Engineer that will bring strong expertise in CMS datasets
(MOR, MMR, MAO) and an understanding of healthcare regulations. The role requires
proficiency with modern cloud data engineering tools, including Dataflow, BigQuery, and
Airflow for orchestration, along with solid foundational knowledge in data warehousing
concepts and optimization techniques for large healthcare datasets.
What You Will Do:
Design, develop, and maintain scalable ETL pipelines for CMS datasets using GCP
Dataflow and Python.
Architect and manage data warehouses using BigQuery, ensuring scalability and
cost-efficiency.
Implement Airflow DAGs for orchestration of complex data workflows and
scheduling.
Ensure data quality, validation, lineage, and governance aligned with CMS and
HIPAA compliance standards.
Optimize large-scale datasets through partitioning, clustering, sharding, and cost
effective query patterns in BigQuery.
Work collaboratively in Agile teams, using Jira for project tracking and Confluence
for documentation.
Monitor and troubleshoot data pipelines, ensuring reliability and operational
excellence.
You Will Be Successful If:
Self-motivated, proactive, and capable of thriving in a fast-paced, agile startup
environment with minimal supervision.
Demonstrates strong ownership of tasks and deliverables, acting as a task master.
Eager self-learner who stays current with emerging technologies and industry
trends.
Excellent communication skills, both written and verbal, to effectively collaborate
across multidisciplinary teams.
What You Will Bring:
Bachelor's degree in Computer Science, Information Systems, or related field.
7+ years of experience in cloud-based data engineering, preferably with healthcare
datasets.
Extensive experience working with risk adjustment.
Expertise in building ETL pipelines using GCP Dataflow (Apache Beam) and Python.
Expert experience with BigQuery including schema design, optimization, and
advanced SQL.
Hands-on experience with Airflow orchestration for large-scale data workflows.
Deep understanding of data warehouse concepts such as star schema, snowflake
schema, normalization, denormalization.
Expert in dataset optimization techniques: query optimization, partitioning,
clustering.
Familiarity with Agile processes, Jira, Confluence, and cloud-native engineering best
practices.
Knowledge of CMS datasets (MOR, MMR, MAO) and healthcare data compliance
(HIPAA).
remote work