Job Description
Data Pipeline Architecture and Development:
- Architect and optimize scalable data storage solutions, including data lakes, warehouses, and NoSQL databases, supporting large-scale analytics.
- Design and maintain efficient data pipelines using technologies such as Apache Spark, Kafka, Fabric Data Factory, and Airflow, based on cross-functional team requirements.
Data Integration and ETL:
- Develop robust ETL processes for reliable data ingestion, utilizing tools like SSIS, ADF, and custom Python scripts to ensure data quality and streamline workflows.
- Optimize ETL performance through techniques like partitioning and parallel processing.
Data Modeling and Schema Design:
- Define and implement data models and schemas for structured and semi-structured sources, ensuring consistency and efficiency while collaborating with data teams to optimize performance.
Data Governance, Security, and Compliance:
- Establish and enforce data governance policies, ensuring data quality, security, and compliance with regulations, using tools like Microsoft SQL Server.
- Implement access controls, encryption, and auditing to protect sensitive data and collaborate with IT to address vulnerabilities.
Infrastructure Management and Optimization:
- Manage and optimize cloud and on-prem infrastructure for data processing, monitor system performance, and implement disaster recovery enhancements.
- Leverage automation for provisioning, configuration, and deployment to improve operational efficiency.
Team Leadership and Mentorship:
- Provide technical leadership, mentoring team members in best practices and cloud technologies, while aligning data engineering initiatives with strategic goals.
Skills Required
- Bachelor’s degree or higher in Software Engineering, Computer Science, Engineering, or a related field.
- 3-5 years of experience in data engineering, with a proven history of designing and implementing complex data infrastructure.
- Proficient in Python, Scala, or Java, with experience in scalable, distributed systems.
- Strong knowledge of cloud computing platforms and related services like AWS Glue, Azure Data Factory, or Google Dataflow.
- Expertise in data modeling, schema design, and SQL query optimization for both relational and NoSQL databases.
- Excellent communication and leadership skills, with the ability to collaborate effectively with cross-functional teams and stakeholders.