- Design, develop, and optimize data pipelines and ETL processes using PySpark or Scala to extract, transform, and load large volumes of structured and unstructured data from diverse sources.
- Implement data ingestion, processing, and storage solutions on Azure cloud platform, leveraging services such as Azure Databricks, Azure Data Lake Storage, and Azure Synapse Analytics.
- Develop and maintain data models, schemas, and metadata to support efficient data access, query performance, and analytics requirements.
- Monitor pipeline performance, troubleshoot issues, and optimize data processing workflows for scalability, reliability, and cost-effectiveness.
- Implement data security and compliance measures to protect sensitive information and ensure regulatory compliance.
Requirement
- Proven experience as a Data Engineer, with expertise in building and optimizing data pipelines using PySpark, Scala, and Apache Spark.
- Hands-on experience with cloud platforms, particularly Azure, and proficiency in Azure services such as Azure Databricks, Azure Data Lake Storage, Azure Synapse Analytics, and Azure SQL Database.
- Strong programming skills in Python and Scala, with experience in software development, version control, and CI/CD practices.
- Familiarity with data warehousing concepts, dimensional modeling, and relational databases (e.g., SQL Server, PostgreSQL, MySQL).
- Experience with big data technologies and frameworks (e.g., Hadoop, Hive, HBase) is a plus
Role: Data Engineer
Industry Type: IT Services & Consulting
Department: Data Science & Analytics
Employment Type: Full Time, Permanent
Role Category: Data Science & Machine Learning
Education
UG: B.Com in Any Specialization, B.Sc in Any Specialization, B.Tech/B.E. in Any Specialization
PG: M.Tech in Any Specialization, MBA/PGDM in Any Specialization, MCA in Any Specialization