Design, develop, and deploy scalable and efficient data pipelines using Scala and Spark to process and analyze large volumes of data.
Collaborate with cross-functional teams to understand data requirements, provide technical guidance, and implement data solutions that meet business needs.
Optimize and fine-tune data processing workflows to ensure high performance and reliability.
Implement and maintain data infrastructure on AWS, leveraging services such as Amazon EMR, Amazon S3, Amazon Glue, and AWS Lambda.
Monitor and troubleshoot data pipelines to identify and resolve issues promptly.
Develop and enforce data engineering best practices, coding standards, and documentation guidelines.
Stay up-to-date with the latest advancements in data engineering technologies and methodologies, and actively contribute to the improvement of the team's technical capabilities.
Provide mentorship and guidance to junior team members, fostering a collaborative and growth-oriented environment.
Requirements:
Bachelor's or master's degree in Computer Science, Engineering, or a related field.
Proven experience as a Data Engineer, with a focus on Scala and Spark.
Strong proficiency in Scala programming language and Spark ecosystem (Spark Core, Spark SQL, Spark Streaming).
Solid understanding of distributed computing principles and data processing frameworks.
Extensive hands-on experience with AWS, including services like Amazon EMR, Amazon S3, Amazon Glue, AWS Lambda, and AWS Data Pipeline.
Familiarity with NiFi and its components for data integration and processing.
Proficiency in SQL and experience with relational and NoSQL databases.
Strong problem-solving skills and ability to analyze complex data requirements.
Excellent communication and collaboration skills, with the ability to work effectively in a team-oriented environment.
Proactive and self-motivated attitude, with a strong commitment to delivering high-quality solutions.
Preferred Qualifications:
Experience with big data technologies such as Hadoop, Hive, or Presto.
Knowledge of containerization technologies like Docker and orchestration tools like Kubernetes.
Familiarity with data visualization tools and techniques.