Data Engineering

Learn the fundamentals of data engineering in this hands-on course, focusing on building efficient data pipelines through ETL processes. Gain experience with tools like Kafka for data streaming, Redis for in-memory storage, and Hadoop and Spark for big data processing. Develop scalable data pipelines by working on real-world projects that integrate these technologies.

Course Outcomes

  • Learn the fundamentals of data pipelines, including Extract, Transform, and Load (ETL) processes, for handling both real-time and batch data efficiently, along with data transformation techniques.
  • Acquire hands-on experience with key data engineering tools and technologies, including Kafka for message queuing, Redis for in-memory data storage, and Hadoop and Spark for big data processing.
  • Should be able to apply knowledge in creating scalable and efficient data pipelines using open-source tools, focusing on real-world data engineering challenges.

Approach

  • Setting up Kafka for data streaming and Redis for in-memory caching. Experiment with various functionalities and use cases of these.
  • Set up big data tools like Hadoop and Spark and try to do some sample exercises on large-scale data.
  • Undertake a project that involves building a simple data pipeline. This project could use Kafka for data ingestion, perform transformations with Spark, and use Redis to cache processed data, simulating a real-world data engineering scenario.