Contact +91 8807923886

Modules covered in

The Ultimate Guide for AWS Data Engineering Course

Download Your Roadmap Guide Now/ Full Curriculum/ Syllabus

Join WhatsApp Group to Stay Updated About Course and our upcoming webinars

Demo video for reference

Module 1

Foundations of Data Engineering
  • What is Data Engineering?
  • Responsibilities and Skillsets of a Data Engineer
  • A Day in the Life of a Data Engineer
  • Tools of the Data Engineer: Databases, Processing Tasks, Scheduling Tasks
  • Cloud Providers Overview
  • Distributed Computing Overview

Module 2

Core Database Concepts
  • SQL vs NoSQL
  • OLAP vs OLTP
  • Datawarehouse vs Data Lake
  • Datawarehouse schemas: Snowflake, Star, Galaxy

Module 3

Apache Spark Fundamentals
  • Spark Architecture: RDD, DataFrame Fundamentals
  • DataFrame API and Data Source API
  • Transformations and Actions
  • Reading and Processing CSV, JSON, and XML Files
  • Understanding Spark UI
  • Implement basic transformations using PySpark.
  • Process CSV and JSON files and visualize the results in Spark UI.

Module 4

Advanced Spark Techniques
  • Catalyst Optimizer, UDFs, and `DataFrame.explain`
  • Directed Acyclic Graphs (DAGs) and Adaptive Query Execution
  • Handling Complex JSON and Struct Data Types
  • Optimizations: Predicate Pushdown, Projection Pushdown, and Cache/Persist
  • Handling Data Skew and Salting
  • Optimize a Spark job using partitioning and caching techniques.
  • Explore complex transformations and actions in Spark.

Module 5

AWS Glue Essentials
  • AWS Glue Architecture and Applications
  • AWS Glue Job Scripts and Properties
  • AWS Glue Data Catalog and Databases
  • AWS Glue Connection and Secret Manager
  • Developing Glue Jobs with Advance Network Configuration
  • Create an AWS Glue job to process and transform data.
  • Integrate AWS Glue with Amazon RDS MySQL and Athena.

Module 6

Streaming and Real-Time Data Processing
  • Kafka Architecture and Spark Structured Streaming
  • Basic Micro-batch and Background Queries
  • Supported Sources and Sinks
  • Writing Streams and Managing Checkpoints
  • Build a streaming application with Spark Structured Streaming.
  • Implement a basic Kafka-Spark pipeline with checkpointing.

Module 7

Lakehouse and Medallion Architecture
  • Configuring Delta Lake for Lakehouse Architecture
  • Delta Format and Transaction Logs
  • Schema Evolution and Medallion Architecture Layers: Bronze, Silver, Gold
  • Transformations for Silver and Gold Layers
  • Set up a Delta Lake and ingest data into the Bronze layer.
  • Apply transformations to create Silver and Gold layers.

Module 8

AWS Lambda and Final Project
  • Building Lambda Functions for Event-Driven Architecture
  • Lambda with Amazon S3, DynamoDB, and SNS
  • Develop a Lambda function triggered by S3 events.
  • Implement a final project integrating AWS Glue, Lambda, and Delta Lake.