AWS Data Engineering Course
Modules covered in
The Ultimate Guide for AWS Data Engineering Course
Download Your Roadmap Guide Now/ Full Curriculum/ Syllabus
Join WhatsApp Group to Stay Updated About Course and our upcoming webinars
Demo video for reference
Module 1
Foundations of Data Engineering
- What is Data Engineering?
- Responsibilities and Skillsets of a Data Engineer
- A Day in the Life of a Data Engineer
- Tools of the Data Engineer: Databases, Processing Tasks, Scheduling Tasks
- Cloud Providers Overview
- Distributed Computing Overview
Module 2
Core Database Concepts
- SQL vs NoSQL
- OLAP vs OLTP
- Datawarehouse vs Data Lake
- Datawarehouse schemas: Snowflake, Star, Galaxy
Module 3
Apache Spark Fundamentals
- Spark Architecture: RDD, DataFrame Fundamentals
- DataFrame API and Data Source API
- Transformations and Actions
- Reading and Processing CSV, JSON, and XML Files
- Understanding Spark UI
- Implement basic transformations using PySpark.
- Process CSV and JSON files and visualize the results in Spark UI.
Module 4
Advanced Spark Techniques
- Catalyst Optimizer, UDFs, and `DataFrame.explain`
- Directed Acyclic Graphs (DAGs) and Adaptive Query Execution
- Handling Complex JSON and Struct Data Types
- Optimizations: Predicate Pushdown, Projection Pushdown, and Cache/Persist
- Handling Data Skew and Salting
- Optimize a Spark job using partitioning and caching techniques.
- Explore complex transformations and actions in Spark.
Module 5
AWS Glue Essentials
- AWS Glue Architecture and Applications
- AWS Glue Job Scripts and Properties
- AWS Glue Data Catalog and Databases
- AWS Glue Connection and Secret Manager
- Developing Glue Jobs with Advance Network Configuration
- Create an AWS Glue job to process and transform data.
- Integrate AWS Glue with Amazon RDS MySQL and Athena.
Module 6
Streaming and Real-Time Data Processing
- Kafka Architecture and Spark Structured Streaming
- Basic Micro-batch and Background Queries
- Supported Sources and Sinks
- Writing Streams and Managing Checkpoints
- Build a streaming application with Spark Structured Streaming.
- Implement a basic Kafka-Spark pipeline with checkpointing.
Module 7
Lakehouse and Medallion Architecture
- Configuring Delta Lake for Lakehouse Architecture
- Delta Format and Transaction Logs
- Schema Evolution and Medallion Architecture Layers: Bronze, Silver, Gold
- Transformations for Silver and Gold Layers
- Set up a Delta Lake and ingest data into the Bronze layer.
- Apply transformations to create Silver and Gold layers.
Module 8
AWS Lambda and Final Project
- Building Lambda Functions for Event-Driven Architecture
- Lambda with Amazon S3, DynamoDB, and SNS
- Develop a Lambda function triggered by S3 events.
- Implement a final project integrating AWS Glue, Lambda, and Delta Lake.