Learn Data on Cloud

Contact +91 8807923886

Modules covered in

The Ultimate Guide for AWS Data Engineering Course

Download Your Roadmap Guide Now/ Full Curriculum/ Syllabus

Join WhatsApp Group to Stay Updated About Course and our upcoming webinars

Demo video for reference

Module 1

Foundations of Data Engineering

What is Data Engineering?
Responsibilities and Skillsets of a Data Engineer
A Day in the Life of a Data Engineer
Tools of the Data Engineer: Databases, Processing Tasks, Scheduling Tasks
Cloud Providers Overview
Distributed Computing Overview

Module 2

Core Database Concepts

SQL vs NoSQL
OLAP vs OLTP
Datawarehouse vs Data Lake
Datawarehouse schemas: Snowflake, Star, Galaxy

Module 3

Apache Spark Fundamentals

Spark Architecture: RDD, DataFrame Fundamentals
DataFrame API and Data Source API
Transformations and Actions
Reading and Processing CSV, JSON, and XML Files
Understanding Spark UI
Implement basic transformations using PySpark.
Process CSV and JSON files and visualize the results in Spark UI.

Module 4

Advanced Spark Techniques

Catalyst Optimizer, UDFs, and `DataFrame.explain`
Directed Acyclic Graphs (DAGs) and Adaptive Query Execution
Handling Complex JSON and Struct Data Types
Optimizations: Predicate Pushdown, Projection Pushdown, and Cache/Persist
Handling Data Skew and Salting
Optimize a Spark job using partitioning and caching techniques.
Explore complex transformations and actions in Spark.

Module 5

AWS Glue Essentials

AWS Glue Architecture and Applications
AWS Glue Job Scripts and Properties
AWS Glue Data Catalog and Databases
AWS Glue Connection and Secret Manager
Developing Glue Jobs with Advance Network Configuration
Create an AWS Glue job to process and transform data.
Integrate AWS Glue with Amazon RDS MySQL and Athena.

Module 6

Streaming and Real-Time Data Processing

Kafka Architecture and Spark Structured Streaming
Basic Micro-batch and Background Queries
Supported Sources and Sinks
Writing Streams and Managing Checkpoints
Build a streaming application with Spark Structured Streaming.
Implement a basic Kafka-Spark pipeline with checkpointing.

Module 7

Lakehouse and Medallion Architecture

Configuring Delta Lake for Lakehouse Architecture
Delta Format and Transaction Logs
Schema Evolution and Medallion Architecture Layers: Bronze, Silver, Gold
Transformations for Silver and Gold Layers
Set up a Delta Lake and ingest data into the Bronze layer.
Apply transformations to create Silver and Gold layers.

Module 8

AWS Lambda and Final Project

Building Lambda Functions for Event-Driven Architecture
Lambda with Amazon S3, DynamoDB, and SNS
Develop a Lambda function triggered by S3 events.
Implement a final project integrating AWS Glue, Lambda, and Delta Lake.