Reviewed for technical accuracy May 12, 2022
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Reference Architecture
Scalable Data
Lake
AWS Cloud
Modern Data Analytics Reference Architecture on AWS
This architecture enables customers to build data analytics pipelines using a Modern Data Analytics approach to
derive insights from the data.
8
9
7
6
5
4
3
2
1
Amazon Kinesis
AWS IoT Core
SaaS
Applications
SQL/NoSQL DBs
File Shares
Devices
Logs
Social
Media
AWS DataSync
Amazon Managed Streaming
for Apache Kafka
Amazon AppFlow
Data Ingestion
Amazon Simple
Storage Service
AWS Database Migration
Service
Amazon QuickSight
AWS Lake
Formation
Purpose-built Analytics & Insights
Unified
Governance
Data Sources
1
2
3
AWS Data Exchange
4
5
9
12
10
AWS Glue
Seamless Data
Movement
AWS Glue
DataBrew
6
Amazon EMR
Amazon OpenSearch
Amazon Redshift
Amazon Kinesis Data Analytics
Amazon SageMaker
Amazon Redshift Spectrum
Amazon Athena
AWS AI Services
7
8
11
12
13
10
11
13
Data is collected from multiple data sources
across the enterprise, SaaS applications, edge
devices, logs, streaming media, and social
networks.
Based on the type of the data source, AWS
Database Migration Service, AWS
DataSync, Amazon Kinesis, Amazon
Managed Streaming for Apache Kafka,
AWS IoT Core, and Amazon AppFlow
are used to ingest the data into a Data Lake
in AWS.
AWS Data Exchange is used for integrating
third-party data into the Data Lake.
AWS Lake Formation is used to build the
scalable data lake, and Amazon S3 is used as
the data lake storage.
AWS Lake Formation is also used to enable
unified governance to centrally manage the
security, access control, and audit trails.
AWS Glue and AWS Glue DataBrew are used to
catalog, transform, enrich, move, and replicate
data across multiple data stores and the data
lake.
Amazon Kinesis Data Analytics is used to
transform and analyze streaming data in real
time.
Amazon QuickSight provides machine
learning-powered business intelligence.
Amazon OpenSearch can be used operational
analytics.
Amazon Redshift is used as a Cloud Data
Warehouse.
Amazon EMR provides the cloud big data
platform for processing vast amounts of data
using open source tools.
Amazon SageMaker and AWS AI services
can be used to build, train and deploy
machine learning models, and add
intelligence to your applications.
Amazon Redshift Spectrum and Amazon
Athena enable interactive querying,
analyzing, and processing capabilities.
This version of the reference architecture diagram has
been archived. For the latest version, see
https://docs.aws.amazon.com/architecture-
diagrams/
latest/modern-data-analytics-on-
aws/modern-data-analytics-on-aws.html