VisionToSOP
Cloud ComputingIntermediateNew

Google Associate Data Practitioner

BigQuery, Dataflow, and the data lifecycle on Google Cloud

5

Modules

150

Practice Questions

5

Field Missions

ACDP

Google Cloud

$49one-time purchase

Pay once. Own forever. No subscription.

Secure payment via Stripe. Own this course forever.

Exam Details

Exam CodeACDP
Exam BodyGoogle Cloud
Exam Fee$200
DifficultyIntermediate

Free Preview — Module 1

Module 1 — Data Preparation and Ingestion

Data is only as good as the pipeline that delivers it. Master the lifecycle from raw ingestion to validated production data using BigQuery, Dataflow, Pub/Sub, and Cloud Storage.

1.

Strategic Foundations of Data Preparation

The Data Preparation Lifecycle: Discovery, Cleaning, Enrichment, and Validation — why 'garbage in, garbage out' is amplified in cloud-scale ML environments.

2.

Exploring and Profiling Data with BigQuery

Schema-on-Read profiling, descriptive statistics (COUNT/MIN/MAX/AVG/DISTINCT), distribution analysis, and identifying data quality issues before pipeline design.

3.

Designing Scalable Ingestion Architectures

Batch (BigQuery Load Jobs + Cloud Storage) vs. Streaming (Pub/Sub + Dataflow) — choosing based on the Three Vs: Volume, Velocity, and Variety.

4.

Ensuring Data Quality through Cleansing and Validation

Schema enforcement, Dead-Letter Queues for invalid records, deduplication with DISTINCT/ROW_NUMBER(), and data normalization for consistent analytics.

Sample Practice Questions

Question 1

A marketing firm receives thousands of ad-click events per second and needs to visualize this data in a real-time dashboard with a maximum 30-second delay. Which ingestion pattern should be implemented?

a.Use a Cron job to run a BigQuery load job every hour from Cloud Storage
b.Buffer the events in Cloud Pub/Sub and use Cloud Dataflow to stream them into BigQuery
c.Use gsutil cp to move logs to a bucket and create an external table in BigQuery
d.Manually export the data from source apps as CSV files and upload to BigQuery

Only Pub/Sub + Dataflow supports real-time streaming within a 30-second window. Pub/Sub handles high-velocity event ingestion; Dataflow processes and streams data into BigQuery continuously. All other options are batch processes incompatible with real-time requirements.

Question 2

During data profiling, you notice the customer_email column contains 'N/A' instead of valid email addresses or NULL. Converting these to standard NULL values is an example of:

a.Data Enrichment
b.Data Normalization
c.Data Cleansing
d.Data Discovery

Data Cleansing is the specific task of correcting or standardizing inaccurate, corrupted, or improperly formatted data. Discovery finds the problem; Cleansing fixes it. Enrichment adds new data; Normalization reorganizes structure to reduce redundancy.

Question 3

You are designing batch ingestion for 10,000 historical PDF manuals. You want scalable, metadata-taggable storage without managing a file server. Which Google Cloud service should you use?

a.Cloud Storage
b.Cloud SQL
c.Cloud Bigtable
d.Firestore

Cloud Storage is object storage for unstructured data like PDFs, scaling virtually without limit. Cloud SQL is for relational data; Bigtable handles high-throughput analytical data; Firestore is a document database — none are designed for raw binary file storage at this scale.

Full course includes 5 modules, 150 practice questions, and 5 field missions.

Need team licenses?

One purchase covers your entire crew. No per-seat fees.

View Team Pricing