Google Associate Data Practitioner
BigQuery, Dataflow, and the data lifecycle on Google Cloud
5
Modules
150
Practice Questions
5
Field Missions
ACDP
Google Cloud
$49one-time purchase
Pay once. Own forever. No subscription.
Secure payment via Stripe. Own this course forever.
Exam Details
Free Preview — Module 1
Module 1 — Data Preparation and Ingestion
Data is only as good as the pipeline that delivers it. Master the lifecycle from raw ingestion to validated production data using BigQuery, Dataflow, Pub/Sub, and Cloud Storage.
Strategic Foundations of Data Preparation
The Data Preparation Lifecycle: Discovery, Cleaning, Enrichment, and Validation — why 'garbage in, garbage out' is amplified in cloud-scale ML environments.
Exploring and Profiling Data with BigQuery
Schema-on-Read profiling, descriptive statistics (COUNT/MIN/MAX/AVG/DISTINCT), distribution analysis, and identifying data quality issues before pipeline design.
Designing Scalable Ingestion Architectures
Batch (BigQuery Load Jobs + Cloud Storage) vs. Streaming (Pub/Sub + Dataflow) — choosing based on the Three Vs: Volume, Velocity, and Variety.
Ensuring Data Quality through Cleansing and Validation
Schema enforcement, Dead-Letter Queues for invalid records, deduplication with DISTINCT/ROW_NUMBER(), and data normalization for consistent analytics.
Sample Practice Questions
Question 1
A marketing firm receives thousands of ad-click events per second and needs to visualize this data in a real-time dashboard with a maximum 30-second delay. Which ingestion pattern should be implemented?
Only Pub/Sub + Dataflow supports real-time streaming within a 30-second window. Pub/Sub handles high-velocity event ingestion; Dataflow processes and streams data into BigQuery continuously. All other options are batch processes incompatible with real-time requirements.
Question 2
During data profiling, you notice the customer_email column contains 'N/A' instead of valid email addresses or NULL. Converting these to standard NULL values is an example of:
Data Cleansing is the specific task of correcting or standardizing inaccurate, corrupted, or improperly formatted data. Discovery finds the problem; Cleansing fixes it. Enrichment adds new data; Normalization reorganizes structure to reduce redundancy.
Question 3
You are designing batch ingestion for 10,000 historical PDF manuals. You want scalable, metadata-taggable storage without managing a file server. Which Google Cloud service should you use?
Cloud Storage is object storage for unstructured data like PDFs, scaling virtually without limit. Cloud SQL is for relational data; Bigtable handles high-throughput analytical data; Firestore is a document database — none are designed for raw binary file storage at this scale.
Full course includes 5 modules, 150 practice questions, and 5 field missions.
Need team licenses?
One purchase covers your entire crew. No per-seat fees.