Real interview territory, not generic definitions. The AI draws on these areas and pushes into the sub-topics where you are weakest.
These are the kinds of scenario-based questions Joshua asks. In a live session they adapt to your answers and your target role.
A Spark job is slow because of skew on one key. How do you detect and fix it?
Explain Slowly Changing Dimensions and when you would use Type 2 over Type 1.
Design an idempotent daily pipeline that can safely re-run after a partial failure.
Compare exactly-once and at-least-once delivery in a Kafka pipeline. What does exactly-once really cost?
When would you choose Iceberg or Delta Lake over plain Parquet on object storage?
Yes. Spark batch ETL, Airflow orchestration and Kafka streaming are all in scope, along with data modeling and SQL.
The concepts are cloud-agnostic. Pair it with the AWS or GCP track if your target role is platform-specific.
Yes. Expect window functions, optimization and skew questions, plus modeling and pipeline design.
Start a free Data Engineering mock interview now. Get scored live and see the ideal answer to every question.
Start Data Engineering Mock Interview