A. Core Data Engineering Concepts SQL (joins, window functions, performance tuning) Data Modeling (star vs snowflake, normalization) ETL/ELT pipelines (batch vs streaming, orchestration tools like Airflow) B. Apache Spark / PySpark Catalyst Optimizer & Tungsten Narrow vs Wide transformations Joins (broadcast, sort-merge), Skew handling AQE (Adaptive Query Execution) Partitioning, Predicate Pushdown Execution Plan (DAG → Stage → Tasks) Spark UI and Job Debugging SCD Type 2 Implementation in PySpark C. AWS S3, Glue, Athena, Lambda, EMR, Redshift Event-driven design (S3 → EventBridge → Lambda) Security: IAM roles, bucket policies, encryption CI/CD in AWS (CodePipeline, CloudFormation) D. Python Writing modular, reusable code Working with Pandas, Boto3 (for AWS interaction) Exception handling, logging Lambda functions and decorators E. Kafka / Streaming Kafka topic partitioning, consumer groups Offset management Integration with Spark Structured Streaming
Senior Data Engineer Interview Questions
2,569 senior data engineer interview questions shared by candidates
Pyspark memory optimization, different types of keys in SQL
About glue, lambda, some questions on python About chat gpt About tuple and list in python About Dynamo db
Remove Duplicates, BigQuery partitions, 3rd highest salary, airflow working, my previous project
Explain GCP services and Data flow pipelines
Tell us about how you solved a problem you've experienced before. Try to describe it in detail
Consider a employee dataframe with columns – empID, department, salary. Get minimum, maximum, average salary and employee count for each department. Write it in a single statement.
Received an online assessment test and later code review happens on the same.
Why are long process of release offer letter after interview is done.
1. Job Master, Dispatcher and resource manager flink 2. What is sqoop 3. Schema evolution handling
Viewing 1631 - 1640 interview questions