Basic Questions
What is Apache Kafka?
- Answer: Kafka is a distributed streaming platform used for building real-time data pipelines and streaming applications. It can handle large volumes of data efficiently and is used for publish/subscribe messaging.
What are the main components of Kafka?
- Answer:
- Producer: Sends data to Kafka topics.
- Consumer: Reads data from Kafka topics.
- Broker: A Kafka server that stores data.
- Topic: A category or stream name where data is published.
- Partition: Sub-divisions of topics for parallel processing.
- ZooKeeper (deprecated in newer versions): Manages Kafka metadata.
- Answer:
How does Kafka achieve fault tolerance?
- Answer: Kafka achieves fault tolerance through replication. Each partition can have multiple replicas, and one of them acts as the leader, while the others act as followers. If the leader fails, one of the followers is promoted as the new leader.
What is a Kafka topic?
- Answer: A topic is a logical channel where messages are written by producers and read by consumers.
What is a partition in Kafka, and why is it important?
- Answer: A partition is a sub-division of a Kafka topic that allows data to be distributed across multiple brokers for parallelism and scalability.
Intermediate Questions
How does Kafka ensure message durability?
- Answer: Kafka writes messages to disk and replicates them across multiple brokers. The
acksconfiguration determines how producers ensure the durability of messages.
- Answer: Kafka writes messages to disk and replicates them across multiple brokers. The
What is the role of ZooKeeper in Kafka?
- Answer: ZooKeeper manages the metadata for Kafka, such as broker information, topic configurations, and leader elections. In newer versions, Kafka has replaced ZooKeeper with the Kafka Raft Protocol (KRaft) for metadata management.
What are consumer groups in Kafka?
- Answer: A consumer group is a group of consumers working together to consume messages from a topic. Each partition is assigned to only one consumer within a group, ensuring that messages are processed in parallel but not duplicated.
What is the difference between
log.retention.msandlog.retention.bytes?- Answer:
log.retention.ms: Configures the maximum age of a log segment before deletion.log.retention.bytes: Configures the maximum size of a log segment before deletion.
- Answer:
How does Kafka handle message ordering?
- Answer: Kafka maintains the order of messages within a partition. If ordering across partitions is needed, it requires additional coordination.
Advanced Questions
How does Kafka handle backpressure?
- Answer: Kafka does not provide built-in backpressure handling. Instead, consumers control their consumption rate by polling messages at their pace. Producers may face limitations if brokers cannot keep up due to disk or network constraints.
What are ISR (In-Sync Replicas) in Kafka?
- Answer: ISR is the set of replicas that are fully synchronized with the leader replica. These replicas ensure data consistency and durability.
What happens when a Kafka producer sends a message to a topic with no leader for the partition?
- Answer: The producer will retry based on its retry configuration. If no leader is elected within the retry window, the producer throws an error.
Explain Kafka’s exactly-once semantics.
- Answer: Kafka ensures exactly-once processing using idempotent producers and transactional messaging, which guarantees that messages are not duplicated during retries or failures.
What is the difference between Kafka Streams and Kafka Connect?
- Answer:
- Kafka Streams: A library for processing and transforming data in Kafka topics.
- Kafka Connect: A tool for integrating Kafka with external systems like databases, files, etc.
- Answer:
How is Kafka different from traditional message brokers like RabbitMQ or ActiveMQ?
- Answer: Kafka is designed for distributed systems, providing high throughput and durability. Unlike traditional brokers, Kafka focuses on large-scale, event-driven architectures and uses distributed logs as its foundation.
What is the role of partition keys in Kafka?
- Answer: Partition keys determine the partition to which a message is sent, ensuring all messages with the same key go to the same partition, enabling message ordering for those keys.
What is the difference between Kafka’s
acks=0,acks=1, andacks=all?- Answer:
acks=0: Producer does not wait for any acknowledgment.acks=1: Producer waits for the leader's acknowledgment.acks=all: Producer waits for all ISR replicas to acknowledge.
- Answer:
What are the key metrics to monitor in Kafka?
- Answer:
- Consumer lag.
- Partition under-replicated count.
- Broker disk usage.
- Request latency.
- Throughput (producer and consumer).
- Answer:
How do you handle large messages in Kafka?
- Answer:
- Increase
message.max.bytesandfetch.message.max.bytes. - Use external storage (e.g., S3) and store references to the messages in Kafka.
- Increase
- Answer:
Let me know if you want detailed explanations or example scenarios for any of these questions!