When to create new Consumer in ConsumerGroup
I am newbie in Kafka world and was reading about Consumer and ConsumerGroup.I got the difference between them and understand why we need ConsumerGroup in Kafka.
But here my question is When we should decide when to create new Consumer within same Group.
When we have huge amount of data?
Could someone help me to understand any real use case.
Thanks
apache-kafka
add a comment |
I am newbie in Kafka world and was reading about Consumer and ConsumerGroup.I got the difference between them and understand why we need ConsumerGroup in Kafka.
But here my question is When we should decide when to create new Consumer within same Group.
When we have huge amount of data?
Could someone help me to understand any real use case.
Thanks
apache-kafka
add a comment |
I am newbie in Kafka world and was reading about Consumer and ConsumerGroup.I got the difference between them and understand why we need ConsumerGroup in Kafka.
But here my question is When we should decide when to create new Consumer within same Group.
When we have huge amount of data?
Could someone help me to understand any real use case.
Thanks
apache-kafka
I am newbie in Kafka world and was reading about Consumer and ConsumerGroup.I got the difference between them and understand why we need ConsumerGroup in Kafka.
But here my question is When we should decide when to create new Consumer within same Group.
When we have huge amount of data?
Could someone help me to understand any real use case.
Thanks
apache-kafka
apache-kafka
asked Nov 21 '18 at 7:26


Still LearningStill Learning
5403827
5403827
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
I think some very good points have already been mentioned and here are my few cents. As your primary question seems to be "When" to add a consumer in a group...
There are 2 scenarios I could think of:
If one or more consumers in a Consumer group are overloaded by consumption from multiple partitions and you intend to distribute that load and increase parallelism. In this case, you could add consumers and trigger a rebalance.
If the partitions in a topic are increasing. This is quite a tricky scenario and may disturb the existing consumers in some ways. Following are a few examples of when this might happen:
a) If the semantics of your data are changing as partitioning a topic
based on the semantics is quite a common use case
b) If the data volume is increasing and the semantics are also changing
c) If only the volume is increasing that is leading to Scenario 1
However, as you've pointed out in your question - if only the volume is increasing and the consumers in a group are nicely mapped to the partitions on a 1-to-1 basis then you may be better off leaving things as they are. Otherwise, you might end up in the Scenario 2b.
Hope this helps!
add a comment |
In Apache Kafka, the level of parallelism is defined by the number of partitions. The higher the number of partitions, the higher the level of parallelism one can achieve. Depending on the volume of data, you should set the number of partitions to the desired value. Note that you can not have more active consumers than number of partitions.
For example, assume that you have a topic test
with 5 partitions and a consumer group test-group
. At any given time, only 5 consumers can be active withing test-group
. Say we've got 1000 messages in topic test
, then each of the 5 active consumers will consume (approximately) 200 messages. In case you run more than 5 partitions, the remaining will be inactive meaning that they won't consumer any messages at all. Similarly, if you have less consumers than partitions, then some of your active consumers will consumer messages from more than one partition.
Another -less straight-forward- example would be the following (taken from):
In this scenario, we do have two topics (A
and B
), each of which has 3 partitions. Two consumers belonging to the same consumer group are consuming messages from both topics.
add a comment |
As mentioned above, Kafka scales the topic consumption by distributing partitions among a consumer group. A consumer group is nothing, but a set of consumers sharing the common identifier.
A consumer is responsible to consumer messages from one or more partitions. If there is a single consumer running in the consumer group, it will consume data from all partitions. If there are multiple consumers running with in same group, they distribute the load in consumes from different-different partitions.
Maximum number of consumers are equal to the maximum number of partitions. If the consumers number exceeds than number of partitions, excessive consumers will be idle.
Let's say if there is a topic with 4 partitions. There are two consumer groups A and B. Group A has two consumers C1,C2. Both consumers will consume from approx 2 and 2 partitions.
While in Consumer Group B, there are 4 consumers, each consumer will consume from one partition.
When to use single consumer or multiple consumer : It depends on the use case. If you want a consolidated output from the processing where the calculations are based on the entire data in the topic, you should use single consumer unless you have a post processing logic to merge the output from each consumer.
If you are just reading the data and want to parallelize the process by distributing load, use multiple consumers
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53407112%2fwhen-to-create-new-consumer-in-consumergroup%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
I think some very good points have already been mentioned and here are my few cents. As your primary question seems to be "When" to add a consumer in a group...
There are 2 scenarios I could think of:
If one or more consumers in a Consumer group are overloaded by consumption from multiple partitions and you intend to distribute that load and increase parallelism. In this case, you could add consumers and trigger a rebalance.
If the partitions in a topic are increasing. This is quite a tricky scenario and may disturb the existing consumers in some ways. Following are a few examples of when this might happen:
a) If the semantics of your data are changing as partitioning a topic
based on the semantics is quite a common use case
b) If the data volume is increasing and the semantics are also changing
c) If only the volume is increasing that is leading to Scenario 1
However, as you've pointed out in your question - if only the volume is increasing and the consumers in a group are nicely mapped to the partitions on a 1-to-1 basis then you may be better off leaving things as they are. Otherwise, you might end up in the Scenario 2b.
Hope this helps!
add a comment |
I think some very good points have already been mentioned and here are my few cents. As your primary question seems to be "When" to add a consumer in a group...
There are 2 scenarios I could think of:
If one or more consumers in a Consumer group are overloaded by consumption from multiple partitions and you intend to distribute that load and increase parallelism. In this case, you could add consumers and trigger a rebalance.
If the partitions in a topic are increasing. This is quite a tricky scenario and may disturb the existing consumers in some ways. Following are a few examples of when this might happen:
a) If the semantics of your data are changing as partitioning a topic
based on the semantics is quite a common use case
b) If the data volume is increasing and the semantics are also changing
c) If only the volume is increasing that is leading to Scenario 1
However, as you've pointed out in your question - if only the volume is increasing and the consumers in a group are nicely mapped to the partitions on a 1-to-1 basis then you may be better off leaving things as they are. Otherwise, you might end up in the Scenario 2b.
Hope this helps!
add a comment |
I think some very good points have already been mentioned and here are my few cents. As your primary question seems to be "When" to add a consumer in a group...
There are 2 scenarios I could think of:
If one or more consumers in a Consumer group are overloaded by consumption from multiple partitions and you intend to distribute that load and increase parallelism. In this case, you could add consumers and trigger a rebalance.
If the partitions in a topic are increasing. This is quite a tricky scenario and may disturb the existing consumers in some ways. Following are a few examples of when this might happen:
a) If the semantics of your data are changing as partitioning a topic
based on the semantics is quite a common use case
b) If the data volume is increasing and the semantics are also changing
c) If only the volume is increasing that is leading to Scenario 1
However, as you've pointed out in your question - if only the volume is increasing and the consumers in a group are nicely mapped to the partitions on a 1-to-1 basis then you may be better off leaving things as they are. Otherwise, you might end up in the Scenario 2b.
Hope this helps!
I think some very good points have already been mentioned and here are my few cents. As your primary question seems to be "When" to add a consumer in a group...
There are 2 scenarios I could think of:
If one or more consumers in a Consumer group are overloaded by consumption from multiple partitions and you intend to distribute that load and increase parallelism. In this case, you could add consumers and trigger a rebalance.
If the partitions in a topic are increasing. This is quite a tricky scenario and may disturb the existing consumers in some ways. Following are a few examples of when this might happen:
a) If the semantics of your data are changing as partitioning a topic
based on the semantics is quite a common use case
b) If the data volume is increasing and the semantics are also changing
c) If only the volume is increasing that is leading to Scenario 1
However, as you've pointed out in your question - if only the volume is increasing and the consumers in a group are nicely mapped to the partitions on a 1-to-1 basis then you may be better off leaving things as they are. Otherwise, you might end up in the Scenario 2b.
Hope this helps!
answered Nov 21 '18 at 12:32


LalitLalit
406210
406210
add a comment |
add a comment |
In Apache Kafka, the level of parallelism is defined by the number of partitions. The higher the number of partitions, the higher the level of parallelism one can achieve. Depending on the volume of data, you should set the number of partitions to the desired value. Note that you can not have more active consumers than number of partitions.
For example, assume that you have a topic test
with 5 partitions and a consumer group test-group
. At any given time, only 5 consumers can be active withing test-group
. Say we've got 1000 messages in topic test
, then each of the 5 active consumers will consume (approximately) 200 messages. In case you run more than 5 partitions, the remaining will be inactive meaning that they won't consumer any messages at all. Similarly, if you have less consumers than partitions, then some of your active consumers will consumer messages from more than one partition.
Another -less straight-forward- example would be the following (taken from):
In this scenario, we do have two topics (A
and B
), each of which has 3 partitions. Two consumers belonging to the same consumer group are consuming messages from both topics.
add a comment |
In Apache Kafka, the level of parallelism is defined by the number of partitions. The higher the number of partitions, the higher the level of parallelism one can achieve. Depending on the volume of data, you should set the number of partitions to the desired value. Note that you can not have more active consumers than number of partitions.
For example, assume that you have a topic test
with 5 partitions and a consumer group test-group
. At any given time, only 5 consumers can be active withing test-group
. Say we've got 1000 messages in topic test
, then each of the 5 active consumers will consume (approximately) 200 messages. In case you run more than 5 partitions, the remaining will be inactive meaning that they won't consumer any messages at all. Similarly, if you have less consumers than partitions, then some of your active consumers will consumer messages from more than one partition.
Another -less straight-forward- example would be the following (taken from):
In this scenario, we do have two topics (A
and B
), each of which has 3 partitions. Two consumers belonging to the same consumer group are consuming messages from both topics.
add a comment |
In Apache Kafka, the level of parallelism is defined by the number of partitions. The higher the number of partitions, the higher the level of parallelism one can achieve. Depending on the volume of data, you should set the number of partitions to the desired value. Note that you can not have more active consumers than number of partitions.
For example, assume that you have a topic test
with 5 partitions and a consumer group test-group
. At any given time, only 5 consumers can be active withing test-group
. Say we've got 1000 messages in topic test
, then each of the 5 active consumers will consume (approximately) 200 messages. In case you run more than 5 partitions, the remaining will be inactive meaning that they won't consumer any messages at all. Similarly, if you have less consumers than partitions, then some of your active consumers will consumer messages from more than one partition.
Another -less straight-forward- example would be the following (taken from):
In this scenario, we do have two topics (A
and B
), each of which has 3 partitions. Two consumers belonging to the same consumer group are consuming messages from both topics.
In Apache Kafka, the level of parallelism is defined by the number of partitions. The higher the number of partitions, the higher the level of parallelism one can achieve. Depending on the volume of data, you should set the number of partitions to the desired value. Note that you can not have more active consumers than number of partitions.
For example, assume that you have a topic test
with 5 partitions and a consumer group test-group
. At any given time, only 5 consumers can be active withing test-group
. Say we've got 1000 messages in topic test
, then each of the 5 active consumers will consume (approximately) 200 messages. In case you run more than 5 partitions, the remaining will be inactive meaning that they won't consumer any messages at all. Similarly, if you have less consumers than partitions, then some of your active consumers will consumer messages from more than one partition.
Another -less straight-forward- example would be the following (taken from):
In this scenario, we do have two topics (A
and B
), each of which has 3 partitions. Two consumers belonging to the same consumer group are consuming messages from both topics.
edited Nov 21 '18 at 7:54
answered Nov 21 '18 at 7:46


Giorgos MyrianthousGiorgos Myrianthous
4,16021437
4,16021437
add a comment |
add a comment |
As mentioned above, Kafka scales the topic consumption by distributing partitions among a consumer group. A consumer group is nothing, but a set of consumers sharing the common identifier.
A consumer is responsible to consumer messages from one or more partitions. If there is a single consumer running in the consumer group, it will consume data from all partitions. If there are multiple consumers running with in same group, they distribute the load in consumes from different-different partitions.
Maximum number of consumers are equal to the maximum number of partitions. If the consumers number exceeds than number of partitions, excessive consumers will be idle.
Let's say if there is a topic with 4 partitions. There are two consumer groups A and B. Group A has two consumers C1,C2. Both consumers will consume from approx 2 and 2 partitions.
While in Consumer Group B, there are 4 consumers, each consumer will consume from one partition.
When to use single consumer or multiple consumer : It depends on the use case. If you want a consolidated output from the processing where the calculations are based on the entire data in the topic, you should use single consumer unless you have a post processing logic to merge the output from each consumer.
If you are just reading the data and want to parallelize the process by distributing load, use multiple consumers
add a comment |
As mentioned above, Kafka scales the topic consumption by distributing partitions among a consumer group. A consumer group is nothing, but a set of consumers sharing the common identifier.
A consumer is responsible to consumer messages from one or more partitions. If there is a single consumer running in the consumer group, it will consume data from all partitions. If there are multiple consumers running with in same group, they distribute the load in consumes from different-different partitions.
Maximum number of consumers are equal to the maximum number of partitions. If the consumers number exceeds than number of partitions, excessive consumers will be idle.
Let's say if there is a topic with 4 partitions. There are two consumer groups A and B. Group A has two consumers C1,C2. Both consumers will consume from approx 2 and 2 partitions.
While in Consumer Group B, there are 4 consumers, each consumer will consume from one partition.
When to use single consumer or multiple consumer : It depends on the use case. If you want a consolidated output from the processing where the calculations are based on the entire data in the topic, you should use single consumer unless you have a post processing logic to merge the output from each consumer.
If you are just reading the data and want to parallelize the process by distributing load, use multiple consumers
add a comment |
As mentioned above, Kafka scales the topic consumption by distributing partitions among a consumer group. A consumer group is nothing, but a set of consumers sharing the common identifier.
A consumer is responsible to consumer messages from one or more partitions. If there is a single consumer running in the consumer group, it will consume data from all partitions. If there are multiple consumers running with in same group, they distribute the load in consumes from different-different partitions.
Maximum number of consumers are equal to the maximum number of partitions. If the consumers number exceeds than number of partitions, excessive consumers will be idle.
Let's say if there is a topic with 4 partitions. There are two consumer groups A and B. Group A has two consumers C1,C2. Both consumers will consume from approx 2 and 2 partitions.
While in Consumer Group B, there are 4 consumers, each consumer will consume from one partition.
When to use single consumer or multiple consumer : It depends on the use case. If you want a consolidated output from the processing where the calculations are based on the entire data in the topic, you should use single consumer unless you have a post processing logic to merge the output from each consumer.
If you are just reading the data and want to parallelize the process by distributing load, use multiple consumers
As mentioned above, Kafka scales the topic consumption by distributing partitions among a consumer group. A consumer group is nothing, but a set of consumers sharing the common identifier.
A consumer is responsible to consumer messages from one or more partitions. If there is a single consumer running in the consumer group, it will consume data from all partitions. If there are multiple consumers running with in same group, they distribute the load in consumes from different-different partitions.
Maximum number of consumers are equal to the maximum number of partitions. If the consumers number exceeds than number of partitions, excessive consumers will be idle.
Let's say if there is a topic with 4 partitions. There are two consumer groups A and B. Group A has two consumers C1,C2. Both consumers will consume from approx 2 and 2 partitions.
While in Consumer Group B, there are 4 consumers, each consumer will consume from one partition.
When to use single consumer or multiple consumer : It depends on the use case. If you want a consolidated output from the processing where the calculations are based on the entire data in the topic, you should use single consumer unless you have a post processing logic to merge the output from each consumer.
If you are just reading the data and want to parallelize the process by distributing load, use multiple consumers
answered Nov 21 '18 at 9:05
Nishu TayalNishu Tayal
12.2k73481
12.2k73481
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53407112%2fwhen-to-create-new-consumer-in-consumergroup%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown