Cassandra sequential repair does not repair all nodes on one run?
Day before yesterday, I had issued a full sequential repair on one of my nodes in a 5 node Cassandra cluster for a single table using the below command.
nodetool repair -full -seq -tr <keyspace> <table> > <logfile>
Now the node on which the command was issued was repaired properly as can be infered from the below command
nodetool cfstats -H <keyspace.columnFamily>
The same, however, cannot be said about other nodes as for them I get a random value of repair %, significantly lesser.
I am not sure what is happening over here, looks like the only node that was repaired for the keyspace and column family was the node on which the repair command was issued. Any guesses on what might be going on here, or how to properly investigate into the issue
Thanks !
cassandra datastax scylla
add a comment |
Day before yesterday, I had issued a full sequential repair on one of my nodes in a 5 node Cassandra cluster for a single table using the below command.
nodetool repair -full -seq -tr <keyspace> <table> > <logfile>
Now the node on which the command was issued was repaired properly as can be infered from the below command
nodetool cfstats -H <keyspace.columnFamily>
The same, however, cannot be said about other nodes as for them I get a random value of repair %, significantly lesser.
I am not sure what is happening over here, looks like the only node that was repaired for the keyspace and column family was the node on which the repair command was issued. Any guesses on what might be going on here, or how to properly investigate into the issue
Thanks !
cassandra datastax scylla
add a comment |
Day before yesterday, I had issued a full sequential repair on one of my nodes in a 5 node Cassandra cluster for a single table using the below command.
nodetool repair -full -seq -tr <keyspace> <table> > <logfile>
Now the node on which the command was issued was repaired properly as can be infered from the below command
nodetool cfstats -H <keyspace.columnFamily>
The same, however, cannot be said about other nodes as for them I get a random value of repair %, significantly lesser.
I am not sure what is happening over here, looks like the only node that was repaired for the keyspace and column family was the node on which the repair command was issued. Any guesses on what might be going on here, or how to properly investigate into the issue
Thanks !
cassandra datastax scylla
Day before yesterday, I had issued a full sequential repair on one of my nodes in a 5 node Cassandra cluster for a single table using the below command.
nodetool repair -full -seq -tr <keyspace> <table> > <logfile>
Now the node on which the command was issued was repaired properly as can be infered from the below command
nodetool cfstats -H <keyspace.columnFamily>
The same, however, cannot be said about other nodes as for them I get a random value of repair %, significantly lesser.
I am not sure what is happening over here, looks like the only node that was repaired for the keyspace and column family was the node on which the repair command was issued. Any guesses on what might be going on here, or how to properly investigate into the issue
Thanks !
cassandra datastax scylla
cassandra datastax scylla
edited Jan 2 at 22:12
Nadav Har'El
1,342413
1,342413
asked Jan 2 at 12:56
Naman GuptaNaman Gupta
213
213
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
You said your cluster has 5 nodes, but not which replication factor (RF) you are using for your table - I'll assume you used the common RF=3. When RF=3, each piece of data is replicated 3 times across the five nodes.
The key point you have missed is that in such a setup each specific node does not contain all the data. How much of the total data does it contain? Let's do some simple math: if the amount of actual data inserted into the table is X, then the total amount of data stored by the cluster is 3*X (since RF=3, there are three copies of each piece of data). This total is spread across 5 nodes, so each node will hold (3*X)/5, i.e., 3/5*X.
When you start a repair on one specific node, it repairs only the data which this node has, i.e., as we just calculated, 3/5 of the total data. What this repair does is for each piece of data held by this node, it compares this data against the copies held by other replicas, repairs the inconsistencies and repairs all these copies. This means when the repair is over, in the node which we repaired all its data was repaired. But for other nodes, not all their data was repaired - just the parts which intersected with the node who initiated this repair. This intersection should be roughly 3/5*3/5 or 36% of the data (of course everything is distributed randomly, so you're likely to get a number close to 36% but not exactly 36%).
So as you probably realized by now, this means that "nodetool repair" is not a cluster-wide operation. If you start it on one node, it is only guaranteed to repair all the data on one node, and may repair less on other nodes. So you must run the repair on each one of the nodes, separately.
Now you may be asking: Since repairing node 1 also repaired 36% of node 2, wouldn't it be a waste to also repair node 2, since we already did 36% of the work? Indeed, it is a waste. So Cassandra has a repair option "-pr" ("primary range") which ensures that only one of the 3 replicas for each piece data will repair it. With RF=3, "nodetool repair -pr" will be three times faster than without "-pr"; You still need to run it separately on each of the nodes, and when all nodes finish your data will be 100% repaired on all nodes.
All of this is fairly inconvenient and it's also hard to recover from transient failures during a long repair. This is why the both commercial Cassandra offerings - from Datastax and ScyllaDB - offer a separate repair tool which is more convenient than "nodetool repair", making sure that the entire cluster is repaired in the most efficient way possible, and recovering from transient problems without redoing the lengthy repair process from the beginning.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54006795%2fcassandra-sequential-repair-does-not-repair-all-nodes-on-one-run%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You said your cluster has 5 nodes, but not which replication factor (RF) you are using for your table - I'll assume you used the common RF=3. When RF=3, each piece of data is replicated 3 times across the five nodes.
The key point you have missed is that in such a setup each specific node does not contain all the data. How much of the total data does it contain? Let's do some simple math: if the amount of actual data inserted into the table is X, then the total amount of data stored by the cluster is 3*X (since RF=3, there are three copies of each piece of data). This total is spread across 5 nodes, so each node will hold (3*X)/5, i.e., 3/5*X.
When you start a repair on one specific node, it repairs only the data which this node has, i.e., as we just calculated, 3/5 of the total data. What this repair does is for each piece of data held by this node, it compares this data against the copies held by other replicas, repairs the inconsistencies and repairs all these copies. This means when the repair is over, in the node which we repaired all its data was repaired. But for other nodes, not all their data was repaired - just the parts which intersected with the node who initiated this repair. This intersection should be roughly 3/5*3/5 or 36% of the data (of course everything is distributed randomly, so you're likely to get a number close to 36% but not exactly 36%).
So as you probably realized by now, this means that "nodetool repair" is not a cluster-wide operation. If you start it on one node, it is only guaranteed to repair all the data on one node, and may repair less on other nodes. So you must run the repair on each one of the nodes, separately.
Now you may be asking: Since repairing node 1 also repaired 36% of node 2, wouldn't it be a waste to also repair node 2, since we already did 36% of the work? Indeed, it is a waste. So Cassandra has a repair option "-pr" ("primary range") which ensures that only one of the 3 replicas for each piece data will repair it. With RF=3, "nodetool repair -pr" will be three times faster than without "-pr"; You still need to run it separately on each of the nodes, and when all nodes finish your data will be 100% repaired on all nodes.
All of this is fairly inconvenient and it's also hard to recover from transient failures during a long repair. This is why the both commercial Cassandra offerings - from Datastax and ScyllaDB - offer a separate repair tool which is more convenient than "nodetool repair", making sure that the entire cluster is repaired in the most efficient way possible, and recovering from transient problems without redoing the lengthy repair process from the beginning.
add a comment |
You said your cluster has 5 nodes, but not which replication factor (RF) you are using for your table - I'll assume you used the common RF=3. When RF=3, each piece of data is replicated 3 times across the five nodes.
The key point you have missed is that in such a setup each specific node does not contain all the data. How much of the total data does it contain? Let's do some simple math: if the amount of actual data inserted into the table is X, then the total amount of data stored by the cluster is 3*X (since RF=3, there are three copies of each piece of data). This total is spread across 5 nodes, so each node will hold (3*X)/5, i.e., 3/5*X.
When you start a repair on one specific node, it repairs only the data which this node has, i.e., as we just calculated, 3/5 of the total data. What this repair does is for each piece of data held by this node, it compares this data against the copies held by other replicas, repairs the inconsistencies and repairs all these copies. This means when the repair is over, in the node which we repaired all its data was repaired. But for other nodes, not all their data was repaired - just the parts which intersected with the node who initiated this repair. This intersection should be roughly 3/5*3/5 or 36% of the data (of course everything is distributed randomly, so you're likely to get a number close to 36% but not exactly 36%).
So as you probably realized by now, this means that "nodetool repair" is not a cluster-wide operation. If you start it on one node, it is only guaranteed to repair all the data on one node, and may repair less on other nodes. So you must run the repair on each one of the nodes, separately.
Now you may be asking: Since repairing node 1 also repaired 36% of node 2, wouldn't it be a waste to also repair node 2, since we already did 36% of the work? Indeed, it is a waste. So Cassandra has a repair option "-pr" ("primary range") which ensures that only one of the 3 replicas for each piece data will repair it. With RF=3, "nodetool repair -pr" will be three times faster than without "-pr"; You still need to run it separately on each of the nodes, and when all nodes finish your data will be 100% repaired on all nodes.
All of this is fairly inconvenient and it's also hard to recover from transient failures during a long repair. This is why the both commercial Cassandra offerings - from Datastax and ScyllaDB - offer a separate repair tool which is more convenient than "nodetool repair", making sure that the entire cluster is repaired in the most efficient way possible, and recovering from transient problems without redoing the lengthy repair process from the beginning.
add a comment |
You said your cluster has 5 nodes, but not which replication factor (RF) you are using for your table - I'll assume you used the common RF=3. When RF=3, each piece of data is replicated 3 times across the five nodes.
The key point you have missed is that in such a setup each specific node does not contain all the data. How much of the total data does it contain? Let's do some simple math: if the amount of actual data inserted into the table is X, then the total amount of data stored by the cluster is 3*X (since RF=3, there are three copies of each piece of data). This total is spread across 5 nodes, so each node will hold (3*X)/5, i.e., 3/5*X.
When you start a repair on one specific node, it repairs only the data which this node has, i.e., as we just calculated, 3/5 of the total data. What this repair does is for each piece of data held by this node, it compares this data against the copies held by other replicas, repairs the inconsistencies and repairs all these copies. This means when the repair is over, in the node which we repaired all its data was repaired. But for other nodes, not all their data was repaired - just the parts which intersected with the node who initiated this repair. This intersection should be roughly 3/5*3/5 or 36% of the data (of course everything is distributed randomly, so you're likely to get a number close to 36% but not exactly 36%).
So as you probably realized by now, this means that "nodetool repair" is not a cluster-wide operation. If you start it on one node, it is only guaranteed to repair all the data on one node, and may repair less on other nodes. So you must run the repair on each one of the nodes, separately.
Now you may be asking: Since repairing node 1 also repaired 36% of node 2, wouldn't it be a waste to also repair node 2, since we already did 36% of the work? Indeed, it is a waste. So Cassandra has a repair option "-pr" ("primary range") which ensures that only one of the 3 replicas for each piece data will repair it. With RF=3, "nodetool repair -pr" will be three times faster than without "-pr"; You still need to run it separately on each of the nodes, and when all nodes finish your data will be 100% repaired on all nodes.
All of this is fairly inconvenient and it's also hard to recover from transient failures during a long repair. This is why the both commercial Cassandra offerings - from Datastax and ScyllaDB - offer a separate repair tool which is more convenient than "nodetool repair", making sure that the entire cluster is repaired in the most efficient way possible, and recovering from transient problems without redoing the lengthy repair process from the beginning.
You said your cluster has 5 nodes, but not which replication factor (RF) you are using for your table - I'll assume you used the common RF=3. When RF=3, each piece of data is replicated 3 times across the five nodes.
The key point you have missed is that in such a setup each specific node does not contain all the data. How much of the total data does it contain? Let's do some simple math: if the amount of actual data inserted into the table is X, then the total amount of data stored by the cluster is 3*X (since RF=3, there are three copies of each piece of data). This total is spread across 5 nodes, so each node will hold (3*X)/5, i.e., 3/5*X.
When you start a repair on one specific node, it repairs only the data which this node has, i.e., as we just calculated, 3/5 of the total data. What this repair does is for each piece of data held by this node, it compares this data against the copies held by other replicas, repairs the inconsistencies and repairs all these copies. This means when the repair is over, in the node which we repaired all its data was repaired. But for other nodes, not all their data was repaired - just the parts which intersected with the node who initiated this repair. This intersection should be roughly 3/5*3/5 or 36% of the data (of course everything is distributed randomly, so you're likely to get a number close to 36% but not exactly 36%).
So as you probably realized by now, this means that "nodetool repair" is not a cluster-wide operation. If you start it on one node, it is only guaranteed to repair all the data on one node, and may repair less on other nodes. So you must run the repair on each one of the nodes, separately.
Now you may be asking: Since repairing node 1 also repaired 36% of node 2, wouldn't it be a waste to also repair node 2, since we already did 36% of the work? Indeed, it is a waste. So Cassandra has a repair option "-pr" ("primary range") which ensures that only one of the 3 replicas for each piece data will repair it. With RF=3, "nodetool repair -pr" will be three times faster than without "-pr"; You still need to run it separately on each of the nodes, and when all nodes finish your data will be 100% repaired on all nodes.
All of this is fairly inconvenient and it's also hard to recover from transient failures during a long repair. This is why the both commercial Cassandra offerings - from Datastax and ScyllaDB - offer a separate repair tool which is more convenient than "nodetool repair", making sure that the entire cluster is repaired in the most efficient way possible, and recovering from transient problems without redoing the lengthy repair process from the beginning.
answered Jan 2 at 16:36
Nadav Har'ElNadav Har'El
1,342413
1,342413
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54006795%2fcassandra-sequential-repair-does-not-repair-all-nodes-on-one-run%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown