Cassandra repair cause timeout in node












2















We are using Cassandra (3.10-1) cluster with 5 nodes - each with 8 cores and 23Gi memory, all in the same DC.
Replication factor - 2
Consistency level - 2



Lately during scheduled repair which is being done about once a week, there are many timeouts on queries from one node - always the same node in different repairs.
From looking at it logs:



debug.log shows that there are several errors when building the merkel tree for repair.



DEBUG [MemtablePostFlush:7354] 2018-12-30 23:52:08,314 ColumnFamilyStore.java:954 - forceFlush requested but everything is clean in user_device_validation_configuration
ERROR [ValidationExecutor:973] 2018-12-30 23:52:08,315 Validator.java:268 - Failed creating a merkle tree for [repair #b3887f60-0c8d-11e9-b894-754001cf0917 on keyspace1/user_device_validation_configuration, [(-8096630393642361664,-8073407512196669022]]], /10.52.5.42 (see log for details)
DEBUG [AntiEntropyStage:1] 2018-12-30 23:52:08,318 RepairMessageVerbHandler.java:114 - Validating ValidationRequest{gcBefore=1545349928} org.apache.cassandra.repair.messages.ValidationRequest@5c1c2b28
DEBUG [ValidationExecutor:973] 2018-12-30 23:52:08,319 StorageService.java:3313 - Forcing flush on keyspace keyspace1, CF raw_sensors
DEBUG [MemtablePostFlush:7354] 2018-12-30 23:52:08,319 ColumnFamilyStore.java:954 - forceFlush requested but everything is clean in raw_sensors
ERROR [ValidationExecutor:973] 2018-12-30 23:52:08,319 Validator.java:268 - Failed creating a merkle tree for [repair #b3887f60-0c8d-11e9-b894-754001cf0917 on keyspace1/raw_sensors, [(-8096630393642361664,-8073407512196669022]]], /10.52.5.42 (see log for details)
DEBUG [AntiEntropyStage:1] 2018-12-30 23:52:08,321 RepairMessageVerbHandler.java:114 - Validating ValidationRequest{gcBefore=1545349928} org.apache.cassandra.repair.messages.ValidationRequest@5c1c2b28
DEBUG [AntiEntropyStage:1] 2018-12-30 23:52:08,321 RepairMessageVerbHandler.java:142 - Got anticompaction request AnticompactionRequest{parentRepairSession=b387e320-0c8d-11e9-b894-754001cf0917} org.apache.cassandra.repair.messages.AnticompactionRequest@d4b7ed7b
ERROR [AntiEntropyStage:1] 2018-12-30 23:52:08,322 RepairMessageVerbHandler.java:168 - Got error, removing parent repair session
ERROR [AntiEntropyStage:1] 2018-12-30 23:52:08,322 CassandraDaemon.java:229 - Exception in thread Thread[AntiEntropyStage:1,5,main]
java.lang.RuntimeException: java.lang.RuntimeException: Parent repair session with id = b387e320-0c8d-11e9-b894-754001cf0917 has failed.
at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:171) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) ~[apache-cassandra-3.10.jar:3.10]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_131]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_131]
at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) [apache-cassandra-3.10.jar:3.10]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_131]
Caused by: java.lang.RuntimeException: Parent repair session with id = b387e320-0c8d-11e9-b894-754001cf0917 has failed.
at org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:400) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.ActiveRepairService.doAntiCompaction(ActiveRepairService.java:435) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:143) ~[apache-cassandra-3.10.jar:3.10]
... 7 common frames omitted
DEBUG [ValidationExecutor:973] 2018-12-30 23:52:08,323 StorageService.java:3313 - Forcing flush on keyspace keyspace1, CF mouse_interactions
DEBUG [MemtablePostFlush:7354] 2018-12-30 23:52:08,323 ColumnFamilyStore.java:954 - forceFlush requested but everything is clean in mouse_interactions
ERROR [ValidationExecutor:973] 2018-12-30 23:52:08,327 Validator.java:268 - Failed creating a merkle tree for [repair #b3887f60-0c8d-11e9-b894-754001cf0917 on keyspace1/mouse_interactions, [(-8096630393642361664,-8073407512196669022]]], /10.52.5.42 (see log for details)
DEBUG [GossipStage:1] 2018-12-30 23:52:10,643 FailureDetector.java:457 - Ignoring interval time of 2000164738 for /10.52.3.47
DEBUG [GossipStage:1] 2018-12-30 23:52:13,643 FailureDetector.java:457 - Ignoring interval time of 2000239040 for /10.52.3.47
DEBUG [ReadRepairStage:407] 2018-12-30 23:52:15,133 ReadCallback.java:242 - Digest mismatch:
org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey(7486012912397474412, 000467657474000020376337363933643363613837616231643531633936396564616234636363613400) (a0e45fcd73255bcd93a63b15d41e0843 vs 7dff1a87a755cf62150befc8352f59e8)
at org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:233) ~[apache-cassandra-3.10.jar:3.10]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_131]
at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) [apache-cassandra-3.10.jar:3.10]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_131]
DEBUG [GossipStage:1] 2018-12-30 23:52:26,639 FailureDetector.java:457 - Ignoring interval time of 2000385682 for /10.52.3.47


After few hours at GC logs, I noticed that GC is being called roughly every 20 seconds, and stop for more than 10 seconds each call:



2018-12-31T06:24:57.450+0000: 1184437.292: Total time for which application threads were stopped: 18.0318658 seconds, Stopping threads took: 0.0005000 seconds
2018-12-31T06:24:57.483+0000: 1184437.325: Total time for which application threads were stopped: 0.0053815 seconds, Stopping threads took: 0.0007325 seconds
2018-12-31T06:24:57.565+0000: 1184437.407: Total time for which application threads were stopped: 0.0118127 seconds, Stopping threads took: 0.0057652 seconds
2018-12-31T06:24:57.604+0000: 1184437.446: Total time for which application threads were stopped: 0.0064909 seconds, Stopping threads took: 0.0023037 seconds
2018-12-31T06:24:57.701+0000: 1184437.543: Total time for which application threads were stopped: 0.0066540 seconds, Stopping threads took: 0.0031299 seconds
{Heap before GC invocations=1377084 (full 108682):
par new generation total 943744K, used 943711K [0x00000005c0000000, 0x0000000600000000, 0x0000000600000000)
eden space 838912K, 100% used [0x00000005c0000000, 0x00000005f3340000, 0x00000005f3340000)
from space 104832K, 99% used [0x00000005f99a0000, 0x00000005ffff7ce0, 0x0000000600000000)
to space 104832K, 0% used [0x00000005f3340000, 0x00000005f3340000, 0x00000005f99a0000)
concurrent mark-sweep generation total 7340032K, used 7340032K [0x0000000600000000, 0x00000007c0000000, 0x00000007c0000000)
Metaspace used 71629K, capacity 73844K, committed 75000K, reserved 1116160K
class space used 8521K, capacity 8909K, committed 9176K, reserved 1048576K
2018-12-31T06:24:58.029+0000: 1184437.870: [Full GC (Allocation Failure) 2018-12-31T06:24:58.029+0000: 1184437.871: [CMS: 7340032K->7340031K(7340032K), 15.2051822 secs] 8283743K->7443230K(8283776K), [Metaspace: 71629K->71629K(1116160K)], 15.2055451 secs] [Times: user=13.94 sys=1.28, real=15.20 secs]
Heap after GC invocations=1377085 (full 108683):
par new generation total 943744K, used 103198K [0x00000005c0000000, 0x0000000600000000, 0x0000000600000000)
eden space 838912K, 12% used [0x00000005c0000000, 0x00000005c64c7950, 0x00000005f3340000)
from space 104832K, 0% used [0x00000005f99a0000, 0x00000005f99a0000, 0x0000000600000000)
to space 104832K, 0% used [0x00000005f3340000, 0x00000005f3340000, 0x00000005f99a0000)
concurrent mark-sweep generation total 7340032K, used 7340031K [0x0000000600000000, 0x00000007c0000000, 0x00000007c0000000)
Metaspace used 71629K, capacity 73844K, committed 75000K, reserved 1116160K
class space used 8521K, capacity 8909K, committed 9176K, reserved 1048576K
}
2018-12-31T06:25:13.235+0000: 1184453.076: Total time for which application threads were stopped: 15.2103050 seconds, Stopping threads took: 0.0004204 seconds
2018-12-31T06:25:13.243+0000: 1184453.085: Total time for which application threads were stopped: 0.0047592 seconds, Stopping threads took: 0.0008416 seconds
2018-12-31T06:25:13.272+0000: 1184453.114: Total time for which application threads were stopped: 0.0085538 seconds, Stopping threads took: 0.0046376 seconds
2018-12-31T06:25:13.298+0000: 1184453.140: [GC (CMS Initial Mark) [1 CMS-initial-mark: 7340031K(7340032K)] 7536074K(8283776K), 0.0650538 secs] [Times: user=0.12 sys=0.01, real=0.06 secs]
2018-12-31T06:25:13.364+0000: 1184453.206: Total time for which application threads were stopped: 0.0728487 seconds, Stopping threads took: 0.0039520 seconds
2018-12-31T06:25:13.364+0000: 1184453.206: [CMS-concurrent-mark-start]
{Heap before GC invocations=1377085 (full 108684):
par new generation total 943744K, used 943215K [0x00000005c0000000, 0x0000000600000000, 0x0000000600000000)
eden space 838912K, 100% used [0x00000005c0000000, 0x00000005f3340000, 0x00000005f3340000)
from space 104832K, 99% used [0x00000005f99a0000, 0x00000005fff7bd98, 0x0000000600000000)
to space 104832K, 0% used [0x00000005f3340000, 0x00000005f3340000, 0x00000005f99a0000)
concurrent mark-sweep generation total 7340032K, used 7340031K [0x0000000600000000, 0x00000007c0000000, 0x00000007c0000000)
Metaspace used 71631K, capacity 73844K, committed 75000K, reserved 1116160K
class space used 8521K, capacity 8909K, committed 9176K, reserved 1048576K
2018-12-31T06:25:13.821+0000: 1184453.662: [Full GC (Allocation Failure) 2018-12-31T06:25:13.821+0000: 1184453.663: [CMS2018-12-31T06:25:16.960+0000: 1184456.802: [CMS-concurrent-mark: 3.592/3.596 secs] [Times: user=12.47 sys=0.38, real=3.60 secs]


So i've start checking the data spread in cluster - we are using num_of_tokens - 32
data seem to be balanced ~ 40% in each node.



UN  10.52.2.94  672.64 GiB  32           ?       ad3d1365-bbb7-4229-b586-40667ec22b41  rack1
UN 10.52.3.47 640.91 GiB 32 ? cdba952b-9685-4769-aaf4-22e538a5c37f rack1
UN 10.52.1.57 719.34 GiB 32 ? 13bb7573-eb30-489b-80c4-6e5a7c8e5f5e rack1
UN 10.52.5.42 743.04 GiB 32 ? c9e892c6-9281-4a49-b4c4-a147a387c3de rack1
UN 10.52.4.43 691.1 GiB 32 ? 53e3724e-f5a9-46b1-b330-7bb542f15f90 rack1


So after checking the logs in other node I cant find any reason for those timeout in that specific node.
Any thoughts or ideas as for what cause this to happen on the same node again and again ?










share|improve this question





























    2















    We are using Cassandra (3.10-1) cluster with 5 nodes - each with 8 cores and 23Gi memory, all in the same DC.
    Replication factor - 2
    Consistency level - 2



    Lately during scheduled repair which is being done about once a week, there are many timeouts on queries from one node - always the same node in different repairs.
    From looking at it logs:



    debug.log shows that there are several errors when building the merkel tree for repair.



    DEBUG [MemtablePostFlush:7354] 2018-12-30 23:52:08,314 ColumnFamilyStore.java:954 - forceFlush requested but everything is clean in user_device_validation_configuration
    ERROR [ValidationExecutor:973] 2018-12-30 23:52:08,315 Validator.java:268 - Failed creating a merkle tree for [repair #b3887f60-0c8d-11e9-b894-754001cf0917 on keyspace1/user_device_validation_configuration, [(-8096630393642361664,-8073407512196669022]]], /10.52.5.42 (see log for details)
    DEBUG [AntiEntropyStage:1] 2018-12-30 23:52:08,318 RepairMessageVerbHandler.java:114 - Validating ValidationRequest{gcBefore=1545349928} org.apache.cassandra.repair.messages.ValidationRequest@5c1c2b28
    DEBUG [ValidationExecutor:973] 2018-12-30 23:52:08,319 StorageService.java:3313 - Forcing flush on keyspace keyspace1, CF raw_sensors
    DEBUG [MemtablePostFlush:7354] 2018-12-30 23:52:08,319 ColumnFamilyStore.java:954 - forceFlush requested but everything is clean in raw_sensors
    ERROR [ValidationExecutor:973] 2018-12-30 23:52:08,319 Validator.java:268 - Failed creating a merkle tree for [repair #b3887f60-0c8d-11e9-b894-754001cf0917 on keyspace1/raw_sensors, [(-8096630393642361664,-8073407512196669022]]], /10.52.5.42 (see log for details)
    DEBUG [AntiEntropyStage:1] 2018-12-30 23:52:08,321 RepairMessageVerbHandler.java:114 - Validating ValidationRequest{gcBefore=1545349928} org.apache.cassandra.repair.messages.ValidationRequest@5c1c2b28
    DEBUG [AntiEntropyStage:1] 2018-12-30 23:52:08,321 RepairMessageVerbHandler.java:142 - Got anticompaction request AnticompactionRequest{parentRepairSession=b387e320-0c8d-11e9-b894-754001cf0917} org.apache.cassandra.repair.messages.AnticompactionRequest@d4b7ed7b
    ERROR [AntiEntropyStage:1] 2018-12-30 23:52:08,322 RepairMessageVerbHandler.java:168 - Got error, removing parent repair session
    ERROR [AntiEntropyStage:1] 2018-12-30 23:52:08,322 CassandraDaemon.java:229 - Exception in thread Thread[AntiEntropyStage:1,5,main]
    java.lang.RuntimeException: java.lang.RuntimeException: Parent repair session with id = b387e320-0c8d-11e9-b894-754001cf0917 has failed.
    at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:171) ~[apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) ~[apache-cassandra-3.10.jar:3.10]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_131]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_131]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_131]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_131]
    at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) [apache-cassandra-3.10.jar:3.10]
    at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_131]
    Caused by: java.lang.RuntimeException: Parent repair session with id = b387e320-0c8d-11e9-b894-754001cf0917 has failed.
    at org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:400) ~[apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.service.ActiveRepairService.doAntiCompaction(ActiveRepairService.java:435) ~[apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:143) ~[apache-cassandra-3.10.jar:3.10]
    ... 7 common frames omitted
    DEBUG [ValidationExecutor:973] 2018-12-30 23:52:08,323 StorageService.java:3313 - Forcing flush on keyspace keyspace1, CF mouse_interactions
    DEBUG [MemtablePostFlush:7354] 2018-12-30 23:52:08,323 ColumnFamilyStore.java:954 - forceFlush requested but everything is clean in mouse_interactions
    ERROR [ValidationExecutor:973] 2018-12-30 23:52:08,327 Validator.java:268 - Failed creating a merkle tree for [repair #b3887f60-0c8d-11e9-b894-754001cf0917 on keyspace1/mouse_interactions, [(-8096630393642361664,-8073407512196669022]]], /10.52.5.42 (see log for details)
    DEBUG [GossipStage:1] 2018-12-30 23:52:10,643 FailureDetector.java:457 - Ignoring interval time of 2000164738 for /10.52.3.47
    DEBUG [GossipStage:1] 2018-12-30 23:52:13,643 FailureDetector.java:457 - Ignoring interval time of 2000239040 for /10.52.3.47
    DEBUG [ReadRepairStage:407] 2018-12-30 23:52:15,133 ReadCallback.java:242 - Digest mismatch:
    org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey(7486012912397474412, 000467657474000020376337363933643363613837616231643531633936396564616234636363613400) (a0e45fcd73255bcd93a63b15d41e0843 vs 7dff1a87a755cf62150befc8352f59e8)
    at org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92) ~[apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:233) ~[apache-cassandra-3.10.jar:3.10]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_131]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_131]
    at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) [apache-cassandra-3.10.jar:3.10]
    at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_131]
    DEBUG [GossipStage:1] 2018-12-30 23:52:26,639 FailureDetector.java:457 - Ignoring interval time of 2000385682 for /10.52.3.47


    After few hours at GC logs, I noticed that GC is being called roughly every 20 seconds, and stop for more than 10 seconds each call:



    2018-12-31T06:24:57.450+0000: 1184437.292: Total time for which application threads were stopped: 18.0318658 seconds, Stopping threads took: 0.0005000 seconds
    2018-12-31T06:24:57.483+0000: 1184437.325: Total time for which application threads were stopped: 0.0053815 seconds, Stopping threads took: 0.0007325 seconds
    2018-12-31T06:24:57.565+0000: 1184437.407: Total time for which application threads were stopped: 0.0118127 seconds, Stopping threads took: 0.0057652 seconds
    2018-12-31T06:24:57.604+0000: 1184437.446: Total time for which application threads were stopped: 0.0064909 seconds, Stopping threads took: 0.0023037 seconds
    2018-12-31T06:24:57.701+0000: 1184437.543: Total time for which application threads were stopped: 0.0066540 seconds, Stopping threads took: 0.0031299 seconds
    {Heap before GC invocations=1377084 (full 108682):
    par new generation total 943744K, used 943711K [0x00000005c0000000, 0x0000000600000000, 0x0000000600000000)
    eden space 838912K, 100% used [0x00000005c0000000, 0x00000005f3340000, 0x00000005f3340000)
    from space 104832K, 99% used [0x00000005f99a0000, 0x00000005ffff7ce0, 0x0000000600000000)
    to space 104832K, 0% used [0x00000005f3340000, 0x00000005f3340000, 0x00000005f99a0000)
    concurrent mark-sweep generation total 7340032K, used 7340032K [0x0000000600000000, 0x00000007c0000000, 0x00000007c0000000)
    Metaspace used 71629K, capacity 73844K, committed 75000K, reserved 1116160K
    class space used 8521K, capacity 8909K, committed 9176K, reserved 1048576K
    2018-12-31T06:24:58.029+0000: 1184437.870: [Full GC (Allocation Failure) 2018-12-31T06:24:58.029+0000: 1184437.871: [CMS: 7340032K->7340031K(7340032K), 15.2051822 secs] 8283743K->7443230K(8283776K), [Metaspace: 71629K->71629K(1116160K)], 15.2055451 secs] [Times: user=13.94 sys=1.28, real=15.20 secs]
    Heap after GC invocations=1377085 (full 108683):
    par new generation total 943744K, used 103198K [0x00000005c0000000, 0x0000000600000000, 0x0000000600000000)
    eden space 838912K, 12% used [0x00000005c0000000, 0x00000005c64c7950, 0x00000005f3340000)
    from space 104832K, 0% used [0x00000005f99a0000, 0x00000005f99a0000, 0x0000000600000000)
    to space 104832K, 0% used [0x00000005f3340000, 0x00000005f3340000, 0x00000005f99a0000)
    concurrent mark-sweep generation total 7340032K, used 7340031K [0x0000000600000000, 0x00000007c0000000, 0x00000007c0000000)
    Metaspace used 71629K, capacity 73844K, committed 75000K, reserved 1116160K
    class space used 8521K, capacity 8909K, committed 9176K, reserved 1048576K
    }
    2018-12-31T06:25:13.235+0000: 1184453.076: Total time for which application threads were stopped: 15.2103050 seconds, Stopping threads took: 0.0004204 seconds
    2018-12-31T06:25:13.243+0000: 1184453.085: Total time for which application threads were stopped: 0.0047592 seconds, Stopping threads took: 0.0008416 seconds
    2018-12-31T06:25:13.272+0000: 1184453.114: Total time for which application threads were stopped: 0.0085538 seconds, Stopping threads took: 0.0046376 seconds
    2018-12-31T06:25:13.298+0000: 1184453.140: [GC (CMS Initial Mark) [1 CMS-initial-mark: 7340031K(7340032K)] 7536074K(8283776K), 0.0650538 secs] [Times: user=0.12 sys=0.01, real=0.06 secs]
    2018-12-31T06:25:13.364+0000: 1184453.206: Total time for which application threads were stopped: 0.0728487 seconds, Stopping threads took: 0.0039520 seconds
    2018-12-31T06:25:13.364+0000: 1184453.206: [CMS-concurrent-mark-start]
    {Heap before GC invocations=1377085 (full 108684):
    par new generation total 943744K, used 943215K [0x00000005c0000000, 0x0000000600000000, 0x0000000600000000)
    eden space 838912K, 100% used [0x00000005c0000000, 0x00000005f3340000, 0x00000005f3340000)
    from space 104832K, 99% used [0x00000005f99a0000, 0x00000005fff7bd98, 0x0000000600000000)
    to space 104832K, 0% used [0x00000005f3340000, 0x00000005f3340000, 0x00000005f99a0000)
    concurrent mark-sweep generation total 7340032K, used 7340031K [0x0000000600000000, 0x00000007c0000000, 0x00000007c0000000)
    Metaspace used 71631K, capacity 73844K, committed 75000K, reserved 1116160K
    class space used 8521K, capacity 8909K, committed 9176K, reserved 1048576K
    2018-12-31T06:25:13.821+0000: 1184453.662: [Full GC (Allocation Failure) 2018-12-31T06:25:13.821+0000: 1184453.663: [CMS2018-12-31T06:25:16.960+0000: 1184456.802: [CMS-concurrent-mark: 3.592/3.596 secs] [Times: user=12.47 sys=0.38, real=3.60 secs]


    So i've start checking the data spread in cluster - we are using num_of_tokens - 32
    data seem to be balanced ~ 40% in each node.



    UN  10.52.2.94  672.64 GiB  32           ?       ad3d1365-bbb7-4229-b586-40667ec22b41  rack1
    UN 10.52.3.47 640.91 GiB 32 ? cdba952b-9685-4769-aaf4-22e538a5c37f rack1
    UN 10.52.1.57 719.34 GiB 32 ? 13bb7573-eb30-489b-80c4-6e5a7c8e5f5e rack1
    UN 10.52.5.42 743.04 GiB 32 ? c9e892c6-9281-4a49-b4c4-a147a387c3de rack1
    UN 10.52.4.43 691.1 GiB 32 ? 53e3724e-f5a9-46b1-b330-7bb542f15f90 rack1


    So after checking the logs in other node I cant find any reason for those timeout in that specific node.
    Any thoughts or ideas as for what cause this to happen on the same node again and again ?










    share|improve this question



























      2












      2








      2


      1






      We are using Cassandra (3.10-1) cluster with 5 nodes - each with 8 cores and 23Gi memory, all in the same DC.
      Replication factor - 2
      Consistency level - 2



      Lately during scheduled repair which is being done about once a week, there are many timeouts on queries from one node - always the same node in different repairs.
      From looking at it logs:



      debug.log shows that there are several errors when building the merkel tree for repair.



      DEBUG [MemtablePostFlush:7354] 2018-12-30 23:52:08,314 ColumnFamilyStore.java:954 - forceFlush requested but everything is clean in user_device_validation_configuration
      ERROR [ValidationExecutor:973] 2018-12-30 23:52:08,315 Validator.java:268 - Failed creating a merkle tree for [repair #b3887f60-0c8d-11e9-b894-754001cf0917 on keyspace1/user_device_validation_configuration, [(-8096630393642361664,-8073407512196669022]]], /10.52.5.42 (see log for details)
      DEBUG [AntiEntropyStage:1] 2018-12-30 23:52:08,318 RepairMessageVerbHandler.java:114 - Validating ValidationRequest{gcBefore=1545349928} org.apache.cassandra.repair.messages.ValidationRequest@5c1c2b28
      DEBUG [ValidationExecutor:973] 2018-12-30 23:52:08,319 StorageService.java:3313 - Forcing flush on keyspace keyspace1, CF raw_sensors
      DEBUG [MemtablePostFlush:7354] 2018-12-30 23:52:08,319 ColumnFamilyStore.java:954 - forceFlush requested but everything is clean in raw_sensors
      ERROR [ValidationExecutor:973] 2018-12-30 23:52:08,319 Validator.java:268 - Failed creating a merkle tree for [repair #b3887f60-0c8d-11e9-b894-754001cf0917 on keyspace1/raw_sensors, [(-8096630393642361664,-8073407512196669022]]], /10.52.5.42 (see log for details)
      DEBUG [AntiEntropyStage:1] 2018-12-30 23:52:08,321 RepairMessageVerbHandler.java:114 - Validating ValidationRequest{gcBefore=1545349928} org.apache.cassandra.repair.messages.ValidationRequest@5c1c2b28
      DEBUG [AntiEntropyStage:1] 2018-12-30 23:52:08,321 RepairMessageVerbHandler.java:142 - Got anticompaction request AnticompactionRequest{parentRepairSession=b387e320-0c8d-11e9-b894-754001cf0917} org.apache.cassandra.repair.messages.AnticompactionRequest@d4b7ed7b
      ERROR [AntiEntropyStage:1] 2018-12-30 23:52:08,322 RepairMessageVerbHandler.java:168 - Got error, removing parent repair session
      ERROR [AntiEntropyStage:1] 2018-12-30 23:52:08,322 CassandraDaemon.java:229 - Exception in thread Thread[AntiEntropyStage:1,5,main]
      java.lang.RuntimeException: java.lang.RuntimeException: Parent repair session with id = b387e320-0c8d-11e9-b894-754001cf0917 has failed.
      at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:171) ~[apache-cassandra-3.10.jar:3.10]
      at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) ~[apache-cassandra-3.10.jar:3.10]
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_131]
      at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_131]
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_131]
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_131]
      at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) [apache-cassandra-3.10.jar:3.10]
      at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_131]
      Caused by: java.lang.RuntimeException: Parent repair session with id = b387e320-0c8d-11e9-b894-754001cf0917 has failed.
      at org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:400) ~[apache-cassandra-3.10.jar:3.10]
      at org.apache.cassandra.service.ActiveRepairService.doAntiCompaction(ActiveRepairService.java:435) ~[apache-cassandra-3.10.jar:3.10]
      at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:143) ~[apache-cassandra-3.10.jar:3.10]
      ... 7 common frames omitted
      DEBUG [ValidationExecutor:973] 2018-12-30 23:52:08,323 StorageService.java:3313 - Forcing flush on keyspace keyspace1, CF mouse_interactions
      DEBUG [MemtablePostFlush:7354] 2018-12-30 23:52:08,323 ColumnFamilyStore.java:954 - forceFlush requested but everything is clean in mouse_interactions
      ERROR [ValidationExecutor:973] 2018-12-30 23:52:08,327 Validator.java:268 - Failed creating a merkle tree for [repair #b3887f60-0c8d-11e9-b894-754001cf0917 on keyspace1/mouse_interactions, [(-8096630393642361664,-8073407512196669022]]], /10.52.5.42 (see log for details)
      DEBUG [GossipStage:1] 2018-12-30 23:52:10,643 FailureDetector.java:457 - Ignoring interval time of 2000164738 for /10.52.3.47
      DEBUG [GossipStage:1] 2018-12-30 23:52:13,643 FailureDetector.java:457 - Ignoring interval time of 2000239040 for /10.52.3.47
      DEBUG [ReadRepairStage:407] 2018-12-30 23:52:15,133 ReadCallback.java:242 - Digest mismatch:
      org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey(7486012912397474412, 000467657474000020376337363933643363613837616231643531633936396564616234636363613400) (a0e45fcd73255bcd93a63b15d41e0843 vs 7dff1a87a755cf62150befc8352f59e8)
      at org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92) ~[apache-cassandra-3.10.jar:3.10]
      at org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:233) ~[apache-cassandra-3.10.jar:3.10]
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_131]
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_131]
      at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) [apache-cassandra-3.10.jar:3.10]
      at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_131]
      DEBUG [GossipStage:1] 2018-12-30 23:52:26,639 FailureDetector.java:457 - Ignoring interval time of 2000385682 for /10.52.3.47


      After few hours at GC logs, I noticed that GC is being called roughly every 20 seconds, and stop for more than 10 seconds each call:



      2018-12-31T06:24:57.450+0000: 1184437.292: Total time for which application threads were stopped: 18.0318658 seconds, Stopping threads took: 0.0005000 seconds
      2018-12-31T06:24:57.483+0000: 1184437.325: Total time for which application threads were stopped: 0.0053815 seconds, Stopping threads took: 0.0007325 seconds
      2018-12-31T06:24:57.565+0000: 1184437.407: Total time for which application threads were stopped: 0.0118127 seconds, Stopping threads took: 0.0057652 seconds
      2018-12-31T06:24:57.604+0000: 1184437.446: Total time for which application threads were stopped: 0.0064909 seconds, Stopping threads took: 0.0023037 seconds
      2018-12-31T06:24:57.701+0000: 1184437.543: Total time for which application threads were stopped: 0.0066540 seconds, Stopping threads took: 0.0031299 seconds
      {Heap before GC invocations=1377084 (full 108682):
      par new generation total 943744K, used 943711K [0x00000005c0000000, 0x0000000600000000, 0x0000000600000000)
      eden space 838912K, 100% used [0x00000005c0000000, 0x00000005f3340000, 0x00000005f3340000)
      from space 104832K, 99% used [0x00000005f99a0000, 0x00000005ffff7ce0, 0x0000000600000000)
      to space 104832K, 0% used [0x00000005f3340000, 0x00000005f3340000, 0x00000005f99a0000)
      concurrent mark-sweep generation total 7340032K, used 7340032K [0x0000000600000000, 0x00000007c0000000, 0x00000007c0000000)
      Metaspace used 71629K, capacity 73844K, committed 75000K, reserved 1116160K
      class space used 8521K, capacity 8909K, committed 9176K, reserved 1048576K
      2018-12-31T06:24:58.029+0000: 1184437.870: [Full GC (Allocation Failure) 2018-12-31T06:24:58.029+0000: 1184437.871: [CMS: 7340032K->7340031K(7340032K), 15.2051822 secs] 8283743K->7443230K(8283776K), [Metaspace: 71629K->71629K(1116160K)], 15.2055451 secs] [Times: user=13.94 sys=1.28, real=15.20 secs]
      Heap after GC invocations=1377085 (full 108683):
      par new generation total 943744K, used 103198K [0x00000005c0000000, 0x0000000600000000, 0x0000000600000000)
      eden space 838912K, 12% used [0x00000005c0000000, 0x00000005c64c7950, 0x00000005f3340000)
      from space 104832K, 0% used [0x00000005f99a0000, 0x00000005f99a0000, 0x0000000600000000)
      to space 104832K, 0% used [0x00000005f3340000, 0x00000005f3340000, 0x00000005f99a0000)
      concurrent mark-sweep generation total 7340032K, used 7340031K [0x0000000600000000, 0x00000007c0000000, 0x00000007c0000000)
      Metaspace used 71629K, capacity 73844K, committed 75000K, reserved 1116160K
      class space used 8521K, capacity 8909K, committed 9176K, reserved 1048576K
      }
      2018-12-31T06:25:13.235+0000: 1184453.076: Total time for which application threads were stopped: 15.2103050 seconds, Stopping threads took: 0.0004204 seconds
      2018-12-31T06:25:13.243+0000: 1184453.085: Total time for which application threads were stopped: 0.0047592 seconds, Stopping threads took: 0.0008416 seconds
      2018-12-31T06:25:13.272+0000: 1184453.114: Total time for which application threads were stopped: 0.0085538 seconds, Stopping threads took: 0.0046376 seconds
      2018-12-31T06:25:13.298+0000: 1184453.140: [GC (CMS Initial Mark) [1 CMS-initial-mark: 7340031K(7340032K)] 7536074K(8283776K), 0.0650538 secs] [Times: user=0.12 sys=0.01, real=0.06 secs]
      2018-12-31T06:25:13.364+0000: 1184453.206: Total time for which application threads were stopped: 0.0728487 seconds, Stopping threads took: 0.0039520 seconds
      2018-12-31T06:25:13.364+0000: 1184453.206: [CMS-concurrent-mark-start]
      {Heap before GC invocations=1377085 (full 108684):
      par new generation total 943744K, used 943215K [0x00000005c0000000, 0x0000000600000000, 0x0000000600000000)
      eden space 838912K, 100% used [0x00000005c0000000, 0x00000005f3340000, 0x00000005f3340000)
      from space 104832K, 99% used [0x00000005f99a0000, 0x00000005fff7bd98, 0x0000000600000000)
      to space 104832K, 0% used [0x00000005f3340000, 0x00000005f3340000, 0x00000005f99a0000)
      concurrent mark-sweep generation total 7340032K, used 7340031K [0x0000000600000000, 0x00000007c0000000, 0x00000007c0000000)
      Metaspace used 71631K, capacity 73844K, committed 75000K, reserved 1116160K
      class space used 8521K, capacity 8909K, committed 9176K, reserved 1048576K
      2018-12-31T06:25:13.821+0000: 1184453.662: [Full GC (Allocation Failure) 2018-12-31T06:25:13.821+0000: 1184453.663: [CMS2018-12-31T06:25:16.960+0000: 1184456.802: [CMS-concurrent-mark: 3.592/3.596 secs] [Times: user=12.47 sys=0.38, real=3.60 secs]


      So i've start checking the data spread in cluster - we are using num_of_tokens - 32
      data seem to be balanced ~ 40% in each node.



      UN  10.52.2.94  672.64 GiB  32           ?       ad3d1365-bbb7-4229-b586-40667ec22b41  rack1
      UN 10.52.3.47 640.91 GiB 32 ? cdba952b-9685-4769-aaf4-22e538a5c37f rack1
      UN 10.52.1.57 719.34 GiB 32 ? 13bb7573-eb30-489b-80c4-6e5a7c8e5f5e rack1
      UN 10.52.5.42 743.04 GiB 32 ? c9e892c6-9281-4a49-b4c4-a147a387c3de rack1
      UN 10.52.4.43 691.1 GiB 32 ? 53e3724e-f5a9-46b1-b330-7bb542f15f90 rack1


      So after checking the logs in other node I cant find any reason for those timeout in that specific node.
      Any thoughts or ideas as for what cause this to happen on the same node again and again ?










      share|improve this question
















      We are using Cassandra (3.10-1) cluster with 5 nodes - each with 8 cores and 23Gi memory, all in the same DC.
      Replication factor - 2
      Consistency level - 2



      Lately during scheduled repair which is being done about once a week, there are many timeouts on queries from one node - always the same node in different repairs.
      From looking at it logs:



      debug.log shows that there are several errors when building the merkel tree for repair.



      DEBUG [MemtablePostFlush:7354] 2018-12-30 23:52:08,314 ColumnFamilyStore.java:954 - forceFlush requested but everything is clean in user_device_validation_configuration
      ERROR [ValidationExecutor:973] 2018-12-30 23:52:08,315 Validator.java:268 - Failed creating a merkle tree for [repair #b3887f60-0c8d-11e9-b894-754001cf0917 on keyspace1/user_device_validation_configuration, [(-8096630393642361664,-8073407512196669022]]], /10.52.5.42 (see log for details)
      DEBUG [AntiEntropyStage:1] 2018-12-30 23:52:08,318 RepairMessageVerbHandler.java:114 - Validating ValidationRequest{gcBefore=1545349928} org.apache.cassandra.repair.messages.ValidationRequest@5c1c2b28
      DEBUG [ValidationExecutor:973] 2018-12-30 23:52:08,319 StorageService.java:3313 - Forcing flush on keyspace keyspace1, CF raw_sensors
      DEBUG [MemtablePostFlush:7354] 2018-12-30 23:52:08,319 ColumnFamilyStore.java:954 - forceFlush requested but everything is clean in raw_sensors
      ERROR [ValidationExecutor:973] 2018-12-30 23:52:08,319 Validator.java:268 - Failed creating a merkle tree for [repair #b3887f60-0c8d-11e9-b894-754001cf0917 on keyspace1/raw_sensors, [(-8096630393642361664,-8073407512196669022]]], /10.52.5.42 (see log for details)
      DEBUG [AntiEntropyStage:1] 2018-12-30 23:52:08,321 RepairMessageVerbHandler.java:114 - Validating ValidationRequest{gcBefore=1545349928} org.apache.cassandra.repair.messages.ValidationRequest@5c1c2b28
      DEBUG [AntiEntropyStage:1] 2018-12-30 23:52:08,321 RepairMessageVerbHandler.java:142 - Got anticompaction request AnticompactionRequest{parentRepairSession=b387e320-0c8d-11e9-b894-754001cf0917} org.apache.cassandra.repair.messages.AnticompactionRequest@d4b7ed7b
      ERROR [AntiEntropyStage:1] 2018-12-30 23:52:08,322 RepairMessageVerbHandler.java:168 - Got error, removing parent repair session
      ERROR [AntiEntropyStage:1] 2018-12-30 23:52:08,322 CassandraDaemon.java:229 - Exception in thread Thread[AntiEntropyStage:1,5,main]
      java.lang.RuntimeException: java.lang.RuntimeException: Parent repair session with id = b387e320-0c8d-11e9-b894-754001cf0917 has failed.
      at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:171) ~[apache-cassandra-3.10.jar:3.10]
      at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) ~[apache-cassandra-3.10.jar:3.10]
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_131]
      at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_131]
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_131]
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_131]
      at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) [apache-cassandra-3.10.jar:3.10]
      at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_131]
      Caused by: java.lang.RuntimeException: Parent repair session with id = b387e320-0c8d-11e9-b894-754001cf0917 has failed.
      at org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:400) ~[apache-cassandra-3.10.jar:3.10]
      at org.apache.cassandra.service.ActiveRepairService.doAntiCompaction(ActiveRepairService.java:435) ~[apache-cassandra-3.10.jar:3.10]
      at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:143) ~[apache-cassandra-3.10.jar:3.10]
      ... 7 common frames omitted
      DEBUG [ValidationExecutor:973] 2018-12-30 23:52:08,323 StorageService.java:3313 - Forcing flush on keyspace keyspace1, CF mouse_interactions
      DEBUG [MemtablePostFlush:7354] 2018-12-30 23:52:08,323 ColumnFamilyStore.java:954 - forceFlush requested but everything is clean in mouse_interactions
      ERROR [ValidationExecutor:973] 2018-12-30 23:52:08,327 Validator.java:268 - Failed creating a merkle tree for [repair #b3887f60-0c8d-11e9-b894-754001cf0917 on keyspace1/mouse_interactions, [(-8096630393642361664,-8073407512196669022]]], /10.52.5.42 (see log for details)
      DEBUG [GossipStage:1] 2018-12-30 23:52:10,643 FailureDetector.java:457 - Ignoring interval time of 2000164738 for /10.52.3.47
      DEBUG [GossipStage:1] 2018-12-30 23:52:13,643 FailureDetector.java:457 - Ignoring interval time of 2000239040 for /10.52.3.47
      DEBUG [ReadRepairStage:407] 2018-12-30 23:52:15,133 ReadCallback.java:242 - Digest mismatch:
      org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey(7486012912397474412, 000467657474000020376337363933643363613837616231643531633936396564616234636363613400) (a0e45fcd73255bcd93a63b15d41e0843 vs 7dff1a87a755cf62150befc8352f59e8)
      at org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92) ~[apache-cassandra-3.10.jar:3.10]
      at org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:233) ~[apache-cassandra-3.10.jar:3.10]
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_131]
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_131]
      at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) [apache-cassandra-3.10.jar:3.10]
      at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_131]
      DEBUG [GossipStage:1] 2018-12-30 23:52:26,639 FailureDetector.java:457 - Ignoring interval time of 2000385682 for /10.52.3.47


      After few hours at GC logs, I noticed that GC is being called roughly every 20 seconds, and stop for more than 10 seconds each call:



      2018-12-31T06:24:57.450+0000: 1184437.292: Total time for which application threads were stopped: 18.0318658 seconds, Stopping threads took: 0.0005000 seconds
      2018-12-31T06:24:57.483+0000: 1184437.325: Total time for which application threads were stopped: 0.0053815 seconds, Stopping threads took: 0.0007325 seconds
      2018-12-31T06:24:57.565+0000: 1184437.407: Total time for which application threads were stopped: 0.0118127 seconds, Stopping threads took: 0.0057652 seconds
      2018-12-31T06:24:57.604+0000: 1184437.446: Total time for which application threads were stopped: 0.0064909 seconds, Stopping threads took: 0.0023037 seconds
      2018-12-31T06:24:57.701+0000: 1184437.543: Total time for which application threads were stopped: 0.0066540 seconds, Stopping threads took: 0.0031299 seconds
      {Heap before GC invocations=1377084 (full 108682):
      par new generation total 943744K, used 943711K [0x00000005c0000000, 0x0000000600000000, 0x0000000600000000)
      eden space 838912K, 100% used [0x00000005c0000000, 0x00000005f3340000, 0x00000005f3340000)
      from space 104832K, 99% used [0x00000005f99a0000, 0x00000005ffff7ce0, 0x0000000600000000)
      to space 104832K, 0% used [0x00000005f3340000, 0x00000005f3340000, 0x00000005f99a0000)
      concurrent mark-sweep generation total 7340032K, used 7340032K [0x0000000600000000, 0x00000007c0000000, 0x00000007c0000000)
      Metaspace used 71629K, capacity 73844K, committed 75000K, reserved 1116160K
      class space used 8521K, capacity 8909K, committed 9176K, reserved 1048576K
      2018-12-31T06:24:58.029+0000: 1184437.870: [Full GC (Allocation Failure) 2018-12-31T06:24:58.029+0000: 1184437.871: [CMS: 7340032K->7340031K(7340032K), 15.2051822 secs] 8283743K->7443230K(8283776K), [Metaspace: 71629K->71629K(1116160K)], 15.2055451 secs] [Times: user=13.94 sys=1.28, real=15.20 secs]
      Heap after GC invocations=1377085 (full 108683):
      par new generation total 943744K, used 103198K [0x00000005c0000000, 0x0000000600000000, 0x0000000600000000)
      eden space 838912K, 12% used [0x00000005c0000000, 0x00000005c64c7950, 0x00000005f3340000)
      from space 104832K, 0% used [0x00000005f99a0000, 0x00000005f99a0000, 0x0000000600000000)
      to space 104832K, 0% used [0x00000005f3340000, 0x00000005f3340000, 0x00000005f99a0000)
      concurrent mark-sweep generation total 7340032K, used 7340031K [0x0000000600000000, 0x00000007c0000000, 0x00000007c0000000)
      Metaspace used 71629K, capacity 73844K, committed 75000K, reserved 1116160K
      class space used 8521K, capacity 8909K, committed 9176K, reserved 1048576K
      }
      2018-12-31T06:25:13.235+0000: 1184453.076: Total time for which application threads were stopped: 15.2103050 seconds, Stopping threads took: 0.0004204 seconds
      2018-12-31T06:25:13.243+0000: 1184453.085: Total time for which application threads were stopped: 0.0047592 seconds, Stopping threads took: 0.0008416 seconds
      2018-12-31T06:25:13.272+0000: 1184453.114: Total time for which application threads were stopped: 0.0085538 seconds, Stopping threads took: 0.0046376 seconds
      2018-12-31T06:25:13.298+0000: 1184453.140: [GC (CMS Initial Mark) [1 CMS-initial-mark: 7340031K(7340032K)] 7536074K(8283776K), 0.0650538 secs] [Times: user=0.12 sys=0.01, real=0.06 secs]
      2018-12-31T06:25:13.364+0000: 1184453.206: Total time for which application threads were stopped: 0.0728487 seconds, Stopping threads took: 0.0039520 seconds
      2018-12-31T06:25:13.364+0000: 1184453.206: [CMS-concurrent-mark-start]
      {Heap before GC invocations=1377085 (full 108684):
      par new generation total 943744K, used 943215K [0x00000005c0000000, 0x0000000600000000, 0x0000000600000000)
      eden space 838912K, 100% used [0x00000005c0000000, 0x00000005f3340000, 0x00000005f3340000)
      from space 104832K, 99% used [0x00000005f99a0000, 0x00000005fff7bd98, 0x0000000600000000)
      to space 104832K, 0% used [0x00000005f3340000, 0x00000005f3340000, 0x00000005f99a0000)
      concurrent mark-sweep generation total 7340032K, used 7340031K [0x0000000600000000, 0x00000007c0000000, 0x00000007c0000000)
      Metaspace used 71631K, capacity 73844K, committed 75000K, reserved 1116160K
      class space used 8521K, capacity 8909K, committed 9176K, reserved 1048576K
      2018-12-31T06:25:13.821+0000: 1184453.662: [Full GC (Allocation Failure) 2018-12-31T06:25:13.821+0000: 1184453.663: [CMS2018-12-31T06:25:16.960+0000: 1184456.802: [CMS-concurrent-mark: 3.592/3.596 secs] [Times: user=12.47 sys=0.38, real=3.60 secs]


      So i've start checking the data spread in cluster - we are using num_of_tokens - 32
      data seem to be balanced ~ 40% in each node.



      UN  10.52.2.94  672.64 GiB  32           ?       ad3d1365-bbb7-4229-b586-40667ec22b41  rack1
      UN 10.52.3.47 640.91 GiB 32 ? cdba952b-9685-4769-aaf4-22e538a5c37f rack1
      UN 10.52.1.57 719.34 GiB 32 ? 13bb7573-eb30-489b-80c4-6e5a7c8e5f5e rack1
      UN 10.52.5.42 743.04 GiB 32 ? c9e892c6-9281-4a49-b4c4-a147a387c3de rack1
      UN 10.52.4.43 691.1 GiB 32 ? 53e3724e-f5a9-46b1-b330-7bb542f15f90 rack1


      So after checking the logs in other node I cant find any reason for those timeout in that specific node.
      Any thoughts or ideas as for what cause this to happen on the same node again and again ?







      cassandra






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jan 4 at 14:44









      Aaron

      35k1076101




      35k1076101










      asked Jan 2 at 8:12









      guy dadonguy dadon

      164




      164
























          1 Answer
          1






          active

          oldest

          votes


















          0














          That's really odd to only see this on one node. Double-check that the configs are the same. Otherwise, you might be writing/querying a large partition which that node is primarily responsible for.




          Replication factor - 2 Consistency level - 2




          In general, repairs can cause nodes to have trouble serving requests, as building Merkle Trees and streaming data are quite resource-intensive. I see two problems here:




          1. Long GC pauses.

          2. A RF/CL ratio which does not allow for any node to be down.


          Starting with #2, when you have a RF=2 and you're requiring 2 replicas to respond on your queries, you are essentially querying at CONSISTENCY_ALL. Therefore, should a node become overwhelmed and short on resources, your queries will be unable to complete. If it were me, I would increase the RF to 3, and run a repair (assuming the nodes have adequate storage). The other option, would be to query at a consistency level of 1 (ONE), which is probably what you should be doing with a RF=2 anyway.



          Also, when diagnosing query problems, it usually helps to see the actual query being run (which you have not provided). With Cassandra, more often than not, query issues are the result of queries which don't fit the designed table.



          As for the long GC pauses, that's a trickier problem to solve. There's an old Cassandra JIRA (CASSANDRA-8150) ticket which talks about optimal settings for the CMS Garbage Collector. You should give that a read.



          What is your MAX_HEAP set to? I see your new generation is less than 1GB, which is going to be too small. With 23GB of RAM on each node, I'd start with the following settings for using CMS GC:




          • Max Heap Size (Xmx) of 8GB-12GB (you want to leave about 50% of RAM for the OS/page-cache).

          • Initial Heap Size (Xms) equal to Max Heap Size to avoid the performance hit of resizing the heap.

          • Heap New Size (Xmn) of 40-50% of your Max Heap Size (so somewhere between 3GB-6GB). You want plenty of room available for the new generation.


          • MaxTenuringThreshold of 6 to 8, as you want objects to be passed around the new gen survivor spaces until they die, in lieu of being promoted to the old gen. By default, I think this is set to 1, which means objects will get promoted too quickly.


          Basically, new-to-old/old-to-permanent promotion is where the long pauses happen. Ideally, you'd like all objects on your heap to be created, live, and die in the new gen, without being promoted.



          Otherwise, it might be worth your while to try using G1GC. For G1, I'd go with a Max/Initial Heap of about 50% of RAM, a MaxGCPauseMillis of 400-500ms, and don't set Heap New Size at all (G1 computes that).






          share|improve this answer
























          • First of all thanks for you detailed response. From looking in our configuration we are setting the GC to use G1GC, but from some reason cassandra is using CMS, as for the query you mention, this is not specific query, all the queries failed due to timeout since that node is busy. For now i've set the repair to split for more segments and use only 1 repair thread in order to lower the 'work' on that node what seems to work.

            – guy dadon
            Jan 7 at 7:49











          • @guydadon Cassandra can accept GC configuration in both the cassandra-env.sh and jvm.options files. Make sure that you check both, as one might be overriding the other.

            – Aaron
            Jan 7 at 16:13











          • I've checked both cassandra-env.sh and jvm.options and both configure it to use G1GC. ill continue investigating it

            – guy dadon
            Jan 8 at 9:06











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54003123%2fcassandra-repair-cause-timeout-in-node%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0














          That's really odd to only see this on one node. Double-check that the configs are the same. Otherwise, you might be writing/querying a large partition which that node is primarily responsible for.




          Replication factor - 2 Consistency level - 2




          In general, repairs can cause nodes to have trouble serving requests, as building Merkle Trees and streaming data are quite resource-intensive. I see two problems here:




          1. Long GC pauses.

          2. A RF/CL ratio which does not allow for any node to be down.


          Starting with #2, when you have a RF=2 and you're requiring 2 replicas to respond on your queries, you are essentially querying at CONSISTENCY_ALL. Therefore, should a node become overwhelmed and short on resources, your queries will be unable to complete. If it were me, I would increase the RF to 3, and run a repair (assuming the nodes have adequate storage). The other option, would be to query at a consistency level of 1 (ONE), which is probably what you should be doing with a RF=2 anyway.



          Also, when diagnosing query problems, it usually helps to see the actual query being run (which you have not provided). With Cassandra, more often than not, query issues are the result of queries which don't fit the designed table.



          As for the long GC pauses, that's a trickier problem to solve. There's an old Cassandra JIRA (CASSANDRA-8150) ticket which talks about optimal settings for the CMS Garbage Collector. You should give that a read.



          What is your MAX_HEAP set to? I see your new generation is less than 1GB, which is going to be too small. With 23GB of RAM on each node, I'd start with the following settings for using CMS GC:




          • Max Heap Size (Xmx) of 8GB-12GB (you want to leave about 50% of RAM for the OS/page-cache).

          • Initial Heap Size (Xms) equal to Max Heap Size to avoid the performance hit of resizing the heap.

          • Heap New Size (Xmn) of 40-50% of your Max Heap Size (so somewhere between 3GB-6GB). You want plenty of room available for the new generation.


          • MaxTenuringThreshold of 6 to 8, as you want objects to be passed around the new gen survivor spaces until they die, in lieu of being promoted to the old gen. By default, I think this is set to 1, which means objects will get promoted too quickly.


          Basically, new-to-old/old-to-permanent promotion is where the long pauses happen. Ideally, you'd like all objects on your heap to be created, live, and die in the new gen, without being promoted.



          Otherwise, it might be worth your while to try using G1GC. For G1, I'd go with a Max/Initial Heap of about 50% of RAM, a MaxGCPauseMillis of 400-500ms, and don't set Heap New Size at all (G1 computes that).






          share|improve this answer
























          • First of all thanks for you detailed response. From looking in our configuration we are setting the GC to use G1GC, but from some reason cassandra is using CMS, as for the query you mention, this is not specific query, all the queries failed due to timeout since that node is busy. For now i've set the repair to split for more segments and use only 1 repair thread in order to lower the 'work' on that node what seems to work.

            – guy dadon
            Jan 7 at 7:49











          • @guydadon Cassandra can accept GC configuration in both the cassandra-env.sh and jvm.options files. Make sure that you check both, as one might be overriding the other.

            – Aaron
            Jan 7 at 16:13











          • I've checked both cassandra-env.sh and jvm.options and both configure it to use G1GC. ill continue investigating it

            – guy dadon
            Jan 8 at 9:06
















          0














          That's really odd to only see this on one node. Double-check that the configs are the same. Otherwise, you might be writing/querying a large partition which that node is primarily responsible for.




          Replication factor - 2 Consistency level - 2




          In general, repairs can cause nodes to have trouble serving requests, as building Merkle Trees and streaming data are quite resource-intensive. I see two problems here:




          1. Long GC pauses.

          2. A RF/CL ratio which does not allow for any node to be down.


          Starting with #2, when you have a RF=2 and you're requiring 2 replicas to respond on your queries, you are essentially querying at CONSISTENCY_ALL. Therefore, should a node become overwhelmed and short on resources, your queries will be unable to complete. If it were me, I would increase the RF to 3, and run a repair (assuming the nodes have adequate storage). The other option, would be to query at a consistency level of 1 (ONE), which is probably what you should be doing with a RF=2 anyway.



          Also, when diagnosing query problems, it usually helps to see the actual query being run (which you have not provided). With Cassandra, more often than not, query issues are the result of queries which don't fit the designed table.



          As for the long GC pauses, that's a trickier problem to solve. There's an old Cassandra JIRA (CASSANDRA-8150) ticket which talks about optimal settings for the CMS Garbage Collector. You should give that a read.



          What is your MAX_HEAP set to? I see your new generation is less than 1GB, which is going to be too small. With 23GB of RAM on each node, I'd start with the following settings for using CMS GC:




          • Max Heap Size (Xmx) of 8GB-12GB (you want to leave about 50% of RAM for the OS/page-cache).

          • Initial Heap Size (Xms) equal to Max Heap Size to avoid the performance hit of resizing the heap.

          • Heap New Size (Xmn) of 40-50% of your Max Heap Size (so somewhere between 3GB-6GB). You want plenty of room available for the new generation.


          • MaxTenuringThreshold of 6 to 8, as you want objects to be passed around the new gen survivor spaces until they die, in lieu of being promoted to the old gen. By default, I think this is set to 1, which means objects will get promoted too quickly.


          Basically, new-to-old/old-to-permanent promotion is where the long pauses happen. Ideally, you'd like all objects on your heap to be created, live, and die in the new gen, without being promoted.



          Otherwise, it might be worth your while to try using G1GC. For G1, I'd go with a Max/Initial Heap of about 50% of RAM, a MaxGCPauseMillis of 400-500ms, and don't set Heap New Size at all (G1 computes that).






          share|improve this answer
























          • First of all thanks for you detailed response. From looking in our configuration we are setting the GC to use G1GC, but from some reason cassandra is using CMS, as for the query you mention, this is not specific query, all the queries failed due to timeout since that node is busy. For now i've set the repair to split for more segments and use only 1 repair thread in order to lower the 'work' on that node what seems to work.

            – guy dadon
            Jan 7 at 7:49











          • @guydadon Cassandra can accept GC configuration in both the cassandra-env.sh and jvm.options files. Make sure that you check both, as one might be overriding the other.

            – Aaron
            Jan 7 at 16:13











          • I've checked both cassandra-env.sh and jvm.options and both configure it to use G1GC. ill continue investigating it

            – guy dadon
            Jan 8 at 9:06














          0












          0








          0







          That's really odd to only see this on one node. Double-check that the configs are the same. Otherwise, you might be writing/querying a large partition which that node is primarily responsible for.




          Replication factor - 2 Consistency level - 2




          In general, repairs can cause nodes to have trouble serving requests, as building Merkle Trees and streaming data are quite resource-intensive. I see two problems here:




          1. Long GC pauses.

          2. A RF/CL ratio which does not allow for any node to be down.


          Starting with #2, when you have a RF=2 and you're requiring 2 replicas to respond on your queries, you are essentially querying at CONSISTENCY_ALL. Therefore, should a node become overwhelmed and short on resources, your queries will be unable to complete. If it were me, I would increase the RF to 3, and run a repair (assuming the nodes have adequate storage). The other option, would be to query at a consistency level of 1 (ONE), which is probably what you should be doing with a RF=2 anyway.



          Also, when diagnosing query problems, it usually helps to see the actual query being run (which you have not provided). With Cassandra, more often than not, query issues are the result of queries which don't fit the designed table.



          As for the long GC pauses, that's a trickier problem to solve. There's an old Cassandra JIRA (CASSANDRA-8150) ticket which talks about optimal settings for the CMS Garbage Collector. You should give that a read.



          What is your MAX_HEAP set to? I see your new generation is less than 1GB, which is going to be too small. With 23GB of RAM on each node, I'd start with the following settings for using CMS GC:




          • Max Heap Size (Xmx) of 8GB-12GB (you want to leave about 50% of RAM for the OS/page-cache).

          • Initial Heap Size (Xms) equal to Max Heap Size to avoid the performance hit of resizing the heap.

          • Heap New Size (Xmn) of 40-50% of your Max Heap Size (so somewhere between 3GB-6GB). You want plenty of room available for the new generation.


          • MaxTenuringThreshold of 6 to 8, as you want objects to be passed around the new gen survivor spaces until they die, in lieu of being promoted to the old gen. By default, I think this is set to 1, which means objects will get promoted too quickly.


          Basically, new-to-old/old-to-permanent promotion is where the long pauses happen. Ideally, you'd like all objects on your heap to be created, live, and die in the new gen, without being promoted.



          Otherwise, it might be worth your while to try using G1GC. For G1, I'd go with a Max/Initial Heap of about 50% of RAM, a MaxGCPauseMillis of 400-500ms, and don't set Heap New Size at all (G1 computes that).






          share|improve this answer













          That's really odd to only see this on one node. Double-check that the configs are the same. Otherwise, you might be writing/querying a large partition which that node is primarily responsible for.




          Replication factor - 2 Consistency level - 2




          In general, repairs can cause nodes to have trouble serving requests, as building Merkle Trees and streaming data are quite resource-intensive. I see two problems here:




          1. Long GC pauses.

          2. A RF/CL ratio which does not allow for any node to be down.


          Starting with #2, when you have a RF=2 and you're requiring 2 replicas to respond on your queries, you are essentially querying at CONSISTENCY_ALL. Therefore, should a node become overwhelmed and short on resources, your queries will be unable to complete. If it were me, I would increase the RF to 3, and run a repair (assuming the nodes have adequate storage). The other option, would be to query at a consistency level of 1 (ONE), which is probably what you should be doing with a RF=2 anyway.



          Also, when diagnosing query problems, it usually helps to see the actual query being run (which you have not provided). With Cassandra, more often than not, query issues are the result of queries which don't fit the designed table.



          As for the long GC pauses, that's a trickier problem to solve. There's an old Cassandra JIRA (CASSANDRA-8150) ticket which talks about optimal settings for the CMS Garbage Collector. You should give that a read.



          What is your MAX_HEAP set to? I see your new generation is less than 1GB, which is going to be too small. With 23GB of RAM on each node, I'd start with the following settings for using CMS GC:




          • Max Heap Size (Xmx) of 8GB-12GB (you want to leave about 50% of RAM for the OS/page-cache).

          • Initial Heap Size (Xms) equal to Max Heap Size to avoid the performance hit of resizing the heap.

          • Heap New Size (Xmn) of 40-50% of your Max Heap Size (so somewhere between 3GB-6GB). You want plenty of room available for the new generation.


          • MaxTenuringThreshold of 6 to 8, as you want objects to be passed around the new gen survivor spaces until they die, in lieu of being promoted to the old gen. By default, I think this is set to 1, which means objects will get promoted too quickly.


          Basically, new-to-old/old-to-permanent promotion is where the long pauses happen. Ideally, you'd like all objects on your heap to be created, live, and die in the new gen, without being promoted.



          Otherwise, it might be worth your while to try using G1GC. For G1, I'd go with a Max/Initial Heap of about 50% of RAM, a MaxGCPauseMillis of 400-500ms, and don't set Heap New Size at all (G1 computes that).







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Jan 4 at 14:42









          AaronAaron

          35k1076101




          35k1076101













          • First of all thanks for you detailed response. From looking in our configuration we are setting the GC to use G1GC, but from some reason cassandra is using CMS, as for the query you mention, this is not specific query, all the queries failed due to timeout since that node is busy. For now i've set the repair to split for more segments and use only 1 repair thread in order to lower the 'work' on that node what seems to work.

            – guy dadon
            Jan 7 at 7:49











          • @guydadon Cassandra can accept GC configuration in both the cassandra-env.sh and jvm.options files. Make sure that you check both, as one might be overriding the other.

            – Aaron
            Jan 7 at 16:13











          • I've checked both cassandra-env.sh and jvm.options and both configure it to use G1GC. ill continue investigating it

            – guy dadon
            Jan 8 at 9:06



















          • First of all thanks for you detailed response. From looking in our configuration we are setting the GC to use G1GC, but from some reason cassandra is using CMS, as for the query you mention, this is not specific query, all the queries failed due to timeout since that node is busy. For now i've set the repair to split for more segments and use only 1 repair thread in order to lower the 'work' on that node what seems to work.

            – guy dadon
            Jan 7 at 7:49











          • @guydadon Cassandra can accept GC configuration in both the cassandra-env.sh and jvm.options files. Make sure that you check both, as one might be overriding the other.

            – Aaron
            Jan 7 at 16:13











          • I've checked both cassandra-env.sh and jvm.options and both configure it to use G1GC. ill continue investigating it

            – guy dadon
            Jan 8 at 9:06

















          First of all thanks for you detailed response. From looking in our configuration we are setting the GC to use G1GC, but from some reason cassandra is using CMS, as for the query you mention, this is not specific query, all the queries failed due to timeout since that node is busy. For now i've set the repair to split for more segments and use only 1 repair thread in order to lower the 'work' on that node what seems to work.

          – guy dadon
          Jan 7 at 7:49





          First of all thanks for you detailed response. From looking in our configuration we are setting the GC to use G1GC, but from some reason cassandra is using CMS, as for the query you mention, this is not specific query, all the queries failed due to timeout since that node is busy. For now i've set the repair to split for more segments and use only 1 repair thread in order to lower the 'work' on that node what seems to work.

          – guy dadon
          Jan 7 at 7:49













          @guydadon Cassandra can accept GC configuration in both the cassandra-env.sh and jvm.options files. Make sure that you check both, as one might be overriding the other.

          – Aaron
          Jan 7 at 16:13





          @guydadon Cassandra can accept GC configuration in both the cassandra-env.sh and jvm.options files. Make sure that you check both, as one might be overriding the other.

          – Aaron
          Jan 7 at 16:13













          I've checked both cassandra-env.sh and jvm.options and both configure it to use G1GC. ill continue investigating it

          – guy dadon
          Jan 8 at 9:06





          I've checked both cassandra-env.sh and jvm.options and both configure it to use G1GC. ill continue investigating it

          – guy dadon
          Jan 8 at 9:06




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54003123%2fcassandra-repair-cause-timeout-in-node%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          MongoDB - Not Authorized To Execute Command

          in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith

          How to fix TextFormField cause rebuild widget in Flutter