How do I limit write operations to 1k records/sec?
Currently, I am able to write to database in the batchsize of 500. But due to the memory shortage error and delay synchronization between child aggregator and leaf node of database, sometimes I am running into Leaf Node Memory Error. The only solution for this is if I limit my write operations to 1k records per second, I can get rid of the error.
dataStream
.map(line => readJsonFromString(line))
.grouped(memsqlBatchSize)
.foreach { recordSet =>
val dbRecords = recordSet.map(m => (m, Events.transform(m)))
dbRecords.map { record =>
try {
Events.setValues(eventInsert, record._2)
eventInsert.addBatch
} catch {
case e: Exception =>
logger.error(s"error adding batch: ${e.getMessage}")
val error_event = Events.jm.writeValueAsString(mapAsJavaMap(record._1.asInstanceOf[Map[String, Object]]))
logger.error(s"event: $error_event")
}
}
// Bulk Commit Records
try {
eventInsert.executeBatch
} catch {
case e: java.sql.BatchUpdateException =>
val updates = e.getUpdateCounts
logger.error(s"failed commit: ${updates.toString}")
updates.zipWithIndex.filter { case (v, i) => v == Statement.EXECUTE_FAILED }.foreach { case (v, i) =>
val error = Events.jm.writeValueAsString(mapAsJavaMap(dbRecords(i)._1.asInstanceOf[Map[String, Object]]))
logger.error(s"insert error: $error")
logger.error(e.getMessage)
}
}
finally {
connection.commit
eventInsert.clearBatch
logger.debug(s"committed: ${dbRecords.length.toString}")
}
}
The reason for 1k records is that, some of the data that I am trying to write can contains tons of json records and if batch size if 500, that may result in 30k records per second. Is there any way so that I can make sure that only 1000 records will be written to the database in a batch irrespective of the number of records?
multithreading scala performance thread-synchronization memsql
add a comment |
Currently, I am able to write to database in the batchsize of 500. But due to the memory shortage error and delay synchronization between child aggregator and leaf node of database, sometimes I am running into Leaf Node Memory Error. The only solution for this is if I limit my write operations to 1k records per second, I can get rid of the error.
dataStream
.map(line => readJsonFromString(line))
.grouped(memsqlBatchSize)
.foreach { recordSet =>
val dbRecords = recordSet.map(m => (m, Events.transform(m)))
dbRecords.map { record =>
try {
Events.setValues(eventInsert, record._2)
eventInsert.addBatch
} catch {
case e: Exception =>
logger.error(s"error adding batch: ${e.getMessage}")
val error_event = Events.jm.writeValueAsString(mapAsJavaMap(record._1.asInstanceOf[Map[String, Object]]))
logger.error(s"event: $error_event")
}
}
// Bulk Commit Records
try {
eventInsert.executeBatch
} catch {
case e: java.sql.BatchUpdateException =>
val updates = e.getUpdateCounts
logger.error(s"failed commit: ${updates.toString}")
updates.zipWithIndex.filter { case (v, i) => v == Statement.EXECUTE_FAILED }.foreach { case (v, i) =>
val error = Events.jm.writeValueAsString(mapAsJavaMap(dbRecords(i)._1.asInstanceOf[Map[String, Object]]))
logger.error(s"insert error: $error")
logger.error(e.getMessage)
}
}
finally {
connection.commit
eventInsert.clearBatch
logger.debug(s"committed: ${dbRecords.length.toString}")
}
}
The reason for 1k records is that, some of the data that I am trying to write can contains tons of json records and if batch size if 500, that may result in 30k records per second. Is there any way so that I can make sure that only 1000 records will be written to the database in a batch irrespective of the number of records?
multithreading scala performance thread-synchronization memsql
1
The way I see it, given code does not tell us any thing about the problem mentioned by you. And it looks like thatdbRecords.map
should bedbRecords.foreach
– Sarvesh Kumar Singh
Nov 21 '18 at 8:45
@SarveshKumarSingh came up with a solution. I think either I introduce delay of 1.0 sec or use Thread.sleep(1000) so that I can limit it. May be I did not understand the requirement right before. According to the architecture of memsql, all loads into memsql are done into a rowstore first, from there memsql will merge into the columnstore. But Memsql first collects data into local memory and then flushes to leafs. That resulted into the leaf error everytime. Reducing the batchsize and introducing a delay helped me writing 120,000 records/sec. Performed testing with this benchmark.
– Aniruddha Tekade
Nov 21 '18 at 21:22
add a comment |
Currently, I am able to write to database in the batchsize of 500. But due to the memory shortage error and delay synchronization between child aggregator and leaf node of database, sometimes I am running into Leaf Node Memory Error. The only solution for this is if I limit my write operations to 1k records per second, I can get rid of the error.
dataStream
.map(line => readJsonFromString(line))
.grouped(memsqlBatchSize)
.foreach { recordSet =>
val dbRecords = recordSet.map(m => (m, Events.transform(m)))
dbRecords.map { record =>
try {
Events.setValues(eventInsert, record._2)
eventInsert.addBatch
} catch {
case e: Exception =>
logger.error(s"error adding batch: ${e.getMessage}")
val error_event = Events.jm.writeValueAsString(mapAsJavaMap(record._1.asInstanceOf[Map[String, Object]]))
logger.error(s"event: $error_event")
}
}
// Bulk Commit Records
try {
eventInsert.executeBatch
} catch {
case e: java.sql.BatchUpdateException =>
val updates = e.getUpdateCounts
logger.error(s"failed commit: ${updates.toString}")
updates.zipWithIndex.filter { case (v, i) => v == Statement.EXECUTE_FAILED }.foreach { case (v, i) =>
val error = Events.jm.writeValueAsString(mapAsJavaMap(dbRecords(i)._1.asInstanceOf[Map[String, Object]]))
logger.error(s"insert error: $error")
logger.error(e.getMessage)
}
}
finally {
connection.commit
eventInsert.clearBatch
logger.debug(s"committed: ${dbRecords.length.toString}")
}
}
The reason for 1k records is that, some of the data that I am trying to write can contains tons of json records and if batch size if 500, that may result in 30k records per second. Is there any way so that I can make sure that only 1000 records will be written to the database in a batch irrespective of the number of records?
multithreading scala performance thread-synchronization memsql
Currently, I am able to write to database in the batchsize of 500. But due to the memory shortage error and delay synchronization between child aggregator and leaf node of database, sometimes I am running into Leaf Node Memory Error. The only solution for this is if I limit my write operations to 1k records per second, I can get rid of the error.
dataStream
.map(line => readJsonFromString(line))
.grouped(memsqlBatchSize)
.foreach { recordSet =>
val dbRecords = recordSet.map(m => (m, Events.transform(m)))
dbRecords.map { record =>
try {
Events.setValues(eventInsert, record._2)
eventInsert.addBatch
} catch {
case e: Exception =>
logger.error(s"error adding batch: ${e.getMessage}")
val error_event = Events.jm.writeValueAsString(mapAsJavaMap(record._1.asInstanceOf[Map[String, Object]]))
logger.error(s"event: $error_event")
}
}
// Bulk Commit Records
try {
eventInsert.executeBatch
} catch {
case e: java.sql.BatchUpdateException =>
val updates = e.getUpdateCounts
logger.error(s"failed commit: ${updates.toString}")
updates.zipWithIndex.filter { case (v, i) => v == Statement.EXECUTE_FAILED }.foreach { case (v, i) =>
val error = Events.jm.writeValueAsString(mapAsJavaMap(dbRecords(i)._1.asInstanceOf[Map[String, Object]]))
logger.error(s"insert error: $error")
logger.error(e.getMessage)
}
}
finally {
connection.commit
eventInsert.clearBatch
logger.debug(s"committed: ${dbRecords.length.toString}")
}
}
The reason for 1k records is that, some of the data that I am trying to write can contains tons of json records and if batch size if 500, that may result in 30k records per second. Is there any way so that I can make sure that only 1000 records will be written to the database in a batch irrespective of the number of records?
multithreading scala performance thread-synchronization memsql
multithreading scala performance thread-synchronization memsql
edited Nov 21 '18 at 8:46
Sarvesh Kumar Singh
8,3311935
8,3311935
asked Nov 20 '18 at 23:00
Aniruddha TekadeAniruddha Tekade
128111
128111
1
The way I see it, given code does not tell us any thing about the problem mentioned by you. And it looks like thatdbRecords.map
should bedbRecords.foreach
– Sarvesh Kumar Singh
Nov 21 '18 at 8:45
@SarveshKumarSingh came up with a solution. I think either I introduce delay of 1.0 sec or use Thread.sleep(1000) so that I can limit it. May be I did not understand the requirement right before. According to the architecture of memsql, all loads into memsql are done into a rowstore first, from there memsql will merge into the columnstore. But Memsql first collects data into local memory and then flushes to leafs. That resulted into the leaf error everytime. Reducing the batchsize and introducing a delay helped me writing 120,000 records/sec. Performed testing with this benchmark.
– Aniruddha Tekade
Nov 21 '18 at 21:22
add a comment |
1
The way I see it, given code does not tell us any thing about the problem mentioned by you. And it looks like thatdbRecords.map
should bedbRecords.foreach
– Sarvesh Kumar Singh
Nov 21 '18 at 8:45
@SarveshKumarSingh came up with a solution. I think either I introduce delay of 1.0 sec or use Thread.sleep(1000) so that I can limit it. May be I did not understand the requirement right before. According to the architecture of memsql, all loads into memsql are done into a rowstore first, from there memsql will merge into the columnstore. But Memsql first collects data into local memory and then flushes to leafs. That resulted into the leaf error everytime. Reducing the batchsize and introducing a delay helped me writing 120,000 records/sec. Performed testing with this benchmark.
– Aniruddha Tekade
Nov 21 '18 at 21:22
1
1
The way I see it, given code does not tell us any thing about the problem mentioned by you. And it looks like that
dbRecords.map
should be dbRecords.foreach
– Sarvesh Kumar Singh
Nov 21 '18 at 8:45
The way I see it, given code does not tell us any thing about the problem mentioned by you. And it looks like that
dbRecords.map
should be dbRecords.foreach
– Sarvesh Kumar Singh
Nov 21 '18 at 8:45
@SarveshKumarSingh came up with a solution. I think either I introduce delay of 1.0 sec or use Thread.sleep(1000) so that I can limit it. May be I did not understand the requirement right before. According to the architecture of memsql, all loads into memsql are done into a rowstore first, from there memsql will merge into the columnstore. But Memsql first collects data into local memory and then flushes to leafs. That resulted into the leaf error everytime. Reducing the batchsize and introducing a delay helped me writing 120,000 records/sec. Performed testing with this benchmark.
– Aniruddha Tekade
Nov 21 '18 at 21:22
@SarveshKumarSingh came up with a solution. I think either I introduce delay of 1.0 sec or use Thread.sleep(1000) so that I can limit it. May be I did not understand the requirement right before. According to the architecture of memsql, all loads into memsql are done into a rowstore first, from there memsql will merge into the columnstore. But Memsql first collects data into local memory and then flushes to leafs. That resulted into the leaf error everytime. Reducing the batchsize and introducing a delay helped me writing 120,000 records/sec. Performed testing with this benchmark.
– Aniruddha Tekade
Nov 21 '18 at 21:22
add a comment |
2 Answers
2
active
oldest
votes
I don't think Thead.sleep is a good idea to handle this situation. Generally we don't recommend to do so in Scala and we don't want to block the thread in any case.
One suggestion would be using any Streaming techniques such as Akka.Stream, Monix.Observable. There are some pro and cons between those libraries I don't want to spend too much paragraph on it. But they do support back pressure to control the producing rate when consumer is slower than producer. For example, in your case your consumer is database writing and your producer maybe is reading some json files and doing some aggregations.
The following code illustrates the idea and you will need to modify as your need:
val sourceJson = Source(dataStream.map(line => readJsonFromString(line)))
val sinkDB = Sink(Events.jm.writeValueAsString) // you will need to figure out how to generate the Sink
val flowThrottle = Flow[String]
.throttle(1, 1.second, 1, ThrottleMode.shaping)
val runnable = sourceJson.via[flowThrottle].toMat(sinkDB)(Keep.right)
val result = runnable.run()
but I don't understand the runnable statement in your code. My requirement is if have I threads=4, batchSize=500 and I somehow want to throttle the writing on producer side limited to 1k records/sec, how can I assure this with your code?
– Aniruddha Tekade
Dec 10 '18 at 23:56
add a comment |
The code block is already called by a thread and there are multiple threads running in parallel. Either I can use Thread.sleep(1000)
or delay(1.0)
in this scala code. But if I use delay()
it will use a promise which might have to call outside the function. Looks like Thread.sleep()
is the best option along with batch size of 1000
. After performing the testing, I could benchmark 120,000 records/thread/sec without any problem.
According to the architecture of memsql, all loads into memsql are done into a rowstore first into the local memory and from there memsql will merge into the columnstore at the end leaves. That resulted into the leaf error everytime I pushed more number of data causing bottleneck. Reducing the batchsize and introducing a Thread.sleep() helped me writing 120,000 records/sec. Performed testing with this benchmark.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53402884%2fhow-do-i-limit-write-operations-to-1k-records-sec%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
I don't think Thead.sleep is a good idea to handle this situation. Generally we don't recommend to do so in Scala and we don't want to block the thread in any case.
One suggestion would be using any Streaming techniques such as Akka.Stream, Monix.Observable. There are some pro and cons between those libraries I don't want to spend too much paragraph on it. But they do support back pressure to control the producing rate when consumer is slower than producer. For example, in your case your consumer is database writing and your producer maybe is reading some json files and doing some aggregations.
The following code illustrates the idea and you will need to modify as your need:
val sourceJson = Source(dataStream.map(line => readJsonFromString(line)))
val sinkDB = Sink(Events.jm.writeValueAsString) // you will need to figure out how to generate the Sink
val flowThrottle = Flow[String]
.throttle(1, 1.second, 1, ThrottleMode.shaping)
val runnable = sourceJson.via[flowThrottle].toMat(sinkDB)(Keep.right)
val result = runnable.run()
but I don't understand the runnable statement in your code. My requirement is if have I threads=4, batchSize=500 and I somehow want to throttle the writing on producer side limited to 1k records/sec, how can I assure this with your code?
– Aniruddha Tekade
Dec 10 '18 at 23:56
add a comment |
I don't think Thead.sleep is a good idea to handle this situation. Generally we don't recommend to do so in Scala and we don't want to block the thread in any case.
One suggestion would be using any Streaming techniques such as Akka.Stream, Monix.Observable. There are some pro and cons between those libraries I don't want to spend too much paragraph on it. But they do support back pressure to control the producing rate when consumer is slower than producer. For example, in your case your consumer is database writing and your producer maybe is reading some json files and doing some aggregations.
The following code illustrates the idea and you will need to modify as your need:
val sourceJson = Source(dataStream.map(line => readJsonFromString(line)))
val sinkDB = Sink(Events.jm.writeValueAsString) // you will need to figure out how to generate the Sink
val flowThrottle = Flow[String]
.throttle(1, 1.second, 1, ThrottleMode.shaping)
val runnable = sourceJson.via[flowThrottle].toMat(sinkDB)(Keep.right)
val result = runnable.run()
but I don't understand the runnable statement in your code. My requirement is if have I threads=4, batchSize=500 and I somehow want to throttle the writing on producer side limited to 1k records/sec, how can I assure this with your code?
– Aniruddha Tekade
Dec 10 '18 at 23:56
add a comment |
I don't think Thead.sleep is a good idea to handle this situation. Generally we don't recommend to do so in Scala and we don't want to block the thread in any case.
One suggestion would be using any Streaming techniques such as Akka.Stream, Monix.Observable. There are some pro and cons between those libraries I don't want to spend too much paragraph on it. But they do support back pressure to control the producing rate when consumer is slower than producer. For example, in your case your consumer is database writing and your producer maybe is reading some json files and doing some aggregations.
The following code illustrates the idea and you will need to modify as your need:
val sourceJson = Source(dataStream.map(line => readJsonFromString(line)))
val sinkDB = Sink(Events.jm.writeValueAsString) // you will need to figure out how to generate the Sink
val flowThrottle = Flow[String]
.throttle(1, 1.second, 1, ThrottleMode.shaping)
val runnable = sourceJson.via[flowThrottle].toMat(sinkDB)(Keep.right)
val result = runnable.run()
I don't think Thead.sleep is a good idea to handle this situation. Generally we don't recommend to do so in Scala and we don't want to block the thread in any case.
One suggestion would be using any Streaming techniques such as Akka.Stream, Monix.Observable. There are some pro and cons between those libraries I don't want to spend too much paragraph on it. But they do support back pressure to control the producing rate when consumer is slower than producer. For example, in your case your consumer is database writing and your producer maybe is reading some json files and doing some aggregations.
The following code illustrates the idea and you will need to modify as your need:
val sourceJson = Source(dataStream.map(line => readJsonFromString(line)))
val sinkDB = Sink(Events.jm.writeValueAsString) // you will need to figure out how to generate the Sink
val flowThrottle = Flow[String]
.throttle(1, 1.second, 1, ThrottleMode.shaping)
val runnable = sourceJson.via[flowThrottle].toMat(sinkDB)(Keep.right)
val result = runnable.run()
answered Nov 22 '18 at 7:10
tomcytomcy
15517
15517
but I don't understand the runnable statement in your code. My requirement is if have I threads=4, batchSize=500 and I somehow want to throttle the writing on producer side limited to 1k records/sec, how can I assure this with your code?
– Aniruddha Tekade
Dec 10 '18 at 23:56
add a comment |
but I don't understand the runnable statement in your code. My requirement is if have I threads=4, batchSize=500 and I somehow want to throttle the writing on producer side limited to 1k records/sec, how can I assure this with your code?
– Aniruddha Tekade
Dec 10 '18 at 23:56
but I don't understand the runnable statement in your code. My requirement is if have I threads=4, batchSize=500 and I somehow want to throttle the writing on producer side limited to 1k records/sec, how can I assure this with your code?
– Aniruddha Tekade
Dec 10 '18 at 23:56
but I don't understand the runnable statement in your code. My requirement is if have I threads=4, batchSize=500 and I somehow want to throttle the writing on producer side limited to 1k records/sec, how can I assure this with your code?
– Aniruddha Tekade
Dec 10 '18 at 23:56
add a comment |
The code block is already called by a thread and there are multiple threads running in parallel. Either I can use Thread.sleep(1000)
or delay(1.0)
in this scala code. But if I use delay()
it will use a promise which might have to call outside the function. Looks like Thread.sleep()
is the best option along with batch size of 1000
. After performing the testing, I could benchmark 120,000 records/thread/sec without any problem.
According to the architecture of memsql, all loads into memsql are done into a rowstore first into the local memory and from there memsql will merge into the columnstore at the end leaves. That resulted into the leaf error everytime I pushed more number of data causing bottleneck. Reducing the batchsize and introducing a Thread.sleep() helped me writing 120,000 records/sec. Performed testing with this benchmark.
add a comment |
The code block is already called by a thread and there are multiple threads running in parallel. Either I can use Thread.sleep(1000)
or delay(1.0)
in this scala code. But if I use delay()
it will use a promise which might have to call outside the function. Looks like Thread.sleep()
is the best option along with batch size of 1000
. After performing the testing, I could benchmark 120,000 records/thread/sec without any problem.
According to the architecture of memsql, all loads into memsql are done into a rowstore first into the local memory and from there memsql will merge into the columnstore at the end leaves. That resulted into the leaf error everytime I pushed more number of data causing bottleneck. Reducing the batchsize and introducing a Thread.sleep() helped me writing 120,000 records/sec. Performed testing with this benchmark.
add a comment |
The code block is already called by a thread and there are multiple threads running in parallel. Either I can use Thread.sleep(1000)
or delay(1.0)
in this scala code. But if I use delay()
it will use a promise which might have to call outside the function. Looks like Thread.sleep()
is the best option along with batch size of 1000
. After performing the testing, I could benchmark 120,000 records/thread/sec without any problem.
According to the architecture of memsql, all loads into memsql are done into a rowstore first into the local memory and from there memsql will merge into the columnstore at the end leaves. That resulted into the leaf error everytime I pushed more number of data causing bottleneck. Reducing the batchsize and introducing a Thread.sleep() helped me writing 120,000 records/sec. Performed testing with this benchmark.
The code block is already called by a thread and there are multiple threads running in parallel. Either I can use Thread.sleep(1000)
or delay(1.0)
in this scala code. But if I use delay()
it will use a promise which might have to call outside the function. Looks like Thread.sleep()
is the best option along with batch size of 1000
. After performing the testing, I could benchmark 120,000 records/thread/sec without any problem.
According to the architecture of memsql, all loads into memsql are done into a rowstore first into the local memory and from there memsql will merge into the columnstore at the end leaves. That resulted into the leaf error everytime I pushed more number of data causing bottleneck. Reducing the batchsize and introducing a Thread.sleep() helped me writing 120,000 records/sec. Performed testing with this benchmark.
answered Nov 21 '18 at 21:34
Aniruddha TekadeAniruddha Tekade
128111
128111
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53402884%2fhow-do-i-limit-write-operations-to-1k-records-sec%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
The way I see it, given code does not tell us any thing about the problem mentioned by you. And it looks like that
dbRecords.map
should bedbRecords.foreach
– Sarvesh Kumar Singh
Nov 21 '18 at 8:45
@SarveshKumarSingh came up with a solution. I think either I introduce delay of 1.0 sec or use Thread.sleep(1000) so that I can limit it. May be I did not understand the requirement right before. According to the architecture of memsql, all loads into memsql are done into a rowstore first, from there memsql will merge into the columnstore. But Memsql first collects data into local memory and then flushes to leafs. That resulted into the leaf error everytime. Reducing the batchsize and introducing a delay helped me writing 120,000 records/sec. Performed testing with this benchmark.
– Aniruddha Tekade
Nov 21 '18 at 21:22