Out of memory while adding elements in immutable set in Scala












0














I'm getting out of memory in a loop when I'm adding elements in an immutable set. There are a lot of objects in the set already and I guess it's consuming a lot of memory. I know that while adding elements in immutable collections Scala will first copy the existing collection in a new set, add the element in the new set and will return this new set.



So suppose if my JVM memory is 500mb and the set is consuming 400mb. Now for before adding new element Scala tries to copy old set into a new set (which I think would again consume 400mb again) now at this step, it's already exceeded the JVM memory (total consumed memory 800) and hence it throws out of memory error.
The code looks little bit like below



private def getNewCollection(myMuttableSet:Set[MyType]): Set[MyType] = {
myMuttableSet.flatMap(c => {
val returnedSet = doSomeCalculationsAndreturnASet // this method returns a large collection so duing the loop the collection grows exponentially
if (returnedSet.isEmpty) Set.empty[MyType]
else doSomeCalculationsAndreturnASet + MyType(constArg1,constArg2) (I have case class of MyType)
})
}


Kindly advise if my understanding is correct.










share|improve this question
























  • Could you include your loop code in your question?
    – hasumedic
    Nov 19 '18 at 14:56






  • 1




    Your understanding is correct
    – Dima
    Nov 19 '18 at 15:04










  • Yes you're correct, but it would be nice to see your code, because if you're doing something like: var set = Set.empty[Int]; for (i <- 1 to 1000) { set = set + i}, then, IMHO, would be better to just use a mutable set.
    – Luis Miguel Mejía Suárez
    Nov 19 '18 at 15:18












  • Preference between mutable val vs immutable var should depend in part on visibility. See stackoverflow.com/a/11386867/3226045 for an explanation on referential transparency.
    – Shane Perry
    Nov 19 '18 at 20:33










  • added the algo that is used in the program
    – whysoseriousson
    Nov 20 '18 at 6:57
















0














I'm getting out of memory in a loop when I'm adding elements in an immutable set. There are a lot of objects in the set already and I guess it's consuming a lot of memory. I know that while adding elements in immutable collections Scala will first copy the existing collection in a new set, add the element in the new set and will return this new set.



So suppose if my JVM memory is 500mb and the set is consuming 400mb. Now for before adding new element Scala tries to copy old set into a new set (which I think would again consume 400mb again) now at this step, it's already exceeded the JVM memory (total consumed memory 800) and hence it throws out of memory error.
The code looks little bit like below



private def getNewCollection(myMuttableSet:Set[MyType]): Set[MyType] = {
myMuttableSet.flatMap(c => {
val returnedSet = doSomeCalculationsAndreturnASet // this method returns a large collection so duing the loop the collection grows exponentially
if (returnedSet.isEmpty) Set.empty[MyType]
else doSomeCalculationsAndreturnASet + MyType(constArg1,constArg2) (I have case class of MyType)
})
}


Kindly advise if my understanding is correct.










share|improve this question
























  • Could you include your loop code in your question?
    – hasumedic
    Nov 19 '18 at 14:56






  • 1




    Your understanding is correct
    – Dima
    Nov 19 '18 at 15:04










  • Yes you're correct, but it would be nice to see your code, because if you're doing something like: var set = Set.empty[Int]; for (i <- 1 to 1000) { set = set + i}, then, IMHO, would be better to just use a mutable set.
    – Luis Miguel Mejía Suárez
    Nov 19 '18 at 15:18












  • Preference between mutable val vs immutable var should depend in part on visibility. See stackoverflow.com/a/11386867/3226045 for an explanation on referential transparency.
    – Shane Perry
    Nov 19 '18 at 20:33










  • added the algo that is used in the program
    – whysoseriousson
    Nov 20 '18 at 6:57














0












0








0







I'm getting out of memory in a loop when I'm adding elements in an immutable set. There are a lot of objects in the set already and I guess it's consuming a lot of memory. I know that while adding elements in immutable collections Scala will first copy the existing collection in a new set, add the element in the new set and will return this new set.



So suppose if my JVM memory is 500mb and the set is consuming 400mb. Now for before adding new element Scala tries to copy old set into a new set (which I think would again consume 400mb again) now at this step, it's already exceeded the JVM memory (total consumed memory 800) and hence it throws out of memory error.
The code looks little bit like below



private def getNewCollection(myMuttableSet:Set[MyType]): Set[MyType] = {
myMuttableSet.flatMap(c => {
val returnedSet = doSomeCalculationsAndreturnASet // this method returns a large collection so duing the loop the collection grows exponentially
if (returnedSet.isEmpty) Set.empty[MyType]
else doSomeCalculationsAndreturnASet + MyType(constArg1,constArg2) (I have case class of MyType)
})
}


Kindly advise if my understanding is correct.










share|improve this question















I'm getting out of memory in a loop when I'm adding elements in an immutable set. There are a lot of objects in the set already and I guess it's consuming a lot of memory. I know that while adding elements in immutable collections Scala will first copy the existing collection in a new set, add the element in the new set and will return this new set.



So suppose if my JVM memory is 500mb and the set is consuming 400mb. Now for before adding new element Scala tries to copy old set into a new set (which I think would again consume 400mb again) now at this step, it's already exceeded the JVM memory (total consumed memory 800) and hence it throws out of memory error.
The code looks little bit like below



private def getNewCollection(myMuttableSet:Set[MyType]): Set[MyType] = {
myMuttableSet.flatMap(c => {
val returnedSet = doSomeCalculationsAndreturnASet // this method returns a large collection so duing the loop the collection grows exponentially
if (returnedSet.isEmpty) Set.empty[MyType]
else doSomeCalculationsAndreturnASet + MyType(constArg1,constArg2) (I have case class of MyType)
})
}


Kindly advise if my understanding is correct.







scala collections garbage-collection out-of-memory






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 20 '18 at 6:57

























asked Nov 19 '18 at 14:51









whysoseriousson

326




326












  • Could you include your loop code in your question?
    – hasumedic
    Nov 19 '18 at 14:56






  • 1




    Your understanding is correct
    – Dima
    Nov 19 '18 at 15:04










  • Yes you're correct, but it would be nice to see your code, because if you're doing something like: var set = Set.empty[Int]; for (i <- 1 to 1000) { set = set + i}, then, IMHO, would be better to just use a mutable set.
    – Luis Miguel Mejía Suárez
    Nov 19 '18 at 15:18












  • Preference between mutable val vs immutable var should depend in part on visibility. See stackoverflow.com/a/11386867/3226045 for an explanation on referential transparency.
    – Shane Perry
    Nov 19 '18 at 20:33










  • added the algo that is used in the program
    – whysoseriousson
    Nov 20 '18 at 6:57


















  • Could you include your loop code in your question?
    – hasumedic
    Nov 19 '18 at 14:56






  • 1




    Your understanding is correct
    – Dima
    Nov 19 '18 at 15:04










  • Yes you're correct, but it would be nice to see your code, because if you're doing something like: var set = Set.empty[Int]; for (i <- 1 to 1000) { set = set + i}, then, IMHO, would be better to just use a mutable set.
    – Luis Miguel Mejía Suárez
    Nov 19 '18 at 15:18












  • Preference between mutable val vs immutable var should depend in part on visibility. See stackoverflow.com/a/11386867/3226045 for an explanation on referential transparency.
    – Shane Perry
    Nov 19 '18 at 20:33










  • added the algo that is used in the program
    – whysoseriousson
    Nov 20 '18 at 6:57
















Could you include your loop code in your question?
– hasumedic
Nov 19 '18 at 14:56




Could you include your loop code in your question?
– hasumedic
Nov 19 '18 at 14:56




1




1




Your understanding is correct
– Dima
Nov 19 '18 at 15:04




Your understanding is correct
– Dima
Nov 19 '18 at 15:04












Yes you're correct, but it would be nice to see your code, because if you're doing something like: var set = Set.empty[Int]; for (i <- 1 to 1000) { set = set + i}, then, IMHO, would be better to just use a mutable set.
– Luis Miguel Mejía Suárez
Nov 19 '18 at 15:18






Yes you're correct, but it would be nice to see your code, because if you're doing something like: var set = Set.empty[Int]; for (i <- 1 to 1000) { set = set + i}, then, IMHO, would be better to just use a mutable set.
– Luis Miguel Mejía Suárez
Nov 19 '18 at 15:18














Preference between mutable val vs immutable var should depend in part on visibility. See stackoverflow.com/a/11386867/3226045 for an explanation on referential transparency.
– Shane Perry
Nov 19 '18 at 20:33




Preference between mutable val vs immutable var should depend in part on visibility. See stackoverflow.com/a/11386867/3226045 for an explanation on referential transparency.
– Shane Perry
Nov 19 '18 at 20:33












added the algo that is used in the program
– whysoseriousson
Nov 20 '18 at 6:57




added the algo that is used in the program
– whysoseriousson
Nov 20 '18 at 6:57












2 Answers
2






active

oldest

votes


















0














It is not quite as simple as that because it depends on the size of elements in the Set.



Creating a new Set is a shallow operation and does not copy the elements in the set, it just creates a new wrapper (typically a hash table of some sort) pointing to the same objects.



If you have a small set of large objects then duplicating that set might not take much storage because the objects will be shared between the two sets. Most of the memory is used by the objects in the set and these do not need to be copied to create a new set. So your 400Mb might become 450Mb and fit within the memory limit.



If you have a large set of small objects then duplicating that set may double the storage. Most of the memory is used in the Set itself and can't be shared between the original set and the copy. In this case your 400Mb could easily become close to 800Mb.



Since you are running out of memory and you say there are a lot of objects, then it sounds like this is the problem, but we would need to see the code to tell for sure.






share|improve this answer





















  • Thanks tim for the ans.I am checking the code and will come back
    – whysoseriousson
    Nov 21 '18 at 3:14



















0















Now for before adding new element Scala tries to copy old set into a new set (which I think would again consume 400mb again) now at this step,




This is not correct.



Immutable collections in scala (including Sets) are implemented as persistent data structures, which usually have a property called "structural sharing". That means, when the structure is updated, it's not fully copied, but instead most of it is reused, with only relatively small part being actually re-created from scratch.



The easiest example to illustrate that is List, which is implemented as a single-linked list, with root pointing to the head.



For example, you have the following code:



val a = List(3,2,1)
val b = 4 :: a
val c = 5 :: b


Although the three lists combined have 3 + 4 + 5 = 12 elements in total, they physically share the nodes, and there are only 5 List nodes.



5 →  4  →  3 →  2  → 1
↑ ↑ ↑
c b a


Similar principle applies to Set. Set in scala is implemented as a HashTrie. I won't go into the details about specifics of a Trie, just think about it as a tree with a high branching factor. Now when that tree is updated, it's not copied completely. Only the nodes that are in the path from the tree root to the new/updated node are copied.



For the HashTrie the depth of the tree can not be more than 7 levels. So, when updating Set in scala you're looking at the memory allocation proportional to O(7 * 32) (7 levels max, each node roughly speaking is an array of 32) in the worst case, regardless of the Set size.





Looking at you code, you have following things in memory:





  1. myMuttableSet is present until getNewCollection returns


  2. myMuttableSet.flatMap creates mutable buffer underneath. Also, after flatMap is done, buffer.result will copy the content of the mutable buffer over to immutable set. So there is actually a brief moment when two sets exist.

  3. on every step of flatMap, returnedSet also retains the memory.


Side note: why are you calling doSomeCalculationsAndreturnASet again if you already have it's result cached in the returnedSet? Could it be the root of the problem?



So, at any given point of time you have in memory (whichever is larger):





  • myMuttableSet + mutable result set buffer + returnedSet + (another?) result doSomeCalculationsAndreturnASet


  • myMuttableSet + mutable result set buffer + immutable result set


To conclude, whatever your problems with memory are, adding the element to the Set most probably is not the culprit. My suggestion would be to suspend you program in debugger and use any profiler (such as VisualVM) to make heap dumps at different stages.






share|improve this answer























  • thanks for the detailed analysis .. I will check and come back on this.
    – whysoseriousson
    Nov 21 '18 at 3:14











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53377150%2fout-of-memory-while-adding-elements-in-immutable-set-in-scala%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














It is not quite as simple as that because it depends on the size of elements in the Set.



Creating a new Set is a shallow operation and does not copy the elements in the set, it just creates a new wrapper (typically a hash table of some sort) pointing to the same objects.



If you have a small set of large objects then duplicating that set might not take much storage because the objects will be shared between the two sets. Most of the memory is used by the objects in the set and these do not need to be copied to create a new set. So your 400Mb might become 450Mb and fit within the memory limit.



If you have a large set of small objects then duplicating that set may double the storage. Most of the memory is used in the Set itself and can't be shared between the original set and the copy. In this case your 400Mb could easily become close to 800Mb.



Since you are running out of memory and you say there are a lot of objects, then it sounds like this is the problem, but we would need to see the code to tell for sure.






share|improve this answer





















  • Thanks tim for the ans.I am checking the code and will come back
    – whysoseriousson
    Nov 21 '18 at 3:14
















0














It is not quite as simple as that because it depends on the size of elements in the Set.



Creating a new Set is a shallow operation and does not copy the elements in the set, it just creates a new wrapper (typically a hash table of some sort) pointing to the same objects.



If you have a small set of large objects then duplicating that set might not take much storage because the objects will be shared between the two sets. Most of the memory is used by the objects in the set and these do not need to be copied to create a new set. So your 400Mb might become 450Mb and fit within the memory limit.



If you have a large set of small objects then duplicating that set may double the storage. Most of the memory is used in the Set itself and can't be shared between the original set and the copy. In this case your 400Mb could easily become close to 800Mb.



Since you are running out of memory and you say there are a lot of objects, then it sounds like this is the problem, but we would need to see the code to tell for sure.






share|improve this answer





















  • Thanks tim for the ans.I am checking the code and will come back
    – whysoseriousson
    Nov 21 '18 at 3:14














0












0








0






It is not quite as simple as that because it depends on the size of elements in the Set.



Creating a new Set is a shallow operation and does not copy the elements in the set, it just creates a new wrapper (typically a hash table of some sort) pointing to the same objects.



If you have a small set of large objects then duplicating that set might not take much storage because the objects will be shared between the two sets. Most of the memory is used by the objects in the set and these do not need to be copied to create a new set. So your 400Mb might become 450Mb and fit within the memory limit.



If you have a large set of small objects then duplicating that set may double the storage. Most of the memory is used in the Set itself and can't be shared between the original set and the copy. In this case your 400Mb could easily become close to 800Mb.



Since you are running out of memory and you say there are a lot of objects, then it sounds like this is the problem, but we would need to see the code to tell for sure.






share|improve this answer












It is not quite as simple as that because it depends on the size of elements in the Set.



Creating a new Set is a shallow operation and does not copy the elements in the set, it just creates a new wrapper (typically a hash table of some sort) pointing to the same objects.



If you have a small set of large objects then duplicating that set might not take much storage because the objects will be shared between the two sets. Most of the memory is used by the objects in the set and these do not need to be copied to create a new set. So your 400Mb might become 450Mb and fit within the memory limit.



If you have a large set of small objects then duplicating that set may double the storage. Most of the memory is used in the Set itself and can't be shared between the original set and the copy. In this case your 400Mb could easily become close to 800Mb.



Since you are running out of memory and you say there are a lot of objects, then it sounds like this is the problem, but we would need to see the code to tell for sure.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 19 '18 at 19:17









Tim

4,5821317




4,5821317












  • Thanks tim for the ans.I am checking the code and will come back
    – whysoseriousson
    Nov 21 '18 at 3:14


















  • Thanks tim for the ans.I am checking the code and will come back
    – whysoseriousson
    Nov 21 '18 at 3:14
















Thanks tim for the ans.I am checking the code and will come back
– whysoseriousson
Nov 21 '18 at 3:14




Thanks tim for the ans.I am checking the code and will come back
– whysoseriousson
Nov 21 '18 at 3:14













0















Now for before adding new element Scala tries to copy old set into a new set (which I think would again consume 400mb again) now at this step,




This is not correct.



Immutable collections in scala (including Sets) are implemented as persistent data structures, which usually have a property called "structural sharing". That means, when the structure is updated, it's not fully copied, but instead most of it is reused, with only relatively small part being actually re-created from scratch.



The easiest example to illustrate that is List, which is implemented as a single-linked list, with root pointing to the head.



For example, you have the following code:



val a = List(3,2,1)
val b = 4 :: a
val c = 5 :: b


Although the three lists combined have 3 + 4 + 5 = 12 elements in total, they physically share the nodes, and there are only 5 List nodes.



5 →  4  →  3 →  2  → 1
↑ ↑ ↑
c b a


Similar principle applies to Set. Set in scala is implemented as a HashTrie. I won't go into the details about specifics of a Trie, just think about it as a tree with a high branching factor. Now when that tree is updated, it's not copied completely. Only the nodes that are in the path from the tree root to the new/updated node are copied.



For the HashTrie the depth of the tree can not be more than 7 levels. So, when updating Set in scala you're looking at the memory allocation proportional to O(7 * 32) (7 levels max, each node roughly speaking is an array of 32) in the worst case, regardless of the Set size.





Looking at you code, you have following things in memory:





  1. myMuttableSet is present until getNewCollection returns


  2. myMuttableSet.flatMap creates mutable buffer underneath. Also, after flatMap is done, buffer.result will copy the content of the mutable buffer over to immutable set. So there is actually a brief moment when two sets exist.

  3. on every step of flatMap, returnedSet also retains the memory.


Side note: why are you calling doSomeCalculationsAndreturnASet again if you already have it's result cached in the returnedSet? Could it be the root of the problem?



So, at any given point of time you have in memory (whichever is larger):





  • myMuttableSet + mutable result set buffer + returnedSet + (another?) result doSomeCalculationsAndreturnASet


  • myMuttableSet + mutable result set buffer + immutable result set


To conclude, whatever your problems with memory are, adding the element to the Set most probably is not the culprit. My suggestion would be to suspend you program in debugger and use any profiler (such as VisualVM) to make heap dumps at different stages.






share|improve this answer























  • thanks for the detailed analysis .. I will check and come back on this.
    – whysoseriousson
    Nov 21 '18 at 3:14
















0















Now for before adding new element Scala tries to copy old set into a new set (which I think would again consume 400mb again) now at this step,




This is not correct.



Immutable collections in scala (including Sets) are implemented as persistent data structures, which usually have a property called "structural sharing". That means, when the structure is updated, it's not fully copied, but instead most of it is reused, with only relatively small part being actually re-created from scratch.



The easiest example to illustrate that is List, which is implemented as a single-linked list, with root pointing to the head.



For example, you have the following code:



val a = List(3,2,1)
val b = 4 :: a
val c = 5 :: b


Although the three lists combined have 3 + 4 + 5 = 12 elements in total, they physically share the nodes, and there are only 5 List nodes.



5 →  4  →  3 →  2  → 1
↑ ↑ ↑
c b a


Similar principle applies to Set. Set in scala is implemented as a HashTrie. I won't go into the details about specifics of a Trie, just think about it as a tree with a high branching factor. Now when that tree is updated, it's not copied completely. Only the nodes that are in the path from the tree root to the new/updated node are copied.



For the HashTrie the depth of the tree can not be more than 7 levels. So, when updating Set in scala you're looking at the memory allocation proportional to O(7 * 32) (7 levels max, each node roughly speaking is an array of 32) in the worst case, regardless of the Set size.





Looking at you code, you have following things in memory:





  1. myMuttableSet is present until getNewCollection returns


  2. myMuttableSet.flatMap creates mutable buffer underneath. Also, after flatMap is done, buffer.result will copy the content of the mutable buffer over to immutable set. So there is actually a brief moment when two sets exist.

  3. on every step of flatMap, returnedSet also retains the memory.


Side note: why are you calling doSomeCalculationsAndreturnASet again if you already have it's result cached in the returnedSet? Could it be the root of the problem?



So, at any given point of time you have in memory (whichever is larger):





  • myMuttableSet + mutable result set buffer + returnedSet + (another?) result doSomeCalculationsAndreturnASet


  • myMuttableSet + mutable result set buffer + immutable result set


To conclude, whatever your problems with memory are, adding the element to the Set most probably is not the culprit. My suggestion would be to suspend you program in debugger and use any profiler (such as VisualVM) to make heap dumps at different stages.






share|improve this answer























  • thanks for the detailed analysis .. I will check and come back on this.
    – whysoseriousson
    Nov 21 '18 at 3:14














0












0








0







Now for before adding new element Scala tries to copy old set into a new set (which I think would again consume 400mb again) now at this step,




This is not correct.



Immutable collections in scala (including Sets) are implemented as persistent data structures, which usually have a property called "structural sharing". That means, when the structure is updated, it's not fully copied, but instead most of it is reused, with only relatively small part being actually re-created from scratch.



The easiest example to illustrate that is List, which is implemented as a single-linked list, with root pointing to the head.



For example, you have the following code:



val a = List(3,2,1)
val b = 4 :: a
val c = 5 :: b


Although the three lists combined have 3 + 4 + 5 = 12 elements in total, they physically share the nodes, and there are only 5 List nodes.



5 →  4  →  3 →  2  → 1
↑ ↑ ↑
c b a


Similar principle applies to Set. Set in scala is implemented as a HashTrie. I won't go into the details about specifics of a Trie, just think about it as a tree with a high branching factor. Now when that tree is updated, it's not copied completely. Only the nodes that are in the path from the tree root to the new/updated node are copied.



For the HashTrie the depth of the tree can not be more than 7 levels. So, when updating Set in scala you're looking at the memory allocation proportional to O(7 * 32) (7 levels max, each node roughly speaking is an array of 32) in the worst case, regardless of the Set size.





Looking at you code, you have following things in memory:





  1. myMuttableSet is present until getNewCollection returns


  2. myMuttableSet.flatMap creates mutable buffer underneath. Also, after flatMap is done, buffer.result will copy the content of the mutable buffer over to immutable set. So there is actually a brief moment when two sets exist.

  3. on every step of flatMap, returnedSet also retains the memory.


Side note: why are you calling doSomeCalculationsAndreturnASet again if you already have it's result cached in the returnedSet? Could it be the root of the problem?



So, at any given point of time you have in memory (whichever is larger):





  • myMuttableSet + mutable result set buffer + returnedSet + (another?) result doSomeCalculationsAndreturnASet


  • myMuttableSet + mutable result set buffer + immutable result set


To conclude, whatever your problems with memory are, adding the element to the Set most probably is not the culprit. My suggestion would be to suspend you program in debugger and use any profiler (such as VisualVM) to make heap dumps at different stages.






share|improve this answer















Now for before adding new element Scala tries to copy old set into a new set (which I think would again consume 400mb again) now at this step,




This is not correct.



Immutable collections in scala (including Sets) are implemented as persistent data structures, which usually have a property called "structural sharing". That means, when the structure is updated, it's not fully copied, but instead most of it is reused, with only relatively small part being actually re-created from scratch.



The easiest example to illustrate that is List, which is implemented as a single-linked list, with root pointing to the head.



For example, you have the following code:



val a = List(3,2,1)
val b = 4 :: a
val c = 5 :: b


Although the three lists combined have 3 + 4 + 5 = 12 elements in total, they physically share the nodes, and there are only 5 List nodes.



5 →  4  →  3 →  2  → 1
↑ ↑ ↑
c b a


Similar principle applies to Set. Set in scala is implemented as a HashTrie. I won't go into the details about specifics of a Trie, just think about it as a tree with a high branching factor. Now when that tree is updated, it's not copied completely. Only the nodes that are in the path from the tree root to the new/updated node are copied.



For the HashTrie the depth of the tree can not be more than 7 levels. So, when updating Set in scala you're looking at the memory allocation proportional to O(7 * 32) (7 levels max, each node roughly speaking is an array of 32) in the worst case, regardless of the Set size.





Looking at you code, you have following things in memory:





  1. myMuttableSet is present until getNewCollection returns


  2. myMuttableSet.flatMap creates mutable buffer underneath. Also, after flatMap is done, buffer.result will copy the content of the mutable buffer over to immutable set. So there is actually a brief moment when two sets exist.

  3. on every step of flatMap, returnedSet also retains the memory.


Side note: why are you calling doSomeCalculationsAndreturnASet again if you already have it's result cached in the returnedSet? Could it be the root of the problem?



So, at any given point of time you have in memory (whichever is larger):





  • myMuttableSet + mutable result set buffer + returnedSet + (another?) result doSomeCalculationsAndreturnASet


  • myMuttableSet + mutable result set buffer + immutable result set


To conclude, whatever your problems with memory are, adding the element to the Set most probably is not the culprit. My suggestion would be to suspend you program in debugger and use any profiler (such as VisualVM) to make heap dumps at different stages.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 20 '18 at 8:17

























answered Nov 20 '18 at 8:03









Aivean

7,4621526




7,4621526












  • thanks for the detailed analysis .. I will check and come back on this.
    – whysoseriousson
    Nov 21 '18 at 3:14


















  • thanks for the detailed analysis .. I will check and come back on this.
    – whysoseriousson
    Nov 21 '18 at 3:14
















thanks for the detailed analysis .. I will check and come back on this.
– whysoseriousson
Nov 21 '18 at 3:14




thanks for the detailed analysis .. I will check and come back on this.
– whysoseriousson
Nov 21 '18 at 3:14


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53377150%2fout-of-memory-while-adding-elements-in-immutable-set-in-scala%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

MongoDB - Not Authorized To Execute Command

in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith

Npm cannot find a required file even through it is in the searched directory