How does the Haskell runtime distinguish between pointers and unboxed word-sized values?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
On a 64-bit platform, OCaml's int
type is 63-bits due to pointer tagging. This allows ints to be unboxed and still be distinguishable from pointers at runtime, allowing for a precise GC. IIRC, the GC in the GHC RTS is precise too, but GHC's Ints are 64-bit and they can be unboxed. If that's the case, then how the runtime system distinguish between Ints and pointers? It seems the same problem would arise with distinguishing between other unboxed word-sized values and pointers too.
haskell garbage-collection ghc
add a comment |
On a 64-bit platform, OCaml's int
type is 63-bits due to pointer tagging. This allows ints to be unboxed and still be distinguishable from pointers at runtime, allowing for a precise GC. IIRC, the GC in the GHC RTS is precise too, but GHC's Ints are 64-bit and they can be unboxed. If that's the case, then how the runtime system distinguish between Ints and pointers? It seems the same problem would arise with distinguishing between other unboxed word-sized values and pointers too.
haskell garbage-collection ghc
add a comment |
On a 64-bit platform, OCaml's int
type is 63-bits due to pointer tagging. This allows ints to be unboxed and still be distinguishable from pointers at runtime, allowing for a precise GC. IIRC, the GC in the GHC RTS is precise too, but GHC's Ints are 64-bit and they can be unboxed. If that's the case, then how the runtime system distinguish between Ints and pointers? It seems the same problem would arise with distinguishing between other unboxed word-sized values and pointers too.
haskell garbage-collection ghc
On a 64-bit platform, OCaml's int
type is 63-bits due to pointer tagging. This allows ints to be unboxed and still be distinguishable from pointers at runtime, allowing for a precise GC. IIRC, the GC in the GHC RTS is precise too, but GHC's Ints are 64-bit and they can be unboxed. If that's the case, then how the runtime system distinguish between Ints and pointers? It seems the same problem would arise with distinguishing between other unboxed word-sized values and pointers too.
haskell garbage-collection ghc
haskell garbage-collection ghc
asked Jan 3 at 5:36


theindigamertheindigamer
851621
851621
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
Essentially, this is described in the GHC RTS documentation, when it details the format of the heap objects, comprising a header and a payload.
The header describes which words of the payload are pointers, so that garbage collection can work.
This means that there is, roughly, an overhead of a word for any heap object.
add a comment |
The short version is: values are allocated with all of the pointers and all of the non-pointers grouped together, and include a bit of metadata so that the GC knows what to follow.
Note that the Haskell report actually allows Int
to be 31 or 63 bits, making the OCaml strategy valid -- but it's not what GHC does.
The slightly longer version is that said "metadata" is really a couple of functions that are used by the garbage collector. To give a rough sketch you could think of values in Haskell as being represented at runtime as an OO-style object with methods:
class Fn:
# By far the most used; this evaluates the value:
enter(...) -> ...
# Used by the garbage collector:
scavenge(...) -> ...
evacuate(...) -> ...
The upshot of this is that values know enough about themselves to do the bookkeeping, and for some common layouts GHC defines specialized versions of these functions; the garbage collector can be mostly oblivious to exactly how scavenge and evacuate work. The segregating of pointers & non-pointers is so that a generic implementation can be made and shared for the common case.
Note that the 'enter' function exists even for Haskell values that are not "functions," since laziness means that even if the type is e.g. Int, evaluation may still involve computation.
If you want the very long version, I suggest reading:
https://www.microsoft.com/en-us/research/publication/implementing-lazy-functional-languages-on-stock-hardware-the-spineless-tagless-g-machine/
which goes into quite a lot of detail on how Haskell gets mapped to hardware. It's a fascinating read, and there's a lot of neat stuff in there that differs substantially from how most (strict) functional languages are implemented. The paper is old, but still relevant.
So an object header (for say a type likedata Foo = Foo {-# UNPACK #-} !Int String
) contains an offset indicating the number of non-pointer words?
– theindigamer
Jan 3 at 6:12
The object header is really just a pointer to a struct with those call(actually called entry)/scavenge/evacuate functions, followed some implementation-specific data. The general-purpose implementation would contain lengths/offsets (though the pointer section is actually first), but for super common layouts you might just have the function pointers point to specialized versions that don't need to look up this info. See also section 7.1 in the paper I linked.
– zenhack
Jan 3 at 6:51
But that section (and a majority of your answer) is about how functions are implemented, whereas my question was primarily about plain values (likeFoo
), not functions/closures. I thought there was be a difference in implementation between the two, as you wouldn't need functions forentry
/scavenge
/evacuate
for a plain value likeFoo 1 "Hi"
.
– theindigamer
Jan 3 at 7:05
@theindigamer, it's actually the same, because of laziness; I made a few edits to try to make this more clear/explicit in my answer.
– zenhack
Jan 3 at 7:23
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54016849%2fhow-does-the-haskell-runtime-distinguish-between-pointers-and-unboxed-word-sized%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Essentially, this is described in the GHC RTS documentation, when it details the format of the heap objects, comprising a header and a payload.
The header describes which words of the payload are pointers, so that garbage collection can work.
This means that there is, roughly, an overhead of a word for any heap object.
add a comment |
Essentially, this is described in the GHC RTS documentation, when it details the format of the heap objects, comprising a header and a payload.
The header describes which words of the payload are pointers, so that garbage collection can work.
This means that there is, roughly, an overhead of a word for any heap object.
add a comment |
Essentially, this is described in the GHC RTS documentation, when it details the format of the heap objects, comprising a header and a payload.
The header describes which words of the payload are pointers, so that garbage collection can work.
This means that there is, roughly, an overhead of a word for any heap object.
Essentially, this is described in the GHC RTS documentation, when it details the format of the heap objects, comprising a header and a payload.
The header describes which words of the payload are pointers, so that garbage collection can work.
This means that there is, roughly, an overhead of a word for any heap object.
answered Jan 3 at 11:26
chichi
77.4k287146
77.4k287146
add a comment |
add a comment |
The short version is: values are allocated with all of the pointers and all of the non-pointers grouped together, and include a bit of metadata so that the GC knows what to follow.
Note that the Haskell report actually allows Int
to be 31 or 63 bits, making the OCaml strategy valid -- but it's not what GHC does.
The slightly longer version is that said "metadata" is really a couple of functions that are used by the garbage collector. To give a rough sketch you could think of values in Haskell as being represented at runtime as an OO-style object with methods:
class Fn:
# By far the most used; this evaluates the value:
enter(...) -> ...
# Used by the garbage collector:
scavenge(...) -> ...
evacuate(...) -> ...
The upshot of this is that values know enough about themselves to do the bookkeeping, and for some common layouts GHC defines specialized versions of these functions; the garbage collector can be mostly oblivious to exactly how scavenge and evacuate work. The segregating of pointers & non-pointers is so that a generic implementation can be made and shared for the common case.
Note that the 'enter' function exists even for Haskell values that are not "functions," since laziness means that even if the type is e.g. Int, evaluation may still involve computation.
If you want the very long version, I suggest reading:
https://www.microsoft.com/en-us/research/publication/implementing-lazy-functional-languages-on-stock-hardware-the-spineless-tagless-g-machine/
which goes into quite a lot of detail on how Haskell gets mapped to hardware. It's a fascinating read, and there's a lot of neat stuff in there that differs substantially from how most (strict) functional languages are implemented. The paper is old, but still relevant.
So an object header (for say a type likedata Foo = Foo {-# UNPACK #-} !Int String
) contains an offset indicating the number of non-pointer words?
– theindigamer
Jan 3 at 6:12
The object header is really just a pointer to a struct with those call(actually called entry)/scavenge/evacuate functions, followed some implementation-specific data. The general-purpose implementation would contain lengths/offsets (though the pointer section is actually first), but for super common layouts you might just have the function pointers point to specialized versions that don't need to look up this info. See also section 7.1 in the paper I linked.
– zenhack
Jan 3 at 6:51
But that section (and a majority of your answer) is about how functions are implemented, whereas my question was primarily about plain values (likeFoo
), not functions/closures. I thought there was be a difference in implementation between the two, as you wouldn't need functions forentry
/scavenge
/evacuate
for a plain value likeFoo 1 "Hi"
.
– theindigamer
Jan 3 at 7:05
@theindigamer, it's actually the same, because of laziness; I made a few edits to try to make this more clear/explicit in my answer.
– zenhack
Jan 3 at 7:23
add a comment |
The short version is: values are allocated with all of the pointers and all of the non-pointers grouped together, and include a bit of metadata so that the GC knows what to follow.
Note that the Haskell report actually allows Int
to be 31 or 63 bits, making the OCaml strategy valid -- but it's not what GHC does.
The slightly longer version is that said "metadata" is really a couple of functions that are used by the garbage collector. To give a rough sketch you could think of values in Haskell as being represented at runtime as an OO-style object with methods:
class Fn:
# By far the most used; this evaluates the value:
enter(...) -> ...
# Used by the garbage collector:
scavenge(...) -> ...
evacuate(...) -> ...
The upshot of this is that values know enough about themselves to do the bookkeeping, and for some common layouts GHC defines specialized versions of these functions; the garbage collector can be mostly oblivious to exactly how scavenge and evacuate work. The segregating of pointers & non-pointers is so that a generic implementation can be made and shared for the common case.
Note that the 'enter' function exists even for Haskell values that are not "functions," since laziness means that even if the type is e.g. Int, evaluation may still involve computation.
If you want the very long version, I suggest reading:
https://www.microsoft.com/en-us/research/publication/implementing-lazy-functional-languages-on-stock-hardware-the-spineless-tagless-g-machine/
which goes into quite a lot of detail on how Haskell gets mapped to hardware. It's a fascinating read, and there's a lot of neat stuff in there that differs substantially from how most (strict) functional languages are implemented. The paper is old, but still relevant.
So an object header (for say a type likedata Foo = Foo {-# UNPACK #-} !Int String
) contains an offset indicating the number of non-pointer words?
– theindigamer
Jan 3 at 6:12
The object header is really just a pointer to a struct with those call(actually called entry)/scavenge/evacuate functions, followed some implementation-specific data. The general-purpose implementation would contain lengths/offsets (though the pointer section is actually first), but for super common layouts you might just have the function pointers point to specialized versions that don't need to look up this info. See also section 7.1 in the paper I linked.
– zenhack
Jan 3 at 6:51
But that section (and a majority of your answer) is about how functions are implemented, whereas my question was primarily about plain values (likeFoo
), not functions/closures. I thought there was be a difference in implementation between the two, as you wouldn't need functions forentry
/scavenge
/evacuate
for a plain value likeFoo 1 "Hi"
.
– theindigamer
Jan 3 at 7:05
@theindigamer, it's actually the same, because of laziness; I made a few edits to try to make this more clear/explicit in my answer.
– zenhack
Jan 3 at 7:23
add a comment |
The short version is: values are allocated with all of the pointers and all of the non-pointers grouped together, and include a bit of metadata so that the GC knows what to follow.
Note that the Haskell report actually allows Int
to be 31 or 63 bits, making the OCaml strategy valid -- but it's not what GHC does.
The slightly longer version is that said "metadata" is really a couple of functions that are used by the garbage collector. To give a rough sketch you could think of values in Haskell as being represented at runtime as an OO-style object with methods:
class Fn:
# By far the most used; this evaluates the value:
enter(...) -> ...
# Used by the garbage collector:
scavenge(...) -> ...
evacuate(...) -> ...
The upshot of this is that values know enough about themselves to do the bookkeeping, and for some common layouts GHC defines specialized versions of these functions; the garbage collector can be mostly oblivious to exactly how scavenge and evacuate work. The segregating of pointers & non-pointers is so that a generic implementation can be made and shared for the common case.
Note that the 'enter' function exists even for Haskell values that are not "functions," since laziness means that even if the type is e.g. Int, evaluation may still involve computation.
If you want the very long version, I suggest reading:
https://www.microsoft.com/en-us/research/publication/implementing-lazy-functional-languages-on-stock-hardware-the-spineless-tagless-g-machine/
which goes into quite a lot of detail on how Haskell gets mapped to hardware. It's a fascinating read, and there's a lot of neat stuff in there that differs substantially from how most (strict) functional languages are implemented. The paper is old, but still relevant.
The short version is: values are allocated with all of the pointers and all of the non-pointers grouped together, and include a bit of metadata so that the GC knows what to follow.
Note that the Haskell report actually allows Int
to be 31 or 63 bits, making the OCaml strategy valid -- but it's not what GHC does.
The slightly longer version is that said "metadata" is really a couple of functions that are used by the garbage collector. To give a rough sketch you could think of values in Haskell as being represented at runtime as an OO-style object with methods:
class Fn:
# By far the most used; this evaluates the value:
enter(...) -> ...
# Used by the garbage collector:
scavenge(...) -> ...
evacuate(...) -> ...
The upshot of this is that values know enough about themselves to do the bookkeeping, and for some common layouts GHC defines specialized versions of these functions; the garbage collector can be mostly oblivious to exactly how scavenge and evacuate work. The segregating of pointers & non-pointers is so that a generic implementation can be made and shared for the common case.
Note that the 'enter' function exists even for Haskell values that are not "functions," since laziness means that even if the type is e.g. Int, evaluation may still involve computation.
If you want the very long version, I suggest reading:
https://www.microsoft.com/en-us/research/publication/implementing-lazy-functional-languages-on-stock-hardware-the-spineless-tagless-g-machine/
which goes into quite a lot of detail on how Haskell gets mapped to hardware. It's a fascinating read, and there's a lot of neat stuff in there that differs substantially from how most (strict) functional languages are implemented. The paper is old, but still relevant.
edited Jan 3 at 7:22
answered Jan 3 at 5:58
zenhackzenhack
1112
1112
So an object header (for say a type likedata Foo = Foo {-# UNPACK #-} !Int String
) contains an offset indicating the number of non-pointer words?
– theindigamer
Jan 3 at 6:12
The object header is really just a pointer to a struct with those call(actually called entry)/scavenge/evacuate functions, followed some implementation-specific data. The general-purpose implementation would contain lengths/offsets (though the pointer section is actually first), but for super common layouts you might just have the function pointers point to specialized versions that don't need to look up this info. See also section 7.1 in the paper I linked.
– zenhack
Jan 3 at 6:51
But that section (and a majority of your answer) is about how functions are implemented, whereas my question was primarily about plain values (likeFoo
), not functions/closures. I thought there was be a difference in implementation between the two, as you wouldn't need functions forentry
/scavenge
/evacuate
for a plain value likeFoo 1 "Hi"
.
– theindigamer
Jan 3 at 7:05
@theindigamer, it's actually the same, because of laziness; I made a few edits to try to make this more clear/explicit in my answer.
– zenhack
Jan 3 at 7:23
add a comment |
So an object header (for say a type likedata Foo = Foo {-# UNPACK #-} !Int String
) contains an offset indicating the number of non-pointer words?
– theindigamer
Jan 3 at 6:12
The object header is really just a pointer to a struct with those call(actually called entry)/scavenge/evacuate functions, followed some implementation-specific data. The general-purpose implementation would contain lengths/offsets (though the pointer section is actually first), but for super common layouts you might just have the function pointers point to specialized versions that don't need to look up this info. See also section 7.1 in the paper I linked.
– zenhack
Jan 3 at 6:51
But that section (and a majority of your answer) is about how functions are implemented, whereas my question was primarily about plain values (likeFoo
), not functions/closures. I thought there was be a difference in implementation between the two, as you wouldn't need functions forentry
/scavenge
/evacuate
for a plain value likeFoo 1 "Hi"
.
– theindigamer
Jan 3 at 7:05
@theindigamer, it's actually the same, because of laziness; I made a few edits to try to make this more clear/explicit in my answer.
– zenhack
Jan 3 at 7:23
So an object header (for say a type like
data Foo = Foo {-# UNPACK #-} !Int String
) contains an offset indicating the number of non-pointer words?– theindigamer
Jan 3 at 6:12
So an object header (for say a type like
data Foo = Foo {-# UNPACK #-} !Int String
) contains an offset indicating the number of non-pointer words?– theindigamer
Jan 3 at 6:12
The object header is really just a pointer to a struct with those call(actually called entry)/scavenge/evacuate functions, followed some implementation-specific data. The general-purpose implementation would contain lengths/offsets (though the pointer section is actually first), but for super common layouts you might just have the function pointers point to specialized versions that don't need to look up this info. See also section 7.1 in the paper I linked.
– zenhack
Jan 3 at 6:51
The object header is really just a pointer to a struct with those call(actually called entry)/scavenge/evacuate functions, followed some implementation-specific data. The general-purpose implementation would contain lengths/offsets (though the pointer section is actually first), but for super common layouts you might just have the function pointers point to specialized versions that don't need to look up this info. See also section 7.1 in the paper I linked.
– zenhack
Jan 3 at 6:51
But that section (and a majority of your answer) is about how functions are implemented, whereas my question was primarily about plain values (like
Foo
), not functions/closures. I thought there was be a difference in implementation between the two, as you wouldn't need functions for entry
/scavenge
/evacuate
for a plain value like Foo 1 "Hi"
.– theindigamer
Jan 3 at 7:05
But that section (and a majority of your answer) is about how functions are implemented, whereas my question was primarily about plain values (like
Foo
), not functions/closures. I thought there was be a difference in implementation between the two, as you wouldn't need functions for entry
/scavenge
/evacuate
for a plain value like Foo 1 "Hi"
.– theindigamer
Jan 3 at 7:05
@theindigamer, it's actually the same, because of laziness; I made a few edits to try to make this more clear/explicit in my answer.
– zenhack
Jan 3 at 7:23
@theindigamer, it's actually the same, because of laziness; I made a few edits to try to make this more clear/explicit in my answer.
– zenhack
Jan 3 at 7:23
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54016849%2fhow-does-the-haskell-runtime-distinguish-between-pointers-and-unboxed-word-sized%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown