How does the Haskell runtime distinguish between pointers and unboxed word-sized values?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

On a 64-bit platform, OCaml's int type is 63-bits due to pointer tagging. This allows ints to be unboxed and still be distinguishable from pointers at runtime, allowing for a precise GC. IIRC, the GC in the GHC RTS is precise too, but GHC's Ints are 64-bit and they can be unboxed. If that's the case, then how the runtime system distinguish between Ints and pointers? It seems the same problem would arise with distinguishing between other unboxed word-sized values and pointers too.

asked Jan 3 at 5:36

theindigamer

851621

add a comment |

asked Jan 3 at 5:36

theindigamer

851621

add a comment |

asked Jan 3 at 5:36

theindigamer

851621

haskell garbage-collection ghc

asked Jan 3 at 5:36

theindigamer

851621

asked Jan 3 at 5:36

theindigamer

851621

asked Jan 3 at 5:36

theindigamer

851621

asked Jan 3 at 5:36

theindigamer

851621

asked Jan 3 at 5:36

theindigamer

851621

add a comment |

2 Answers
2

active

oldest

votes

Essentially, this is described in the GHC RTS documentation, when it details the format of the heap objects, comprising a header and a payload.

The header describes which words of the payload are pointers, so that garbage collection can work.

This means that there is, roughly, an overhead of a word for any heap object.

answered Jan 3 at 11:26

chi

77.4k287146

add a comment |

The short version is: values are allocated with all of the pointers and all of the non-pointers grouped together, and include a bit of metadata so that the GC knows what to follow.

Note that the Haskell report actually allows Int to be 31 or 63 bits, making the OCaml strategy valid -- but it's not what GHC does.

The slightly longer version is that said "metadata" is really a couple of functions that are used by the garbage collector. To give a rough sketch you could think of values in Haskell as being represented at runtime as an OO-style object with methods:

class Fn:

    # By far the most used; this evaluates the value:

    enter(...) -> ...

    # Used by the garbage collector:

    scavenge(...) -> ...

    evacuate(...) -> ...

The upshot of this is that values know enough about themselves to do the bookkeeping, and for some common layouts GHC defines specialized versions of these functions; the garbage collector can be mostly oblivious to exactly how scavenge and evacuate work. The segregating of pointers & non-pointers is so that a generic implementation can be made and shared for the common case.

Note that the 'enter' function exists even for Haskell values that are not "functions," since laziness means that even if the type is e.g. Int, evaluation may still involve computation.

If you want the very long version, I suggest reading:

https://www.microsoft.com/en-us/research/publication/implementing-lazy-functional-languages-on-stock-hardware-the-spineless-tagless-g-machine/

which goes into quite a lot of detail on how Haskell gets mapped to hardware. It's a fascinating read, and there's a lot of neat stuff in there that differs substantially from how most (strict) functional languages are implemented. The paper is old, but still relevant.

edited Jan 3 at 7:22

answered Jan 3 at 5:58

zenhack

1112

So an object header (for say a type like data Foo = Foo {-# UNPACK #-} !Int String) contains an offset indicating the number of non-pointer words?

– theindigamer
Jan 3 at 6:12

The object header is really just a pointer to a struct with those call(actually called entry)/scavenge/evacuate functions, followed some implementation-specific data. The general-purpose implementation would contain lengths/offsets (though the pointer section is actually first), but for super common layouts you might just have the function pointers point to specialized versions that don't need to look up this info. See also section 7.1 in the paper I linked.

– zenhack
Jan 3 at 6:51

But that section (and a majority of your answer) is about how functions are implemented, whereas my question was primarily about plain values (like Foo), not functions/closures. I thought there was be a difference in implementation between the two, as you wouldn't need functions for entry/scavenge/evacuate for a plain value like Foo 1 "Hi".

– theindigamer
Jan 3 at 7:05

@theindigamer, it's actually the same, because of laziness; I made a few edits to try to make this more clear/explicit in my answer.

– zenhack
Jan 3 at 7:23

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54016849%2fhow-does-the-haskell-runtime-distinguish-between-pointers-and-unboxed-word-sized%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Essentially, this is described in the GHC RTS documentation, when it details the format of the heap objects, comprising a header and a payload.

The header describes which words of the payload are pointers, so that garbage collection can work.

This means that there is, roughly, an overhead of a word for any heap object.

answered Jan 3 at 11:26

chi

77.4k287146

add a comment |

Essentially, this is described in the GHC RTS documentation, when it details the format of the heap objects, comprising a header and a payload.

The header describes which words of the payload are pointers, so that garbage collection can work.

This means that there is, roughly, an overhead of a word for any heap object.

answered Jan 3 at 11:26

chi

77.4k287146

add a comment |

Essentially, this is described in the GHC RTS documentation, when it details the format of the heap objects, comprising a header and a payload.

The header describes which words of the payload are pointers, so that garbage collection can work.

This means that there is, roughly, an overhead of a word for any heap object.

answered Jan 3 at 11:26

chi

77.4k287146

Essentially, this is described in the GHC RTS documentation, when it details the format of the heap objects, comprising a header and a payload.

The header describes which words of the payload are pointers, so that garbage collection can work.

This means that there is, roughly, an overhead of a word for any heap object.

answered Jan 3 at 11:26

chi

77.4k287146

answered Jan 3 at 11:26

chi

77.4k287146

answered Jan 3 at 11:26

chi

77.4k287146

answered Jan 3 at 11:26

chi

77.4k287146

add a comment |

The short version is: values are allocated with all of the pointers and all of the non-pointers grouped together, and include a bit of metadata so that the GC knows what to follow.

Note that the Haskell report actually allows Int to be 31 or 63 bits, making the OCaml strategy valid -- but it's not what GHC does.

class Fn:

    # By far the most used; this evaluates the value:

    enter(...) -> ...

    # Used by the garbage collector:

    scavenge(...) -> ...

    evacuate(...) -> ...

Note that the 'enter' function exists even for Haskell values that are not "functions," since laziness means that even if the type is e.g. Int, evaluation may still involve computation.

If you want the very long version, I suggest reading:

https://www.microsoft.com/en-us/research/publication/implementing-lazy-functional-languages-on-stock-hardware-the-spineless-tagless-g-machine/

edited Jan 3 at 7:22

answered Jan 3 at 5:58

zenhack

1112

So an object header (for say a type like data Foo = Foo {-# UNPACK #-} !Int String) contains an offset indicating the number of non-pointer words?

– theindigamer
Jan 3 at 6:12

The object header is really just a pointer to a struct with those call(actually called entry)/scavenge/evacuate functions, followed some implementation-specific data. The general-purpose implementation would contain lengths/offsets (though the pointer section is actually first), but for super common layouts you might just have the function pointers point to specialized versions that don't need to look up this info. See also section 7.1 in the paper I linked.

– zenhack
Jan 3 at 6:51

But that section (and a majority of your answer) is about how functions are implemented, whereas my question was primarily about plain values (like Foo), not functions/closures. I thought there was be a difference in implementation between the two, as you wouldn't need functions for entry/scavenge/evacuate for a plain value like Foo 1 "Hi".

– theindigamer
Jan 3 at 7:05

@theindigamer, it's actually the same, because of laziness; I made a few edits to try to make this more clear/explicit in my answer.

– zenhack
Jan 3 at 7:23

add a comment |

The short version is: values are allocated with all of the pointers and all of the non-pointers grouped together, and include a bit of metadata so that the GC knows what to follow.

Note that the Haskell report actually allows Int to be 31 or 63 bits, making the OCaml strategy valid -- but it's not what GHC does.

class Fn:

    # By far the most used; this evaluates the value:

    enter(...) -> ...

    # Used by the garbage collector:

    scavenge(...) -> ...

    evacuate(...) -> ...

Note that the 'enter' function exists even for Haskell values that are not "functions," since laziness means that even if the type is e.g. Int, evaluation may still involve computation.

If you want the very long version, I suggest reading:

https://www.microsoft.com/en-us/research/publication/implementing-lazy-functional-languages-on-stock-hardware-the-spineless-tagless-g-machine/

edited Jan 3 at 7:22

answered Jan 3 at 5:58

zenhack

1112

So an object header (for say a type like data Foo = Foo {-# UNPACK #-} !Int String) contains an offset indicating the number of non-pointer words?

– theindigamer
Jan 3 at 6:12

The object header is really just a pointer to a struct with those call(actually called entry)/scavenge/evacuate functions, followed some implementation-specific data. The general-purpose implementation would contain lengths/offsets (though the pointer section is actually first), but for super common layouts you might just have the function pointers point to specialized versions that don't need to look up this info. See also section 7.1 in the paper I linked.

– zenhack
Jan 3 at 6:51

But that section (and a majority of your answer) is about how functions are implemented, whereas my question was primarily about plain values (like Foo), not functions/closures. I thought there was be a difference in implementation between the two, as you wouldn't need functions for entry/scavenge/evacuate for a plain value like Foo 1 "Hi".

– theindigamer
Jan 3 at 7:05

@theindigamer, it's actually the same, because of laziness; I made a few edits to try to make this more clear/explicit in my answer.

– zenhack
Jan 3 at 7:23

add a comment |

The short version is: values are allocated with all of the pointers and all of the non-pointers grouped together, and include a bit of metadata so that the GC knows what to follow.

Note that the Haskell report actually allows Int to be 31 or 63 bits, making the OCaml strategy valid -- but it's not what GHC does.

class Fn:

    # By far the most used; this evaluates the value:

    enter(...) -> ...

    # Used by the garbage collector:

    scavenge(...) -> ...

    evacuate(...) -> ...

Note that the 'enter' function exists even for Haskell values that are not "functions," since laziness means that even if the type is e.g. Int, evaluation may still involve computation.

If you want the very long version, I suggest reading:

https://www.microsoft.com/en-us/research/publication/implementing-lazy-functional-languages-on-stock-hardware-the-spineless-tagless-g-machine/

edited Jan 3 at 7:22

answered Jan 3 at 5:58

zenhack

1112

The short version is: values are allocated with all of the pointers and all of the non-pointers grouped together, and include a bit of metadata so that the GC knows what to follow.

Note that the Haskell report actually allows Int to be 31 or 63 bits, making the OCaml strategy valid -- but it's not what GHC does.

class Fn:

    # By far the most used; this evaluates the value:

    enter(...) -> ...

    # Used by the garbage collector:

    scavenge(...) -> ...

    evacuate(...) -> ...

Note that the 'enter' function exists even for Haskell values that are not "functions," since laziness means that even if the type is e.g. Int, evaluation may still involve computation.

If you want the very long version, I suggest reading:

https://www.microsoft.com/en-us/research/publication/implementing-lazy-functional-languages-on-stock-hardware-the-spineless-tagless-g-machine/

edited Jan 3 at 7:22

answered Jan 3 at 5:58

zenhack

1112

edited Jan 3 at 7:22

answered Jan 3 at 5:58

zenhack

1112

answered Jan 3 at 5:58

zenhack

1112

answered Jan 3 at 5:58

zenhack

1112

So an object header (for say a type like data Foo = Foo {-# UNPACK #-} !Int String) contains an offset indicating the number of non-pointer words?

– theindigamer
Jan 3 at 6:12

The object header is really just a pointer to a struct with those call(actually called entry)/scavenge/evacuate functions, followed some implementation-specific data. The general-purpose implementation would contain lengths/offsets (though the pointer section is actually first), but for super common layouts you might just have the function pointers point to specialized versions that don't need to look up this info. See also section 7.1 in the paper I linked.

– zenhack
Jan 3 at 6:51

But that section (and a majority of your answer) is about how functions are implemented, whereas my question was primarily about plain values (like Foo), not functions/closures. I thought there was be a difference in implementation between the two, as you wouldn't need functions for entry/scavenge/evacuate for a plain value like Foo 1 "Hi".

– theindigamer
Jan 3 at 7:05

@theindigamer, it's actually the same, because of laziness; I made a few edits to try to make this more clear/explicit in my answer.

– zenhack
Jan 3 at 7:23

add a comment |

So an object header (for say a type like data Foo = Foo {-# UNPACK #-} !Int String) contains an offset indicating the number of non-pointer words?

– theindigamer
Jan 3 at 6:12

The object header is really just a pointer to a struct with those call(actually called entry)/scavenge/evacuate functions, followed some implementation-specific data. The general-purpose implementation would contain lengths/offsets (though the pointer section is actually first), but for super common layouts you might just have the function pointers point to specialized versions that don't need to look up this info. See also section 7.1 in the paper I linked.

– zenhack
Jan 3 at 6:51

But that section (and a majority of your answer) is about how functions are implemented, whereas my question was primarily about plain values (like Foo), not functions/closures. I thought there was be a difference in implementation between the two, as you wouldn't need functions for entry/scavenge/evacuate for a plain value like Foo 1 "Hi".

– theindigamer
Jan 3 at 7:05

@theindigamer, it's actually the same, because of laziness; I made a few edits to try to make this more clear/explicit in my answer.

– zenhack
Jan 3 at 7:23

So an object header (for say a type like data Foo = Foo {-# UNPACK #-} !Int String) contains an offset indicating the number of non-pointer words?

– theindigamer
Jan 3 at 6:12

The object header is really just a pointer to a struct with those call(actually called entry)/scavenge/evacuate functions, followed some implementation-specific data. The general-purpose implementation would contain lengths/offsets (though the pointer section is actually first), but for super common layouts you might just have the function pointers point to specialized versions that don't need to look up this info. See also section 7.1 in the paper I linked.

– zenhack
Jan 3 at 6:51

But that section (and a majority of your answer) is about how functions are implemented, whereas my question was primarily about plain values (like Foo), not functions/closures. I thought there was be a difference in implementation between the two, as you wouldn't need functions for entry/scavenge/evacuate for a plain value like Foo 1 "Hi".

– theindigamer
Jan 3 at 7:05

@theindigamer, it's actually the same, because of laziness; I made a few edits to try to make this more clear/explicit in my answer.

– zenhack
Jan 3 at 7:23

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu