How does the Haskell runtime distinguish between pointers and unboxed word-sized values?





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







4















On a 64-bit platform, OCaml's int type is 63-bits due to pointer tagging. This allows ints to be unboxed and still be distinguishable from pointers at runtime, allowing for a precise GC. IIRC, the GC in the GHC RTS is precise too, but GHC's Ints are 64-bit and they can be unboxed. If that's the case, then how the runtime system distinguish between Ints and pointers? It seems the same problem would arise with distinguishing between other unboxed word-sized values and pointers too.










share|improve this question





























    4















    On a 64-bit platform, OCaml's int type is 63-bits due to pointer tagging. This allows ints to be unboxed and still be distinguishable from pointers at runtime, allowing for a precise GC. IIRC, the GC in the GHC RTS is precise too, but GHC's Ints are 64-bit and they can be unboxed. If that's the case, then how the runtime system distinguish between Ints and pointers? It seems the same problem would arise with distinguishing between other unboxed word-sized values and pointers too.










    share|improve this question

























      4












      4








      4


      1






      On a 64-bit platform, OCaml's int type is 63-bits due to pointer tagging. This allows ints to be unboxed and still be distinguishable from pointers at runtime, allowing for a precise GC. IIRC, the GC in the GHC RTS is precise too, but GHC's Ints are 64-bit and they can be unboxed. If that's the case, then how the runtime system distinguish between Ints and pointers? It seems the same problem would arise with distinguishing between other unboxed word-sized values and pointers too.










      share|improve this question














      On a 64-bit platform, OCaml's int type is 63-bits due to pointer tagging. This allows ints to be unboxed and still be distinguishable from pointers at runtime, allowing for a precise GC. IIRC, the GC in the GHC RTS is precise too, but GHC's Ints are 64-bit and they can be unboxed. If that's the case, then how the runtime system distinguish between Ints and pointers? It seems the same problem would arise with distinguishing between other unboxed word-sized values and pointers too.







      haskell garbage-collection ghc






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Jan 3 at 5:36









      theindigamertheindigamer

      851621




      851621
























          2 Answers
          2






          active

          oldest

          votes


















          1














          Essentially, this is described in the GHC RTS documentation, when it details the format of the heap objects, comprising a header and a payload.



          The header describes which words of the payload are pointers, so that garbage collection can work.



          This means that there is, roughly, an overhead of a word for any heap object.






          share|improve this answer































            6














            The short version is: values are allocated with all of the pointers and all of the non-pointers grouped together, and include a bit of metadata so that the GC knows what to follow.



            Note that the Haskell report actually allows Int to be 31 or 63 bits, making the OCaml strategy valid -- but it's not what GHC does.



            The slightly longer version is that said "metadata" is really a couple of functions that are used by the garbage collector. To give a rough sketch you could think of values in Haskell as being represented at runtime as an OO-style object with methods:



            class Fn:
            # By far the most used; this evaluates the value:
            enter(...) -> ...
            # Used by the garbage collector:
            scavenge(...) -> ...
            evacuate(...) -> ...


            The upshot of this is that values know enough about themselves to do the bookkeeping, and for some common layouts GHC defines specialized versions of these functions; the garbage collector can be mostly oblivious to exactly how scavenge and evacuate work. The segregating of pointers & non-pointers is so that a generic implementation can be made and shared for the common case.



            Note that the 'enter' function exists even for Haskell values that are not "functions," since laziness means that even if the type is e.g. Int, evaluation may still involve computation.



            If you want the very long version, I suggest reading:



            https://www.microsoft.com/en-us/research/publication/implementing-lazy-functional-languages-on-stock-hardware-the-spineless-tagless-g-machine/



            which goes into quite a lot of detail on how Haskell gets mapped to hardware. It's a fascinating read, and there's a lot of neat stuff in there that differs substantially from how most (strict) functional languages are implemented. The paper is old, but still relevant.






            share|improve this answer


























            • So an object header (for say a type like data Foo = Foo {-# UNPACK #-} !Int String) contains an offset indicating the number of non-pointer words?

              – theindigamer
              Jan 3 at 6:12











            • The object header is really just a pointer to a struct with those call(actually called entry)/scavenge/evacuate functions, followed some implementation-specific data. The general-purpose implementation would contain lengths/offsets (though the pointer section is actually first), but for super common layouts you might just have the function pointers point to specialized versions that don't need to look up this info. See also section 7.1 in the paper I linked.

              – zenhack
              Jan 3 at 6:51











            • But that section (and a majority of your answer) is about how functions are implemented, whereas my question was primarily about plain values (like Foo), not functions/closures. I thought there was be a difference in implementation between the two, as you wouldn't need functions for entry/scavenge/evacuate for a plain value like Foo 1 "Hi".

              – theindigamer
              Jan 3 at 7:05











            • @theindigamer, it's actually the same, because of laziness; I made a few edits to try to make this more clear/explicit in my answer.

              – zenhack
              Jan 3 at 7:23












            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54016849%2fhow-does-the-haskell-runtime-distinguish-between-pointers-and-unboxed-word-sized%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            Essentially, this is described in the GHC RTS documentation, when it details the format of the heap objects, comprising a header and a payload.



            The header describes which words of the payload are pointers, so that garbage collection can work.



            This means that there is, roughly, an overhead of a word for any heap object.






            share|improve this answer




























              1














              Essentially, this is described in the GHC RTS documentation, when it details the format of the heap objects, comprising a header and a payload.



              The header describes which words of the payload are pointers, so that garbage collection can work.



              This means that there is, roughly, an overhead of a word for any heap object.






              share|improve this answer


























                1












                1








                1







                Essentially, this is described in the GHC RTS documentation, when it details the format of the heap objects, comprising a header and a payload.



                The header describes which words of the payload are pointers, so that garbage collection can work.



                This means that there is, roughly, an overhead of a word for any heap object.






                share|improve this answer













                Essentially, this is described in the GHC RTS documentation, when it details the format of the heap objects, comprising a header and a payload.



                The header describes which words of the payload are pointers, so that garbage collection can work.



                This means that there is, roughly, an overhead of a word for any heap object.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Jan 3 at 11:26









                chichi

                77.4k287146




                77.4k287146

























                    6














                    The short version is: values are allocated with all of the pointers and all of the non-pointers grouped together, and include a bit of metadata so that the GC knows what to follow.



                    Note that the Haskell report actually allows Int to be 31 or 63 bits, making the OCaml strategy valid -- but it's not what GHC does.



                    The slightly longer version is that said "metadata" is really a couple of functions that are used by the garbage collector. To give a rough sketch you could think of values in Haskell as being represented at runtime as an OO-style object with methods:



                    class Fn:
                    # By far the most used; this evaluates the value:
                    enter(...) -> ...
                    # Used by the garbage collector:
                    scavenge(...) -> ...
                    evacuate(...) -> ...


                    The upshot of this is that values know enough about themselves to do the bookkeeping, and for some common layouts GHC defines specialized versions of these functions; the garbage collector can be mostly oblivious to exactly how scavenge and evacuate work. The segregating of pointers & non-pointers is so that a generic implementation can be made and shared for the common case.



                    Note that the 'enter' function exists even for Haskell values that are not "functions," since laziness means that even if the type is e.g. Int, evaluation may still involve computation.



                    If you want the very long version, I suggest reading:



                    https://www.microsoft.com/en-us/research/publication/implementing-lazy-functional-languages-on-stock-hardware-the-spineless-tagless-g-machine/



                    which goes into quite a lot of detail on how Haskell gets mapped to hardware. It's a fascinating read, and there's a lot of neat stuff in there that differs substantially from how most (strict) functional languages are implemented. The paper is old, but still relevant.






                    share|improve this answer


























                    • So an object header (for say a type like data Foo = Foo {-# UNPACK #-} !Int String) contains an offset indicating the number of non-pointer words?

                      – theindigamer
                      Jan 3 at 6:12











                    • The object header is really just a pointer to a struct with those call(actually called entry)/scavenge/evacuate functions, followed some implementation-specific data. The general-purpose implementation would contain lengths/offsets (though the pointer section is actually first), but for super common layouts you might just have the function pointers point to specialized versions that don't need to look up this info. See also section 7.1 in the paper I linked.

                      – zenhack
                      Jan 3 at 6:51











                    • But that section (and a majority of your answer) is about how functions are implemented, whereas my question was primarily about plain values (like Foo), not functions/closures. I thought there was be a difference in implementation between the two, as you wouldn't need functions for entry/scavenge/evacuate for a plain value like Foo 1 "Hi".

                      – theindigamer
                      Jan 3 at 7:05











                    • @theindigamer, it's actually the same, because of laziness; I made a few edits to try to make this more clear/explicit in my answer.

                      – zenhack
                      Jan 3 at 7:23
















                    6














                    The short version is: values are allocated with all of the pointers and all of the non-pointers grouped together, and include a bit of metadata so that the GC knows what to follow.



                    Note that the Haskell report actually allows Int to be 31 or 63 bits, making the OCaml strategy valid -- but it's not what GHC does.



                    The slightly longer version is that said "metadata" is really a couple of functions that are used by the garbage collector. To give a rough sketch you could think of values in Haskell as being represented at runtime as an OO-style object with methods:



                    class Fn:
                    # By far the most used; this evaluates the value:
                    enter(...) -> ...
                    # Used by the garbage collector:
                    scavenge(...) -> ...
                    evacuate(...) -> ...


                    The upshot of this is that values know enough about themselves to do the bookkeeping, and for some common layouts GHC defines specialized versions of these functions; the garbage collector can be mostly oblivious to exactly how scavenge and evacuate work. The segregating of pointers & non-pointers is so that a generic implementation can be made and shared for the common case.



                    Note that the 'enter' function exists even for Haskell values that are not "functions," since laziness means that even if the type is e.g. Int, evaluation may still involve computation.



                    If you want the very long version, I suggest reading:



                    https://www.microsoft.com/en-us/research/publication/implementing-lazy-functional-languages-on-stock-hardware-the-spineless-tagless-g-machine/



                    which goes into quite a lot of detail on how Haskell gets mapped to hardware. It's a fascinating read, and there's a lot of neat stuff in there that differs substantially from how most (strict) functional languages are implemented. The paper is old, but still relevant.






                    share|improve this answer


























                    • So an object header (for say a type like data Foo = Foo {-# UNPACK #-} !Int String) contains an offset indicating the number of non-pointer words?

                      – theindigamer
                      Jan 3 at 6:12











                    • The object header is really just a pointer to a struct with those call(actually called entry)/scavenge/evacuate functions, followed some implementation-specific data. The general-purpose implementation would contain lengths/offsets (though the pointer section is actually first), but for super common layouts you might just have the function pointers point to specialized versions that don't need to look up this info. See also section 7.1 in the paper I linked.

                      – zenhack
                      Jan 3 at 6:51











                    • But that section (and a majority of your answer) is about how functions are implemented, whereas my question was primarily about plain values (like Foo), not functions/closures. I thought there was be a difference in implementation between the two, as you wouldn't need functions for entry/scavenge/evacuate for a plain value like Foo 1 "Hi".

                      – theindigamer
                      Jan 3 at 7:05











                    • @theindigamer, it's actually the same, because of laziness; I made a few edits to try to make this more clear/explicit in my answer.

                      – zenhack
                      Jan 3 at 7:23














                    6












                    6








                    6







                    The short version is: values are allocated with all of the pointers and all of the non-pointers grouped together, and include a bit of metadata so that the GC knows what to follow.



                    Note that the Haskell report actually allows Int to be 31 or 63 bits, making the OCaml strategy valid -- but it's not what GHC does.



                    The slightly longer version is that said "metadata" is really a couple of functions that are used by the garbage collector. To give a rough sketch you could think of values in Haskell as being represented at runtime as an OO-style object with methods:



                    class Fn:
                    # By far the most used; this evaluates the value:
                    enter(...) -> ...
                    # Used by the garbage collector:
                    scavenge(...) -> ...
                    evacuate(...) -> ...


                    The upshot of this is that values know enough about themselves to do the bookkeeping, and for some common layouts GHC defines specialized versions of these functions; the garbage collector can be mostly oblivious to exactly how scavenge and evacuate work. The segregating of pointers & non-pointers is so that a generic implementation can be made and shared for the common case.



                    Note that the 'enter' function exists even for Haskell values that are not "functions," since laziness means that even if the type is e.g. Int, evaluation may still involve computation.



                    If you want the very long version, I suggest reading:



                    https://www.microsoft.com/en-us/research/publication/implementing-lazy-functional-languages-on-stock-hardware-the-spineless-tagless-g-machine/



                    which goes into quite a lot of detail on how Haskell gets mapped to hardware. It's a fascinating read, and there's a lot of neat stuff in there that differs substantially from how most (strict) functional languages are implemented. The paper is old, but still relevant.






                    share|improve this answer















                    The short version is: values are allocated with all of the pointers and all of the non-pointers grouped together, and include a bit of metadata so that the GC knows what to follow.



                    Note that the Haskell report actually allows Int to be 31 or 63 bits, making the OCaml strategy valid -- but it's not what GHC does.



                    The slightly longer version is that said "metadata" is really a couple of functions that are used by the garbage collector. To give a rough sketch you could think of values in Haskell as being represented at runtime as an OO-style object with methods:



                    class Fn:
                    # By far the most used; this evaluates the value:
                    enter(...) -> ...
                    # Used by the garbage collector:
                    scavenge(...) -> ...
                    evacuate(...) -> ...


                    The upshot of this is that values know enough about themselves to do the bookkeeping, and for some common layouts GHC defines specialized versions of these functions; the garbage collector can be mostly oblivious to exactly how scavenge and evacuate work. The segregating of pointers & non-pointers is so that a generic implementation can be made and shared for the common case.



                    Note that the 'enter' function exists even for Haskell values that are not "functions," since laziness means that even if the type is e.g. Int, evaluation may still involve computation.



                    If you want the very long version, I suggest reading:



                    https://www.microsoft.com/en-us/research/publication/implementing-lazy-functional-languages-on-stock-hardware-the-spineless-tagless-g-machine/



                    which goes into quite a lot of detail on how Haskell gets mapped to hardware. It's a fascinating read, and there's a lot of neat stuff in there that differs substantially from how most (strict) functional languages are implemented. The paper is old, but still relevant.







                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Jan 3 at 7:22

























                    answered Jan 3 at 5:58









                    zenhackzenhack

                    1112




                    1112













                    • So an object header (for say a type like data Foo = Foo {-# UNPACK #-} !Int String) contains an offset indicating the number of non-pointer words?

                      – theindigamer
                      Jan 3 at 6:12











                    • The object header is really just a pointer to a struct with those call(actually called entry)/scavenge/evacuate functions, followed some implementation-specific data. The general-purpose implementation would contain lengths/offsets (though the pointer section is actually first), but for super common layouts you might just have the function pointers point to specialized versions that don't need to look up this info. See also section 7.1 in the paper I linked.

                      – zenhack
                      Jan 3 at 6:51











                    • But that section (and a majority of your answer) is about how functions are implemented, whereas my question was primarily about plain values (like Foo), not functions/closures. I thought there was be a difference in implementation between the two, as you wouldn't need functions for entry/scavenge/evacuate for a plain value like Foo 1 "Hi".

                      – theindigamer
                      Jan 3 at 7:05











                    • @theindigamer, it's actually the same, because of laziness; I made a few edits to try to make this more clear/explicit in my answer.

                      – zenhack
                      Jan 3 at 7:23



















                    • So an object header (for say a type like data Foo = Foo {-# UNPACK #-} !Int String) contains an offset indicating the number of non-pointer words?

                      – theindigamer
                      Jan 3 at 6:12











                    • The object header is really just a pointer to a struct with those call(actually called entry)/scavenge/evacuate functions, followed some implementation-specific data. The general-purpose implementation would contain lengths/offsets (though the pointer section is actually first), but for super common layouts you might just have the function pointers point to specialized versions that don't need to look up this info. See also section 7.1 in the paper I linked.

                      – zenhack
                      Jan 3 at 6:51











                    • But that section (and a majority of your answer) is about how functions are implemented, whereas my question was primarily about plain values (like Foo), not functions/closures. I thought there was be a difference in implementation between the two, as you wouldn't need functions for entry/scavenge/evacuate for a plain value like Foo 1 "Hi".

                      – theindigamer
                      Jan 3 at 7:05











                    • @theindigamer, it's actually the same, because of laziness; I made a few edits to try to make this more clear/explicit in my answer.

                      – zenhack
                      Jan 3 at 7:23

















                    So an object header (for say a type like data Foo = Foo {-# UNPACK #-} !Int String) contains an offset indicating the number of non-pointer words?

                    – theindigamer
                    Jan 3 at 6:12





                    So an object header (for say a type like data Foo = Foo {-# UNPACK #-} !Int String) contains an offset indicating the number of non-pointer words?

                    – theindigamer
                    Jan 3 at 6:12













                    The object header is really just a pointer to a struct with those call(actually called entry)/scavenge/evacuate functions, followed some implementation-specific data. The general-purpose implementation would contain lengths/offsets (though the pointer section is actually first), but for super common layouts you might just have the function pointers point to specialized versions that don't need to look up this info. See also section 7.1 in the paper I linked.

                    – zenhack
                    Jan 3 at 6:51





                    The object header is really just a pointer to a struct with those call(actually called entry)/scavenge/evacuate functions, followed some implementation-specific data. The general-purpose implementation would contain lengths/offsets (though the pointer section is actually first), but for super common layouts you might just have the function pointers point to specialized versions that don't need to look up this info. See also section 7.1 in the paper I linked.

                    – zenhack
                    Jan 3 at 6:51













                    But that section (and a majority of your answer) is about how functions are implemented, whereas my question was primarily about plain values (like Foo), not functions/closures. I thought there was be a difference in implementation between the two, as you wouldn't need functions for entry/scavenge/evacuate for a plain value like Foo 1 "Hi".

                    – theindigamer
                    Jan 3 at 7:05





                    But that section (and a majority of your answer) is about how functions are implemented, whereas my question was primarily about plain values (like Foo), not functions/closures. I thought there was be a difference in implementation between the two, as you wouldn't need functions for entry/scavenge/evacuate for a plain value like Foo 1 "Hi".

                    – theindigamer
                    Jan 3 at 7:05













                    @theindigamer, it's actually the same, because of laziness; I made a few edits to try to make this more clear/explicit in my answer.

                    – zenhack
                    Jan 3 at 7:23





                    @theindigamer, it's actually the same, because of laziness; I made a few edits to try to make this more clear/explicit in my answer.

                    – zenhack
                    Jan 3 at 7:23


















                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54016849%2fhow-does-the-haskell-runtime-distinguish-between-pointers-and-unboxed-word-sized%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    MongoDB - Not Authorized To Execute Command

                    How to fix TextFormField cause rebuild widget in Flutter

                    in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith