How to turn a binary string into a byte?












1















If I take the letter 'à' and encode it in UTF-8 I obtain the following result:



'à'.encode('utf-8')
>> b'xc3xa0'


Now from a bytearray I would like to convert 'à' into a binary string and turn it back into 'à'. To do so I execute the following code:



byte = bytearray('à','utf-8')
for x in byte:
print(bin(x))


I get 0b11000011and0b10100000, which is 195 and 160. Then, I fuse them together and take the 0b part out. Now I execute this code:



s = '1100001110100000'
value1 = s[0:8].encode('utf-8')
value2 = s[9:16].encode('utf-8')
value = value1 + value2
print(chr(int(value, 2)))
>> 憠


No matter how I develop the later part I get symbols and never seem to be able to get back my 'à'. I would like to know why is that? And how can I get an 'à'.










share|improve this question



























    1















    If I take the letter 'à' and encode it in UTF-8 I obtain the following result:



    'à'.encode('utf-8')
    >> b'xc3xa0'


    Now from a bytearray I would like to convert 'à' into a binary string and turn it back into 'à'. To do so I execute the following code:



    byte = bytearray('à','utf-8')
    for x in byte:
    print(bin(x))


    I get 0b11000011and0b10100000, which is 195 and 160. Then, I fuse them together and take the 0b part out. Now I execute this code:



    s = '1100001110100000'
    value1 = s[0:8].encode('utf-8')
    value2 = s[9:16].encode('utf-8')
    value = value1 + value2
    print(chr(int(value, 2)))
    >> 憠


    No matter how I develop the later part I get symbols and never seem to be able to get back my 'à'. I would like to know why is that? And how can I get an 'à'.










    share|improve this question

























      1












      1








      1








      If I take the letter 'à' and encode it in UTF-8 I obtain the following result:



      'à'.encode('utf-8')
      >> b'xc3xa0'


      Now from a bytearray I would like to convert 'à' into a binary string and turn it back into 'à'. To do so I execute the following code:



      byte = bytearray('à','utf-8')
      for x in byte:
      print(bin(x))


      I get 0b11000011and0b10100000, which is 195 and 160. Then, I fuse them together and take the 0b part out. Now I execute this code:



      s = '1100001110100000'
      value1 = s[0:8].encode('utf-8')
      value2 = s[9:16].encode('utf-8')
      value = value1 + value2
      print(chr(int(value, 2)))
      >> 憠


      No matter how I develop the later part I get symbols and never seem to be able to get back my 'à'. I would like to know why is that? And how can I get an 'à'.










      share|improve this question














      If I take the letter 'à' and encode it in UTF-8 I obtain the following result:



      'à'.encode('utf-8')
      >> b'xc3xa0'


      Now from a bytearray I would like to convert 'à' into a binary string and turn it back into 'à'. To do so I execute the following code:



      byte = bytearray('à','utf-8')
      for x in byte:
      print(bin(x))


      I get 0b11000011and0b10100000, which is 195 and 160. Then, I fuse them together and take the 0b part out. Now I execute this code:



      s = '1100001110100000'
      value1 = s[0:8].encode('utf-8')
      value2 = s[9:16].encode('utf-8')
      value = value1 + value2
      print(chr(int(value, 2)))
      >> 憠


      No matter how I develop the later part I get symbols and never seem to be able to get back my 'à'. I would like to know why is that? And how can I get an 'à'.







      python unicode utf-8 utf






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 21 '18 at 23:44









      jatrp5jatrp5

      142




      142
























          3 Answers
          3






          active

          oldest

          votes


















          3














          >>> bytes(int(s[i:i+8], 2) for i in range(0, len(s), 8)).decode('utf-8')
          'à'


          There are multiple parts to this. The bytes constructor creates a byte string from a sequence of integers. The integers are formed from strings using int with a base of 2. The range combined with the slicing peels off 8 characters at a time. Finally decode converts those bytes back into Unicode characters.






          share|improve this answer



















          • 1





            Note also that the OP can use ''.join('{:08b}'.format(i) for i in byte) on the original byte-array object. This is pretty similar: we take the byte-array apart, one byte at a time, and format each one using :08b to get an eight-bit zero-filled string representation, then join all the strings without whitespace.

            – torek
            Nov 22 '18 at 0:03



















          0














          you need your second bits to be s[8:16] (or just s[8:]) otherwise you get 0100000



          you also need to convert you "bit string" back to an integer before thinking of it as a byte with int("0010101",2)



          s = '1100001110100000'
          value1 = bytearray([int(s[:8],2), # bits 0..7 (8 total)
          int(s[8:],2)] # bits 8..15 (8 total)
          )
          print(value1.decode("utf8"))





          share|improve this answer































            0














            Convert the base-2 value back to an integer with int(s,2), convert that integer to a number of bytes (int.to_bytes) based on the original length divided by 8 and big-endian conversion to keep the bytes in the right order, then .decode() it (default in Python 3 is utf8):



            >>> s = '1100001110100000'
            >>> int(s,2)
            50080
            >>> int(s,2).to_bytes(len(s)//8,'big')
            b'xc3xa0'
            >>> int(s,2).to_bytes(len(s)//8,'big').decode()
            'à'





            share|improve this answer























              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53422008%2fhow-to-turn-a-binary-string-into-a-byte%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              3














              >>> bytes(int(s[i:i+8], 2) for i in range(0, len(s), 8)).decode('utf-8')
              'à'


              There are multiple parts to this. The bytes constructor creates a byte string from a sequence of integers. The integers are formed from strings using int with a base of 2. The range combined with the slicing peels off 8 characters at a time. Finally decode converts those bytes back into Unicode characters.






              share|improve this answer



















              • 1





                Note also that the OP can use ''.join('{:08b}'.format(i) for i in byte) on the original byte-array object. This is pretty similar: we take the byte-array apart, one byte at a time, and format each one using :08b to get an eight-bit zero-filled string representation, then join all the strings without whitespace.

                – torek
                Nov 22 '18 at 0:03
















              3














              >>> bytes(int(s[i:i+8], 2) for i in range(0, len(s), 8)).decode('utf-8')
              'à'


              There are multiple parts to this. The bytes constructor creates a byte string from a sequence of integers. The integers are formed from strings using int with a base of 2. The range combined with the slicing peels off 8 characters at a time. Finally decode converts those bytes back into Unicode characters.






              share|improve this answer



















              • 1





                Note also that the OP can use ''.join('{:08b}'.format(i) for i in byte) on the original byte-array object. This is pretty similar: we take the byte-array apart, one byte at a time, and format each one using :08b to get an eight-bit zero-filled string representation, then join all the strings without whitespace.

                – torek
                Nov 22 '18 at 0:03














              3












              3








              3







              >>> bytes(int(s[i:i+8], 2) for i in range(0, len(s), 8)).decode('utf-8')
              'à'


              There are multiple parts to this. The bytes constructor creates a byte string from a sequence of integers. The integers are formed from strings using int with a base of 2. The range combined with the slicing peels off 8 characters at a time. Finally decode converts those bytes back into Unicode characters.






              share|improve this answer













              >>> bytes(int(s[i:i+8], 2) for i in range(0, len(s), 8)).decode('utf-8')
              'à'


              There are multiple parts to this. The bytes constructor creates a byte string from a sequence of integers. The integers are formed from strings using int with a base of 2. The range combined with the slicing peels off 8 characters at a time. Finally decode converts those bytes back into Unicode characters.







              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered Nov 21 '18 at 23:50









              Mark RansomMark Ransom

              225k29283509




              225k29283509








              • 1





                Note also that the OP can use ''.join('{:08b}'.format(i) for i in byte) on the original byte-array object. This is pretty similar: we take the byte-array apart, one byte at a time, and format each one using :08b to get an eight-bit zero-filled string representation, then join all the strings without whitespace.

                – torek
                Nov 22 '18 at 0:03














              • 1





                Note also that the OP can use ''.join('{:08b}'.format(i) for i in byte) on the original byte-array object. This is pretty similar: we take the byte-array apart, one byte at a time, and format each one using :08b to get an eight-bit zero-filled string representation, then join all the strings without whitespace.

                – torek
                Nov 22 '18 at 0:03








              1




              1





              Note also that the OP can use ''.join('{:08b}'.format(i) for i in byte) on the original byte-array object. This is pretty similar: we take the byte-array apart, one byte at a time, and format each one using :08b to get an eight-bit zero-filled string representation, then join all the strings without whitespace.

              – torek
              Nov 22 '18 at 0:03





              Note also that the OP can use ''.join('{:08b}'.format(i) for i in byte) on the original byte-array object. This is pretty similar: we take the byte-array apart, one byte at a time, and format each one using :08b to get an eight-bit zero-filled string representation, then join all the strings without whitespace.

              – torek
              Nov 22 '18 at 0:03













              0














              you need your second bits to be s[8:16] (or just s[8:]) otherwise you get 0100000



              you also need to convert you "bit string" back to an integer before thinking of it as a byte with int("0010101",2)



              s = '1100001110100000'
              value1 = bytearray([int(s[:8],2), # bits 0..7 (8 total)
              int(s[8:],2)] # bits 8..15 (8 total)
              )
              print(value1.decode("utf8"))





              share|improve this answer




























                0














                you need your second bits to be s[8:16] (or just s[8:]) otherwise you get 0100000



                you also need to convert you "bit string" back to an integer before thinking of it as a byte with int("0010101",2)



                s = '1100001110100000'
                value1 = bytearray([int(s[:8],2), # bits 0..7 (8 total)
                int(s[8:],2)] # bits 8..15 (8 total)
                )
                print(value1.decode("utf8"))





                share|improve this answer


























                  0












                  0








                  0







                  you need your second bits to be s[8:16] (or just s[8:]) otherwise you get 0100000



                  you also need to convert you "bit string" back to an integer before thinking of it as a byte with int("0010101",2)



                  s = '1100001110100000'
                  value1 = bytearray([int(s[:8],2), # bits 0..7 (8 total)
                  int(s[8:],2)] # bits 8..15 (8 total)
                  )
                  print(value1.decode("utf8"))





                  share|improve this answer













                  you need your second bits to be s[8:16] (or just s[8:]) otherwise you get 0100000



                  you also need to convert you "bit string" back to an integer before thinking of it as a byte with int("0010101",2)



                  s = '1100001110100000'
                  value1 = bytearray([int(s[:8],2), # bits 0..7 (8 total)
                  int(s[8:],2)] # bits 8..15 (8 total)
                  )
                  print(value1.decode("utf8"))






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 21 '18 at 23:51









                  Joran BeasleyJoran Beasley

                  73.4k679120




                  73.4k679120























                      0














                      Convert the base-2 value back to an integer with int(s,2), convert that integer to a number of bytes (int.to_bytes) based on the original length divided by 8 and big-endian conversion to keep the bytes in the right order, then .decode() it (default in Python 3 is utf8):



                      >>> s = '1100001110100000'
                      >>> int(s,2)
                      50080
                      >>> int(s,2).to_bytes(len(s)//8,'big')
                      b'xc3xa0'
                      >>> int(s,2).to_bytes(len(s)//8,'big').decode()
                      'à'





                      share|improve this answer




























                        0














                        Convert the base-2 value back to an integer with int(s,2), convert that integer to a number of bytes (int.to_bytes) based on the original length divided by 8 and big-endian conversion to keep the bytes in the right order, then .decode() it (default in Python 3 is utf8):



                        >>> s = '1100001110100000'
                        >>> int(s,2)
                        50080
                        >>> int(s,2).to_bytes(len(s)//8,'big')
                        b'xc3xa0'
                        >>> int(s,2).to_bytes(len(s)//8,'big').decode()
                        'à'





                        share|improve this answer


























                          0












                          0








                          0







                          Convert the base-2 value back to an integer with int(s,2), convert that integer to a number of bytes (int.to_bytes) based on the original length divided by 8 and big-endian conversion to keep the bytes in the right order, then .decode() it (default in Python 3 is utf8):



                          >>> s = '1100001110100000'
                          >>> int(s,2)
                          50080
                          >>> int(s,2).to_bytes(len(s)//8,'big')
                          b'xc3xa0'
                          >>> int(s,2).to_bytes(len(s)//8,'big').decode()
                          'à'





                          share|improve this answer













                          Convert the base-2 value back to an integer with int(s,2), convert that integer to a number of bytes (int.to_bytes) based on the original length divided by 8 and big-endian conversion to keep the bytes in the right order, then .decode() it (default in Python 3 is utf8):



                          >>> s = '1100001110100000'
                          >>> int(s,2)
                          50080
                          >>> int(s,2).to_bytes(len(s)//8,'big')
                          b'xc3xa0'
                          >>> int(s,2).to_bytes(len(s)//8,'big').decode()
                          'à'






                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Nov 22 '18 at 7:29









                          Mark TolonenMark Tolonen

                          94.2k12114176




                          94.2k12114176






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53422008%2fhow-to-turn-a-binary-string-into-a-byte%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Can a sorcerer learn a 5th-level spell early by creating spell slots using the Font of Magic feature?

                              Does disintegrating a polymorphed enemy still kill it after the 2018 errata?

                              A Topological Invariant for $pi_3(U(n))$