PyTesseract - recognize digits in simple image












0















I'm trying to use pytesseract to recognize two numbers from an image:



enter image description here




  • I have tried --psm 6 up to 10

  • I have tried -c tessedit_char_whitelist=0123456789'


None of the above returns 49 number. Closest I got is returned 4 without 9



Do you have any tips about how to make tesseract recognize it ?










share|improve this question





























    0















    I'm trying to use pytesseract to recognize two numbers from an image:



    enter image description here




    • I have tried --psm 6 up to 10

    • I have tried -c tessedit_char_whitelist=0123456789'


    None of the above returns 49 number. Closest I got is returned 4 without 9



    Do you have any tips about how to make tesseract recognize it ?










    share|improve this question



























      0












      0








      0








      I'm trying to use pytesseract to recognize two numbers from an image:



      enter image description here




      • I have tried --psm 6 up to 10

      • I have tried -c tessedit_char_whitelist=0123456789'


      None of the above returns 49 number. Closest I got is returned 4 without 9



      Do you have any tips about how to make tesseract recognize it ?










      share|improve this question
















      I'm trying to use pytesseract to recognize two numbers from an image:



      enter image description here




      • I have tried --psm 6 up to 10

      • I have tried -c tessedit_char_whitelist=0123456789'


      None of the above returns 49 number. Closest I got is returned 4 without 9



      Do you have any tips about how to make tesseract recognize it ?







      python ocr tesseract pytesseract






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jan 2 at 4:27









      Davide Fiocco

      1,076624




      1,076624










      asked Jan 1 at 20:26









      PovilasKPovilasK

      165112




      165112
























          2 Answers
          2






          active

          oldest

          votes


















          1














          Have you tried different --oem ? I would also try to use a --psm higher than 10.






          share|improve this answer


























          • Yes, I have tried --oem 3 without any luck.

            – PovilasK
            Jan 2 at 20:09



















          1














          Try --psm 13 --oem 3 (oem = 1 or 2 should do also)



          import pytesseract
          from PIL import Image
          import requests
          import io

          response = requests.get('https://i.stack.imgur.com/oAAXR.png')
          text = pytesseract.image_to_string(Image.open(io.BytesIO(response.content)), lang='eng',
          config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789')

          print(text)


          yields 49 as you expect on my machine.



          I get the same result by downloading the image locally and firing



          tesseract oAAXR.png output --oem 3 --psm 13 -l eng


          For reference my tesseract --version gives
          tesseract 4.0.0 leptonica-1.77.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.1) : libpng 1.6.36 : libtiff 4.0.10 : zlib 1.2.11 : libwebp 1.0.1 Found AVX2 Found AVX Found SSE.






          share|improve this answer


























          • Where is --psm 13 documented? I only see 1-10 here: TESSERACT(1) Manual Page.

            – user3169
            Jan 7 at 6:55











          • Aw, looks like their documentation may be inconsistent or refers to different versions there, check github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage for psm > 10.

            – Davide Fiocco
            Jan 7 at 10:14













          • Thanks for the answer, but your code gives me "ay" string instead of 49. tesseract versions: tesseract 4.0.0 leptonica-1.77.0 libgif 5.1.4 : libjpeg 9c : libpng 1.6.36 : libtiff 4.0.10 : zlib 1.2.11 : libwebp 1.0.1 : libopenjp2 2.3.0 Found AVX2 Found AVX Found SSE I'm also using MacOS (Mojave). Maybe that has something to do with it

            – PovilasK
            Jan 8 at 15:10











          • I have edited the answer with my config, not sure what could be going wrong :(

            – Davide Fiocco
            Jan 8 at 21:10













          • Yeah I can see that only difference is: your libjpeg is 8d and my libjpeg 9c. Everything else is same.

            – PovilasK
            Jan 10 at 12:22











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53998699%2fpytesseract-recognize-digits-in-simple-image%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          Have you tried different --oem ? I would also try to use a --psm higher than 10.






          share|improve this answer


























          • Yes, I have tried --oem 3 without any luck.

            – PovilasK
            Jan 2 at 20:09
















          1














          Have you tried different --oem ? I would also try to use a --psm higher than 10.






          share|improve this answer


























          • Yes, I have tried --oem 3 without any luck.

            – PovilasK
            Jan 2 at 20:09














          1












          1








          1







          Have you tried different --oem ? I would also try to use a --psm higher than 10.






          share|improve this answer















          Have you tried different --oem ? I would also try to use a --psm higher than 10.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Jan 6 at 14:15









          Davide Fiocco

          1,076624




          1,076624










          answered Jan 1 at 22:37









          QuarKUS7QuarKUS7

          513




          513













          • Yes, I have tried --oem 3 without any luck.

            – PovilasK
            Jan 2 at 20:09



















          • Yes, I have tried --oem 3 without any luck.

            – PovilasK
            Jan 2 at 20:09

















          Yes, I have tried --oem 3 without any luck.

          – PovilasK
          Jan 2 at 20:09





          Yes, I have tried --oem 3 without any luck.

          – PovilasK
          Jan 2 at 20:09













          1














          Try --psm 13 --oem 3 (oem = 1 or 2 should do also)



          import pytesseract
          from PIL import Image
          import requests
          import io

          response = requests.get('https://i.stack.imgur.com/oAAXR.png')
          text = pytesseract.image_to_string(Image.open(io.BytesIO(response.content)), lang='eng',
          config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789')

          print(text)


          yields 49 as you expect on my machine.



          I get the same result by downloading the image locally and firing



          tesseract oAAXR.png output --oem 3 --psm 13 -l eng


          For reference my tesseract --version gives
          tesseract 4.0.0 leptonica-1.77.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.1) : libpng 1.6.36 : libtiff 4.0.10 : zlib 1.2.11 : libwebp 1.0.1 Found AVX2 Found AVX Found SSE.






          share|improve this answer


























          • Where is --psm 13 documented? I only see 1-10 here: TESSERACT(1) Manual Page.

            – user3169
            Jan 7 at 6:55











          • Aw, looks like their documentation may be inconsistent or refers to different versions there, check github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage for psm > 10.

            – Davide Fiocco
            Jan 7 at 10:14













          • Thanks for the answer, but your code gives me "ay" string instead of 49. tesseract versions: tesseract 4.0.0 leptonica-1.77.0 libgif 5.1.4 : libjpeg 9c : libpng 1.6.36 : libtiff 4.0.10 : zlib 1.2.11 : libwebp 1.0.1 : libopenjp2 2.3.0 Found AVX2 Found AVX Found SSE I'm also using MacOS (Mojave). Maybe that has something to do with it

            – PovilasK
            Jan 8 at 15:10











          • I have edited the answer with my config, not sure what could be going wrong :(

            – Davide Fiocco
            Jan 8 at 21:10













          • Yeah I can see that only difference is: your libjpeg is 8d and my libjpeg 9c. Everything else is same.

            – PovilasK
            Jan 10 at 12:22
















          1














          Try --psm 13 --oem 3 (oem = 1 or 2 should do also)



          import pytesseract
          from PIL import Image
          import requests
          import io

          response = requests.get('https://i.stack.imgur.com/oAAXR.png')
          text = pytesseract.image_to_string(Image.open(io.BytesIO(response.content)), lang='eng',
          config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789')

          print(text)


          yields 49 as you expect on my machine.



          I get the same result by downloading the image locally and firing



          tesseract oAAXR.png output --oem 3 --psm 13 -l eng


          For reference my tesseract --version gives
          tesseract 4.0.0 leptonica-1.77.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.1) : libpng 1.6.36 : libtiff 4.0.10 : zlib 1.2.11 : libwebp 1.0.1 Found AVX2 Found AVX Found SSE.






          share|improve this answer


























          • Where is --psm 13 documented? I only see 1-10 here: TESSERACT(1) Manual Page.

            – user3169
            Jan 7 at 6:55











          • Aw, looks like their documentation may be inconsistent or refers to different versions there, check github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage for psm > 10.

            – Davide Fiocco
            Jan 7 at 10:14













          • Thanks for the answer, but your code gives me "ay" string instead of 49. tesseract versions: tesseract 4.0.0 leptonica-1.77.0 libgif 5.1.4 : libjpeg 9c : libpng 1.6.36 : libtiff 4.0.10 : zlib 1.2.11 : libwebp 1.0.1 : libopenjp2 2.3.0 Found AVX2 Found AVX Found SSE I'm also using MacOS (Mojave). Maybe that has something to do with it

            – PovilasK
            Jan 8 at 15:10











          • I have edited the answer with my config, not sure what could be going wrong :(

            – Davide Fiocco
            Jan 8 at 21:10













          • Yeah I can see that only difference is: your libjpeg is 8d and my libjpeg 9c. Everything else is same.

            – PovilasK
            Jan 10 at 12:22














          1












          1








          1







          Try --psm 13 --oem 3 (oem = 1 or 2 should do also)



          import pytesseract
          from PIL import Image
          import requests
          import io

          response = requests.get('https://i.stack.imgur.com/oAAXR.png')
          text = pytesseract.image_to_string(Image.open(io.BytesIO(response.content)), lang='eng',
          config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789')

          print(text)


          yields 49 as you expect on my machine.



          I get the same result by downloading the image locally and firing



          tesseract oAAXR.png output --oem 3 --psm 13 -l eng


          For reference my tesseract --version gives
          tesseract 4.0.0 leptonica-1.77.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.1) : libpng 1.6.36 : libtiff 4.0.10 : zlib 1.2.11 : libwebp 1.0.1 Found AVX2 Found AVX Found SSE.






          share|improve this answer















          Try --psm 13 --oem 3 (oem = 1 or 2 should do also)



          import pytesseract
          from PIL import Image
          import requests
          import io

          response = requests.get('https://i.stack.imgur.com/oAAXR.png')
          text = pytesseract.image_to_string(Image.open(io.BytesIO(response.content)), lang='eng',
          config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789')

          print(text)


          yields 49 as you expect on my machine.



          I get the same result by downloading the image locally and firing



          tesseract oAAXR.png output --oem 3 --psm 13 -l eng


          For reference my tesseract --version gives
          tesseract 4.0.0 leptonica-1.77.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.1) : libpng 1.6.36 : libtiff 4.0.10 : zlib 1.2.11 : libwebp 1.0.1 Found AVX2 Found AVX Found SSE.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Jan 8 at 21:13

























          answered Jan 4 at 20:27









          Davide FioccoDavide Fiocco

          1,076624




          1,076624













          • Where is --psm 13 documented? I only see 1-10 here: TESSERACT(1) Manual Page.

            – user3169
            Jan 7 at 6:55











          • Aw, looks like their documentation may be inconsistent or refers to different versions there, check github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage for psm > 10.

            – Davide Fiocco
            Jan 7 at 10:14













          • Thanks for the answer, but your code gives me "ay" string instead of 49. tesseract versions: tesseract 4.0.0 leptonica-1.77.0 libgif 5.1.4 : libjpeg 9c : libpng 1.6.36 : libtiff 4.0.10 : zlib 1.2.11 : libwebp 1.0.1 : libopenjp2 2.3.0 Found AVX2 Found AVX Found SSE I'm also using MacOS (Mojave). Maybe that has something to do with it

            – PovilasK
            Jan 8 at 15:10











          • I have edited the answer with my config, not sure what could be going wrong :(

            – Davide Fiocco
            Jan 8 at 21:10













          • Yeah I can see that only difference is: your libjpeg is 8d and my libjpeg 9c. Everything else is same.

            – PovilasK
            Jan 10 at 12:22



















          • Where is --psm 13 documented? I only see 1-10 here: TESSERACT(1) Manual Page.

            – user3169
            Jan 7 at 6:55











          • Aw, looks like their documentation may be inconsistent or refers to different versions there, check github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage for psm > 10.

            – Davide Fiocco
            Jan 7 at 10:14













          • Thanks for the answer, but your code gives me "ay" string instead of 49. tesseract versions: tesseract 4.0.0 leptonica-1.77.0 libgif 5.1.4 : libjpeg 9c : libpng 1.6.36 : libtiff 4.0.10 : zlib 1.2.11 : libwebp 1.0.1 : libopenjp2 2.3.0 Found AVX2 Found AVX Found SSE I'm also using MacOS (Mojave). Maybe that has something to do with it

            – PovilasK
            Jan 8 at 15:10











          • I have edited the answer with my config, not sure what could be going wrong :(

            – Davide Fiocco
            Jan 8 at 21:10













          • Yeah I can see that only difference is: your libjpeg is 8d and my libjpeg 9c. Everything else is same.

            – PovilasK
            Jan 10 at 12:22

















          Where is --psm 13 documented? I only see 1-10 here: TESSERACT(1) Manual Page.

          – user3169
          Jan 7 at 6:55





          Where is --psm 13 documented? I only see 1-10 here: TESSERACT(1) Manual Page.

          – user3169
          Jan 7 at 6:55













          Aw, looks like their documentation may be inconsistent or refers to different versions there, check github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage for psm > 10.

          – Davide Fiocco
          Jan 7 at 10:14







          Aw, looks like their documentation may be inconsistent or refers to different versions there, check github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage for psm > 10.

          – Davide Fiocco
          Jan 7 at 10:14















          Thanks for the answer, but your code gives me "ay" string instead of 49. tesseract versions: tesseract 4.0.0 leptonica-1.77.0 libgif 5.1.4 : libjpeg 9c : libpng 1.6.36 : libtiff 4.0.10 : zlib 1.2.11 : libwebp 1.0.1 : libopenjp2 2.3.0 Found AVX2 Found AVX Found SSE I'm also using MacOS (Mojave). Maybe that has something to do with it

          – PovilasK
          Jan 8 at 15:10





          Thanks for the answer, but your code gives me "ay" string instead of 49. tesseract versions: tesseract 4.0.0 leptonica-1.77.0 libgif 5.1.4 : libjpeg 9c : libpng 1.6.36 : libtiff 4.0.10 : zlib 1.2.11 : libwebp 1.0.1 : libopenjp2 2.3.0 Found AVX2 Found AVX Found SSE I'm also using MacOS (Mojave). Maybe that has something to do with it

          – PovilasK
          Jan 8 at 15:10













          I have edited the answer with my config, not sure what could be going wrong :(

          – Davide Fiocco
          Jan 8 at 21:10







          I have edited the answer with my config, not sure what could be going wrong :(

          – Davide Fiocco
          Jan 8 at 21:10















          Yeah I can see that only difference is: your libjpeg is 8d and my libjpeg 9c. Everything else is same.

          – PovilasK
          Jan 10 at 12:22





          Yeah I can see that only difference is: your libjpeg is 8d and my libjpeg 9c. Everything else is same.

          – PovilasK
          Jan 10 at 12:22


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53998699%2fpytesseract-recognize-digits-in-simple-image%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          MongoDB - Not Authorized To Execute Command

          How to fix TextFormField cause rebuild widget in Flutter

          in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith