Replacing all non-alphanumeric characters with empty strings












174















I tried using this but didn't work-



return value.replaceAll("/[^A-Za-z0-9 ]/", "");









share|improve this question


















  • 27





    Guys, you forget there are alphabets other than the Latin one.

    – Mateva
    Oct 14 '15 at 16:48
















174















I tried using this but didn't work-



return value.replaceAll("/[^A-Za-z0-9 ]/", "");









share|improve this question


















  • 27





    Guys, you forget there are alphabets other than the Latin one.

    – Mateva
    Oct 14 '15 at 16:48














174












174








174


34






I tried using this but didn't work-



return value.replaceAll("/[^A-Za-z0-9 ]/", "");









share|improve this question














I tried using this but didn't work-



return value.replaceAll("/[^A-Za-z0-9 ]/", "");






java regex non-alphanumeric






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 26 '09 at 20:28









Alex GomesAlex Gomes

900286




900286








  • 27





    Guys, you forget there are alphabets other than the Latin one.

    – Mateva
    Oct 14 '15 at 16:48














  • 27





    Guys, you forget there are alphabets other than the Latin one.

    – Mateva
    Oct 14 '15 at 16:48








27




27





Guys, you forget there are alphabets other than the Latin one.

– Mateva
Oct 14 '15 at 16:48





Guys, you forget there are alphabets other than the Latin one.

– Mateva
Oct 14 '15 at 16:48












12 Answers
12






active

oldest

votes


















216














Use [^A-Za-z0-9].



Note: removed the space since that is not typically considered alphanumeric.






share|improve this answer





















  • 10





    Neither should the space at the end of the character class.

    – Andrew Duffy
    Nov 26 '09 at 20:31






  • 6





    He's probably used to programming in PHP.

    – William
    Nov 26 '09 at 20:31






  • 10





    @William -- it's unfortunate that PHP is now getting credit for PCRE

    – Thomas Dignan
    Feb 11 '13 at 3:10



















116














Try



return value.replaceAll("[^A-Za-z0-9]", "");


or



return value.replaceAll("[\W]|_", "");





share|improve this answer



















  • 3





    With underscores, return value.replaceAll("\W", "");

    – erickson
    Nov 26 '09 at 20:35











  • Of course. Compilers are great at spotting that sort of thing.

    – Andrew Duffy
    Nov 26 '09 at 20:36






  • 1





    The second one doesn't answer the question. What about characters like : / etc?

    – WW.
    Dec 29 '14 at 4:03



















48














return value.replaceAll("[^A-Za-z0-9 ]", "");


This will leave spaces intact. I assume that's what you want. Otherwise, remove the space from the regex.






share|improve this answer































    47














    You should be aware that [^a-zA-Z] will replace characters not being itself in the character range A-Z/a-z. That means special characters like é, ß etc. or cyrillic characters and such will be removed.



    If the replacement of these characters is not wanted use pre-defined character classes instead:



     someString.replaceAll("[^\p{IsAlphabetic}^\p{IsDigit}]", "");


    PS: p{Alnum} does not achieve this effect, it acts the same as [A-Za-z0-9].






    share|improve this answer





















    • 6





      Thanks a lot for this post - it was very useful to me. Additionally, I believe this is the actual answer to the question. The Latin alphabet isn't the only one in the world!

      – Mateva
      Oct 15 '15 at 7:15








    • 1





      Actually, the stated regex will treat "^" as a valid character, since only the first occurrence of "^" is negating the meaning of the selection. [^\p{IsAlphabetic}\p{IsDigit}] works well.

      – Bogdan Klichuk
      Jan 19 '18 at 17:22











    • Only [^\p{Alpha}\p{Digit}] works for me

      – Jakub Turcovsky
      Apr 17 '18 at 13:47








    • 1





      @JakubTurcovsky docs.oracle.com/javase/10/docs/api/java/util/regex/Pattern.html defines IsAlphabetic and IsDigit as binary properties. Alpha and Digit are POSIX character classes (US-ASCII only). Except the docs.oracle.com/javase/10/docs/api/java/util/regex/… flag is specified.

      – Andre Steingress
      Apr 17 '18 at 14:39













    • @AndreSteingress Correct, the reason {IsDigit} doesn't work for me and {Digit} does is that I'm trying this on Android. And Android has UNICODE_CHARACTER_CLASS turned on by default. Thanks for clearance.

      – Jakub Turcovsky
      Apr 30 '18 at 11:28





















    21














    You could also try this simpler regex:



     str = str.replaceAll("\P{Alnum}", "");





    share|improve this answer





















    • 2





      Or, preserving whitespace: str.replaceAll("[^\p{Alnum}\s]", "")

      – Jonik
      Dec 29 '15 at 10:28











    • Or \p{Alnum}\p{Space}.

      – membersound
      Dec 15 '16 at 11:22



















    10














    Java's regular expressions don't require you to put a forward-slash (/) or any other delimiter around the regex, as opposed to other languages like Perl, for example.






    share|improve this answer































      8














      I made this method for creating filenames:



      public static String safeChar(String input)
      {
      char allowed = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-_".toCharArray();
      char charArray = input.toString().toCharArray();
      StringBuilder result = new StringBuilder();
      for (char c : charArray)
      {
      for (char a : allowed)
      {
      if(c==a) result.append(a);
      }
      }
      return result.toString();
      }





      share|improve this answer



















      • 4





        This is pretty brute-force. Regex is the way to go with the OP's situation.

        – Michael Peterson
        Mar 20 '12 at 0:28






      • 1





        You're right, regex is better. But at the time, regex and me I didn't come along well.

        – zneo
        Apr 12 '12 at 19:10











      • Hah, does anyone really get along that well with regex? ;)

        – Michael Peterson
        Apr 12 '12 at 22:46











      • You're so right! After it's written, it sort of turn into machine language..

        – zneo
        Apr 14 '12 at 18:04



















      1














      Simple method:



      public boolean isBlank(String value) {
      return (value == null || value.equals("") || value.equals("null") || value.trim().equals(""));
      }

      public String normalizeOnlyLettersNumbers(String str) {
      if (!isBlank(str)) {
      return str.replaceAll("[^\p{L}\p{Nd}]+", "");
      } else {
      return "";
      }
      }





      share|improve this answer































        1














        public static void main(String args) {
        String value = " Chlamydia_spp. IgG, IgM & IgA Abs (8006) ";

        System.out.println(value.replaceAll("[^A-Za-z0-9]", ""));

        }


        output: ChlamydiasppIgGIgMIgAAbs8006



        Github: https://github.com/AlbinViju/Learning/blob/master/StripNonAlphaNumericFromString.java






        share|improve this answer

































          1














          If you want to also allow alphanumeric characters which don't belong to the ascii characters set, like for instance german umlaut's, you can consider using the following solution:



           String value = "your value";

          // this could be placed as a static final constant, so the compiling is only done once
          Pattern pattern = Pattern.compile("[^\w]", Pattern.UNICODE_CHARACTER_CLASS);

          value = pattern.matcher(value).replaceAll("");


          Please note that the usage of the UNICODE_CHARACTER_CLASS flag could have an impose on performance penalty (see javadoc of this flag)






          share|improve this answer































            1














            Solution:



            value.replaceAll("[^A-Za-z0-9]", "")



            Explanation:




            [^abc]
            When a caret ^ appears as the first character inside square brackets, it negates the pattern. This pattern matches any character except a or b or c.




            Looking at the keyword as two function:




            • [(Pattern)] = match(Pattern)


            • [^(Pattern)] = notMatch(Pattern)


            Moreover regarding a pattern:




            • A-Z = all characters included from A to Z


            • a-z = all characters included from a to z


            • 0=0 = all characters included from 0 to 9



            Therefore it will substitute all the char NOT included in the pattern






            share|improve this answer































              0














              Using Guava you can easily combine different type of criteria. For your specific solution you can use:



              value = CharMatcher.inRange('0', '9')
              .or(CharMatcher.inRange('a', 'z')
              .or(CharMatcher.inRange('A', 'Z'))).retainFrom(value)





              share|improve this answer

























                Your Answer






                StackExchange.ifUsing("editor", function () {
                StackExchange.using("externalEditor", function () {
                StackExchange.using("snippets", function () {
                StackExchange.snippets.init();
                });
                });
                }, "code-snippets");

                StackExchange.ready(function() {
                var channelOptions = {
                tags: "".split(" "),
                id: "1"
                };
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function() {
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled) {
                StackExchange.using("snippets", function() {
                createEditor();
                });
                }
                else {
                createEditor();
                }
                });

                function createEditor() {
                StackExchange.prepareEditor({
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader: {
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                },
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                });


                }
                });














                draft saved

                draft discarded


















                StackExchange.ready(
                function () {
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f1805518%2freplacing-all-non-alphanumeric-characters-with-empty-strings%23new-answer', 'question_page');
                }
                );

                Post as a guest















                Required, but never shown

























                12 Answers
                12






                active

                oldest

                votes








                12 Answers
                12






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                216














                Use [^A-Za-z0-9].



                Note: removed the space since that is not typically considered alphanumeric.






                share|improve this answer





















                • 10





                  Neither should the space at the end of the character class.

                  – Andrew Duffy
                  Nov 26 '09 at 20:31






                • 6





                  He's probably used to programming in PHP.

                  – William
                  Nov 26 '09 at 20:31






                • 10





                  @William -- it's unfortunate that PHP is now getting credit for PCRE

                  – Thomas Dignan
                  Feb 11 '13 at 3:10
















                216














                Use [^A-Za-z0-9].



                Note: removed the space since that is not typically considered alphanumeric.






                share|improve this answer





















                • 10





                  Neither should the space at the end of the character class.

                  – Andrew Duffy
                  Nov 26 '09 at 20:31






                • 6





                  He's probably used to programming in PHP.

                  – William
                  Nov 26 '09 at 20:31






                • 10





                  @William -- it's unfortunate that PHP is now getting credit for PCRE

                  – Thomas Dignan
                  Feb 11 '13 at 3:10














                216












                216








                216







                Use [^A-Za-z0-9].



                Note: removed the space since that is not typically considered alphanumeric.






                share|improve this answer















                Use [^A-Za-z0-9].



                Note: removed the space since that is not typically considered alphanumeric.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Sep 18 '17 at 17:14









                Dave Jarvis

                21k30131256




                21k30131256










                answered Nov 26 '09 at 20:30









                Mirek PlutaMirek Pluta

                5,94312619




                5,94312619








                • 10





                  Neither should the space at the end of the character class.

                  – Andrew Duffy
                  Nov 26 '09 at 20:31






                • 6





                  He's probably used to programming in PHP.

                  – William
                  Nov 26 '09 at 20:31






                • 10





                  @William -- it's unfortunate that PHP is now getting credit for PCRE

                  – Thomas Dignan
                  Feb 11 '13 at 3:10














                • 10





                  Neither should the space at the end of the character class.

                  – Andrew Duffy
                  Nov 26 '09 at 20:31






                • 6





                  He's probably used to programming in PHP.

                  – William
                  Nov 26 '09 at 20:31






                • 10





                  @William -- it's unfortunate that PHP is now getting credit for PCRE

                  – Thomas Dignan
                  Feb 11 '13 at 3:10








                10




                10





                Neither should the space at the end of the character class.

                – Andrew Duffy
                Nov 26 '09 at 20:31





                Neither should the space at the end of the character class.

                – Andrew Duffy
                Nov 26 '09 at 20:31




                6




                6





                He's probably used to programming in PHP.

                – William
                Nov 26 '09 at 20:31





                He's probably used to programming in PHP.

                – William
                Nov 26 '09 at 20:31




                10




                10





                @William -- it's unfortunate that PHP is now getting credit for PCRE

                – Thomas Dignan
                Feb 11 '13 at 3:10





                @William -- it's unfortunate that PHP is now getting credit for PCRE

                – Thomas Dignan
                Feb 11 '13 at 3:10













                116














                Try



                return value.replaceAll("[^A-Za-z0-9]", "");


                or



                return value.replaceAll("[\W]|_", "");





                share|improve this answer



















                • 3





                  With underscores, return value.replaceAll("\W", "");

                  – erickson
                  Nov 26 '09 at 20:35











                • Of course. Compilers are great at spotting that sort of thing.

                  – Andrew Duffy
                  Nov 26 '09 at 20:36






                • 1





                  The second one doesn't answer the question. What about characters like : / etc?

                  – WW.
                  Dec 29 '14 at 4:03
















                116














                Try



                return value.replaceAll("[^A-Za-z0-9]", "");


                or



                return value.replaceAll("[\W]|_", "");





                share|improve this answer



















                • 3





                  With underscores, return value.replaceAll("\W", "");

                  – erickson
                  Nov 26 '09 at 20:35











                • Of course. Compilers are great at spotting that sort of thing.

                  – Andrew Duffy
                  Nov 26 '09 at 20:36






                • 1





                  The second one doesn't answer the question. What about characters like : / etc?

                  – WW.
                  Dec 29 '14 at 4:03














                116












                116








                116







                Try



                return value.replaceAll("[^A-Za-z0-9]", "");


                or



                return value.replaceAll("[\W]|_", "");





                share|improve this answer













                Try



                return value.replaceAll("[^A-Za-z0-9]", "");


                or



                return value.replaceAll("[\W]|_", "");






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 26 '09 at 20:33









                Andrew DuffyAndrew Duffy

                5,49211716




                5,49211716








                • 3





                  With underscores, return value.replaceAll("\W", "");

                  – erickson
                  Nov 26 '09 at 20:35











                • Of course. Compilers are great at spotting that sort of thing.

                  – Andrew Duffy
                  Nov 26 '09 at 20:36






                • 1





                  The second one doesn't answer the question. What about characters like : / etc?

                  – WW.
                  Dec 29 '14 at 4:03














                • 3





                  With underscores, return value.replaceAll("\W", "");

                  – erickson
                  Nov 26 '09 at 20:35











                • Of course. Compilers are great at spotting that sort of thing.

                  – Andrew Duffy
                  Nov 26 '09 at 20:36






                • 1





                  The second one doesn't answer the question. What about characters like : / etc?

                  – WW.
                  Dec 29 '14 at 4:03








                3




                3





                With underscores, return value.replaceAll("\W", "");

                – erickson
                Nov 26 '09 at 20:35





                With underscores, return value.replaceAll("\W", "");

                – erickson
                Nov 26 '09 at 20:35













                Of course. Compilers are great at spotting that sort of thing.

                – Andrew Duffy
                Nov 26 '09 at 20:36





                Of course. Compilers are great at spotting that sort of thing.

                – Andrew Duffy
                Nov 26 '09 at 20:36




                1




                1





                The second one doesn't answer the question. What about characters like : / etc?

                – WW.
                Dec 29 '14 at 4:03





                The second one doesn't answer the question. What about characters like : / etc?

                – WW.
                Dec 29 '14 at 4:03











                48














                return value.replaceAll("[^A-Za-z0-9 ]", "");


                This will leave spaces intact. I assume that's what you want. Otherwise, remove the space from the regex.






                share|improve this answer




























                  48














                  return value.replaceAll("[^A-Za-z0-9 ]", "");


                  This will leave spaces intact. I assume that's what you want. Otherwise, remove the space from the regex.






                  share|improve this answer


























                    48












                    48








                    48







                    return value.replaceAll("[^A-Za-z0-9 ]", "");


                    This will leave spaces intact. I assume that's what you want. Otherwise, remove the space from the regex.






                    share|improve this answer













                    return value.replaceAll("[^A-Za-z0-9 ]", "");


                    This will leave spaces intact. I assume that's what you want. Otherwise, remove the space from the regex.







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Nov 26 '09 at 20:31









                    ericksonerickson

                    222k42331428




                    222k42331428























                        47














                        You should be aware that [^a-zA-Z] will replace characters not being itself in the character range A-Z/a-z. That means special characters like é, ß etc. or cyrillic characters and such will be removed.



                        If the replacement of these characters is not wanted use pre-defined character classes instead:



                         someString.replaceAll("[^\p{IsAlphabetic}^\p{IsDigit}]", "");


                        PS: p{Alnum} does not achieve this effect, it acts the same as [A-Za-z0-9].






                        share|improve this answer





















                        • 6





                          Thanks a lot for this post - it was very useful to me. Additionally, I believe this is the actual answer to the question. The Latin alphabet isn't the only one in the world!

                          – Mateva
                          Oct 15 '15 at 7:15








                        • 1





                          Actually, the stated regex will treat "^" as a valid character, since only the first occurrence of "^" is negating the meaning of the selection. [^\p{IsAlphabetic}\p{IsDigit}] works well.

                          – Bogdan Klichuk
                          Jan 19 '18 at 17:22











                        • Only [^\p{Alpha}\p{Digit}] works for me

                          – Jakub Turcovsky
                          Apr 17 '18 at 13:47








                        • 1





                          @JakubTurcovsky docs.oracle.com/javase/10/docs/api/java/util/regex/Pattern.html defines IsAlphabetic and IsDigit as binary properties. Alpha and Digit are POSIX character classes (US-ASCII only). Except the docs.oracle.com/javase/10/docs/api/java/util/regex/… flag is specified.

                          – Andre Steingress
                          Apr 17 '18 at 14:39













                        • @AndreSteingress Correct, the reason {IsDigit} doesn't work for me and {Digit} does is that I'm trying this on Android. And Android has UNICODE_CHARACTER_CLASS turned on by default. Thanks for clearance.

                          – Jakub Turcovsky
                          Apr 30 '18 at 11:28


















                        47














                        You should be aware that [^a-zA-Z] will replace characters not being itself in the character range A-Z/a-z. That means special characters like é, ß etc. or cyrillic characters and such will be removed.



                        If the replacement of these characters is not wanted use pre-defined character classes instead:



                         someString.replaceAll("[^\p{IsAlphabetic}^\p{IsDigit}]", "");


                        PS: p{Alnum} does not achieve this effect, it acts the same as [A-Za-z0-9].






                        share|improve this answer





















                        • 6





                          Thanks a lot for this post - it was very useful to me. Additionally, I believe this is the actual answer to the question. The Latin alphabet isn't the only one in the world!

                          – Mateva
                          Oct 15 '15 at 7:15








                        • 1





                          Actually, the stated regex will treat "^" as a valid character, since only the first occurrence of "^" is negating the meaning of the selection. [^\p{IsAlphabetic}\p{IsDigit}] works well.

                          – Bogdan Klichuk
                          Jan 19 '18 at 17:22











                        • Only [^\p{Alpha}\p{Digit}] works for me

                          – Jakub Turcovsky
                          Apr 17 '18 at 13:47








                        • 1





                          @JakubTurcovsky docs.oracle.com/javase/10/docs/api/java/util/regex/Pattern.html defines IsAlphabetic and IsDigit as binary properties. Alpha and Digit are POSIX character classes (US-ASCII only). Except the docs.oracle.com/javase/10/docs/api/java/util/regex/… flag is specified.

                          – Andre Steingress
                          Apr 17 '18 at 14:39













                        • @AndreSteingress Correct, the reason {IsDigit} doesn't work for me and {Digit} does is that I'm trying this on Android. And Android has UNICODE_CHARACTER_CLASS turned on by default. Thanks for clearance.

                          – Jakub Turcovsky
                          Apr 30 '18 at 11:28
















                        47












                        47








                        47







                        You should be aware that [^a-zA-Z] will replace characters not being itself in the character range A-Z/a-z. That means special characters like é, ß etc. or cyrillic characters and such will be removed.



                        If the replacement of these characters is not wanted use pre-defined character classes instead:



                         someString.replaceAll("[^\p{IsAlphabetic}^\p{IsDigit}]", "");


                        PS: p{Alnum} does not achieve this effect, it acts the same as [A-Za-z0-9].






                        share|improve this answer















                        You should be aware that [^a-zA-Z] will replace characters not being itself in the character range A-Z/a-z. That means special characters like é, ß etc. or cyrillic characters and such will be removed.



                        If the replacement of these characters is not wanted use pre-defined character classes instead:



                         someString.replaceAll("[^\p{IsAlphabetic}^\p{IsDigit}]", "");


                        PS: p{Alnum} does not achieve this effect, it acts the same as [A-Za-z0-9].







                        share|improve this answer














                        share|improve this answer



                        share|improve this answer








                        edited Sep 18 '17 at 10:26

























                        answered Sep 17 '15 at 10:25









                        Andre SteingressAndre Steingress

                        3,3872022




                        3,3872022








                        • 6





                          Thanks a lot for this post - it was very useful to me. Additionally, I believe this is the actual answer to the question. The Latin alphabet isn't the only one in the world!

                          – Mateva
                          Oct 15 '15 at 7:15








                        • 1





                          Actually, the stated regex will treat "^" as a valid character, since only the first occurrence of "^" is negating the meaning of the selection. [^\p{IsAlphabetic}\p{IsDigit}] works well.

                          – Bogdan Klichuk
                          Jan 19 '18 at 17:22











                        • Only [^\p{Alpha}\p{Digit}] works for me

                          – Jakub Turcovsky
                          Apr 17 '18 at 13:47








                        • 1





                          @JakubTurcovsky docs.oracle.com/javase/10/docs/api/java/util/regex/Pattern.html defines IsAlphabetic and IsDigit as binary properties. Alpha and Digit are POSIX character classes (US-ASCII only). Except the docs.oracle.com/javase/10/docs/api/java/util/regex/… flag is specified.

                          – Andre Steingress
                          Apr 17 '18 at 14:39













                        • @AndreSteingress Correct, the reason {IsDigit} doesn't work for me and {Digit} does is that I'm trying this on Android. And Android has UNICODE_CHARACTER_CLASS turned on by default. Thanks for clearance.

                          – Jakub Turcovsky
                          Apr 30 '18 at 11:28
















                        • 6





                          Thanks a lot for this post - it was very useful to me. Additionally, I believe this is the actual answer to the question. The Latin alphabet isn't the only one in the world!

                          – Mateva
                          Oct 15 '15 at 7:15








                        • 1





                          Actually, the stated regex will treat "^" as a valid character, since only the first occurrence of "^" is negating the meaning of the selection. [^\p{IsAlphabetic}\p{IsDigit}] works well.

                          – Bogdan Klichuk
                          Jan 19 '18 at 17:22











                        • Only [^\p{Alpha}\p{Digit}] works for me

                          – Jakub Turcovsky
                          Apr 17 '18 at 13:47








                        • 1





                          @JakubTurcovsky docs.oracle.com/javase/10/docs/api/java/util/regex/Pattern.html defines IsAlphabetic and IsDigit as binary properties. Alpha and Digit are POSIX character classes (US-ASCII only). Except the docs.oracle.com/javase/10/docs/api/java/util/regex/… flag is specified.

                          – Andre Steingress
                          Apr 17 '18 at 14:39













                        • @AndreSteingress Correct, the reason {IsDigit} doesn't work for me and {Digit} does is that I'm trying this on Android. And Android has UNICODE_CHARACTER_CLASS turned on by default. Thanks for clearance.

                          – Jakub Turcovsky
                          Apr 30 '18 at 11:28










                        6




                        6





                        Thanks a lot for this post - it was very useful to me. Additionally, I believe this is the actual answer to the question. The Latin alphabet isn't the only one in the world!

                        – Mateva
                        Oct 15 '15 at 7:15







                        Thanks a lot for this post - it was very useful to me. Additionally, I believe this is the actual answer to the question. The Latin alphabet isn't the only one in the world!

                        – Mateva
                        Oct 15 '15 at 7:15






                        1




                        1





                        Actually, the stated regex will treat "^" as a valid character, since only the first occurrence of "^" is negating the meaning of the selection. [^\p{IsAlphabetic}\p{IsDigit}] works well.

                        – Bogdan Klichuk
                        Jan 19 '18 at 17:22





                        Actually, the stated regex will treat "^" as a valid character, since only the first occurrence of "^" is negating the meaning of the selection. [^\p{IsAlphabetic}\p{IsDigit}] works well.

                        – Bogdan Klichuk
                        Jan 19 '18 at 17:22













                        Only [^\p{Alpha}\p{Digit}] works for me

                        – Jakub Turcovsky
                        Apr 17 '18 at 13:47







                        Only [^\p{Alpha}\p{Digit}] works for me

                        – Jakub Turcovsky
                        Apr 17 '18 at 13:47






                        1




                        1





                        @JakubTurcovsky docs.oracle.com/javase/10/docs/api/java/util/regex/Pattern.html defines IsAlphabetic and IsDigit as binary properties. Alpha and Digit are POSIX character classes (US-ASCII only). Except the docs.oracle.com/javase/10/docs/api/java/util/regex/… flag is specified.

                        – Andre Steingress
                        Apr 17 '18 at 14:39







                        @JakubTurcovsky docs.oracle.com/javase/10/docs/api/java/util/regex/Pattern.html defines IsAlphabetic and IsDigit as binary properties. Alpha and Digit are POSIX character classes (US-ASCII only). Except the docs.oracle.com/javase/10/docs/api/java/util/regex/… flag is specified.

                        – Andre Steingress
                        Apr 17 '18 at 14:39















                        @AndreSteingress Correct, the reason {IsDigit} doesn't work for me and {Digit} does is that I'm trying this on Android. And Android has UNICODE_CHARACTER_CLASS turned on by default. Thanks for clearance.

                        – Jakub Turcovsky
                        Apr 30 '18 at 11:28







                        @AndreSteingress Correct, the reason {IsDigit} doesn't work for me and {Digit} does is that I'm trying this on Android. And Android has UNICODE_CHARACTER_CLASS turned on by default. Thanks for clearance.

                        – Jakub Turcovsky
                        Apr 30 '18 at 11:28













                        21














                        You could also try this simpler regex:



                         str = str.replaceAll("\P{Alnum}", "");





                        share|improve this answer





















                        • 2





                          Or, preserving whitespace: str.replaceAll("[^\p{Alnum}\s]", "")

                          – Jonik
                          Dec 29 '15 at 10:28











                        • Or \p{Alnum}\p{Space}.

                          – membersound
                          Dec 15 '16 at 11:22
















                        21














                        You could also try this simpler regex:



                         str = str.replaceAll("\P{Alnum}", "");





                        share|improve this answer





















                        • 2





                          Or, preserving whitespace: str.replaceAll("[^\p{Alnum}\s]", "")

                          – Jonik
                          Dec 29 '15 at 10:28











                        • Or \p{Alnum}\p{Space}.

                          – membersound
                          Dec 15 '16 at 11:22














                        21












                        21








                        21







                        You could also try this simpler regex:



                         str = str.replaceAll("\P{Alnum}", "");





                        share|improve this answer















                        You could also try this simpler regex:



                         str = str.replaceAll("\P{Alnum}", "");






                        share|improve this answer














                        share|improve this answer



                        share|improve this answer








                        edited May 20 '14 at 3:14









                        nhinkle

                        87911831




                        87911831










                        answered Aug 6 '13 at 12:17









                        sauravsaurav

                        2,6761630




                        2,6761630








                        • 2





                          Or, preserving whitespace: str.replaceAll("[^\p{Alnum}\s]", "")

                          – Jonik
                          Dec 29 '15 at 10:28











                        • Or \p{Alnum}\p{Space}.

                          – membersound
                          Dec 15 '16 at 11:22














                        • 2





                          Or, preserving whitespace: str.replaceAll("[^\p{Alnum}\s]", "")

                          – Jonik
                          Dec 29 '15 at 10:28











                        • Or \p{Alnum}\p{Space}.

                          – membersound
                          Dec 15 '16 at 11:22








                        2




                        2





                        Or, preserving whitespace: str.replaceAll("[^\p{Alnum}\s]", "")

                        – Jonik
                        Dec 29 '15 at 10:28





                        Or, preserving whitespace: str.replaceAll("[^\p{Alnum}\s]", "")

                        – Jonik
                        Dec 29 '15 at 10:28













                        Or \p{Alnum}\p{Space}.

                        – membersound
                        Dec 15 '16 at 11:22





                        Or \p{Alnum}\p{Space}.

                        – membersound
                        Dec 15 '16 at 11:22











                        10














                        Java's regular expressions don't require you to put a forward-slash (/) or any other delimiter around the regex, as opposed to other languages like Perl, for example.






                        share|improve this answer




























                          10














                          Java's regular expressions don't require you to put a forward-slash (/) or any other delimiter around the regex, as opposed to other languages like Perl, for example.






                          share|improve this answer


























                            10












                            10








                            10







                            Java's regular expressions don't require you to put a forward-slash (/) or any other delimiter around the regex, as opposed to other languages like Perl, for example.






                            share|improve this answer













                            Java's regular expressions don't require you to put a forward-slash (/) or any other delimiter around the regex, as opposed to other languages like Perl, for example.







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Nov 26 '09 at 20:39









                            abyxabyx

                            43.1k1578109




                            43.1k1578109























                                8














                                I made this method for creating filenames:



                                public static String safeChar(String input)
                                {
                                char allowed = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-_".toCharArray();
                                char charArray = input.toString().toCharArray();
                                StringBuilder result = new StringBuilder();
                                for (char c : charArray)
                                {
                                for (char a : allowed)
                                {
                                if(c==a) result.append(a);
                                }
                                }
                                return result.toString();
                                }





                                share|improve this answer



















                                • 4





                                  This is pretty brute-force. Regex is the way to go with the OP's situation.

                                  – Michael Peterson
                                  Mar 20 '12 at 0:28






                                • 1





                                  You're right, regex is better. But at the time, regex and me I didn't come along well.

                                  – zneo
                                  Apr 12 '12 at 19:10











                                • Hah, does anyone really get along that well with regex? ;)

                                  – Michael Peterson
                                  Apr 12 '12 at 22:46











                                • You're so right! After it's written, it sort of turn into machine language..

                                  – zneo
                                  Apr 14 '12 at 18:04
















                                8














                                I made this method for creating filenames:



                                public static String safeChar(String input)
                                {
                                char allowed = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-_".toCharArray();
                                char charArray = input.toString().toCharArray();
                                StringBuilder result = new StringBuilder();
                                for (char c : charArray)
                                {
                                for (char a : allowed)
                                {
                                if(c==a) result.append(a);
                                }
                                }
                                return result.toString();
                                }





                                share|improve this answer



















                                • 4





                                  This is pretty brute-force. Regex is the way to go with the OP's situation.

                                  – Michael Peterson
                                  Mar 20 '12 at 0:28






                                • 1





                                  You're right, regex is better. But at the time, regex and me I didn't come along well.

                                  – zneo
                                  Apr 12 '12 at 19:10











                                • Hah, does anyone really get along that well with regex? ;)

                                  – Michael Peterson
                                  Apr 12 '12 at 22:46











                                • You're so right! After it's written, it sort of turn into machine language..

                                  – zneo
                                  Apr 14 '12 at 18:04














                                8












                                8








                                8







                                I made this method for creating filenames:



                                public static String safeChar(String input)
                                {
                                char allowed = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-_".toCharArray();
                                char charArray = input.toString().toCharArray();
                                StringBuilder result = new StringBuilder();
                                for (char c : charArray)
                                {
                                for (char a : allowed)
                                {
                                if(c==a) result.append(a);
                                }
                                }
                                return result.toString();
                                }





                                share|improve this answer













                                I made this method for creating filenames:



                                public static String safeChar(String input)
                                {
                                char allowed = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-_".toCharArray();
                                char charArray = input.toString().toCharArray();
                                StringBuilder result = new StringBuilder();
                                for (char c : charArray)
                                {
                                for (char a : allowed)
                                {
                                if(c==a) result.append(a);
                                }
                                }
                                return result.toString();
                                }






                                share|improve this answer












                                share|improve this answer



                                share|improve this answer










                                answered Nov 27 '09 at 2:08









                                zneozneo

                                488310




                                488310








                                • 4





                                  This is pretty brute-force. Regex is the way to go with the OP's situation.

                                  – Michael Peterson
                                  Mar 20 '12 at 0:28






                                • 1





                                  You're right, regex is better. But at the time, regex and me I didn't come along well.

                                  – zneo
                                  Apr 12 '12 at 19:10











                                • Hah, does anyone really get along that well with regex? ;)

                                  – Michael Peterson
                                  Apr 12 '12 at 22:46











                                • You're so right! After it's written, it sort of turn into machine language..

                                  – zneo
                                  Apr 14 '12 at 18:04














                                • 4





                                  This is pretty brute-force. Regex is the way to go with the OP's situation.

                                  – Michael Peterson
                                  Mar 20 '12 at 0:28






                                • 1





                                  You're right, regex is better. But at the time, regex and me I didn't come along well.

                                  – zneo
                                  Apr 12 '12 at 19:10











                                • Hah, does anyone really get along that well with regex? ;)

                                  – Michael Peterson
                                  Apr 12 '12 at 22:46











                                • You're so right! After it's written, it sort of turn into machine language..

                                  – zneo
                                  Apr 14 '12 at 18:04








                                4




                                4





                                This is pretty brute-force. Regex is the way to go with the OP's situation.

                                – Michael Peterson
                                Mar 20 '12 at 0:28





                                This is pretty brute-force. Regex is the way to go with the OP's situation.

                                – Michael Peterson
                                Mar 20 '12 at 0:28




                                1




                                1





                                You're right, regex is better. But at the time, regex and me I didn't come along well.

                                – zneo
                                Apr 12 '12 at 19:10





                                You're right, regex is better. But at the time, regex and me I didn't come along well.

                                – zneo
                                Apr 12 '12 at 19:10













                                Hah, does anyone really get along that well with regex? ;)

                                – Michael Peterson
                                Apr 12 '12 at 22:46





                                Hah, does anyone really get along that well with regex? ;)

                                – Michael Peterson
                                Apr 12 '12 at 22:46













                                You're so right! After it's written, it sort of turn into machine language..

                                – zneo
                                Apr 14 '12 at 18:04





                                You're so right! After it's written, it sort of turn into machine language..

                                – zneo
                                Apr 14 '12 at 18:04











                                1














                                Simple method:



                                public boolean isBlank(String value) {
                                return (value == null || value.equals("") || value.equals("null") || value.trim().equals(""));
                                }

                                public String normalizeOnlyLettersNumbers(String str) {
                                if (!isBlank(str)) {
                                return str.replaceAll("[^\p{L}\p{Nd}]+", "");
                                } else {
                                return "";
                                }
                                }





                                share|improve this answer




























                                  1














                                  Simple method:



                                  public boolean isBlank(String value) {
                                  return (value == null || value.equals("") || value.equals("null") || value.trim().equals(""));
                                  }

                                  public String normalizeOnlyLettersNumbers(String str) {
                                  if (!isBlank(str)) {
                                  return str.replaceAll("[^\p{L}\p{Nd}]+", "");
                                  } else {
                                  return "";
                                  }
                                  }





                                  share|improve this answer


























                                    1












                                    1








                                    1







                                    Simple method:



                                    public boolean isBlank(String value) {
                                    return (value == null || value.equals("") || value.equals("null") || value.trim().equals(""));
                                    }

                                    public String normalizeOnlyLettersNumbers(String str) {
                                    if (!isBlank(str)) {
                                    return str.replaceAll("[^\p{L}\p{Nd}]+", "");
                                    } else {
                                    return "";
                                    }
                                    }





                                    share|improve this answer













                                    Simple method:



                                    public boolean isBlank(String value) {
                                    return (value == null || value.equals("") || value.equals("null") || value.trim().equals(""));
                                    }

                                    public String normalizeOnlyLettersNumbers(String str) {
                                    if (!isBlank(str)) {
                                    return str.replaceAll("[^\p{L}\p{Nd}]+", "");
                                    } else {
                                    return "";
                                    }
                                    }






                                    share|improve this answer












                                    share|improve this answer



                                    share|improve this answer










                                    answered Nov 1 '16 at 19:36









                                    Alberto CerqueiraAlberto Cerqueira

                                    639912




                                    639912























                                        1














                                        public static void main(String args) {
                                        String value = " Chlamydia_spp. IgG, IgM & IgA Abs (8006) ";

                                        System.out.println(value.replaceAll("[^A-Za-z0-9]", ""));

                                        }


                                        output: ChlamydiasppIgGIgMIgAAbs8006



                                        Github: https://github.com/AlbinViju/Learning/blob/master/StripNonAlphaNumericFromString.java






                                        share|improve this answer






























                                          1














                                          public static void main(String args) {
                                          String value = " Chlamydia_spp. IgG, IgM & IgA Abs (8006) ";

                                          System.out.println(value.replaceAll("[^A-Za-z0-9]", ""));

                                          }


                                          output: ChlamydiasppIgGIgMIgAAbs8006



                                          Github: https://github.com/AlbinViju/Learning/blob/master/StripNonAlphaNumericFromString.java






                                          share|improve this answer




























                                            1












                                            1








                                            1







                                            public static void main(String args) {
                                            String value = " Chlamydia_spp. IgG, IgM & IgA Abs (8006) ";

                                            System.out.println(value.replaceAll("[^A-Za-z0-9]", ""));

                                            }


                                            output: ChlamydiasppIgGIgMIgAAbs8006



                                            Github: https://github.com/AlbinViju/Learning/blob/master/StripNonAlphaNumericFromString.java






                                            share|improve this answer















                                            public static void main(String args) {
                                            String value = " Chlamydia_spp. IgG, IgM & IgA Abs (8006) ";

                                            System.out.println(value.replaceAll("[^A-Za-z0-9]", ""));

                                            }


                                            output: ChlamydiasppIgGIgMIgAAbs8006



                                            Github: https://github.com/AlbinViju/Learning/blob/master/StripNonAlphaNumericFromString.java







                                            share|improve this answer














                                            share|improve this answer



                                            share|improve this answer








                                            edited Aug 23 '17 at 15:46









                                            Jason Roman

                                            6,304102530




                                            6,304102530










                                            answered Aug 23 '17 at 15:21









                                            AlbinAlbin

                                            114




                                            114























                                                1














                                                If you want to also allow alphanumeric characters which don't belong to the ascii characters set, like for instance german umlaut's, you can consider using the following solution:



                                                 String value = "your value";

                                                // this could be placed as a static final constant, so the compiling is only done once
                                                Pattern pattern = Pattern.compile("[^\w]", Pattern.UNICODE_CHARACTER_CLASS);

                                                value = pattern.matcher(value).replaceAll("");


                                                Please note that the usage of the UNICODE_CHARACTER_CLASS flag could have an impose on performance penalty (see javadoc of this flag)






                                                share|improve this answer




























                                                  1














                                                  If you want to also allow alphanumeric characters which don't belong to the ascii characters set, like for instance german umlaut's, you can consider using the following solution:



                                                   String value = "your value";

                                                  // this could be placed as a static final constant, so the compiling is only done once
                                                  Pattern pattern = Pattern.compile("[^\w]", Pattern.UNICODE_CHARACTER_CLASS);

                                                  value = pattern.matcher(value).replaceAll("");


                                                  Please note that the usage of the UNICODE_CHARACTER_CLASS flag could have an impose on performance penalty (see javadoc of this flag)






                                                  share|improve this answer


























                                                    1












                                                    1








                                                    1







                                                    If you want to also allow alphanumeric characters which don't belong to the ascii characters set, like for instance german umlaut's, you can consider using the following solution:



                                                     String value = "your value";

                                                    // this could be placed as a static final constant, so the compiling is only done once
                                                    Pattern pattern = Pattern.compile("[^\w]", Pattern.UNICODE_CHARACTER_CLASS);

                                                    value = pattern.matcher(value).replaceAll("");


                                                    Please note that the usage of the UNICODE_CHARACTER_CLASS flag could have an impose on performance penalty (see javadoc of this flag)






                                                    share|improve this answer













                                                    If you want to also allow alphanumeric characters which don't belong to the ascii characters set, like for instance german umlaut's, you can consider using the following solution:



                                                     String value = "your value";

                                                    // this could be placed as a static final constant, so the compiling is only done once
                                                    Pattern pattern = Pattern.compile("[^\w]", Pattern.UNICODE_CHARACTER_CLASS);

                                                    value = pattern.matcher(value).replaceAll("");


                                                    Please note that the usage of the UNICODE_CHARACTER_CLASS flag could have an impose on performance penalty (see javadoc of this flag)







                                                    share|improve this answer












                                                    share|improve this answer



                                                    share|improve this answer










                                                    answered May 24 '18 at 10:18









                                                    snapsnap

                                                    6411711




                                                    6411711























                                                        1














                                                        Solution:



                                                        value.replaceAll("[^A-Za-z0-9]", "")



                                                        Explanation:




                                                        [^abc]
                                                        When a caret ^ appears as the first character inside square brackets, it negates the pattern. This pattern matches any character except a or b or c.




                                                        Looking at the keyword as two function:




                                                        • [(Pattern)] = match(Pattern)


                                                        • [^(Pattern)] = notMatch(Pattern)


                                                        Moreover regarding a pattern:




                                                        • A-Z = all characters included from A to Z


                                                        • a-z = all characters included from a to z


                                                        • 0=0 = all characters included from 0 to 9



                                                        Therefore it will substitute all the char NOT included in the pattern






                                                        share|improve this answer




























                                                          1














                                                          Solution:



                                                          value.replaceAll("[^A-Za-z0-9]", "")



                                                          Explanation:




                                                          [^abc]
                                                          When a caret ^ appears as the first character inside square brackets, it negates the pattern. This pattern matches any character except a or b or c.




                                                          Looking at the keyword as two function:




                                                          • [(Pattern)] = match(Pattern)


                                                          • [^(Pattern)] = notMatch(Pattern)


                                                          Moreover regarding a pattern:




                                                          • A-Z = all characters included from A to Z


                                                          • a-z = all characters included from a to z


                                                          • 0=0 = all characters included from 0 to 9



                                                          Therefore it will substitute all the char NOT included in the pattern






                                                          share|improve this answer


























                                                            1












                                                            1








                                                            1







                                                            Solution:



                                                            value.replaceAll("[^A-Za-z0-9]", "")



                                                            Explanation:




                                                            [^abc]
                                                            When a caret ^ appears as the first character inside square brackets, it negates the pattern. This pattern matches any character except a or b or c.




                                                            Looking at the keyword as two function:




                                                            • [(Pattern)] = match(Pattern)


                                                            • [^(Pattern)] = notMatch(Pattern)


                                                            Moreover regarding a pattern:




                                                            • A-Z = all characters included from A to Z


                                                            • a-z = all characters included from a to z


                                                            • 0=0 = all characters included from 0 to 9



                                                            Therefore it will substitute all the char NOT included in the pattern






                                                            share|improve this answer













                                                            Solution:



                                                            value.replaceAll("[^A-Za-z0-9]", "")



                                                            Explanation:




                                                            [^abc]
                                                            When a caret ^ appears as the first character inside square brackets, it negates the pattern. This pattern matches any character except a or b or c.




                                                            Looking at the keyword as two function:




                                                            • [(Pattern)] = match(Pattern)


                                                            • [^(Pattern)] = notMatch(Pattern)


                                                            Moreover regarding a pattern:




                                                            • A-Z = all characters included from A to Z


                                                            • a-z = all characters included from a to z


                                                            • 0=0 = all characters included from 0 to 9



                                                            Therefore it will substitute all the char NOT included in the pattern







                                                            share|improve this answer












                                                            share|improve this answer



                                                            share|improve this answer










                                                            answered Nov 21 '18 at 12:07









                                                            GalloCedroneGalloCedrone

                                                            2,5752928




                                                            2,5752928























                                                                0














                                                                Using Guava you can easily combine different type of criteria. For your specific solution you can use:



                                                                value = CharMatcher.inRange('0', '9')
                                                                .or(CharMatcher.inRange('a', 'z')
                                                                .or(CharMatcher.inRange('A', 'Z'))).retainFrom(value)





                                                                share|improve this answer






























                                                                  0














                                                                  Using Guava you can easily combine different type of criteria. For your specific solution you can use:



                                                                  value = CharMatcher.inRange('0', '9')
                                                                  .or(CharMatcher.inRange('a', 'z')
                                                                  .or(CharMatcher.inRange('A', 'Z'))).retainFrom(value)





                                                                  share|improve this answer




























                                                                    0












                                                                    0








                                                                    0







                                                                    Using Guava you can easily combine different type of criteria. For your specific solution you can use:



                                                                    value = CharMatcher.inRange('0', '9')
                                                                    .or(CharMatcher.inRange('a', 'z')
                                                                    .or(CharMatcher.inRange('A', 'Z'))).retainFrom(value)





                                                                    share|improve this answer















                                                                    Using Guava you can easily combine different type of criteria. For your specific solution you can use:



                                                                    value = CharMatcher.inRange('0', '9')
                                                                    .or(CharMatcher.inRange('a', 'z')
                                                                    .or(CharMatcher.inRange('A', 'Z'))).retainFrom(value)






                                                                    share|improve this answer














                                                                    share|improve this answer



                                                                    share|improve this answer








                                                                    edited Oct 4 '18 at 8:40

























                                                                    answered Oct 4 '18 at 7:45









                                                                    Debmalya BiswasDebmalya Biswas

                                                                    2,0041726




                                                                    2,0041726






























                                                                        draft saved

                                                                        draft discarded




















































                                                                        Thanks for contributing an answer to Stack Overflow!


                                                                        • Please be sure to answer the question. Provide details and share your research!

                                                                        But avoid



                                                                        • Asking for help, clarification, or responding to other answers.

                                                                        • Making statements based on opinion; back them up with references or personal experience.


                                                                        To learn more, see our tips on writing great answers.




                                                                        draft saved


                                                                        draft discarded














                                                                        StackExchange.ready(
                                                                        function () {
                                                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f1805518%2freplacing-all-non-alphanumeric-characters-with-empty-strings%23new-answer', 'question_page');
                                                                        }
                                                                        );

                                                                        Post as a guest















                                                                        Required, but never shown





















































                                                                        Required, but never shown














                                                                        Required, but never shown












                                                                        Required, but never shown







                                                                        Required, but never shown

































                                                                        Required, but never shown














                                                                        Required, but never shown












                                                                        Required, but never shown







                                                                        Required, but never shown







                                                                        Popular posts from this blog

                                                                        MongoDB - Not Authorized To Execute Command

                                                                        in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith

                                                                        Npm cannot find a required file even through it is in the searched directory