Replacing all non-alphanumeric characters with empty strings
I tried using this but didn't work-
return value.replaceAll("/[^A-Za-z0-9 ]/", "");
java regex non-alphanumeric
add a comment |
I tried using this but didn't work-
return value.replaceAll("/[^A-Za-z0-9 ]/", "");
java regex non-alphanumeric
27
Guys, you forget there are alphabets other than the Latin one.
– Mateva
Oct 14 '15 at 16:48
add a comment |
I tried using this but didn't work-
return value.replaceAll("/[^A-Za-z0-9 ]/", "");
java regex non-alphanumeric
I tried using this but didn't work-
return value.replaceAll("/[^A-Za-z0-9 ]/", "");
java regex non-alphanumeric
java regex non-alphanumeric
asked Nov 26 '09 at 20:28
Alex GomesAlex Gomes
900286
900286
27
Guys, you forget there are alphabets other than the Latin one.
– Mateva
Oct 14 '15 at 16:48
add a comment |
27
Guys, you forget there are alphabets other than the Latin one.
– Mateva
Oct 14 '15 at 16:48
27
27
Guys, you forget there are alphabets other than the Latin one.
– Mateva
Oct 14 '15 at 16:48
Guys, you forget there are alphabets other than the Latin one.
– Mateva
Oct 14 '15 at 16:48
add a comment |
12 Answers
12
active
oldest
votes
Use [^A-Za-z0-9]
.
Note: removed the space since that is not typically considered alphanumeric.
10
Neither should the space at the end of the character class.
– Andrew Duffy
Nov 26 '09 at 20:31
6
He's probably used to programming in PHP.
– William
Nov 26 '09 at 20:31
10
@William -- it's unfortunate that PHP is now getting credit for PCRE
– Thomas Dignan
Feb 11 '13 at 3:10
add a comment |
Try
return value.replaceAll("[^A-Za-z0-9]", "");
or
return value.replaceAll("[\W]|_", "");
3
With underscores,return value.replaceAll("\W", "");
– erickson
Nov 26 '09 at 20:35
Of course. Compilers are great at spotting that sort of thing.
– Andrew Duffy
Nov 26 '09 at 20:36
1
The second one doesn't answer the question. What about characters like : / etc?
– WW.
Dec 29 '14 at 4:03
add a comment |
return value.replaceAll("[^A-Za-z0-9 ]", "");
This will leave spaces intact. I assume that's what you want. Otherwise, remove the space from the regex.
add a comment |
You should be aware that [^a-zA-Z]
will replace characters not being itself in the character range A-Z/a-z. That means special characters like é
, ß
etc. or cyrillic characters and such will be removed.
If the replacement of these characters is not wanted use pre-defined character classes instead:
someString.replaceAll("[^\p{IsAlphabetic}^\p{IsDigit}]", "");
PS: p{Alnum}
does not achieve this effect, it acts the same as [A-Za-z0-9]
.
6
Thanks a lot for this post - it was very useful to me. Additionally, I believe this is the actual answer to the question. The Latin alphabet isn't the only one in the world!
– Mateva
Oct 15 '15 at 7:15
1
Actually, the stated regex will treat "^" as a valid character, since only the first occurrence of "^" is negating the meaning of the selection.[^\p{IsAlphabetic}\p{IsDigit}]
works well.
– Bogdan Klichuk
Jan 19 '18 at 17:22
Only[^\p{Alpha}\p{Digit}]
works for me
– Jakub Turcovsky
Apr 17 '18 at 13:47
1
@JakubTurcovsky docs.oracle.com/javase/10/docs/api/java/util/regex/Pattern.html defines IsAlphabetic and IsDigit as binary properties. Alpha and Digit are POSIX character classes (US-ASCII only). Except the docs.oracle.com/javase/10/docs/api/java/util/regex/… flag is specified.
– Andre Steingress
Apr 17 '18 at 14:39
@AndreSteingress Correct, the reason{IsDigit}
doesn't work for me and{Digit}
does is that I'm trying this on Android. And Android hasUNICODE_CHARACTER_CLASS
turned on by default. Thanks for clearance.
– Jakub Turcovsky
Apr 30 '18 at 11:28
|
show 1 more comment
You could also try this simpler regex:
str = str.replaceAll("\P{Alnum}", "");
2
Or, preserving whitespace:str.replaceAll("[^\p{Alnum}\s]", "")
– Jonik
Dec 29 '15 at 10:28
Or\p{Alnum}\p{Space}
.
– membersound
Dec 15 '16 at 11:22
add a comment |
Java's regular expressions don't require you to put a forward-slash (/
) or any other delimiter around the regex, as opposed to other languages like Perl, for example.
add a comment |
I made this method for creating filenames:
public static String safeChar(String input)
{
char allowed = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-_".toCharArray();
char charArray = input.toString().toCharArray();
StringBuilder result = new StringBuilder();
for (char c : charArray)
{
for (char a : allowed)
{
if(c==a) result.append(a);
}
}
return result.toString();
}
4
This is pretty brute-force. Regex is the way to go with the OP's situation.
– Michael Peterson
Mar 20 '12 at 0:28
1
You're right, regex is better. But at the time, regex and me I didn't come along well.
– zneo
Apr 12 '12 at 19:10
Hah, does anyone really get along that well with regex? ;)
– Michael Peterson
Apr 12 '12 at 22:46
You're so right! After it's written, it sort of turn into machine language..
– zneo
Apr 14 '12 at 18:04
add a comment |
Simple method:
public boolean isBlank(String value) {
return (value == null || value.equals("") || value.equals("null") || value.trim().equals(""));
}
public String normalizeOnlyLettersNumbers(String str) {
if (!isBlank(str)) {
return str.replaceAll("[^\p{L}\p{Nd}]+", "");
} else {
return "";
}
}
add a comment |
public static void main(String args) {
String value = " Chlamydia_spp. IgG, IgM & IgA Abs (8006) ";
System.out.println(value.replaceAll("[^A-Za-z0-9]", ""));
}
output: ChlamydiasppIgGIgMIgAAbs8006
Github: https://github.com/AlbinViju/Learning/blob/master/StripNonAlphaNumericFromString.java
add a comment |
If you want to also allow alphanumeric characters which don't belong to the ascii characters set, like for instance german umlaut's, you can consider using the following solution:
String value = "your value";
// this could be placed as a static final constant, so the compiling is only done once
Pattern pattern = Pattern.compile("[^\w]", Pattern.UNICODE_CHARACTER_CLASS);
value = pattern.matcher(value).replaceAll("");
Please note that the usage of the UNICODE_CHARACTER_CLASS flag could have an impose on performance penalty (see javadoc of this flag)
add a comment |
Solution:
value.replaceAll("[^A-Za-z0-9]", "")
Explanation:
[^abc]
When a caret^
appears as the first character inside square brackets, it negates the pattern. This pattern matches any character except a or b or c.
Looking at the keyword as two function:
[(Pattern)] = match(Pattern)
[^(Pattern)] = notMatch(Pattern)
Moreover regarding a pattern:
A-Z = all characters included from A to Z
a-z = all characters included from a to z
0=0 = all characters included from 0 to 9
Therefore it will substitute all the char NOT included in the pattern
add a comment |
Using Guava you can easily combine different type of criteria. For your specific solution you can use:
value = CharMatcher.inRange('0', '9')
.or(CharMatcher.inRange('a', 'z')
.or(CharMatcher.inRange('A', 'Z'))).retainFrom(value)
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f1805518%2freplacing-all-non-alphanumeric-characters-with-empty-strings%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
12 Answers
12
active
oldest
votes
12 Answers
12
active
oldest
votes
active
oldest
votes
active
oldest
votes
Use [^A-Za-z0-9]
.
Note: removed the space since that is not typically considered alphanumeric.
10
Neither should the space at the end of the character class.
– Andrew Duffy
Nov 26 '09 at 20:31
6
He's probably used to programming in PHP.
– William
Nov 26 '09 at 20:31
10
@William -- it's unfortunate that PHP is now getting credit for PCRE
– Thomas Dignan
Feb 11 '13 at 3:10
add a comment |
Use [^A-Za-z0-9]
.
Note: removed the space since that is not typically considered alphanumeric.
10
Neither should the space at the end of the character class.
– Andrew Duffy
Nov 26 '09 at 20:31
6
He's probably used to programming in PHP.
– William
Nov 26 '09 at 20:31
10
@William -- it's unfortunate that PHP is now getting credit for PCRE
– Thomas Dignan
Feb 11 '13 at 3:10
add a comment |
Use [^A-Za-z0-9]
.
Note: removed the space since that is not typically considered alphanumeric.
Use [^A-Za-z0-9]
.
Note: removed the space since that is not typically considered alphanumeric.
edited Sep 18 '17 at 17:14
Dave Jarvis
21k30131256
21k30131256
answered Nov 26 '09 at 20:30
Mirek PlutaMirek Pluta
5,94312619
5,94312619
10
Neither should the space at the end of the character class.
– Andrew Duffy
Nov 26 '09 at 20:31
6
He's probably used to programming in PHP.
– William
Nov 26 '09 at 20:31
10
@William -- it's unfortunate that PHP is now getting credit for PCRE
– Thomas Dignan
Feb 11 '13 at 3:10
add a comment |
10
Neither should the space at the end of the character class.
– Andrew Duffy
Nov 26 '09 at 20:31
6
He's probably used to programming in PHP.
– William
Nov 26 '09 at 20:31
10
@William -- it's unfortunate that PHP is now getting credit for PCRE
– Thomas Dignan
Feb 11 '13 at 3:10
10
10
Neither should the space at the end of the character class.
– Andrew Duffy
Nov 26 '09 at 20:31
Neither should the space at the end of the character class.
– Andrew Duffy
Nov 26 '09 at 20:31
6
6
He's probably used to programming in PHP.
– William
Nov 26 '09 at 20:31
He's probably used to programming in PHP.
– William
Nov 26 '09 at 20:31
10
10
@William -- it's unfortunate that PHP is now getting credit for PCRE
– Thomas Dignan
Feb 11 '13 at 3:10
@William -- it's unfortunate that PHP is now getting credit for PCRE
– Thomas Dignan
Feb 11 '13 at 3:10
add a comment |
Try
return value.replaceAll("[^A-Za-z0-9]", "");
or
return value.replaceAll("[\W]|_", "");
3
With underscores,return value.replaceAll("\W", "");
– erickson
Nov 26 '09 at 20:35
Of course. Compilers are great at spotting that sort of thing.
– Andrew Duffy
Nov 26 '09 at 20:36
1
The second one doesn't answer the question. What about characters like : / etc?
– WW.
Dec 29 '14 at 4:03
add a comment |
Try
return value.replaceAll("[^A-Za-z0-9]", "");
or
return value.replaceAll("[\W]|_", "");
3
With underscores,return value.replaceAll("\W", "");
– erickson
Nov 26 '09 at 20:35
Of course. Compilers are great at spotting that sort of thing.
– Andrew Duffy
Nov 26 '09 at 20:36
1
The second one doesn't answer the question. What about characters like : / etc?
– WW.
Dec 29 '14 at 4:03
add a comment |
Try
return value.replaceAll("[^A-Za-z0-9]", "");
or
return value.replaceAll("[\W]|_", "");
Try
return value.replaceAll("[^A-Za-z0-9]", "");
or
return value.replaceAll("[\W]|_", "");
answered Nov 26 '09 at 20:33
Andrew DuffyAndrew Duffy
5,49211716
5,49211716
3
With underscores,return value.replaceAll("\W", "");
– erickson
Nov 26 '09 at 20:35
Of course. Compilers are great at spotting that sort of thing.
– Andrew Duffy
Nov 26 '09 at 20:36
1
The second one doesn't answer the question. What about characters like : / etc?
– WW.
Dec 29 '14 at 4:03
add a comment |
3
With underscores,return value.replaceAll("\W", "");
– erickson
Nov 26 '09 at 20:35
Of course. Compilers are great at spotting that sort of thing.
– Andrew Duffy
Nov 26 '09 at 20:36
1
The second one doesn't answer the question. What about characters like : / etc?
– WW.
Dec 29 '14 at 4:03
3
3
With underscores,
return value.replaceAll("\W", "");
– erickson
Nov 26 '09 at 20:35
With underscores,
return value.replaceAll("\W", "");
– erickson
Nov 26 '09 at 20:35
Of course. Compilers are great at spotting that sort of thing.
– Andrew Duffy
Nov 26 '09 at 20:36
Of course. Compilers are great at spotting that sort of thing.
– Andrew Duffy
Nov 26 '09 at 20:36
1
1
The second one doesn't answer the question. What about characters like : / etc?
– WW.
Dec 29 '14 at 4:03
The second one doesn't answer the question. What about characters like : / etc?
– WW.
Dec 29 '14 at 4:03
add a comment |
return value.replaceAll("[^A-Za-z0-9 ]", "");
This will leave spaces intact. I assume that's what you want. Otherwise, remove the space from the regex.
add a comment |
return value.replaceAll("[^A-Za-z0-9 ]", "");
This will leave spaces intact. I assume that's what you want. Otherwise, remove the space from the regex.
add a comment |
return value.replaceAll("[^A-Za-z0-9 ]", "");
This will leave spaces intact. I assume that's what you want. Otherwise, remove the space from the regex.
return value.replaceAll("[^A-Za-z0-9 ]", "");
This will leave spaces intact. I assume that's what you want. Otherwise, remove the space from the regex.
answered Nov 26 '09 at 20:31
ericksonerickson
222k42331428
222k42331428
add a comment |
add a comment |
You should be aware that [^a-zA-Z]
will replace characters not being itself in the character range A-Z/a-z. That means special characters like é
, ß
etc. or cyrillic characters and such will be removed.
If the replacement of these characters is not wanted use pre-defined character classes instead:
someString.replaceAll("[^\p{IsAlphabetic}^\p{IsDigit}]", "");
PS: p{Alnum}
does not achieve this effect, it acts the same as [A-Za-z0-9]
.
6
Thanks a lot for this post - it was very useful to me. Additionally, I believe this is the actual answer to the question. The Latin alphabet isn't the only one in the world!
– Mateva
Oct 15 '15 at 7:15
1
Actually, the stated regex will treat "^" as a valid character, since only the first occurrence of "^" is negating the meaning of the selection.[^\p{IsAlphabetic}\p{IsDigit}]
works well.
– Bogdan Klichuk
Jan 19 '18 at 17:22
Only[^\p{Alpha}\p{Digit}]
works for me
– Jakub Turcovsky
Apr 17 '18 at 13:47
1
@JakubTurcovsky docs.oracle.com/javase/10/docs/api/java/util/regex/Pattern.html defines IsAlphabetic and IsDigit as binary properties. Alpha and Digit are POSIX character classes (US-ASCII only). Except the docs.oracle.com/javase/10/docs/api/java/util/regex/… flag is specified.
– Andre Steingress
Apr 17 '18 at 14:39
@AndreSteingress Correct, the reason{IsDigit}
doesn't work for me and{Digit}
does is that I'm trying this on Android. And Android hasUNICODE_CHARACTER_CLASS
turned on by default. Thanks for clearance.
– Jakub Turcovsky
Apr 30 '18 at 11:28
|
show 1 more comment
You should be aware that [^a-zA-Z]
will replace characters not being itself in the character range A-Z/a-z. That means special characters like é
, ß
etc. or cyrillic characters and such will be removed.
If the replacement of these characters is not wanted use pre-defined character classes instead:
someString.replaceAll("[^\p{IsAlphabetic}^\p{IsDigit}]", "");
PS: p{Alnum}
does not achieve this effect, it acts the same as [A-Za-z0-9]
.
6
Thanks a lot for this post - it was very useful to me. Additionally, I believe this is the actual answer to the question. The Latin alphabet isn't the only one in the world!
– Mateva
Oct 15 '15 at 7:15
1
Actually, the stated regex will treat "^" as a valid character, since only the first occurrence of "^" is negating the meaning of the selection.[^\p{IsAlphabetic}\p{IsDigit}]
works well.
– Bogdan Klichuk
Jan 19 '18 at 17:22
Only[^\p{Alpha}\p{Digit}]
works for me
– Jakub Turcovsky
Apr 17 '18 at 13:47
1
@JakubTurcovsky docs.oracle.com/javase/10/docs/api/java/util/regex/Pattern.html defines IsAlphabetic and IsDigit as binary properties. Alpha and Digit are POSIX character classes (US-ASCII only). Except the docs.oracle.com/javase/10/docs/api/java/util/regex/… flag is specified.
– Andre Steingress
Apr 17 '18 at 14:39
@AndreSteingress Correct, the reason{IsDigit}
doesn't work for me and{Digit}
does is that I'm trying this on Android. And Android hasUNICODE_CHARACTER_CLASS
turned on by default. Thanks for clearance.
– Jakub Turcovsky
Apr 30 '18 at 11:28
|
show 1 more comment
You should be aware that [^a-zA-Z]
will replace characters not being itself in the character range A-Z/a-z. That means special characters like é
, ß
etc. or cyrillic characters and such will be removed.
If the replacement of these characters is not wanted use pre-defined character classes instead:
someString.replaceAll("[^\p{IsAlphabetic}^\p{IsDigit}]", "");
PS: p{Alnum}
does not achieve this effect, it acts the same as [A-Za-z0-9]
.
You should be aware that [^a-zA-Z]
will replace characters not being itself in the character range A-Z/a-z. That means special characters like é
, ß
etc. or cyrillic characters and such will be removed.
If the replacement of these characters is not wanted use pre-defined character classes instead:
someString.replaceAll("[^\p{IsAlphabetic}^\p{IsDigit}]", "");
PS: p{Alnum}
does not achieve this effect, it acts the same as [A-Za-z0-9]
.
edited Sep 18 '17 at 10:26
answered Sep 17 '15 at 10:25
Andre SteingressAndre Steingress
3,3872022
3,3872022
6
Thanks a lot for this post - it was very useful to me. Additionally, I believe this is the actual answer to the question. The Latin alphabet isn't the only one in the world!
– Mateva
Oct 15 '15 at 7:15
1
Actually, the stated regex will treat "^" as a valid character, since only the first occurrence of "^" is negating the meaning of the selection.[^\p{IsAlphabetic}\p{IsDigit}]
works well.
– Bogdan Klichuk
Jan 19 '18 at 17:22
Only[^\p{Alpha}\p{Digit}]
works for me
– Jakub Turcovsky
Apr 17 '18 at 13:47
1
@JakubTurcovsky docs.oracle.com/javase/10/docs/api/java/util/regex/Pattern.html defines IsAlphabetic and IsDigit as binary properties. Alpha and Digit are POSIX character classes (US-ASCII only). Except the docs.oracle.com/javase/10/docs/api/java/util/regex/… flag is specified.
– Andre Steingress
Apr 17 '18 at 14:39
@AndreSteingress Correct, the reason{IsDigit}
doesn't work for me and{Digit}
does is that I'm trying this on Android. And Android hasUNICODE_CHARACTER_CLASS
turned on by default. Thanks for clearance.
– Jakub Turcovsky
Apr 30 '18 at 11:28
|
show 1 more comment
6
Thanks a lot for this post - it was very useful to me. Additionally, I believe this is the actual answer to the question. The Latin alphabet isn't the only one in the world!
– Mateva
Oct 15 '15 at 7:15
1
Actually, the stated regex will treat "^" as a valid character, since only the first occurrence of "^" is negating the meaning of the selection.[^\p{IsAlphabetic}\p{IsDigit}]
works well.
– Bogdan Klichuk
Jan 19 '18 at 17:22
Only[^\p{Alpha}\p{Digit}]
works for me
– Jakub Turcovsky
Apr 17 '18 at 13:47
1
@JakubTurcovsky docs.oracle.com/javase/10/docs/api/java/util/regex/Pattern.html defines IsAlphabetic and IsDigit as binary properties. Alpha and Digit are POSIX character classes (US-ASCII only). Except the docs.oracle.com/javase/10/docs/api/java/util/regex/… flag is specified.
– Andre Steingress
Apr 17 '18 at 14:39
@AndreSteingress Correct, the reason{IsDigit}
doesn't work for me and{Digit}
does is that I'm trying this on Android. And Android hasUNICODE_CHARACTER_CLASS
turned on by default. Thanks for clearance.
– Jakub Turcovsky
Apr 30 '18 at 11:28
6
6
Thanks a lot for this post - it was very useful to me. Additionally, I believe this is the actual answer to the question. The Latin alphabet isn't the only one in the world!
– Mateva
Oct 15 '15 at 7:15
Thanks a lot for this post - it was very useful to me. Additionally, I believe this is the actual answer to the question. The Latin alphabet isn't the only one in the world!
– Mateva
Oct 15 '15 at 7:15
1
1
Actually, the stated regex will treat "^" as a valid character, since only the first occurrence of "^" is negating the meaning of the selection.
[^\p{IsAlphabetic}\p{IsDigit}]
works well.– Bogdan Klichuk
Jan 19 '18 at 17:22
Actually, the stated regex will treat "^" as a valid character, since only the first occurrence of "^" is negating the meaning of the selection.
[^\p{IsAlphabetic}\p{IsDigit}]
works well.– Bogdan Klichuk
Jan 19 '18 at 17:22
Only
[^\p{Alpha}\p{Digit}]
works for me– Jakub Turcovsky
Apr 17 '18 at 13:47
Only
[^\p{Alpha}\p{Digit}]
works for me– Jakub Turcovsky
Apr 17 '18 at 13:47
1
1
@JakubTurcovsky docs.oracle.com/javase/10/docs/api/java/util/regex/Pattern.html defines IsAlphabetic and IsDigit as binary properties. Alpha and Digit are POSIX character classes (US-ASCII only). Except the docs.oracle.com/javase/10/docs/api/java/util/regex/… flag is specified.
– Andre Steingress
Apr 17 '18 at 14:39
@JakubTurcovsky docs.oracle.com/javase/10/docs/api/java/util/regex/Pattern.html defines IsAlphabetic and IsDigit as binary properties. Alpha and Digit are POSIX character classes (US-ASCII only). Except the docs.oracle.com/javase/10/docs/api/java/util/regex/… flag is specified.
– Andre Steingress
Apr 17 '18 at 14:39
@AndreSteingress Correct, the reason
{IsDigit}
doesn't work for me and {Digit}
does is that I'm trying this on Android. And Android has UNICODE_CHARACTER_CLASS
turned on by default. Thanks for clearance.– Jakub Turcovsky
Apr 30 '18 at 11:28
@AndreSteingress Correct, the reason
{IsDigit}
doesn't work for me and {Digit}
does is that I'm trying this on Android. And Android has UNICODE_CHARACTER_CLASS
turned on by default. Thanks for clearance.– Jakub Turcovsky
Apr 30 '18 at 11:28
|
show 1 more comment
You could also try this simpler regex:
str = str.replaceAll("\P{Alnum}", "");
2
Or, preserving whitespace:str.replaceAll("[^\p{Alnum}\s]", "")
– Jonik
Dec 29 '15 at 10:28
Or\p{Alnum}\p{Space}
.
– membersound
Dec 15 '16 at 11:22
add a comment |
You could also try this simpler regex:
str = str.replaceAll("\P{Alnum}", "");
2
Or, preserving whitespace:str.replaceAll("[^\p{Alnum}\s]", "")
– Jonik
Dec 29 '15 at 10:28
Or\p{Alnum}\p{Space}
.
– membersound
Dec 15 '16 at 11:22
add a comment |
You could also try this simpler regex:
str = str.replaceAll("\P{Alnum}", "");
You could also try this simpler regex:
str = str.replaceAll("\P{Alnum}", "");
edited May 20 '14 at 3:14
nhinkle
87911831
87911831
answered Aug 6 '13 at 12:17


sauravsaurav
2,6761630
2,6761630
2
Or, preserving whitespace:str.replaceAll("[^\p{Alnum}\s]", "")
– Jonik
Dec 29 '15 at 10:28
Or\p{Alnum}\p{Space}
.
– membersound
Dec 15 '16 at 11:22
add a comment |
2
Or, preserving whitespace:str.replaceAll("[^\p{Alnum}\s]", "")
– Jonik
Dec 29 '15 at 10:28
Or\p{Alnum}\p{Space}
.
– membersound
Dec 15 '16 at 11:22
2
2
Or, preserving whitespace:
str.replaceAll("[^\p{Alnum}\s]", "")
– Jonik
Dec 29 '15 at 10:28
Or, preserving whitespace:
str.replaceAll("[^\p{Alnum}\s]", "")
– Jonik
Dec 29 '15 at 10:28
Or
\p{Alnum}\p{Space}
.– membersound
Dec 15 '16 at 11:22
Or
\p{Alnum}\p{Space}
.– membersound
Dec 15 '16 at 11:22
add a comment |
Java's regular expressions don't require you to put a forward-slash (/
) or any other delimiter around the regex, as opposed to other languages like Perl, for example.
add a comment |
Java's regular expressions don't require you to put a forward-slash (/
) or any other delimiter around the regex, as opposed to other languages like Perl, for example.
add a comment |
Java's regular expressions don't require you to put a forward-slash (/
) or any other delimiter around the regex, as opposed to other languages like Perl, for example.
Java's regular expressions don't require you to put a forward-slash (/
) or any other delimiter around the regex, as opposed to other languages like Perl, for example.
answered Nov 26 '09 at 20:39
abyxabyx
43.1k1578109
43.1k1578109
add a comment |
add a comment |
I made this method for creating filenames:
public static String safeChar(String input)
{
char allowed = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-_".toCharArray();
char charArray = input.toString().toCharArray();
StringBuilder result = new StringBuilder();
for (char c : charArray)
{
for (char a : allowed)
{
if(c==a) result.append(a);
}
}
return result.toString();
}
4
This is pretty brute-force. Regex is the way to go with the OP's situation.
– Michael Peterson
Mar 20 '12 at 0:28
1
You're right, regex is better. But at the time, regex and me I didn't come along well.
– zneo
Apr 12 '12 at 19:10
Hah, does anyone really get along that well with regex? ;)
– Michael Peterson
Apr 12 '12 at 22:46
You're so right! After it's written, it sort of turn into machine language..
– zneo
Apr 14 '12 at 18:04
add a comment |
I made this method for creating filenames:
public static String safeChar(String input)
{
char allowed = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-_".toCharArray();
char charArray = input.toString().toCharArray();
StringBuilder result = new StringBuilder();
for (char c : charArray)
{
for (char a : allowed)
{
if(c==a) result.append(a);
}
}
return result.toString();
}
4
This is pretty brute-force. Regex is the way to go with the OP's situation.
– Michael Peterson
Mar 20 '12 at 0:28
1
You're right, regex is better. But at the time, regex and me I didn't come along well.
– zneo
Apr 12 '12 at 19:10
Hah, does anyone really get along that well with regex? ;)
– Michael Peterson
Apr 12 '12 at 22:46
You're so right! After it's written, it sort of turn into machine language..
– zneo
Apr 14 '12 at 18:04
add a comment |
I made this method for creating filenames:
public static String safeChar(String input)
{
char allowed = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-_".toCharArray();
char charArray = input.toString().toCharArray();
StringBuilder result = new StringBuilder();
for (char c : charArray)
{
for (char a : allowed)
{
if(c==a) result.append(a);
}
}
return result.toString();
}
I made this method for creating filenames:
public static String safeChar(String input)
{
char allowed = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-_".toCharArray();
char charArray = input.toString().toCharArray();
StringBuilder result = new StringBuilder();
for (char c : charArray)
{
for (char a : allowed)
{
if(c==a) result.append(a);
}
}
return result.toString();
}
answered Nov 27 '09 at 2:08
zneozneo
488310
488310
4
This is pretty brute-force. Regex is the way to go with the OP's situation.
– Michael Peterson
Mar 20 '12 at 0:28
1
You're right, regex is better. But at the time, regex and me I didn't come along well.
– zneo
Apr 12 '12 at 19:10
Hah, does anyone really get along that well with regex? ;)
– Michael Peterson
Apr 12 '12 at 22:46
You're so right! After it's written, it sort of turn into machine language..
– zneo
Apr 14 '12 at 18:04
add a comment |
4
This is pretty brute-force. Regex is the way to go with the OP's situation.
– Michael Peterson
Mar 20 '12 at 0:28
1
You're right, regex is better. But at the time, regex and me I didn't come along well.
– zneo
Apr 12 '12 at 19:10
Hah, does anyone really get along that well with regex? ;)
– Michael Peterson
Apr 12 '12 at 22:46
You're so right! After it's written, it sort of turn into machine language..
– zneo
Apr 14 '12 at 18:04
4
4
This is pretty brute-force. Regex is the way to go with the OP's situation.
– Michael Peterson
Mar 20 '12 at 0:28
This is pretty brute-force. Regex is the way to go with the OP's situation.
– Michael Peterson
Mar 20 '12 at 0:28
1
1
You're right, regex is better. But at the time, regex and me I didn't come along well.
– zneo
Apr 12 '12 at 19:10
You're right, regex is better. But at the time, regex and me I didn't come along well.
– zneo
Apr 12 '12 at 19:10
Hah, does anyone really get along that well with regex? ;)
– Michael Peterson
Apr 12 '12 at 22:46
Hah, does anyone really get along that well with regex? ;)
– Michael Peterson
Apr 12 '12 at 22:46
You're so right! After it's written, it sort of turn into machine language..
– zneo
Apr 14 '12 at 18:04
You're so right! After it's written, it sort of turn into machine language..
– zneo
Apr 14 '12 at 18:04
add a comment |
Simple method:
public boolean isBlank(String value) {
return (value == null || value.equals("") || value.equals("null") || value.trim().equals(""));
}
public String normalizeOnlyLettersNumbers(String str) {
if (!isBlank(str)) {
return str.replaceAll("[^\p{L}\p{Nd}]+", "");
} else {
return "";
}
}
add a comment |
Simple method:
public boolean isBlank(String value) {
return (value == null || value.equals("") || value.equals("null") || value.trim().equals(""));
}
public String normalizeOnlyLettersNumbers(String str) {
if (!isBlank(str)) {
return str.replaceAll("[^\p{L}\p{Nd}]+", "");
} else {
return "";
}
}
add a comment |
Simple method:
public boolean isBlank(String value) {
return (value == null || value.equals("") || value.equals("null") || value.trim().equals(""));
}
public String normalizeOnlyLettersNumbers(String str) {
if (!isBlank(str)) {
return str.replaceAll("[^\p{L}\p{Nd}]+", "");
} else {
return "";
}
}
Simple method:
public boolean isBlank(String value) {
return (value == null || value.equals("") || value.equals("null") || value.trim().equals(""));
}
public String normalizeOnlyLettersNumbers(String str) {
if (!isBlank(str)) {
return str.replaceAll("[^\p{L}\p{Nd}]+", "");
} else {
return "";
}
}
answered Nov 1 '16 at 19:36


Alberto CerqueiraAlberto Cerqueira
639912
639912
add a comment |
add a comment |
public static void main(String args) {
String value = " Chlamydia_spp. IgG, IgM & IgA Abs (8006) ";
System.out.println(value.replaceAll("[^A-Za-z0-9]", ""));
}
output: ChlamydiasppIgGIgMIgAAbs8006
Github: https://github.com/AlbinViju/Learning/blob/master/StripNonAlphaNumericFromString.java
add a comment |
public static void main(String args) {
String value = " Chlamydia_spp. IgG, IgM & IgA Abs (8006) ";
System.out.println(value.replaceAll("[^A-Za-z0-9]", ""));
}
output: ChlamydiasppIgGIgMIgAAbs8006
Github: https://github.com/AlbinViju/Learning/blob/master/StripNonAlphaNumericFromString.java
add a comment |
public static void main(String args) {
String value = " Chlamydia_spp. IgG, IgM & IgA Abs (8006) ";
System.out.println(value.replaceAll("[^A-Za-z0-9]", ""));
}
output: ChlamydiasppIgGIgMIgAAbs8006
Github: https://github.com/AlbinViju/Learning/blob/master/StripNonAlphaNumericFromString.java
public static void main(String args) {
String value = " Chlamydia_spp. IgG, IgM & IgA Abs (8006) ";
System.out.println(value.replaceAll("[^A-Za-z0-9]", ""));
}
output: ChlamydiasppIgGIgMIgAAbs8006
Github: https://github.com/AlbinViju/Learning/blob/master/StripNonAlphaNumericFromString.java
edited Aug 23 '17 at 15:46


Jason Roman
6,304102530
6,304102530
answered Aug 23 '17 at 15:21
AlbinAlbin
114
114
add a comment |
add a comment |
If you want to also allow alphanumeric characters which don't belong to the ascii characters set, like for instance german umlaut's, you can consider using the following solution:
String value = "your value";
// this could be placed as a static final constant, so the compiling is only done once
Pattern pattern = Pattern.compile("[^\w]", Pattern.UNICODE_CHARACTER_CLASS);
value = pattern.matcher(value).replaceAll("");
Please note that the usage of the UNICODE_CHARACTER_CLASS flag could have an impose on performance penalty (see javadoc of this flag)
add a comment |
If you want to also allow alphanumeric characters which don't belong to the ascii characters set, like for instance german umlaut's, you can consider using the following solution:
String value = "your value";
// this could be placed as a static final constant, so the compiling is only done once
Pattern pattern = Pattern.compile("[^\w]", Pattern.UNICODE_CHARACTER_CLASS);
value = pattern.matcher(value).replaceAll("");
Please note that the usage of the UNICODE_CHARACTER_CLASS flag could have an impose on performance penalty (see javadoc of this flag)
add a comment |
If you want to also allow alphanumeric characters which don't belong to the ascii characters set, like for instance german umlaut's, you can consider using the following solution:
String value = "your value";
// this could be placed as a static final constant, so the compiling is only done once
Pattern pattern = Pattern.compile("[^\w]", Pattern.UNICODE_CHARACTER_CLASS);
value = pattern.matcher(value).replaceAll("");
Please note that the usage of the UNICODE_CHARACTER_CLASS flag could have an impose on performance penalty (see javadoc of this flag)
If you want to also allow alphanumeric characters which don't belong to the ascii characters set, like for instance german umlaut's, you can consider using the following solution:
String value = "your value";
// this could be placed as a static final constant, so the compiling is only done once
Pattern pattern = Pattern.compile("[^\w]", Pattern.UNICODE_CHARACTER_CLASS);
value = pattern.matcher(value).replaceAll("");
Please note that the usage of the UNICODE_CHARACTER_CLASS flag could have an impose on performance penalty (see javadoc of this flag)
answered May 24 '18 at 10:18
snapsnap
6411711
6411711
add a comment |
add a comment |
Solution:
value.replaceAll("[^A-Za-z0-9]", "")
Explanation:
[^abc]
When a caret^
appears as the first character inside square brackets, it negates the pattern. This pattern matches any character except a or b or c.
Looking at the keyword as two function:
[(Pattern)] = match(Pattern)
[^(Pattern)] = notMatch(Pattern)
Moreover regarding a pattern:
A-Z = all characters included from A to Z
a-z = all characters included from a to z
0=0 = all characters included from 0 to 9
Therefore it will substitute all the char NOT included in the pattern
add a comment |
Solution:
value.replaceAll("[^A-Za-z0-9]", "")
Explanation:
[^abc]
When a caret^
appears as the first character inside square brackets, it negates the pattern. This pattern matches any character except a or b or c.
Looking at the keyword as two function:
[(Pattern)] = match(Pattern)
[^(Pattern)] = notMatch(Pattern)
Moreover regarding a pattern:
A-Z = all characters included from A to Z
a-z = all characters included from a to z
0=0 = all characters included from 0 to 9
Therefore it will substitute all the char NOT included in the pattern
add a comment |
Solution:
value.replaceAll("[^A-Za-z0-9]", "")
Explanation:
[^abc]
When a caret^
appears as the first character inside square brackets, it negates the pattern. This pattern matches any character except a or b or c.
Looking at the keyword as two function:
[(Pattern)] = match(Pattern)
[^(Pattern)] = notMatch(Pattern)
Moreover regarding a pattern:
A-Z = all characters included from A to Z
a-z = all characters included from a to z
0=0 = all characters included from 0 to 9
Therefore it will substitute all the char NOT included in the pattern
Solution:
value.replaceAll("[^A-Za-z0-9]", "")
Explanation:
[^abc]
When a caret^
appears as the first character inside square brackets, it negates the pattern. This pattern matches any character except a or b or c.
Looking at the keyword as two function:
[(Pattern)] = match(Pattern)
[^(Pattern)] = notMatch(Pattern)
Moreover regarding a pattern:
A-Z = all characters included from A to Z
a-z = all characters included from a to z
0=0 = all characters included from 0 to 9
Therefore it will substitute all the char NOT included in the pattern
answered Nov 21 '18 at 12:07


GalloCedroneGalloCedrone
2,5752928
2,5752928
add a comment |
add a comment |
Using Guava you can easily combine different type of criteria. For your specific solution you can use:
value = CharMatcher.inRange('0', '9')
.or(CharMatcher.inRange('a', 'z')
.or(CharMatcher.inRange('A', 'Z'))).retainFrom(value)
add a comment |
Using Guava you can easily combine different type of criteria. For your specific solution you can use:
value = CharMatcher.inRange('0', '9')
.or(CharMatcher.inRange('a', 'z')
.or(CharMatcher.inRange('A', 'Z'))).retainFrom(value)
add a comment |
Using Guava you can easily combine different type of criteria. For your specific solution you can use:
value = CharMatcher.inRange('0', '9')
.or(CharMatcher.inRange('a', 'z')
.or(CharMatcher.inRange('A', 'Z'))).retainFrom(value)
Using Guava you can easily combine different type of criteria. For your specific solution you can use:
value = CharMatcher.inRange('0', '9')
.or(CharMatcher.inRange('a', 'z')
.or(CharMatcher.inRange('A', 'Z'))).retainFrom(value)
edited Oct 4 '18 at 8:40
answered Oct 4 '18 at 7:45


Debmalya BiswasDebmalya Biswas
2,0041726
2,0041726
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f1805518%2freplacing-all-non-alphanumeric-characters-with-empty-strings%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
27
Guys, you forget there are alphabets other than the Latin one.
– Mateva
Oct 14 '15 at 16:48