Python RegEx - Negative Lookahead not working after a ? quantifier
I'm new to regex, and I'm wanting to find all instances of "po" and it's variants (i.e. "p.o. | p. o. | p o") that ISN'T followed by "box" because I'm interested in purchase orders and not PO boxes. The code below isn't working and just matches the po even when it's followed by a "box." Any ideas?
string = " po pobox po box po box p.o. p.o.box p.o. box p.o. box"
re.findall(r' p.?s?o.?(?!s*box)', string)
//expected output
[' po', ' p.o.']
//actual output
[' po', ' p.o.', ' p.o', ' p.o', ' p.o']
python regex
add a comment |
I'm new to regex, and I'm wanting to find all instances of "po" and it's variants (i.e. "p.o. | p. o. | p o") that ISN'T followed by "box" because I'm interested in purchase orders and not PO boxes. The code below isn't working and just matches the po even when it's followed by a "box." Any ideas?
string = " po pobox po box po box p.o. p.o.box p.o. box p.o. box"
re.findall(r' p.?s?o.?(?!s*box)', string)
//expected output
[' po', ' p.o.']
//actual output
[' po', ' p.o.', ' p.o', ' p.o', ' p.o']
python regex
1
Thep.o
inp.o.box
is not followed bybox
. It's followed by.
.
– user2357112
Nov 20 '18 at 22:36
Are you referring to the expected output part? It was suppose to read “p.o.” instead of “p.o” so box WOULD follow “p.o.”
– Will Blanton
Nov 21 '18 at 2:40
add a comment |
I'm new to regex, and I'm wanting to find all instances of "po" and it's variants (i.e. "p.o. | p. o. | p o") that ISN'T followed by "box" because I'm interested in purchase orders and not PO boxes. The code below isn't working and just matches the po even when it's followed by a "box." Any ideas?
string = " po pobox po box po box p.o. p.o.box p.o. box p.o. box"
re.findall(r' p.?s?o.?(?!s*box)', string)
//expected output
[' po', ' p.o.']
//actual output
[' po', ' p.o.', ' p.o', ' p.o', ' p.o']
python regex
I'm new to regex, and I'm wanting to find all instances of "po" and it's variants (i.e. "p.o. | p. o. | p o") that ISN'T followed by "box" because I'm interested in purchase orders and not PO boxes. The code below isn't working and just matches the po even when it's followed by a "box." Any ideas?
string = " po pobox po box po box p.o. p.o.box p.o. box p.o. box"
re.findall(r' p.?s?o.?(?!s*box)', string)
//expected output
[' po', ' p.o.']
//actual output
[' po', ' p.o.', ' p.o', ' p.o', ' p.o']
python regex
python regex
edited Nov 21 '18 at 2:38
Will Blanton
asked Nov 20 '18 at 22:31
Will BlantonWill Blanton
234
234
1
Thep.o
inp.o.box
is not followed bybox
. It's followed by.
.
– user2357112
Nov 20 '18 at 22:36
Are you referring to the expected output part? It was suppose to read “p.o.” instead of “p.o” so box WOULD follow “p.o.”
– Will Blanton
Nov 21 '18 at 2:40
add a comment |
1
Thep.o
inp.o.box
is not followed bybox
. It's followed by.
.
– user2357112
Nov 20 '18 at 22:36
Are you referring to the expected output part? It was suppose to read “p.o.” instead of “p.o” so box WOULD follow “p.o.”
– Will Blanton
Nov 21 '18 at 2:40
1
1
The
p.o
in p.o.box
is not followed by box
. It's followed by .
.– user2357112
Nov 20 '18 at 22:36
The
p.o
in p.o.box
is not followed by box
. It's followed by .
.– user2357112
Nov 20 '18 at 22:36
Are you referring to the expected output part? It was suppose to read “p.o.” instead of “p.o” so box WOULD follow “p.o.”
– Will Blanton
Nov 21 '18 at 2:40
Are you referring to the expected output part? It was suppose to read “p.o.” instead of “p.o” so box WOULD follow “p.o.”
– Will Blanton
Nov 21 '18 at 2:40
add a comment |
1 Answer
1
active
oldest
votes
You placed the lookahead after an optional pattern and backtracking makes it possible to match the string in another way.
If Python supported possessive quantifiers, it would be easy to solve by adding +
after the .?
that is before the lookahead: p.?s?o.?+(?!s*box)
. It would prevent the engine from backtracking into .?
pattern.
However, since Python re
does not support them, you need to move the lookahead right after the o
, obligatory part, and add .?
to the lookahead:
r'p.?s?o(?!.?s*box).?'
^^^^^^^^^^^^^
See the regex demo. Add b
after box
if you plan to match it as a whole word. Same with the first p
, you may want to add a b
before it to match p
as a whole word.
Details
p
- ap
.?
- an optional (1 or 0) dots
s?
- an optional (1 or 0) whitespaces
o
- ano
(?!.?s*box)
- a negative lookahead that fails the match if, immediately to the right of the current location there is an optional dot, 0+ whitespaces andbox
.?
- an optional (1 or 0) dots
That’s exactly what the issue was! I thought my overalll expression was right, it was just that backtracking issue causing the false positives. But now, the last period is chopped off. Is there a way to get it to still show if it’s there?
– Will Blanton
Nov 21 '18 at 2:37
1
Nevermind! I missed the “.?” at the end. That took care of that! Thanks again!
– Will Blanton
Nov 21 '18 at 2:56
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53402567%2fpython-regex-negative-lookahead-not-working-after-a-quantifier%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You placed the lookahead after an optional pattern and backtracking makes it possible to match the string in another way.
If Python supported possessive quantifiers, it would be easy to solve by adding +
after the .?
that is before the lookahead: p.?s?o.?+(?!s*box)
. It would prevent the engine from backtracking into .?
pattern.
However, since Python re
does not support them, you need to move the lookahead right after the o
, obligatory part, and add .?
to the lookahead:
r'p.?s?o(?!.?s*box).?'
^^^^^^^^^^^^^
See the regex demo. Add b
after box
if you plan to match it as a whole word. Same with the first p
, you may want to add a b
before it to match p
as a whole word.
Details
p
- ap
.?
- an optional (1 or 0) dots
s?
- an optional (1 or 0) whitespaces
o
- ano
(?!.?s*box)
- a negative lookahead that fails the match if, immediately to the right of the current location there is an optional dot, 0+ whitespaces andbox
.?
- an optional (1 or 0) dots
That’s exactly what the issue was! I thought my overalll expression was right, it was just that backtracking issue causing the false positives. But now, the last period is chopped off. Is there a way to get it to still show if it’s there?
– Will Blanton
Nov 21 '18 at 2:37
1
Nevermind! I missed the “.?” at the end. That took care of that! Thanks again!
– Will Blanton
Nov 21 '18 at 2:56
add a comment |
You placed the lookahead after an optional pattern and backtracking makes it possible to match the string in another way.
If Python supported possessive quantifiers, it would be easy to solve by adding +
after the .?
that is before the lookahead: p.?s?o.?+(?!s*box)
. It would prevent the engine from backtracking into .?
pattern.
However, since Python re
does not support them, you need to move the lookahead right after the o
, obligatory part, and add .?
to the lookahead:
r'p.?s?o(?!.?s*box).?'
^^^^^^^^^^^^^
See the regex demo. Add b
after box
if you plan to match it as a whole word. Same with the first p
, you may want to add a b
before it to match p
as a whole word.
Details
p
- ap
.?
- an optional (1 or 0) dots
s?
- an optional (1 or 0) whitespaces
o
- ano
(?!.?s*box)
- a negative lookahead that fails the match if, immediately to the right of the current location there is an optional dot, 0+ whitespaces andbox
.?
- an optional (1 or 0) dots
That’s exactly what the issue was! I thought my overalll expression was right, it was just that backtracking issue causing the false positives. But now, the last period is chopped off. Is there a way to get it to still show if it’s there?
– Will Blanton
Nov 21 '18 at 2:37
1
Nevermind! I missed the “.?” at the end. That took care of that! Thanks again!
– Will Blanton
Nov 21 '18 at 2:56
add a comment |
You placed the lookahead after an optional pattern and backtracking makes it possible to match the string in another way.
If Python supported possessive quantifiers, it would be easy to solve by adding +
after the .?
that is before the lookahead: p.?s?o.?+(?!s*box)
. It would prevent the engine from backtracking into .?
pattern.
However, since Python re
does not support them, you need to move the lookahead right after the o
, obligatory part, and add .?
to the lookahead:
r'p.?s?o(?!.?s*box).?'
^^^^^^^^^^^^^
See the regex demo. Add b
after box
if you plan to match it as a whole word. Same with the first p
, you may want to add a b
before it to match p
as a whole word.
Details
p
- ap
.?
- an optional (1 or 0) dots
s?
- an optional (1 or 0) whitespaces
o
- ano
(?!.?s*box)
- a negative lookahead that fails the match if, immediately to the right of the current location there is an optional dot, 0+ whitespaces andbox
.?
- an optional (1 or 0) dots
You placed the lookahead after an optional pattern and backtracking makes it possible to match the string in another way.
If Python supported possessive quantifiers, it would be easy to solve by adding +
after the .?
that is before the lookahead: p.?s?o.?+(?!s*box)
. It would prevent the engine from backtracking into .?
pattern.
However, since Python re
does not support them, you need to move the lookahead right after the o
, obligatory part, and add .?
to the lookahead:
r'p.?s?o(?!.?s*box).?'
^^^^^^^^^^^^^
See the regex demo. Add b
after box
if you plan to match it as a whole word. Same with the first p
, you may want to add a b
before it to match p
as a whole word.
Details
p
- ap
.?
- an optional (1 or 0) dots
s?
- an optional (1 or 0) whitespaces
o
- ano
(?!.?s*box)
- a negative lookahead that fails the match if, immediately to the right of the current location there is an optional dot, 0+ whitespaces andbox
.?
- an optional (1 or 0) dots
edited Nov 20 '18 at 22:39
answered Nov 20 '18 at 22:34
Wiktor StribiżewWiktor Stribiżew
313k16133210
313k16133210
That’s exactly what the issue was! I thought my overalll expression was right, it was just that backtracking issue causing the false positives. But now, the last period is chopped off. Is there a way to get it to still show if it’s there?
– Will Blanton
Nov 21 '18 at 2:37
1
Nevermind! I missed the “.?” at the end. That took care of that! Thanks again!
– Will Blanton
Nov 21 '18 at 2:56
add a comment |
That’s exactly what the issue was! I thought my overalll expression was right, it was just that backtracking issue causing the false positives. But now, the last period is chopped off. Is there a way to get it to still show if it’s there?
– Will Blanton
Nov 21 '18 at 2:37
1
Nevermind! I missed the “.?” at the end. That took care of that! Thanks again!
– Will Blanton
Nov 21 '18 at 2:56
That’s exactly what the issue was! I thought my overalll expression was right, it was just that backtracking issue causing the false positives. But now, the last period is chopped off. Is there a way to get it to still show if it’s there?
– Will Blanton
Nov 21 '18 at 2:37
That’s exactly what the issue was! I thought my overalll expression was right, it was just that backtracking issue causing the false positives. But now, the last period is chopped off. Is there a way to get it to still show if it’s there?
– Will Blanton
Nov 21 '18 at 2:37
1
1
Nevermind! I missed the “.?” at the end. That took care of that! Thanks again!
– Will Blanton
Nov 21 '18 at 2:56
Nevermind! I missed the “.?” at the end. That took care of that! Thanks again!
– Will Blanton
Nov 21 '18 at 2:56
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53402567%2fpython-regex-negative-lookahead-not-working-after-a-quantifier%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
The
p.o
inp.o.box
is not followed bybox
. It's followed by.
.– user2357112
Nov 20 '18 at 22:36
Are you referring to the expected output part? It was suppose to read “p.o.” instead of “p.o” so box WOULD follow “p.o.”
– Will Blanton
Nov 21 '18 at 2:40