Python RegEx - Negative Lookahead not working after a ? quantifier












4















I'm new to regex, and I'm wanting to find all instances of "po" and it's variants (i.e. "p.o. | p. o. | p o") that ISN'T followed by "box" because I'm interested in purchase orders and not PO boxes. The code below isn't working and just matches the po even when it's followed by a "box." Any ideas?



string = " po  pobox  po box  po  box    p.o.  p.o.box  p.o. box  p.o.  box"

re.findall(r' p.?s?o.?(?!s*box)', string)

//expected output
[' po', ' p.o.']

//actual output
[' po', ' p.o.', ' p.o', ' p.o', ' p.o']









share|improve this question




















  • 1





    The p.o in p.o.box is not followed by box. It's followed by ..

    – user2357112
    Nov 20 '18 at 22:36











  • Are you referring to the expected output part? It was suppose to read “p.o.” instead of “p.o” so box WOULD follow “p.o.”

    – Will Blanton
    Nov 21 '18 at 2:40
















4















I'm new to regex, and I'm wanting to find all instances of "po" and it's variants (i.e. "p.o. | p. o. | p o") that ISN'T followed by "box" because I'm interested in purchase orders and not PO boxes. The code below isn't working and just matches the po even when it's followed by a "box." Any ideas?



string = " po  pobox  po box  po  box    p.o.  p.o.box  p.o. box  p.o.  box"

re.findall(r' p.?s?o.?(?!s*box)', string)

//expected output
[' po', ' p.o.']

//actual output
[' po', ' p.o.', ' p.o', ' p.o', ' p.o']









share|improve this question




















  • 1





    The p.o in p.o.box is not followed by box. It's followed by ..

    – user2357112
    Nov 20 '18 at 22:36











  • Are you referring to the expected output part? It was suppose to read “p.o.” instead of “p.o” so box WOULD follow “p.o.”

    – Will Blanton
    Nov 21 '18 at 2:40














4












4








4


1






I'm new to regex, and I'm wanting to find all instances of "po" and it's variants (i.e. "p.o. | p. o. | p o") that ISN'T followed by "box" because I'm interested in purchase orders and not PO boxes. The code below isn't working and just matches the po even when it's followed by a "box." Any ideas?



string = " po  pobox  po box  po  box    p.o.  p.o.box  p.o. box  p.o.  box"

re.findall(r' p.?s?o.?(?!s*box)', string)

//expected output
[' po', ' p.o.']

//actual output
[' po', ' p.o.', ' p.o', ' p.o', ' p.o']









share|improve this question
















I'm new to regex, and I'm wanting to find all instances of "po" and it's variants (i.e. "p.o. | p. o. | p o") that ISN'T followed by "box" because I'm interested in purchase orders and not PO boxes. The code below isn't working and just matches the po even when it's followed by a "box." Any ideas?



string = " po  pobox  po box  po  box    p.o.  p.o.box  p.o. box  p.o.  box"

re.findall(r' p.?s?o.?(?!s*box)', string)

//expected output
[' po', ' p.o.']

//actual output
[' po', ' p.o.', ' p.o', ' p.o', ' p.o']






python regex






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 21 '18 at 2:38







Will Blanton

















asked Nov 20 '18 at 22:31









Will BlantonWill Blanton

234




234








  • 1





    The p.o in p.o.box is not followed by box. It's followed by ..

    – user2357112
    Nov 20 '18 at 22:36











  • Are you referring to the expected output part? It was suppose to read “p.o.” instead of “p.o” so box WOULD follow “p.o.”

    – Will Blanton
    Nov 21 '18 at 2:40














  • 1





    The p.o in p.o.box is not followed by box. It's followed by ..

    – user2357112
    Nov 20 '18 at 22:36











  • Are you referring to the expected output part? It was suppose to read “p.o.” instead of “p.o” so box WOULD follow “p.o.”

    – Will Blanton
    Nov 21 '18 at 2:40








1




1





The p.o in p.o.box is not followed by box. It's followed by ..

– user2357112
Nov 20 '18 at 22:36





The p.o in p.o.box is not followed by box. It's followed by ..

– user2357112
Nov 20 '18 at 22:36













Are you referring to the expected output part? It was suppose to read “p.o.” instead of “p.o” so box WOULD follow “p.o.”

– Will Blanton
Nov 21 '18 at 2:40





Are you referring to the expected output part? It was suppose to read “p.o.” instead of “p.o” so box WOULD follow “p.o.”

– Will Blanton
Nov 21 '18 at 2:40












1 Answer
1






active

oldest

votes


















3














You placed the lookahead after an optional pattern and backtracking makes it possible to match the string in another way.



If Python supported possessive quantifiers, it would be easy to solve by adding + after the .? that is before the lookahead: p.?s?o.?+(?!s*box). It would prevent the engine from backtracking into .? pattern.



However, since Python re does not support them, you need to move the lookahead right after the o, obligatory part, and add .? to the lookahead:



r'p.?s?o(?!.?s*box).?'
^^^^^^^^^^^^^


See the regex demo. Add b after box if you plan to match it as a whole word. Same with the first p, you may want to add a b before it to match p as a whole word.



Details





  • p - a p


  • .? - an optional (1 or 0) dots


  • s? - an optional (1 or 0) whitespaces


  • o - an o


  • (?!.?s*box) - a negative lookahead that fails the match if, immediately to the right of the current location there is an optional dot, 0+ whitespaces and box


  • .? - an optional (1 or 0) dots






share|improve this answer


























  • That’s exactly what the issue was! I thought my overalll expression was right, it was just that backtracking issue causing the false positives. But now, the last period is chopped off. Is there a way to get it to still show if it’s there?

    – Will Blanton
    Nov 21 '18 at 2:37








  • 1





    Nevermind! I missed the “.?” at the end. That took care of that! Thanks again!

    – Will Blanton
    Nov 21 '18 at 2:56











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53402567%2fpython-regex-negative-lookahead-not-working-after-a-quantifier%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









3














You placed the lookahead after an optional pattern and backtracking makes it possible to match the string in another way.



If Python supported possessive quantifiers, it would be easy to solve by adding + after the .? that is before the lookahead: p.?s?o.?+(?!s*box). It would prevent the engine from backtracking into .? pattern.



However, since Python re does not support them, you need to move the lookahead right after the o, obligatory part, and add .? to the lookahead:



r'p.?s?o(?!.?s*box).?'
^^^^^^^^^^^^^


See the regex demo. Add b after box if you plan to match it as a whole word. Same with the first p, you may want to add a b before it to match p as a whole word.



Details





  • p - a p


  • .? - an optional (1 or 0) dots


  • s? - an optional (1 or 0) whitespaces


  • o - an o


  • (?!.?s*box) - a negative lookahead that fails the match if, immediately to the right of the current location there is an optional dot, 0+ whitespaces and box


  • .? - an optional (1 or 0) dots






share|improve this answer


























  • That’s exactly what the issue was! I thought my overalll expression was right, it was just that backtracking issue causing the false positives. But now, the last period is chopped off. Is there a way to get it to still show if it’s there?

    – Will Blanton
    Nov 21 '18 at 2:37








  • 1





    Nevermind! I missed the “.?” at the end. That took care of that! Thanks again!

    – Will Blanton
    Nov 21 '18 at 2:56
















3














You placed the lookahead after an optional pattern and backtracking makes it possible to match the string in another way.



If Python supported possessive quantifiers, it would be easy to solve by adding + after the .? that is before the lookahead: p.?s?o.?+(?!s*box). It would prevent the engine from backtracking into .? pattern.



However, since Python re does not support them, you need to move the lookahead right after the o, obligatory part, and add .? to the lookahead:



r'p.?s?o(?!.?s*box).?'
^^^^^^^^^^^^^


See the regex demo. Add b after box if you plan to match it as a whole word. Same with the first p, you may want to add a b before it to match p as a whole word.



Details





  • p - a p


  • .? - an optional (1 or 0) dots


  • s? - an optional (1 or 0) whitespaces


  • o - an o


  • (?!.?s*box) - a negative lookahead that fails the match if, immediately to the right of the current location there is an optional dot, 0+ whitespaces and box


  • .? - an optional (1 or 0) dots






share|improve this answer


























  • That’s exactly what the issue was! I thought my overalll expression was right, it was just that backtracking issue causing the false positives. But now, the last period is chopped off. Is there a way to get it to still show if it’s there?

    – Will Blanton
    Nov 21 '18 at 2:37








  • 1





    Nevermind! I missed the “.?” at the end. That took care of that! Thanks again!

    – Will Blanton
    Nov 21 '18 at 2:56














3












3








3







You placed the lookahead after an optional pattern and backtracking makes it possible to match the string in another way.



If Python supported possessive quantifiers, it would be easy to solve by adding + after the .? that is before the lookahead: p.?s?o.?+(?!s*box). It would prevent the engine from backtracking into .? pattern.



However, since Python re does not support them, you need to move the lookahead right after the o, obligatory part, and add .? to the lookahead:



r'p.?s?o(?!.?s*box).?'
^^^^^^^^^^^^^


See the regex demo. Add b after box if you plan to match it as a whole word. Same with the first p, you may want to add a b before it to match p as a whole word.



Details





  • p - a p


  • .? - an optional (1 or 0) dots


  • s? - an optional (1 or 0) whitespaces


  • o - an o


  • (?!.?s*box) - a negative lookahead that fails the match if, immediately to the right of the current location there is an optional dot, 0+ whitespaces and box


  • .? - an optional (1 or 0) dots






share|improve this answer















You placed the lookahead after an optional pattern and backtracking makes it possible to match the string in another way.



If Python supported possessive quantifiers, it would be easy to solve by adding + after the .? that is before the lookahead: p.?s?o.?+(?!s*box). It would prevent the engine from backtracking into .? pattern.



However, since Python re does not support them, you need to move the lookahead right after the o, obligatory part, and add .? to the lookahead:



r'p.?s?o(?!.?s*box).?'
^^^^^^^^^^^^^


See the regex demo. Add b after box if you plan to match it as a whole word. Same with the first p, you may want to add a b before it to match p as a whole word.



Details





  • p - a p


  • .? - an optional (1 or 0) dots


  • s? - an optional (1 or 0) whitespaces


  • o - an o


  • (?!.?s*box) - a negative lookahead that fails the match if, immediately to the right of the current location there is an optional dot, 0+ whitespaces and box


  • .? - an optional (1 or 0) dots







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 20 '18 at 22:39

























answered Nov 20 '18 at 22:34









Wiktor StribiżewWiktor Stribiżew

313k16133210




313k16133210













  • That’s exactly what the issue was! I thought my overalll expression was right, it was just that backtracking issue causing the false positives. But now, the last period is chopped off. Is there a way to get it to still show if it’s there?

    – Will Blanton
    Nov 21 '18 at 2:37








  • 1





    Nevermind! I missed the “.?” at the end. That took care of that! Thanks again!

    – Will Blanton
    Nov 21 '18 at 2:56



















  • That’s exactly what the issue was! I thought my overalll expression was right, it was just that backtracking issue causing the false positives. But now, the last period is chopped off. Is there a way to get it to still show if it’s there?

    – Will Blanton
    Nov 21 '18 at 2:37








  • 1





    Nevermind! I missed the “.?” at the end. That took care of that! Thanks again!

    – Will Blanton
    Nov 21 '18 at 2:56

















That’s exactly what the issue was! I thought my overalll expression was right, it was just that backtracking issue causing the false positives. But now, the last period is chopped off. Is there a way to get it to still show if it’s there?

– Will Blanton
Nov 21 '18 at 2:37







That’s exactly what the issue was! I thought my overalll expression was right, it was just that backtracking issue causing the false positives. But now, the last period is chopped off. Is there a way to get it to still show if it’s there?

– Will Blanton
Nov 21 '18 at 2:37






1




1





Nevermind! I missed the “.?” at the end. That took care of that! Thanks again!

– Will Blanton
Nov 21 '18 at 2:56





Nevermind! I missed the “.?” at the end. That took care of that! Thanks again!

– Will Blanton
Nov 21 '18 at 2:56


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53402567%2fpython-regex-negative-lookahead-not-working-after-a-quantifier%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

MongoDB - Not Authorized To Execute Command

How to fix TextFormField cause rebuild widget in Flutter

in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith