Bug with re.finditer() function and re.DOTALL flag in re module of Python 3.6? [duplicate]





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







0
















This question already has an answer here:




  • Python regex: greedy pattern returning multiple empty matches

    1 answer



  • Regex plus vs star difference?

    9 answers



  • String.replaceAll(regex) makes the same replacement twice

    1 answer




I'm getting weird results when I use the re.DOTALL in re.finditer() when using Python 3.6.
I don't know if this is the expected operation or If I'm missing something or if its a bug.



CASE 1



I try this version of a string with an embedded newline.



I expect to get 2 matched values back: m1 = 'abc' and m2 = ' de'



import re
result = re.finditer('.*', 'abcn de', flags=0)
m1 = result.__next__()
# <_sre.SRE_Match object; span=(0, 3), match='abc'>
m2 = result.__next__()
# <_sre.SRE_Match object; span=(3, 3), match=''>
m3 = result.__next__()
# <_sre.SRE_Match object; span=(4, 7), match=' de'>
m4 = result.__next__()
# <_sre.SRE_Match object; span=(7, 7), match=''>


Whats with the match values m2 and m4?



CASE 2



I try this with re.DOTALL, and I expect to get back one match, m1 = 'abcn de'



result = re.finditer('.*', 'abcn de', flags=re.DOTALL)
m1 = result.__next__()
# <_sre.SRE_Match object; span=(0, 7), match='abcn de'>
m2 = result.__next__()
# <_sre.SRE_Match object; span=(7, 7), match=''>


Whats with the extra matches? How do I make the results work as expected?



I want the first case to return ...



m1 = 'abc'
m2 = ' de'


... and the second case to return



m1 = 'abcn de'


and nothing else.










share|improve this question













marked as duplicate by James, Wiktor Stribiżew python
Users with the  python badge can single-handedly close python questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Jan 3 at 7:46


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.



















  • If you don't want to match empty strings, use .+ instead of .*.

    – Tim Peters
    Jan 3 at 2:58


















0
















This question already has an answer here:




  • Python regex: greedy pattern returning multiple empty matches

    1 answer



  • Regex plus vs star difference?

    9 answers



  • String.replaceAll(regex) makes the same replacement twice

    1 answer




I'm getting weird results when I use the re.DOTALL in re.finditer() when using Python 3.6.
I don't know if this is the expected operation or If I'm missing something or if its a bug.



CASE 1



I try this version of a string with an embedded newline.



I expect to get 2 matched values back: m1 = 'abc' and m2 = ' de'



import re
result = re.finditer('.*', 'abcn de', flags=0)
m1 = result.__next__()
# <_sre.SRE_Match object; span=(0, 3), match='abc'>
m2 = result.__next__()
# <_sre.SRE_Match object; span=(3, 3), match=''>
m3 = result.__next__()
# <_sre.SRE_Match object; span=(4, 7), match=' de'>
m4 = result.__next__()
# <_sre.SRE_Match object; span=(7, 7), match=''>


Whats with the match values m2 and m4?



CASE 2



I try this with re.DOTALL, and I expect to get back one match, m1 = 'abcn de'



result = re.finditer('.*', 'abcn de', flags=re.DOTALL)
m1 = result.__next__()
# <_sre.SRE_Match object; span=(0, 7), match='abcn de'>
m2 = result.__next__()
# <_sre.SRE_Match object; span=(7, 7), match=''>


Whats with the extra matches? How do I make the results work as expected?



I want the first case to return ...



m1 = 'abc'
m2 = ' de'


... and the second case to return



m1 = 'abcn de'


and nothing else.










share|improve this question













marked as duplicate by James, Wiktor Stribiżew python
Users with the  python badge can single-handedly close python questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Jan 3 at 7:46


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.



















  • If you don't want to match empty strings, use .+ instead of .*.

    – Tim Peters
    Jan 3 at 2:58














0












0








0









This question already has an answer here:




  • Python regex: greedy pattern returning multiple empty matches

    1 answer



  • Regex plus vs star difference?

    9 answers



  • String.replaceAll(regex) makes the same replacement twice

    1 answer




I'm getting weird results when I use the re.DOTALL in re.finditer() when using Python 3.6.
I don't know if this is the expected operation or If I'm missing something or if its a bug.



CASE 1



I try this version of a string with an embedded newline.



I expect to get 2 matched values back: m1 = 'abc' and m2 = ' de'



import re
result = re.finditer('.*', 'abcn de', flags=0)
m1 = result.__next__()
# <_sre.SRE_Match object; span=(0, 3), match='abc'>
m2 = result.__next__()
# <_sre.SRE_Match object; span=(3, 3), match=''>
m3 = result.__next__()
# <_sre.SRE_Match object; span=(4, 7), match=' de'>
m4 = result.__next__()
# <_sre.SRE_Match object; span=(7, 7), match=''>


Whats with the match values m2 and m4?



CASE 2



I try this with re.DOTALL, and I expect to get back one match, m1 = 'abcn de'



result = re.finditer('.*', 'abcn de', flags=re.DOTALL)
m1 = result.__next__()
# <_sre.SRE_Match object; span=(0, 7), match='abcn de'>
m2 = result.__next__()
# <_sre.SRE_Match object; span=(7, 7), match=''>


Whats with the extra matches? How do I make the results work as expected?



I want the first case to return ...



m1 = 'abc'
m2 = ' de'


... and the second case to return



m1 = 'abcn de'


and nothing else.










share|improve this question















This question already has an answer here:




  • Python regex: greedy pattern returning multiple empty matches

    1 answer



  • Regex plus vs star difference?

    9 answers



  • String.replaceAll(regex) makes the same replacement twice

    1 answer




I'm getting weird results when I use the re.DOTALL in re.finditer() when using Python 3.6.
I don't know if this is the expected operation or If I'm missing something or if its a bug.



CASE 1



I try this version of a string with an embedded newline.



I expect to get 2 matched values back: m1 = 'abc' and m2 = ' de'



import re
result = re.finditer('.*', 'abcn de', flags=0)
m1 = result.__next__()
# <_sre.SRE_Match object; span=(0, 3), match='abc'>
m2 = result.__next__()
# <_sre.SRE_Match object; span=(3, 3), match=''>
m3 = result.__next__()
# <_sre.SRE_Match object; span=(4, 7), match=' de'>
m4 = result.__next__()
# <_sre.SRE_Match object; span=(7, 7), match=''>


Whats with the match values m2 and m4?



CASE 2



I try this with re.DOTALL, and I expect to get back one match, m1 = 'abcn de'



result = re.finditer('.*', 'abcn de', flags=re.DOTALL)
m1 = result.__next__()
# <_sre.SRE_Match object; span=(0, 7), match='abcn de'>
m2 = result.__next__()
# <_sre.SRE_Match object; span=(7, 7), match=''>


Whats with the extra matches? How do I make the results work as expected?



I want the first case to return ...



m1 = 'abc'
m2 = ' de'


... and the second case to return



m1 = 'abcn de'


and nothing else.





This question already has an answer here:




  • Python regex: greedy pattern returning multiple empty matches

    1 answer



  • Regex plus vs star difference?

    9 answers



  • String.replaceAll(regex) makes the same replacement twice

    1 answer








python regex pattern-matching newline flags






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jan 3 at 2:55









P MoranP Moran

4071614




4071614




marked as duplicate by James, Wiktor Stribiżew python
Users with the  python badge can single-handedly close python questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Jan 3 at 7:46


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.









marked as duplicate by James, Wiktor Stribiżew python
Users with the  python badge can single-handedly close python questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Jan 3 at 7:46


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.















  • If you don't want to match empty strings, use .+ instead of .*.

    – Tim Peters
    Jan 3 at 2:58



















  • If you don't want to match empty strings, use .+ instead of .*.

    – Tim Peters
    Jan 3 at 2:58

















If you don't want to match empty strings, use .+ instead of .*.

– Tim Peters
Jan 3 at 2:58





If you don't want to match empty strings, use .+ instead of .*.

– Tim Peters
Jan 3 at 2:58












1 Answer
1






active

oldest

votes


















1














Your pattern is



.*


This means "match zero or more characters"; zero-width matches are permitted.



In your first case, the m2 and m4s exist because the pattern stops matching at the newline, then tries to find a new match starting at that position (index 3). No characters are matched, but the pattern still permits it, because it's .*, hence the first match has



span=(0, 3)


and the second match has



span=(3, 3)


The same thing is happening for span=(7, 7) in m4 and in your DOTALL code.



It sounds like you want a match only if there's at least one character - repeat with + rather than *:



re.finditer('.+', 'abcn de')





share|improve this answer






























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    Your pattern is



    .*


    This means "match zero or more characters"; zero-width matches are permitted.



    In your first case, the m2 and m4s exist because the pattern stops matching at the newline, then tries to find a new match starting at that position (index 3). No characters are matched, but the pattern still permits it, because it's .*, hence the first match has



    span=(0, 3)


    and the second match has



    span=(3, 3)


    The same thing is happening for span=(7, 7) in m4 and in your DOTALL code.



    It sounds like you want a match only if there's at least one character - repeat with + rather than *:



    re.finditer('.+', 'abcn de')





    share|improve this answer




























      1














      Your pattern is



      .*


      This means "match zero or more characters"; zero-width matches are permitted.



      In your first case, the m2 and m4s exist because the pattern stops matching at the newline, then tries to find a new match starting at that position (index 3). No characters are matched, but the pattern still permits it, because it's .*, hence the first match has



      span=(0, 3)


      and the second match has



      span=(3, 3)


      The same thing is happening for span=(7, 7) in m4 and in your DOTALL code.



      It sounds like you want a match only if there's at least one character - repeat with + rather than *:



      re.finditer('.+', 'abcn de')





      share|improve this answer


























        1












        1








        1







        Your pattern is



        .*


        This means "match zero or more characters"; zero-width matches are permitted.



        In your first case, the m2 and m4s exist because the pattern stops matching at the newline, then tries to find a new match starting at that position (index 3). No characters are matched, but the pattern still permits it, because it's .*, hence the first match has



        span=(0, 3)


        and the second match has



        span=(3, 3)


        The same thing is happening for span=(7, 7) in m4 and in your DOTALL code.



        It sounds like you want a match only if there's at least one character - repeat with + rather than *:



        re.finditer('.+', 'abcn de')





        share|improve this answer













        Your pattern is



        .*


        This means "match zero or more characters"; zero-width matches are permitted.



        In your first case, the m2 and m4s exist because the pattern stops matching at the newline, then tries to find a new match starting at that position (index 3). No characters are matched, but the pattern still permits it, because it's .*, hence the first match has



        span=(0, 3)


        and the second match has



        span=(3, 3)


        The same thing is happening for span=(7, 7) in m4 and in your DOTALL code.



        It sounds like you want a match only if there's at least one character - repeat with + rather than *:



        re.finditer('.+', 'abcn de')






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 3 at 3:01









        CertainPerformanceCertainPerformance

        97.5k165887




        97.5k165887

















            Popular posts from this blog

            'app-layout' is not a known element: how to share Component with different Modules

            android studio warns about leanback feature tag usage required on manifest while using Unity exported app?

            WPF add header to Image with URL pettitions [duplicate]