Bug with re.finditer() function and re.DOTALL flag in re module of Python 3.6? [duplicate]





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







0
















This question already has an answer here:




  • Python regex: greedy pattern returning multiple empty matches

    1 answer



  • Regex plus vs star difference?

    9 answers



  • String.replaceAll(regex) makes the same replacement twice

    1 answer




I'm getting weird results when I use the re.DOTALL in re.finditer() when using Python 3.6.
I don't know if this is the expected operation or If I'm missing something or if its a bug.



CASE 1



I try this version of a string with an embedded newline.



I expect to get 2 matched values back: m1 = 'abc' and m2 = ' de'



import re
result = re.finditer('.*', 'abcn de', flags=0)
m1 = result.__next__()
# <_sre.SRE_Match object; span=(0, 3), match='abc'>
m2 = result.__next__()
# <_sre.SRE_Match object; span=(3, 3), match=''>
m3 = result.__next__()
# <_sre.SRE_Match object; span=(4, 7), match=' de'>
m4 = result.__next__()
# <_sre.SRE_Match object; span=(7, 7), match=''>


Whats with the match values m2 and m4?



CASE 2



I try this with re.DOTALL, and I expect to get back one match, m1 = 'abcn de'



result = re.finditer('.*', 'abcn de', flags=re.DOTALL)
m1 = result.__next__()
# <_sre.SRE_Match object; span=(0, 7), match='abcn de'>
m2 = result.__next__()
# <_sre.SRE_Match object; span=(7, 7), match=''>


Whats with the extra matches? How do I make the results work as expected?



I want the first case to return ...



m1 = 'abc'
m2 = ' de'


... and the second case to return



m1 = 'abcn de'


and nothing else.










share|improve this question













marked as duplicate by James, Wiktor Stribiżew python
Users with the  python badge can single-handedly close python questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Jan 3 at 7:46


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.



















  • If you don't want to match empty strings, use .+ instead of .*.

    – Tim Peters
    Jan 3 at 2:58


















0
















This question already has an answer here:




  • Python regex: greedy pattern returning multiple empty matches

    1 answer



  • Regex plus vs star difference?

    9 answers



  • String.replaceAll(regex) makes the same replacement twice

    1 answer




I'm getting weird results when I use the re.DOTALL in re.finditer() when using Python 3.6.
I don't know if this is the expected operation or If I'm missing something or if its a bug.



CASE 1



I try this version of a string with an embedded newline.



I expect to get 2 matched values back: m1 = 'abc' and m2 = ' de'



import re
result = re.finditer('.*', 'abcn de', flags=0)
m1 = result.__next__()
# <_sre.SRE_Match object; span=(0, 3), match='abc'>
m2 = result.__next__()
# <_sre.SRE_Match object; span=(3, 3), match=''>
m3 = result.__next__()
# <_sre.SRE_Match object; span=(4, 7), match=' de'>
m4 = result.__next__()
# <_sre.SRE_Match object; span=(7, 7), match=''>


Whats with the match values m2 and m4?



CASE 2



I try this with re.DOTALL, and I expect to get back one match, m1 = 'abcn de'



result = re.finditer('.*', 'abcn de', flags=re.DOTALL)
m1 = result.__next__()
# <_sre.SRE_Match object; span=(0, 7), match='abcn de'>
m2 = result.__next__()
# <_sre.SRE_Match object; span=(7, 7), match=''>


Whats with the extra matches? How do I make the results work as expected?



I want the first case to return ...



m1 = 'abc'
m2 = ' de'


... and the second case to return



m1 = 'abcn de'


and nothing else.










share|improve this question













marked as duplicate by James, Wiktor Stribiżew python
Users with the  python badge can single-handedly close python questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Jan 3 at 7:46


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.



















  • If you don't want to match empty strings, use .+ instead of .*.

    – Tim Peters
    Jan 3 at 2:58














0












0








0









This question already has an answer here:




  • Python regex: greedy pattern returning multiple empty matches

    1 answer



  • Regex plus vs star difference?

    9 answers



  • String.replaceAll(regex) makes the same replacement twice

    1 answer




I'm getting weird results when I use the re.DOTALL in re.finditer() when using Python 3.6.
I don't know if this is the expected operation or If I'm missing something or if its a bug.



CASE 1



I try this version of a string with an embedded newline.



I expect to get 2 matched values back: m1 = 'abc' and m2 = ' de'



import re
result = re.finditer('.*', 'abcn de', flags=0)
m1 = result.__next__()
# <_sre.SRE_Match object; span=(0, 3), match='abc'>
m2 = result.__next__()
# <_sre.SRE_Match object; span=(3, 3), match=''>
m3 = result.__next__()
# <_sre.SRE_Match object; span=(4, 7), match=' de'>
m4 = result.__next__()
# <_sre.SRE_Match object; span=(7, 7), match=''>


Whats with the match values m2 and m4?



CASE 2



I try this with re.DOTALL, and I expect to get back one match, m1 = 'abcn de'



result = re.finditer('.*', 'abcn de', flags=re.DOTALL)
m1 = result.__next__()
# <_sre.SRE_Match object; span=(0, 7), match='abcn de'>
m2 = result.__next__()
# <_sre.SRE_Match object; span=(7, 7), match=''>


Whats with the extra matches? How do I make the results work as expected?



I want the first case to return ...



m1 = 'abc'
m2 = ' de'


... and the second case to return



m1 = 'abcn de'


and nothing else.










share|improve this question















This question already has an answer here:




  • Python regex: greedy pattern returning multiple empty matches

    1 answer



  • Regex plus vs star difference?

    9 answers



  • String.replaceAll(regex) makes the same replacement twice

    1 answer




I'm getting weird results when I use the re.DOTALL in re.finditer() when using Python 3.6.
I don't know if this is the expected operation or If I'm missing something or if its a bug.



CASE 1



I try this version of a string with an embedded newline.



I expect to get 2 matched values back: m1 = 'abc' and m2 = ' de'



import re
result = re.finditer('.*', 'abcn de', flags=0)
m1 = result.__next__()
# <_sre.SRE_Match object; span=(0, 3), match='abc'>
m2 = result.__next__()
# <_sre.SRE_Match object; span=(3, 3), match=''>
m3 = result.__next__()
# <_sre.SRE_Match object; span=(4, 7), match=' de'>
m4 = result.__next__()
# <_sre.SRE_Match object; span=(7, 7), match=''>


Whats with the match values m2 and m4?



CASE 2



I try this with re.DOTALL, and I expect to get back one match, m1 = 'abcn de'



result = re.finditer('.*', 'abcn de', flags=re.DOTALL)
m1 = result.__next__()
# <_sre.SRE_Match object; span=(0, 7), match='abcn de'>
m2 = result.__next__()
# <_sre.SRE_Match object; span=(7, 7), match=''>


Whats with the extra matches? How do I make the results work as expected?



I want the first case to return ...



m1 = 'abc'
m2 = ' de'


... and the second case to return



m1 = 'abcn de'


and nothing else.





This question already has an answer here:




  • Python regex: greedy pattern returning multiple empty matches

    1 answer



  • Regex plus vs star difference?

    9 answers



  • String.replaceAll(regex) makes the same replacement twice

    1 answer








python regex pattern-matching newline flags






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jan 3 at 2:55









P MoranP Moran

4071614




4071614




marked as duplicate by James, Wiktor Stribiżew python
Users with the  python badge can single-handedly close python questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Jan 3 at 7:46


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.









marked as duplicate by James, Wiktor Stribiżew python
Users with the  python badge can single-handedly close python questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Jan 3 at 7:46


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.















  • If you don't want to match empty strings, use .+ instead of .*.

    – Tim Peters
    Jan 3 at 2:58



















  • If you don't want to match empty strings, use .+ instead of .*.

    – Tim Peters
    Jan 3 at 2:58

















If you don't want to match empty strings, use .+ instead of .*.

– Tim Peters
Jan 3 at 2:58





If you don't want to match empty strings, use .+ instead of .*.

– Tim Peters
Jan 3 at 2:58












1 Answer
1






active

oldest

votes


















1














Your pattern is



.*


This means "match zero or more characters"; zero-width matches are permitted.



In your first case, the m2 and m4s exist because the pattern stops matching at the newline, then tries to find a new match starting at that position (index 3). No characters are matched, but the pattern still permits it, because it's .*, hence the first match has



span=(0, 3)


and the second match has



span=(3, 3)


The same thing is happening for span=(7, 7) in m4 and in your DOTALL code.



It sounds like you want a match only if there's at least one character - repeat with + rather than *:



re.finditer('.+', 'abcn de')





share|improve this answer






























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    Your pattern is



    .*


    This means "match zero or more characters"; zero-width matches are permitted.



    In your first case, the m2 and m4s exist because the pattern stops matching at the newline, then tries to find a new match starting at that position (index 3). No characters are matched, but the pattern still permits it, because it's .*, hence the first match has



    span=(0, 3)


    and the second match has



    span=(3, 3)


    The same thing is happening for span=(7, 7) in m4 and in your DOTALL code.



    It sounds like you want a match only if there's at least one character - repeat with + rather than *:



    re.finditer('.+', 'abcn de')





    share|improve this answer




























      1














      Your pattern is



      .*


      This means "match zero or more characters"; zero-width matches are permitted.



      In your first case, the m2 and m4s exist because the pattern stops matching at the newline, then tries to find a new match starting at that position (index 3). No characters are matched, but the pattern still permits it, because it's .*, hence the first match has



      span=(0, 3)


      and the second match has



      span=(3, 3)


      The same thing is happening for span=(7, 7) in m4 and in your DOTALL code.



      It sounds like you want a match only if there's at least one character - repeat with + rather than *:



      re.finditer('.+', 'abcn de')





      share|improve this answer


























        1












        1








        1







        Your pattern is



        .*


        This means "match zero or more characters"; zero-width matches are permitted.



        In your first case, the m2 and m4s exist because the pattern stops matching at the newline, then tries to find a new match starting at that position (index 3). No characters are matched, but the pattern still permits it, because it's .*, hence the first match has



        span=(0, 3)


        and the second match has



        span=(3, 3)


        The same thing is happening for span=(7, 7) in m4 and in your DOTALL code.



        It sounds like you want a match only if there's at least one character - repeat with + rather than *:



        re.finditer('.+', 'abcn de')





        share|improve this answer













        Your pattern is



        .*


        This means "match zero or more characters"; zero-width matches are permitted.



        In your first case, the m2 and m4s exist because the pattern stops matching at the newline, then tries to find a new match starting at that position (index 3). No characters are matched, but the pattern still permits it, because it's .*, hence the first match has



        span=(0, 3)


        and the second match has



        span=(3, 3)


        The same thing is happening for span=(7, 7) in m4 and in your DOTALL code.



        It sounds like you want a match only if there's at least one character - repeat with + rather than *:



        re.finditer('.+', 'abcn de')






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 3 at 3:01









        CertainPerformanceCertainPerformance

        97.5k165887




        97.5k165887

















            Popular posts from this blog

            Can a sorcerer learn a 5th-level spell early by creating spell slots using the Font of Magic feature?

            ts Property 'filter' does not exist on type '{}'

            mat-slide-toggle shouldn't change it's state when I click cancel in confirmation window