Bug with re.finditer() function and re.DOTALL flag in re module of Python 3.6? [duplicate]

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

This question already has an answer here:

Python regex: greedy pattern returning multiple empty matches

1 answer

Regex plus vs star difference?

9 answers

String.replaceAll(regex) makes the same replacement twice

1 answer

I'm getting weird results when I use the re.DOTALL in re.finditer() when using Python 3.6.
I don't know if this is the expected operation or If I'm missing something or if its a bug.

CASE 1

I try this version of a string with an embedded newline.

I expect to get 2 matched values back: m1 = 'abc' and m2 = ' de'

import re

result = re.finditer('.*', 'abcn de', flags=0)

m1 = result.__next__()

#    <_sre.SRE_Match object; span=(0, 3), match='abc'>

m2 = result.__next__()

#    <_sre.SRE_Match object; span=(3, 3), match=''>

m3 = result.__next__()

#    <_sre.SRE_Match object; span=(4, 7), match=' de'>

m4 = result.__next__()

#    <_sre.SRE_Match object; span=(7, 7), match=''>

Whats with the match values m2 and m4?

CASE 2

I try this with re.DOTALL, and I expect to get back one match, m1 = 'abcn de'

result = re.finditer('.*', 'abcn de', flags=re.DOTALL)

m1 = result.__next__()

#     <_sre.SRE_Match object; span=(0, 7), match='abcn de'>

m2 = result.__next__()

#     <_sre.SRE_Match object; span=(7, 7), match=''>

Whats with the extra matches? How do I make the results work as expected?

I want the first case to return ...

m1 = 'abc'

m2 = ' de'

... and the second case to return

m1 = 'abcn de'

and nothing else.

asked Jan 3 at 2:55

P Moran

4071614

marked as duplicate by James, Wiktor Stribiżew python
Users with the python badge can single-handedly close python questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Jan 3 at 7:46

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

If you don't want to match empty strings, use .+ instead of .*.

– Tim Peters
Jan 3 at 2:58

add a comment |

This question already has an answer here:

Python regex: greedy pattern returning multiple empty matches

1 answer

Regex plus vs star difference?

9 answers

String.replaceAll(regex) makes the same replacement twice

1 answer

I'm getting weird results when I use the re.DOTALL in re.finditer() when using Python 3.6.
I don't know if this is the expected operation or If I'm missing something or if its a bug.

CASE 1

I try this version of a string with an embedded newline.

I expect to get 2 matched values back: m1 = 'abc' and m2 = ' de'

import re

result = re.finditer('.*', 'abcn de', flags=0)

m1 = result.__next__()

#    <_sre.SRE_Match object; span=(0, 3), match='abc'>

m2 = result.__next__()

#    <_sre.SRE_Match object; span=(3, 3), match=''>

m3 = result.__next__()

#    <_sre.SRE_Match object; span=(4, 7), match=' de'>

m4 = result.__next__()

#    <_sre.SRE_Match object; span=(7, 7), match=''>

Whats with the match values m2 and m4?

CASE 2

I try this with re.DOTALL, and I expect to get back one match, m1 = 'abcn de'

result = re.finditer('.*', 'abcn de', flags=re.DOTALL)

m1 = result.__next__()

#     <_sre.SRE_Match object; span=(0, 7), match='abcn de'>

m2 = result.__next__()

#     <_sre.SRE_Match object; span=(7, 7), match=''>

Whats with the extra matches? How do I make the results work as expected?

I want the first case to return ...

m1 = 'abc'

m2 = ' de'

... and the second case to return

m1 = 'abcn de'

and nothing else.

asked Jan 3 at 2:55

P Moran

4071614

marked as duplicate by James, Wiktor Stribiżew python
Users with the python badge can single-handedly close python questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Jan 3 at 7:46

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

If you don't want to match empty strings, use .+ instead of .*.

– Tim Peters
Jan 3 at 2:58

add a comment |

This question already has an answer here:

Python regex: greedy pattern returning multiple empty matches

1 answer

Regex plus vs star difference?

9 answers

String.replaceAll(regex) makes the same replacement twice

1 answer

I'm getting weird results when I use the re.DOTALL in re.finditer() when using Python 3.6.
I don't know if this is the expected operation or If I'm missing something or if its a bug.

CASE 1

I try this version of a string with an embedded newline.

I expect to get 2 matched values back: m1 = 'abc' and m2 = ' de'

import re

result = re.finditer('.*', 'abcn de', flags=0)

m1 = result.__next__()

#    <_sre.SRE_Match object; span=(0, 3), match='abc'>

m2 = result.__next__()

#    <_sre.SRE_Match object; span=(3, 3), match=''>

m3 = result.__next__()

#    <_sre.SRE_Match object; span=(4, 7), match=' de'>

m4 = result.__next__()

#    <_sre.SRE_Match object; span=(7, 7), match=''>

Whats with the match values m2 and m4?

CASE 2

I try this with re.DOTALL, and I expect to get back one match, m1 = 'abcn de'

result = re.finditer('.*', 'abcn de', flags=re.DOTALL)

m1 = result.__next__()

#     <_sre.SRE_Match object; span=(0, 7), match='abcn de'>

m2 = result.__next__()

#     <_sre.SRE_Match object; span=(7, 7), match=''>

Whats with the extra matches? How do I make the results work as expected?

I want the first case to return ...

m1 = 'abc'

m2 = ' de'

... and the second case to return

m1 = 'abcn de'

and nothing else.

asked Jan 3 at 2:55

P Moran

4071614

This question already has an answer here:

Python regex: greedy pattern returning multiple empty matches

1 answer

Regex plus vs star difference?

9 answers

String.replaceAll(regex) makes the same replacement twice

1 answer

I'm getting weird results when I use the re.DOTALL in re.finditer() when using Python 3.6.
I don't know if this is the expected operation or If I'm missing something or if its a bug.

CASE 1

I try this version of a string with an embedded newline.

I expect to get 2 matched values back: m1 = 'abc' and m2 = ' de'

import re

result = re.finditer('.*', 'abcn de', flags=0)

m1 = result.__next__()

#    <_sre.SRE_Match object; span=(0, 3), match='abc'>

m2 = result.__next__()

#    <_sre.SRE_Match object; span=(3, 3), match=''>

m3 = result.__next__()

#    <_sre.SRE_Match object; span=(4, 7), match=' de'>

m4 = result.__next__()

#    <_sre.SRE_Match object; span=(7, 7), match=''>

Whats with the match values m2 and m4?

CASE 2

I try this with re.DOTALL, and I expect to get back one match, m1 = 'abcn de'

result = re.finditer('.*', 'abcn de', flags=re.DOTALL)

m1 = result.__next__()

#     <_sre.SRE_Match object; span=(0, 7), match='abcn de'>

m2 = result.__next__()

#     <_sre.SRE_Match object; span=(7, 7), match=''>

Whats with the extra matches? How do I make the results work as expected?

I want the first case to return ...

m1 = 'abc'

m2 = ' de'

... and the second case to return

m1 = 'abcn de'

and nothing else.

This question already has an answer here:

Python regex: greedy pattern returning multiple empty matches

1 answer

Regex plus vs star difference?

9 answers

String.replaceAll(regex) makes the same replacement twice

1 answer

python regex pattern-matching newline flags

asked Jan 3 at 2:55

P Moran

4071614

asked Jan 3 at 2:55

P Moran

4071614

asked Jan 3 at 2:55

P Moran

4071614

asked Jan 3 at 2:55

P Moran

4071614

asked Jan 3 at 2:55

P Moran

4071614

marked as duplicate by James, Wiktor Stribiżew python
Users with the python badge can single-handedly close python questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Jan 3 at 7:46

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

marked as duplicate by James, Wiktor Stribiżew python
Users with the python badge can single-handedly close python questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Jan 3 at 7:46

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

If you don't want to match empty strings, use .+ instead of .*.

– Tim Peters
Jan 3 at 2:58

add a comment |

If you don't want to match empty strings, use .+ instead of .*.

– Tim Peters
Jan 3 at 2:58

If you don't want to match empty strings, use .+ instead of .*.

– Tim Peters
Jan 3 at 2:58

add a comment |

1 Answer
1

active

oldest

votes

Your pattern is

.*

This means "match zero or more characters"; zero-width matches are permitted.

In your first case, the m2 and m4s exist because the pattern stops matching at the newline, then tries to find a new match starting at that position (index 3). No characters are matched, but the pattern still permits it, because it's .*, hence the first match has

span=(0, 3)

and the second match has

span=(3, 3)

The same thing is happening for span=(7, 7) in m4 and in your DOTALL code.

It sounds like you want a match only if there's at least one character - repeat with + rather than *:

re.finditer('.+', 'abcn de')

answered Jan 3 at 3:01

CertainPerformance

97.5k165887

add a comment |

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Your pattern is

.*

This means "match zero or more characters"; zero-width matches are permitted.

span=(0, 3)

and the second match has

span=(3, 3)

The same thing is happening for span=(7, 7) in m4 and in your DOTALL code.

It sounds like you want a match only if there's at least one character - repeat with + rather than *:

re.finditer('.+', 'abcn de')

answered Jan 3 at 3:01

CertainPerformance

97.5k165887

add a comment |

Your pattern is

.*

This means "match zero or more characters"; zero-width matches are permitted.

span=(0, 3)

and the second match has

span=(3, 3)

The same thing is happening for span=(7, 7) in m4 and in your DOTALL code.

It sounds like you want a match only if there's at least one character - repeat with + rather than *:

re.finditer('.+', 'abcn de')

answered Jan 3 at 3:01

CertainPerformance

97.5k165887

add a comment |

Your pattern is

.*

This means "match zero or more characters"; zero-width matches are permitted.

span=(0, 3)

and the second match has

span=(3, 3)

The same thing is happening for span=(7, 7) in m4 and in your DOTALL code.

It sounds like you want a match only if there's at least one character - repeat with + rather than *:

re.finditer('.+', 'abcn de')

answered Jan 3 at 3:01

CertainPerformance

97.5k165887

Your pattern is

.*

This means "match zero or more characters"; zero-width matches are permitted.

span=(0, 3)

and the second match has

span=(3, 3)

The same thing is happening for span=(7, 7) in m4 and in your DOTALL code.

It sounds like you want a match only if there's at least one character - repeat with + rather than *:

re.finditer('.+', 'abcn de')

answered Jan 3 at 3:01

CertainPerformance

97.5k165887

answered Jan 3 at 3:01

CertainPerformance

97.5k165887

answered Jan 3 at 3:01

CertainPerformance

97.5k165887

answered Jan 3 at 3:01

CertainPerformance

97.5k165887

add a comment |

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu