Bug with re.finditer() function and re.DOTALL flag in re module of Python 3.6? [duplicate]
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
This question already has an answer here:
Python regex: greedy pattern returning multiple empty matches
1 answer
Regex plus vs star difference?
9 answers
String.replaceAll(regex) makes the same replacement twice
1 answer
I'm getting weird results when I use the re.DOTALL in re.finditer() when using Python 3.6.
I don't know if this is the expected operation or If I'm missing something or if its a bug.
CASE 1
I try this version of a string with an embedded newline.
I expect to get 2 matched values back: m1 = 'abc' and m2 = ' de'
import re
result = re.finditer('.*', 'abcn de', flags=0)
m1 = result.__next__()
# <_sre.SRE_Match object; span=(0, 3), match='abc'>
m2 = result.__next__()
# <_sre.SRE_Match object; span=(3, 3), match=''>
m3 = result.__next__()
# <_sre.SRE_Match object; span=(4, 7), match=' de'>
m4 = result.__next__()
# <_sre.SRE_Match object; span=(7, 7), match=''>
Whats with the match values m2 and m4?
CASE 2
I try this with re.DOTALL, and I expect to get back one match, m1 = 'abcn de'
result = re.finditer('.*', 'abcn de', flags=re.DOTALL)
m1 = result.__next__()
# <_sre.SRE_Match object; span=(0, 7), match='abcn de'>
m2 = result.__next__()
# <_sre.SRE_Match object; span=(7, 7), match=''>
Whats with the extra matches? How do I make the results work as expected?
I want the first case to return ...
m1 = 'abc'
m2 = ' de'
... and the second case to return
m1 = 'abcn de'
and nothing else.
python regex pattern-matching newline flags
marked as duplicate by James, Wiktor Stribiżew
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Jan 3 at 7:46
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
This question already has an answer here:
Python regex: greedy pattern returning multiple empty matches
1 answer
Regex plus vs star difference?
9 answers
String.replaceAll(regex) makes the same replacement twice
1 answer
I'm getting weird results when I use the re.DOTALL in re.finditer() when using Python 3.6.
I don't know if this is the expected operation or If I'm missing something or if its a bug.
CASE 1
I try this version of a string with an embedded newline.
I expect to get 2 matched values back: m1 = 'abc' and m2 = ' de'
import re
result = re.finditer('.*', 'abcn de', flags=0)
m1 = result.__next__()
# <_sre.SRE_Match object; span=(0, 3), match='abc'>
m2 = result.__next__()
# <_sre.SRE_Match object; span=(3, 3), match=''>
m3 = result.__next__()
# <_sre.SRE_Match object; span=(4, 7), match=' de'>
m4 = result.__next__()
# <_sre.SRE_Match object; span=(7, 7), match=''>
Whats with the match values m2 and m4?
CASE 2
I try this with re.DOTALL, and I expect to get back one match, m1 = 'abcn de'
result = re.finditer('.*', 'abcn de', flags=re.DOTALL)
m1 = result.__next__()
# <_sre.SRE_Match object; span=(0, 7), match='abcn de'>
m2 = result.__next__()
# <_sre.SRE_Match object; span=(7, 7), match=''>
Whats with the extra matches? How do I make the results work as expected?
I want the first case to return ...
m1 = 'abc'
m2 = ' de'
... and the second case to return
m1 = 'abcn de'
and nothing else.
python regex pattern-matching newline flags
marked as duplicate by James, Wiktor Stribiżew
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Jan 3 at 7:46
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
If you don't want to match empty strings, use.+
instead of.*
.
– Tim Peters
Jan 3 at 2:58
add a comment |
This question already has an answer here:
Python regex: greedy pattern returning multiple empty matches
1 answer
Regex plus vs star difference?
9 answers
String.replaceAll(regex) makes the same replacement twice
1 answer
I'm getting weird results when I use the re.DOTALL in re.finditer() when using Python 3.6.
I don't know if this is the expected operation or If I'm missing something or if its a bug.
CASE 1
I try this version of a string with an embedded newline.
I expect to get 2 matched values back: m1 = 'abc' and m2 = ' de'
import re
result = re.finditer('.*', 'abcn de', flags=0)
m1 = result.__next__()
# <_sre.SRE_Match object; span=(0, 3), match='abc'>
m2 = result.__next__()
# <_sre.SRE_Match object; span=(3, 3), match=''>
m3 = result.__next__()
# <_sre.SRE_Match object; span=(4, 7), match=' de'>
m4 = result.__next__()
# <_sre.SRE_Match object; span=(7, 7), match=''>
Whats with the match values m2 and m4?
CASE 2
I try this with re.DOTALL, and I expect to get back one match, m1 = 'abcn de'
result = re.finditer('.*', 'abcn de', flags=re.DOTALL)
m1 = result.__next__()
# <_sre.SRE_Match object; span=(0, 7), match='abcn de'>
m2 = result.__next__()
# <_sre.SRE_Match object; span=(7, 7), match=''>
Whats with the extra matches? How do I make the results work as expected?
I want the first case to return ...
m1 = 'abc'
m2 = ' de'
... and the second case to return
m1 = 'abcn de'
and nothing else.
python regex pattern-matching newline flags
This question already has an answer here:
Python regex: greedy pattern returning multiple empty matches
1 answer
Regex plus vs star difference?
9 answers
String.replaceAll(regex) makes the same replacement twice
1 answer
I'm getting weird results when I use the re.DOTALL in re.finditer() when using Python 3.6.
I don't know if this is the expected operation or If I'm missing something or if its a bug.
CASE 1
I try this version of a string with an embedded newline.
I expect to get 2 matched values back: m1 = 'abc' and m2 = ' de'
import re
result = re.finditer('.*', 'abcn de', flags=0)
m1 = result.__next__()
# <_sre.SRE_Match object; span=(0, 3), match='abc'>
m2 = result.__next__()
# <_sre.SRE_Match object; span=(3, 3), match=''>
m3 = result.__next__()
# <_sre.SRE_Match object; span=(4, 7), match=' de'>
m4 = result.__next__()
# <_sre.SRE_Match object; span=(7, 7), match=''>
Whats with the match values m2 and m4?
CASE 2
I try this with re.DOTALL, and I expect to get back one match, m1 = 'abcn de'
result = re.finditer('.*', 'abcn de', flags=re.DOTALL)
m1 = result.__next__()
# <_sre.SRE_Match object; span=(0, 7), match='abcn de'>
m2 = result.__next__()
# <_sre.SRE_Match object; span=(7, 7), match=''>
Whats with the extra matches? How do I make the results work as expected?
I want the first case to return ...
m1 = 'abc'
m2 = ' de'
... and the second case to return
m1 = 'abcn de'
and nothing else.
This question already has an answer here:
Python regex: greedy pattern returning multiple empty matches
1 answer
Regex plus vs star difference?
9 answers
String.replaceAll(regex) makes the same replacement twice
1 answer
python regex pattern-matching newline flags
python regex pattern-matching newline flags
asked Jan 3 at 2:55
P MoranP Moran
4071614
4071614
marked as duplicate by James, Wiktor Stribiżew
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Jan 3 at 7:46
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
marked as duplicate by James, Wiktor Stribiżew
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Jan 3 at 7:46
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
If you don't want to match empty strings, use.+
instead of.*
.
– Tim Peters
Jan 3 at 2:58
add a comment |
If you don't want to match empty strings, use.+
instead of.*
.
– Tim Peters
Jan 3 at 2:58
If you don't want to match empty strings, use
.+
instead of .*
.– Tim Peters
Jan 3 at 2:58
If you don't want to match empty strings, use
.+
instead of .*
.– Tim Peters
Jan 3 at 2:58
add a comment |
1 Answer
1
active
oldest
votes
Your pattern is
.*
This means "match zero or more characters"; zero-width matches are permitted.
In your first case, the m2
and m4
s exist because the pattern stops matching at the newline, then tries to find a new match starting at that position (index 3). No characters are matched, but the pattern still permits it, because it's .*
, hence the first match has
span=(0, 3)
and the second match has
span=(3, 3)
The same thing is happening for span=(7, 7)
in m4
and in your DOTALL
code.
It sounds like you want a match only if there's at least one character - repeat with +
rather than *
:
re.finditer('.+', 'abcn de')
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Your pattern is
.*
This means "match zero or more characters"; zero-width matches are permitted.
In your first case, the m2
and m4
s exist because the pattern stops matching at the newline, then tries to find a new match starting at that position (index 3). No characters are matched, but the pattern still permits it, because it's .*
, hence the first match has
span=(0, 3)
and the second match has
span=(3, 3)
The same thing is happening for span=(7, 7)
in m4
and in your DOTALL
code.
It sounds like you want a match only if there's at least one character - repeat with +
rather than *
:
re.finditer('.+', 'abcn de')
add a comment |
Your pattern is
.*
This means "match zero or more characters"; zero-width matches are permitted.
In your first case, the m2
and m4
s exist because the pattern stops matching at the newline, then tries to find a new match starting at that position (index 3). No characters are matched, but the pattern still permits it, because it's .*
, hence the first match has
span=(0, 3)
and the second match has
span=(3, 3)
The same thing is happening for span=(7, 7)
in m4
and in your DOTALL
code.
It sounds like you want a match only if there's at least one character - repeat with +
rather than *
:
re.finditer('.+', 'abcn de')
add a comment |
Your pattern is
.*
This means "match zero or more characters"; zero-width matches are permitted.
In your first case, the m2
and m4
s exist because the pattern stops matching at the newline, then tries to find a new match starting at that position (index 3). No characters are matched, but the pattern still permits it, because it's .*
, hence the first match has
span=(0, 3)
and the second match has
span=(3, 3)
The same thing is happening for span=(7, 7)
in m4
and in your DOTALL
code.
It sounds like you want a match only if there's at least one character - repeat with +
rather than *
:
re.finditer('.+', 'abcn de')
Your pattern is
.*
This means "match zero or more characters"; zero-width matches are permitted.
In your first case, the m2
and m4
s exist because the pattern stops matching at the newline, then tries to find a new match starting at that position (index 3). No characters are matched, but the pattern still permits it, because it's .*
, hence the first match has
span=(0, 3)
and the second match has
span=(3, 3)
The same thing is happening for span=(7, 7)
in m4
and in your DOTALL
code.
It sounds like you want a match only if there's at least one character - repeat with +
rather than *
:
re.finditer('.+', 'abcn de')
answered Jan 3 at 3:01
CertainPerformanceCertainPerformance
97.5k165887
97.5k165887
add a comment |
add a comment |
If you don't want to match empty strings, use
.+
instead of.*
.– Tim Peters
Jan 3 at 2:58