keep quoted blocks intact when splitting by delimiter
Given an example string s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
and I want to spearate it to the following chunks:
# To Do: something like {l = s.split(',')}
l = ['Hi', 'my name is Humpty-Dumpty', '"Alice, Through the Looking Glass"']
I don't know where and how many delimiters I'll find.
This is my initial idea, and it is quite long, and not exact, as it removes the all delimiters, while I want the delimiters inside quotes to survive:
s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
ss =
inner_string = ""
delimiter = ','
for item in s.split(delimiter):
if not inner_string:
if '"' not in item: # regullar string. not intersting
ss.append(item)
else:
inner_string += item # start inner string
elif inner_string:
inner_string += item
if '"' in item: # end inner string
ss.append(inner_string)
inner_string = ""
else: # middle of inner string
pass
print(ss)
# prints ['Hi', ' my name is Humpty-Dumpty', ' from "Alice Through the Looking Glass"'] which is OK-ish
python python-3.x split
add a comment |
Given an example string s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
and I want to spearate it to the following chunks:
# To Do: something like {l = s.split(',')}
l = ['Hi', 'my name is Humpty-Dumpty', '"Alice, Through the Looking Glass"']
I don't know where and how many delimiters I'll find.
This is my initial idea, and it is quite long, and not exact, as it removes the all delimiters, while I want the delimiters inside quotes to survive:
s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
ss =
inner_string = ""
delimiter = ','
for item in s.split(delimiter):
if not inner_string:
if '"' not in item: # regullar string. not intersting
ss.append(item)
else:
inner_string += item # start inner string
elif inner_string:
inner_string += item
if '"' in item: # end inner string
ss.append(inner_string)
inner_string = ""
else: # middle of inner string
pass
print(ss)
# prints ['Hi', ' my name is Humpty-Dumpty', ' from "Alice Through the Looking Glass"'] which is OK-ish
python python-3.x split
add a comment |
Given an example string s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
and I want to spearate it to the following chunks:
# To Do: something like {l = s.split(',')}
l = ['Hi', 'my name is Humpty-Dumpty', '"Alice, Through the Looking Glass"']
I don't know where and how many delimiters I'll find.
This is my initial idea, and it is quite long, and not exact, as it removes the all delimiters, while I want the delimiters inside quotes to survive:
s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
ss =
inner_string = ""
delimiter = ','
for item in s.split(delimiter):
if not inner_string:
if '"' not in item: # regullar string. not intersting
ss.append(item)
else:
inner_string += item # start inner string
elif inner_string:
inner_string += item
if '"' in item: # end inner string
ss.append(inner_string)
inner_string = ""
else: # middle of inner string
pass
print(ss)
# prints ['Hi', ' my name is Humpty-Dumpty', ' from "Alice Through the Looking Glass"'] which is OK-ish
python python-3.x split
Given an example string s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
and I want to spearate it to the following chunks:
# To Do: something like {l = s.split(',')}
l = ['Hi', 'my name is Humpty-Dumpty', '"Alice, Through the Looking Glass"']
I don't know where and how many delimiters I'll find.
This is my initial idea, and it is quite long, and not exact, as it removes the all delimiters, while I want the delimiters inside quotes to survive:
s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
ss =
inner_string = ""
delimiter = ','
for item in s.split(delimiter):
if not inner_string:
if '"' not in item: # regullar string. not intersting
ss.append(item)
else:
inner_string += item # start inner string
elif inner_string:
inner_string += item
if '"' in item: # end inner string
ss.append(inner_string)
inner_string = ""
else: # middle of inner string
pass
print(ss)
# prints ['Hi', ' my name is Humpty-Dumpty', ' from "Alice Through the Looking Glass"'] which is OK-ish
python python-3.x split
python python-3.x split
edited Nov 20 '18 at 13:00
fferri
11.7k22251
11.7k22251
asked Nov 20 '18 at 11:13


CIsForCookiesCIsForCookies
6,74411546
6,74411546
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
You can split by regular expressions with re.split
:
>>> import re
>>> [x for x in re.split(r'([^",]*(?:"[^"]*"[^",]*)*)', s) if x not in (',','')]
when s
is equal to:
'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
it outputs:
['Hi', ' my name is Humpty-Dumpty', ' from "Alice, Through the Looking Glass"']
Regular expression explained:
(
[^",]* zero or more chars other than " or ,
(?: non-capturing group
"[^"]*" quoted block
[^",]* followed by zero or more chars other than " or ,
)* zero or more times
)
add a comment |
I solved this problem by avoiding split
entirely:
s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
l =
substr = ""
quotes_open = False
for c in s:
if c == ',' and not quotes_open: # check for comma only if no quotes open
l.append(substr)
substr = ""
elif c == '"':
quotes_open = not quotes_open
else:
substr += c
l.append(substr)
print(l)
Output:
['Hi', ' my name is Humpty-Dumpty', ' from Alice, Through the Looking Glass']
A more generalised function could look something like:
def custom_split(input_str, delimiter=' ', avoid_between_char='"'):
l =
substr = ""
between_avoid_chars = False
for c in s:
if c == delimiter and not between_avoid_chars:
l.append(substr)
substr = ""
elif c == avoid_between_char:
between_avoid_chars = not between_avoid_chars
else:
substr += c
l.append(substr)
return l
add a comment |
this would work for this specific case and can provide a starting point.
import re
s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
cut = re.search('(".*")', s)
r = re.sub('(".*")', '$VAR$', s).split(',')
res =
for i in r:
res.append(re.sub('$VAR$', cut.group(1), i))
Output
print(res)
['Hi', ' my name is Humpty-Dumpty', ' from "Alice, Through the Looking Glass"']
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53391766%2fkeep-quoted-blocks-intact-when-splitting-by-delimiter%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can split by regular expressions with re.split
:
>>> import re
>>> [x for x in re.split(r'([^",]*(?:"[^"]*"[^",]*)*)', s) if x not in (',','')]
when s
is equal to:
'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
it outputs:
['Hi', ' my name is Humpty-Dumpty', ' from "Alice, Through the Looking Glass"']
Regular expression explained:
(
[^",]* zero or more chars other than " or ,
(?: non-capturing group
"[^"]*" quoted block
[^",]* followed by zero or more chars other than " or ,
)* zero or more times
)
add a comment |
You can split by regular expressions with re.split
:
>>> import re
>>> [x for x in re.split(r'([^",]*(?:"[^"]*"[^",]*)*)', s) if x not in (',','')]
when s
is equal to:
'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
it outputs:
['Hi', ' my name is Humpty-Dumpty', ' from "Alice, Through the Looking Glass"']
Regular expression explained:
(
[^",]* zero or more chars other than " or ,
(?: non-capturing group
"[^"]*" quoted block
[^",]* followed by zero or more chars other than " or ,
)* zero or more times
)
add a comment |
You can split by regular expressions with re.split
:
>>> import re
>>> [x for x in re.split(r'([^",]*(?:"[^"]*"[^",]*)*)', s) if x not in (',','')]
when s
is equal to:
'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
it outputs:
['Hi', ' my name is Humpty-Dumpty', ' from "Alice, Through the Looking Glass"']
Regular expression explained:
(
[^",]* zero or more chars other than " or ,
(?: non-capturing group
"[^"]*" quoted block
[^",]* followed by zero or more chars other than " or ,
)* zero or more times
)
You can split by regular expressions with re.split
:
>>> import re
>>> [x for x in re.split(r'([^",]*(?:"[^"]*"[^",]*)*)', s) if x not in (',','')]
when s
is equal to:
'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
it outputs:
['Hi', ' my name is Humpty-Dumpty', ' from "Alice, Through the Looking Glass"']
Regular expression explained:
(
[^",]* zero or more chars other than " or ,
(?: non-capturing group
"[^"]*" quoted block
[^",]* followed by zero or more chars other than " or ,
)* zero or more times
)
edited Nov 20 '18 at 12:56
answered Nov 20 '18 at 11:36
fferrifferri
11.7k22251
11.7k22251
add a comment |
add a comment |
I solved this problem by avoiding split
entirely:
s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
l =
substr = ""
quotes_open = False
for c in s:
if c == ',' and not quotes_open: # check for comma only if no quotes open
l.append(substr)
substr = ""
elif c == '"':
quotes_open = not quotes_open
else:
substr += c
l.append(substr)
print(l)
Output:
['Hi', ' my name is Humpty-Dumpty', ' from Alice, Through the Looking Glass']
A more generalised function could look something like:
def custom_split(input_str, delimiter=' ', avoid_between_char='"'):
l =
substr = ""
between_avoid_chars = False
for c in s:
if c == delimiter and not between_avoid_chars:
l.append(substr)
substr = ""
elif c == avoid_between_char:
between_avoid_chars = not between_avoid_chars
else:
substr += c
l.append(substr)
return l
add a comment |
I solved this problem by avoiding split
entirely:
s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
l =
substr = ""
quotes_open = False
for c in s:
if c == ',' and not quotes_open: # check for comma only if no quotes open
l.append(substr)
substr = ""
elif c == '"':
quotes_open = not quotes_open
else:
substr += c
l.append(substr)
print(l)
Output:
['Hi', ' my name is Humpty-Dumpty', ' from Alice, Through the Looking Glass']
A more generalised function could look something like:
def custom_split(input_str, delimiter=' ', avoid_between_char='"'):
l =
substr = ""
between_avoid_chars = False
for c in s:
if c == delimiter and not between_avoid_chars:
l.append(substr)
substr = ""
elif c == avoid_between_char:
between_avoid_chars = not between_avoid_chars
else:
substr += c
l.append(substr)
return l
add a comment |
I solved this problem by avoiding split
entirely:
s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
l =
substr = ""
quotes_open = False
for c in s:
if c == ',' and not quotes_open: # check for comma only if no quotes open
l.append(substr)
substr = ""
elif c == '"':
quotes_open = not quotes_open
else:
substr += c
l.append(substr)
print(l)
Output:
['Hi', ' my name is Humpty-Dumpty', ' from Alice, Through the Looking Glass']
A more generalised function could look something like:
def custom_split(input_str, delimiter=' ', avoid_between_char='"'):
l =
substr = ""
between_avoid_chars = False
for c in s:
if c == delimiter and not between_avoid_chars:
l.append(substr)
substr = ""
elif c == avoid_between_char:
between_avoid_chars = not between_avoid_chars
else:
substr += c
l.append(substr)
return l
I solved this problem by avoiding split
entirely:
s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
l =
substr = ""
quotes_open = False
for c in s:
if c == ',' and not quotes_open: # check for comma only if no quotes open
l.append(substr)
substr = ""
elif c == '"':
quotes_open = not quotes_open
else:
substr += c
l.append(substr)
print(l)
Output:
['Hi', ' my name is Humpty-Dumpty', ' from Alice, Through the Looking Glass']
A more generalised function could look something like:
def custom_split(input_str, delimiter=' ', avoid_between_char='"'):
l =
substr = ""
between_avoid_chars = False
for c in s:
if c == delimiter and not between_avoid_chars:
l.append(substr)
substr = ""
elif c == avoid_between_char:
between_avoid_chars = not between_avoid_chars
else:
substr += c
l.append(substr)
return l
edited Nov 20 '18 at 11:39
answered Nov 20 '18 at 11:31


AquarthurAquarthur
279111
279111
add a comment |
add a comment |
this would work for this specific case and can provide a starting point.
import re
s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
cut = re.search('(".*")', s)
r = re.sub('(".*")', '$VAR$', s).split(',')
res =
for i in r:
res.append(re.sub('$VAR$', cut.group(1), i))
Output
print(res)
['Hi', ' my name is Humpty-Dumpty', ' from "Alice, Through the Looking Glass"']
add a comment |
this would work for this specific case and can provide a starting point.
import re
s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
cut = re.search('(".*")', s)
r = re.sub('(".*")', '$VAR$', s).split(',')
res =
for i in r:
res.append(re.sub('$VAR$', cut.group(1), i))
Output
print(res)
['Hi', ' my name is Humpty-Dumpty', ' from "Alice, Through the Looking Glass"']
add a comment |
this would work for this specific case and can provide a starting point.
import re
s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
cut = re.search('(".*")', s)
r = re.sub('(".*")', '$VAR$', s).split(',')
res =
for i in r:
res.append(re.sub('$VAR$', cut.group(1), i))
Output
print(res)
['Hi', ' my name is Humpty-Dumpty', ' from "Alice, Through the Looking Glass"']
this would work for this specific case and can provide a starting point.
import re
s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
cut = re.search('(".*")', s)
r = re.sub('(".*")', '$VAR$', s).split(',')
res =
for i in r:
res.append(re.sub('$VAR$', cut.group(1), i))
Output
print(res)
['Hi', ' my name is Humpty-Dumpty', ' from "Alice, Through the Looking Glass"']
answered Nov 20 '18 at 11:31


RichyRichy
30318
30318
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53391766%2fkeep-quoted-blocks-intact-when-splitting-by-delimiter%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown