Removing elements from Pandas Series of lists
I've been searching for solutions and hints from the site, but couldn't find issue directly related with my case.
I have scraped text data from various sites and have split the text by using str.split('n')
. The text contains a lot of 'n' and splitting this way made it pretty desirable. (Please let me know if this approach is too bad.)
df['scrape']
0 nWebsite:nnnnVisitnn nWhite paper:nn...
1 nWebsite:nnnnVisitnn nWhite paper:nn...
2 nWebsite:nnnnVisitnn nWhite paper:nn...
3 nWebsite:nnnnVisitnn nWhite paper:nn...
4 nWebsite:nnnnVisitnn nWhite paper:nn...
5 nWebsite:nnnnVisitnn nWhite paper:nn...
The result was a Pandas Series of lists – all elements are list of strings.
df['split'] = df['scrape'].str.split('n')
0 [, Website:, , , , Visit, , , White paper:, ,...
1 [, Website:, , , , Visit, , , White paper:, ,...
2 [, Website:, , , , Visit, , , White paper:, ,...
3 [, Website:, , , , Visit, , , White paper:, ,...
4 [, Website:, , , , Visit, , , White paper:, ,...
5 [, Website:, , , , Visit, , , White paper:, ,...
6 [, Website:, , , , Visit, , , White paper:, ,...
I want to get rid of None element (‘’ and ‘ ‘) on each list.
I tried looping:
for i in series:
While ‘’ in i:
i.remove(‘’)
Above code works with some arbitrary example I made, but with my real data it produces an error.
for i in df['split']:
... while '' in i:
... i.remove('')
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
TypeError: argument of type 'float' is not iterable
I'm not sure why I am getting an error with my data. Could get get some advice on this? Thanks!
python string pandas list series
add a comment |
I've been searching for solutions and hints from the site, but couldn't find issue directly related with my case.
I have scraped text data from various sites and have split the text by using str.split('n')
. The text contains a lot of 'n' and splitting this way made it pretty desirable. (Please let me know if this approach is too bad.)
df['scrape']
0 nWebsite:nnnnVisitnn nWhite paper:nn...
1 nWebsite:nnnnVisitnn nWhite paper:nn...
2 nWebsite:nnnnVisitnn nWhite paper:nn...
3 nWebsite:nnnnVisitnn nWhite paper:nn...
4 nWebsite:nnnnVisitnn nWhite paper:nn...
5 nWebsite:nnnnVisitnn nWhite paper:nn...
The result was a Pandas Series of lists – all elements are list of strings.
df['split'] = df['scrape'].str.split('n')
0 [, Website:, , , , Visit, , , White paper:, ,...
1 [, Website:, , , , Visit, , , White paper:, ,...
2 [, Website:, , , , Visit, , , White paper:, ,...
3 [, Website:, , , , Visit, , , White paper:, ,...
4 [, Website:, , , , Visit, , , White paper:, ,...
5 [, Website:, , , , Visit, , , White paper:, ,...
6 [, Website:, , , , Visit, , , White paper:, ,...
I want to get rid of None element (‘’ and ‘ ‘) on each list.
I tried looping:
for i in series:
While ‘’ in i:
i.remove(‘’)
Above code works with some arbitrary example I made, but with my real data it produces an error.
for i in df['split']:
... while '' in i:
... i.remove('')
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
TypeError: argument of type 'float' is not iterable
I'm not sure why I am getting an error with my data. Could get get some advice on this? Thanks!
python string pandas list series
1
Don't store lists in a Series
– user3483203
Nov 19 '18 at 19:32
What's the suggestion for this case then, if I don't store lists in a Series?
– Matthew Son
Nov 19 '18 at 19:50
Solution Thanks to Toby's idea : def remover(list): return [s for s in list if s !='' and s != ' '] df['new'] = df['split'].apply(remover) With this method you don't need to drop NaN values.
– Matthew Son
Nov 19 '18 at 22:00
add a comment |
I've been searching for solutions and hints from the site, but couldn't find issue directly related with my case.
I have scraped text data from various sites and have split the text by using str.split('n')
. The text contains a lot of 'n' and splitting this way made it pretty desirable. (Please let me know if this approach is too bad.)
df['scrape']
0 nWebsite:nnnnVisitnn nWhite paper:nn...
1 nWebsite:nnnnVisitnn nWhite paper:nn...
2 nWebsite:nnnnVisitnn nWhite paper:nn...
3 nWebsite:nnnnVisitnn nWhite paper:nn...
4 nWebsite:nnnnVisitnn nWhite paper:nn...
5 nWebsite:nnnnVisitnn nWhite paper:nn...
The result was a Pandas Series of lists – all elements are list of strings.
df['split'] = df['scrape'].str.split('n')
0 [, Website:, , , , Visit, , , White paper:, ,...
1 [, Website:, , , , Visit, , , White paper:, ,...
2 [, Website:, , , , Visit, , , White paper:, ,...
3 [, Website:, , , , Visit, , , White paper:, ,...
4 [, Website:, , , , Visit, , , White paper:, ,...
5 [, Website:, , , , Visit, , , White paper:, ,...
6 [, Website:, , , , Visit, , , White paper:, ,...
I want to get rid of None element (‘’ and ‘ ‘) on each list.
I tried looping:
for i in series:
While ‘’ in i:
i.remove(‘’)
Above code works with some arbitrary example I made, but with my real data it produces an error.
for i in df['split']:
... while '' in i:
... i.remove('')
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
TypeError: argument of type 'float' is not iterable
I'm not sure why I am getting an error with my data. Could get get some advice on this? Thanks!
python string pandas list series
I've been searching for solutions and hints from the site, but couldn't find issue directly related with my case.
I have scraped text data from various sites and have split the text by using str.split('n')
. The text contains a lot of 'n' and splitting this way made it pretty desirable. (Please let me know if this approach is too bad.)
df['scrape']
0 nWebsite:nnnnVisitnn nWhite paper:nn...
1 nWebsite:nnnnVisitnn nWhite paper:nn...
2 nWebsite:nnnnVisitnn nWhite paper:nn...
3 nWebsite:nnnnVisitnn nWhite paper:nn...
4 nWebsite:nnnnVisitnn nWhite paper:nn...
5 nWebsite:nnnnVisitnn nWhite paper:nn...
The result was a Pandas Series of lists – all elements are list of strings.
df['split'] = df['scrape'].str.split('n')
0 [, Website:, , , , Visit, , , White paper:, ,...
1 [, Website:, , , , Visit, , , White paper:, ,...
2 [, Website:, , , , Visit, , , White paper:, ,...
3 [, Website:, , , , Visit, , , White paper:, ,...
4 [, Website:, , , , Visit, , , White paper:, ,...
5 [, Website:, , , , Visit, , , White paper:, ,...
6 [, Website:, , , , Visit, , , White paper:, ,...
I want to get rid of None element (‘’ and ‘ ‘) on each list.
I tried looping:
for i in series:
While ‘’ in i:
i.remove(‘’)
Above code works with some arbitrary example I made, but with my real data it produces an error.
for i in df['split']:
... while '' in i:
... i.remove('')
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
TypeError: argument of type 'float' is not iterable
I'm not sure why I am getting an error with my data. Could get get some advice on this? Thanks!
python string pandas list series
python string pandas list series
edited Nov 19 '18 at 19:46
Matthew Son
asked Nov 19 '18 at 19:28


Matthew SonMatthew Son
33
33
1
Don't store lists in a Series
– user3483203
Nov 19 '18 at 19:32
What's the suggestion for this case then, if I don't store lists in a Series?
– Matthew Son
Nov 19 '18 at 19:50
Solution Thanks to Toby's idea : def remover(list): return [s for s in list if s !='' and s != ' '] df['new'] = df['split'].apply(remover) With this method you don't need to drop NaN values.
– Matthew Son
Nov 19 '18 at 22:00
add a comment |
1
Don't store lists in a Series
– user3483203
Nov 19 '18 at 19:32
What's the suggestion for this case then, if I don't store lists in a Series?
– Matthew Son
Nov 19 '18 at 19:50
Solution Thanks to Toby's idea : def remover(list): return [s for s in list if s !='' and s != ' '] df['new'] = df['split'].apply(remover) With this method you don't need to drop NaN values.
– Matthew Son
Nov 19 '18 at 22:00
1
1
Don't store lists in a Series
– user3483203
Nov 19 '18 at 19:32
Don't store lists in a Series
– user3483203
Nov 19 '18 at 19:32
What's the suggestion for this case then, if I don't store lists in a Series?
– Matthew Son
Nov 19 '18 at 19:50
What's the suggestion for this case then, if I don't store lists in a Series?
– Matthew Son
Nov 19 '18 at 19:50
Solution Thanks to Toby's idea : def remover(list): return [s for s in list if s !='' and s != ' '] df['new'] = df['split'].apply(remover) With this method you don't need to drop NaN values.
– Matthew Son
Nov 19 '18 at 22:00
Solution Thanks to Toby's idea : def remover(list): return [s for s in list if s !='' and s != ' '] df['new'] = df['split'].apply(remover) With this method you don't need to drop NaN values.
– Matthew Son
Nov 19 '18 at 22:00
add a comment |
1 Answer
1
active
oldest
votes
You could use list comprehension:
new_series = [s for s in series if s!='' and s!=' ' and s!=None]
To apply the list comprehension to each element in a Pandas Series of lists (essentially a list of lists), you need to nest the list comprehension like this:
new_series = [[s for s in element if s!='' and s!=' ' and s!=None] for element in series]
Doesn't work. I tried series = [s for s in df['split'] if s!='' and s!=' '] and still it contains '' and ' ' values.
– Matthew Son
Nov 19 '18 at 19:51
Do you need to add None criteria also? See my updated example
– Toby Petty
Nov 19 '18 at 20:15
Still it doesn't work... tried converting it into list of lists too. Your suggestion yields one big list collapsed into, but I do have to keep those separate.
– Matthew Son
Nov 19 '18 at 20:40
Ah ok I think I understand, you want to apply the list comprehension to each list in the series (essentially a list of lists). If I understand correctly this should work:[[s for s in x if s!='' and s!=' ' and s!=None] for x in series]
– Toby Petty
Nov 19 '18 at 20:52
Thanks for keep updating. This answer looks like what I want, but honestly don't know why it makes error still.. [[s for s in x if s!='' and s!=' ' and s!=None] for x in df['split']] >>> [[s for s in x if s!='' and s!=' ' and s!=None] for x in df['split']] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in <listcomp> TypeError: 'float' object is not iterable
– Matthew Son
Nov 19 '18 at 21:18
|
show 4 more comments
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53381366%2fremoving-elements-from-pandas-series-of-lists%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You could use list comprehension:
new_series = [s for s in series if s!='' and s!=' ' and s!=None]
To apply the list comprehension to each element in a Pandas Series of lists (essentially a list of lists), you need to nest the list comprehension like this:
new_series = [[s for s in element if s!='' and s!=' ' and s!=None] for element in series]
Doesn't work. I tried series = [s for s in df['split'] if s!='' and s!=' '] and still it contains '' and ' ' values.
– Matthew Son
Nov 19 '18 at 19:51
Do you need to add None criteria also? See my updated example
– Toby Petty
Nov 19 '18 at 20:15
Still it doesn't work... tried converting it into list of lists too. Your suggestion yields one big list collapsed into, but I do have to keep those separate.
– Matthew Son
Nov 19 '18 at 20:40
Ah ok I think I understand, you want to apply the list comprehension to each list in the series (essentially a list of lists). If I understand correctly this should work:[[s for s in x if s!='' and s!=' ' and s!=None] for x in series]
– Toby Petty
Nov 19 '18 at 20:52
Thanks for keep updating. This answer looks like what I want, but honestly don't know why it makes error still.. [[s for s in x if s!='' and s!=' ' and s!=None] for x in df['split']] >>> [[s for s in x if s!='' and s!=' ' and s!=None] for x in df['split']] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in <listcomp> TypeError: 'float' object is not iterable
– Matthew Son
Nov 19 '18 at 21:18
|
show 4 more comments
You could use list comprehension:
new_series = [s for s in series if s!='' and s!=' ' and s!=None]
To apply the list comprehension to each element in a Pandas Series of lists (essentially a list of lists), you need to nest the list comprehension like this:
new_series = [[s for s in element if s!='' and s!=' ' and s!=None] for element in series]
Doesn't work. I tried series = [s for s in df['split'] if s!='' and s!=' '] and still it contains '' and ' ' values.
– Matthew Son
Nov 19 '18 at 19:51
Do you need to add None criteria also? See my updated example
– Toby Petty
Nov 19 '18 at 20:15
Still it doesn't work... tried converting it into list of lists too. Your suggestion yields one big list collapsed into, but I do have to keep those separate.
– Matthew Son
Nov 19 '18 at 20:40
Ah ok I think I understand, you want to apply the list comprehension to each list in the series (essentially a list of lists). If I understand correctly this should work:[[s for s in x if s!='' and s!=' ' and s!=None] for x in series]
– Toby Petty
Nov 19 '18 at 20:52
Thanks for keep updating. This answer looks like what I want, but honestly don't know why it makes error still.. [[s for s in x if s!='' and s!=' ' and s!=None] for x in df['split']] >>> [[s for s in x if s!='' and s!=' ' and s!=None] for x in df['split']] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in <listcomp> TypeError: 'float' object is not iterable
– Matthew Son
Nov 19 '18 at 21:18
|
show 4 more comments
You could use list comprehension:
new_series = [s for s in series if s!='' and s!=' ' and s!=None]
To apply the list comprehension to each element in a Pandas Series of lists (essentially a list of lists), you need to nest the list comprehension like this:
new_series = [[s for s in element if s!='' and s!=' ' and s!=None] for element in series]
You could use list comprehension:
new_series = [s for s in series if s!='' and s!=' ' and s!=None]
To apply the list comprehension to each element in a Pandas Series of lists (essentially a list of lists), you need to nest the list comprehension like this:
new_series = [[s for s in element if s!='' and s!=' ' and s!=None] for element in series]
edited Nov 19 '18 at 20:59
answered Nov 19 '18 at 19:33
Toby PettyToby Petty
661412
661412
Doesn't work. I tried series = [s for s in df['split'] if s!='' and s!=' '] and still it contains '' and ' ' values.
– Matthew Son
Nov 19 '18 at 19:51
Do you need to add None criteria also? See my updated example
– Toby Petty
Nov 19 '18 at 20:15
Still it doesn't work... tried converting it into list of lists too. Your suggestion yields one big list collapsed into, but I do have to keep those separate.
– Matthew Son
Nov 19 '18 at 20:40
Ah ok I think I understand, you want to apply the list comprehension to each list in the series (essentially a list of lists). If I understand correctly this should work:[[s for s in x if s!='' and s!=' ' and s!=None] for x in series]
– Toby Petty
Nov 19 '18 at 20:52
Thanks for keep updating. This answer looks like what I want, but honestly don't know why it makes error still.. [[s for s in x if s!='' and s!=' ' and s!=None] for x in df['split']] >>> [[s for s in x if s!='' and s!=' ' and s!=None] for x in df['split']] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in <listcomp> TypeError: 'float' object is not iterable
– Matthew Son
Nov 19 '18 at 21:18
|
show 4 more comments
Doesn't work. I tried series = [s for s in df['split'] if s!='' and s!=' '] and still it contains '' and ' ' values.
– Matthew Son
Nov 19 '18 at 19:51
Do you need to add None criteria also? See my updated example
– Toby Petty
Nov 19 '18 at 20:15
Still it doesn't work... tried converting it into list of lists too. Your suggestion yields one big list collapsed into, but I do have to keep those separate.
– Matthew Son
Nov 19 '18 at 20:40
Ah ok I think I understand, you want to apply the list comprehension to each list in the series (essentially a list of lists). If I understand correctly this should work:[[s for s in x if s!='' and s!=' ' and s!=None] for x in series]
– Toby Petty
Nov 19 '18 at 20:52
Thanks for keep updating. This answer looks like what I want, but honestly don't know why it makes error still.. [[s for s in x if s!='' and s!=' ' and s!=None] for x in df['split']] >>> [[s for s in x if s!='' and s!=' ' and s!=None] for x in df['split']] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in <listcomp> TypeError: 'float' object is not iterable
– Matthew Son
Nov 19 '18 at 21:18
Doesn't work. I tried series = [s for s in df['split'] if s!='' and s!=' '] and still it contains '' and ' ' values.
– Matthew Son
Nov 19 '18 at 19:51
Doesn't work. I tried series = [s for s in df['split'] if s!='' and s!=' '] and still it contains '' and ' ' values.
– Matthew Son
Nov 19 '18 at 19:51
Do you need to add None criteria also? See my updated example
– Toby Petty
Nov 19 '18 at 20:15
Do you need to add None criteria also? See my updated example
– Toby Petty
Nov 19 '18 at 20:15
Still it doesn't work... tried converting it into list of lists too. Your suggestion yields one big list collapsed into, but I do have to keep those separate.
– Matthew Son
Nov 19 '18 at 20:40
Still it doesn't work... tried converting it into list of lists too. Your suggestion yields one big list collapsed into, but I do have to keep those separate.
– Matthew Son
Nov 19 '18 at 20:40
Ah ok I think I understand, you want to apply the list comprehension to each list in the series (essentially a list of lists). If I understand correctly this should work:
[[s for s in x if s!='' and s!=' ' and s!=None] for x in series]
– Toby Petty
Nov 19 '18 at 20:52
Ah ok I think I understand, you want to apply the list comprehension to each list in the series (essentially a list of lists). If I understand correctly this should work:
[[s for s in x if s!='' and s!=' ' and s!=None] for x in series]
– Toby Petty
Nov 19 '18 at 20:52
Thanks for keep updating. This answer looks like what I want, but honestly don't know why it makes error still.. [[s for s in x if s!='' and s!=' ' and s!=None] for x in df['split']] >>> [[s for s in x if s!='' and s!=' ' and s!=None] for x in df['split']] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in <listcomp> TypeError: 'float' object is not iterable
– Matthew Son
Nov 19 '18 at 21:18
Thanks for keep updating. This answer looks like what I want, but honestly don't know why it makes error still.. [[s for s in x if s!='' and s!=' ' and s!=None] for x in df['split']] >>> [[s for s in x if s!='' and s!=' ' and s!=None] for x in df['split']] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in <listcomp> TypeError: 'float' object is not iterable
– Matthew Son
Nov 19 '18 at 21:18
|
show 4 more comments
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53381366%2fremoving-elements-from-pandas-series-of-lists%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Don't store lists in a Series
– user3483203
Nov 19 '18 at 19:32
What's the suggestion for this case then, if I don't store lists in a Series?
– Matthew Son
Nov 19 '18 at 19:50
Solution Thanks to Toby's idea : def remover(list): return [s for s in list if s !='' and s != ' '] df['new'] = df['split'].apply(remover) With this method you don't need to drop NaN values.
– Matthew Son
Nov 19 '18 at 22:00