remove some character in python from csv file
I have a dataset about Indonesia recipe with 3 columns (first column is recipe name, second column is ingredient, third column is step).
In the second and third column, there is some special character wsuch as '#'
and '/'
, how do I remove them? I followed this but it's showing some errors. Here is the dataset!
This is my code:
import csv
input = open('dataset-ayam-baru.csv', 'rb')
lines = csv.reader(input)
output = open('new_dataset.csv', 'wb')
writer = csv.writer(output)
conversion = '-"/.$'
text = input.read()
newtext = '_'
for c in text:
newtext += '_' if c in conversion else c
writer.writerow(c)
input.close()
output.close()
I am getting the following error:
TypeError Traceback (most recent call last) <ipython-input-28-05d606ed80df> in <module>() 10 newtext = '' 11 for c in text: ---> 12 newtext += '' if c in conversion else c 13 writer.writerow(c) 14
TypeError: 'in <string>' requires string as left operand, not int
python csv
add a comment |
I have a dataset about Indonesia recipe with 3 columns (first column is recipe name, second column is ingredient, third column is step).
In the second and third column, there is some special character wsuch as '#'
and '/'
, how do I remove them? I followed this but it's showing some errors. Here is the dataset!
This is my code:
import csv
input = open('dataset-ayam-baru.csv', 'rb')
lines = csv.reader(input)
output = open('new_dataset.csv', 'wb')
writer = csv.writer(output)
conversion = '-"/.$'
text = input.read()
newtext = '_'
for c in text:
newtext += '_' if c in conversion else c
writer.writerow(c)
input.close()
output.close()
I am getting the following error:
TypeError Traceback (most recent call last) <ipython-input-28-05d606ed80df> in <module>() 10 newtext = '' 11 for c in text: ---> 12 newtext += '' if c in conversion else c 13 writer.writerow(c) 14
TypeError: 'in <string>' requires string as left operand, not int
python csv
Can you post the error log over here.
– Sharvin Shah
Nov 20 '18 at 11:32
1
here is the error: TypeError Traceback (most recent call last) <ipython-input-28-05d606ed80df> in <module>() 10 newtext = '' 11 for c in text: ---> 12 newtext += '' if c in conversion else c 13 writer.writerow(c) 14 TypeError: 'in <string>' requires string as left operand, not int
– Ab Rohi
Nov 20 '18 at 11:33
Please add the code you used
– Jaba
Nov 20 '18 at 11:36
1
i've been aded the code i used
– Ab Rohi
Nov 20 '18 at 11:39
1
then is there any solution?
– Ab Rohi
Nov 20 '18 at 11:52
add a comment |
I have a dataset about Indonesia recipe with 3 columns (first column is recipe name, second column is ingredient, third column is step).
In the second and third column, there is some special character wsuch as '#'
and '/'
, how do I remove them? I followed this but it's showing some errors. Here is the dataset!
This is my code:
import csv
input = open('dataset-ayam-baru.csv', 'rb')
lines = csv.reader(input)
output = open('new_dataset.csv', 'wb')
writer = csv.writer(output)
conversion = '-"/.$'
text = input.read()
newtext = '_'
for c in text:
newtext += '_' if c in conversion else c
writer.writerow(c)
input.close()
output.close()
I am getting the following error:
TypeError Traceback (most recent call last) <ipython-input-28-05d606ed80df> in <module>() 10 newtext = '' 11 for c in text: ---> 12 newtext += '' if c in conversion else c 13 writer.writerow(c) 14
TypeError: 'in <string>' requires string as left operand, not int
python csv
I have a dataset about Indonesia recipe with 3 columns (first column is recipe name, second column is ingredient, third column is step).
In the second and third column, there is some special character wsuch as '#'
and '/'
, how do I remove them? I followed this but it's showing some errors. Here is the dataset!
This is my code:
import csv
input = open('dataset-ayam-baru.csv', 'rb')
lines = csv.reader(input)
output = open('new_dataset.csv', 'wb')
writer = csv.writer(output)
conversion = '-"/.$'
text = input.read()
newtext = '_'
for c in text:
newtext += '_' if c in conversion else c
writer.writerow(c)
input.close()
output.close()
I am getting the following error:
TypeError Traceback (most recent call last) <ipython-input-28-05d606ed80df> in <module>() 10 newtext = '' 11 for c in text: ---> 12 newtext += '' if c in conversion else c 13 writer.writerow(c) 14
TypeError: 'in <string>' requires string as left operand, not int
python csv
python csv
edited Nov 20 '18 at 11:40
Jaba
7,051175394
7,051175394
asked Nov 20 '18 at 11:30
Ab RohiAb Rohi
214
214
Can you post the error log over here.
– Sharvin Shah
Nov 20 '18 at 11:32
1
here is the error: TypeError Traceback (most recent call last) <ipython-input-28-05d606ed80df> in <module>() 10 newtext = '' 11 for c in text: ---> 12 newtext += '' if c in conversion else c 13 writer.writerow(c) 14 TypeError: 'in <string>' requires string as left operand, not int
– Ab Rohi
Nov 20 '18 at 11:33
Please add the code you used
– Jaba
Nov 20 '18 at 11:36
1
i've been aded the code i used
– Ab Rohi
Nov 20 '18 at 11:39
1
then is there any solution?
– Ab Rohi
Nov 20 '18 at 11:52
add a comment |
Can you post the error log over here.
– Sharvin Shah
Nov 20 '18 at 11:32
1
here is the error: TypeError Traceback (most recent call last) <ipython-input-28-05d606ed80df> in <module>() 10 newtext = '' 11 for c in text: ---> 12 newtext += '' if c in conversion else c 13 writer.writerow(c) 14 TypeError: 'in <string>' requires string as left operand, not int
– Ab Rohi
Nov 20 '18 at 11:33
Please add the code you used
– Jaba
Nov 20 '18 at 11:36
1
i've been aded the code i used
– Ab Rohi
Nov 20 '18 at 11:39
1
then is there any solution?
– Ab Rohi
Nov 20 '18 at 11:52
Can you post the error log over here.
– Sharvin Shah
Nov 20 '18 at 11:32
Can you post the error log over here.
– Sharvin Shah
Nov 20 '18 at 11:32
1
1
here is the error: TypeError Traceback (most recent call last) <ipython-input-28-05d606ed80df> in <module>() 10 newtext = '' 11 for c in text: ---> 12 newtext += '' if c in conversion else c 13 writer.writerow(c) 14 TypeError: 'in <string>' requires string as left operand, not int
– Ab Rohi
Nov 20 '18 at 11:33
here is the error: TypeError Traceback (most recent call last) <ipython-input-28-05d606ed80df> in <module>() 10 newtext = '' 11 for c in text: ---> 12 newtext += '' if c in conversion else c 13 writer.writerow(c) 14 TypeError: 'in <string>' requires string as left operand, not int
– Ab Rohi
Nov 20 '18 at 11:33
Please add the code you used
– Jaba
Nov 20 '18 at 11:36
Please add the code you used
– Jaba
Nov 20 '18 at 11:36
1
1
i've been aded the code i used
– Ab Rohi
Nov 20 '18 at 11:39
i've been aded the code i used
– Ab Rohi
Nov 20 '18 at 11:39
1
1
then is there any solution?
– Ab Rohi
Nov 20 '18 at 11:52
then is there any solution?
– Ab Rohi
Nov 20 '18 at 11:52
add a comment |
2 Answers
2
active
oldest
votes
The error is due to the fact that you are loading the file as bytes. You need to put the "rt" instead of "wb" when you open the file.
From the stackoverflow question you cited, a working answer for me is:
import csv
with open("dataset-ayam-baru.csv", "rt", encoding="utf-8") as infile, open("new_dataset.csv", "w") as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
conversion = set('_"/.$')
for row in reader:
newrow = [''.join('_' if c in conversion else c for c in entry) for entry in row]
writer.writerow(newrow)
Important: the encoding of the input file! I had to convert it to ANSI to make it work, because you need to know in advance the encoding of the dataset (i.e., utf-8).
The followup question (the one regarding the bytes and the encoding) is there: csv.Error: iterator should return strings, not bytes
1
Hi, thank you for solution, but I got new error : charmap' codec can't decode byte 0x9d in position 3291: character maps to <undefined>, what is it possible about?
– Ab Rohi
Nov 20 '18 at 12:19
Hello, it depends on the encoding of your original csv. It is mandatory that you know the encoding of the file beforehand, it cannot be inferred. I edited the answer adding the parameter "encoding" to the open() function. If you do not know the encoding of the file, you have to convert it to a known encoding.
– FMarazzi
Nov 20 '18 at 13:41
add a comment |
Here i found somewhere to remove the special character, in case someone may need it.
def give_emoji_free_text(text):
allchars = [str for str in text]
emoji_list = [c for c in allchars if c in emoji.UNICODE_EMOJI]
clean_text = ' '.join([str for str in text.split() if not any(i in str for i in emoji_list)])
return clean_text
for i in range(len(data['Title'])):
data['Ingredients'][i] = give_emoji_free_text(data['Ingredients'].get_value(i))
data['Title'][i] = give_emoji_free_text(data['Title'].get_value(i))
data['Steps'][i] = give_emoji_free_text(data['Steps'].get_value(i))
Thank you.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53392073%2fremove-some-character-in-python-from-csv-file%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
The error is due to the fact that you are loading the file as bytes. You need to put the "rt" instead of "wb" when you open the file.
From the stackoverflow question you cited, a working answer for me is:
import csv
with open("dataset-ayam-baru.csv", "rt", encoding="utf-8") as infile, open("new_dataset.csv", "w") as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
conversion = set('_"/.$')
for row in reader:
newrow = [''.join('_' if c in conversion else c for c in entry) for entry in row]
writer.writerow(newrow)
Important: the encoding of the input file! I had to convert it to ANSI to make it work, because you need to know in advance the encoding of the dataset (i.e., utf-8).
The followup question (the one regarding the bytes and the encoding) is there: csv.Error: iterator should return strings, not bytes
1
Hi, thank you for solution, but I got new error : charmap' codec can't decode byte 0x9d in position 3291: character maps to <undefined>, what is it possible about?
– Ab Rohi
Nov 20 '18 at 12:19
Hello, it depends on the encoding of your original csv. It is mandatory that you know the encoding of the file beforehand, it cannot be inferred. I edited the answer adding the parameter "encoding" to the open() function. If you do not know the encoding of the file, you have to convert it to a known encoding.
– FMarazzi
Nov 20 '18 at 13:41
add a comment |
The error is due to the fact that you are loading the file as bytes. You need to put the "rt" instead of "wb" when you open the file.
From the stackoverflow question you cited, a working answer for me is:
import csv
with open("dataset-ayam-baru.csv", "rt", encoding="utf-8") as infile, open("new_dataset.csv", "w") as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
conversion = set('_"/.$')
for row in reader:
newrow = [''.join('_' if c in conversion else c for c in entry) for entry in row]
writer.writerow(newrow)
Important: the encoding of the input file! I had to convert it to ANSI to make it work, because you need to know in advance the encoding of the dataset (i.e., utf-8).
The followup question (the one regarding the bytes and the encoding) is there: csv.Error: iterator should return strings, not bytes
1
Hi, thank you for solution, but I got new error : charmap' codec can't decode byte 0x9d in position 3291: character maps to <undefined>, what is it possible about?
– Ab Rohi
Nov 20 '18 at 12:19
Hello, it depends on the encoding of your original csv. It is mandatory that you know the encoding of the file beforehand, it cannot be inferred. I edited the answer adding the parameter "encoding" to the open() function. If you do not know the encoding of the file, you have to convert it to a known encoding.
– FMarazzi
Nov 20 '18 at 13:41
add a comment |
The error is due to the fact that you are loading the file as bytes. You need to put the "rt" instead of "wb" when you open the file.
From the stackoverflow question you cited, a working answer for me is:
import csv
with open("dataset-ayam-baru.csv", "rt", encoding="utf-8") as infile, open("new_dataset.csv", "w") as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
conversion = set('_"/.$')
for row in reader:
newrow = [''.join('_' if c in conversion else c for c in entry) for entry in row]
writer.writerow(newrow)
Important: the encoding of the input file! I had to convert it to ANSI to make it work, because you need to know in advance the encoding of the dataset (i.e., utf-8).
The followup question (the one regarding the bytes and the encoding) is there: csv.Error: iterator should return strings, not bytes
The error is due to the fact that you are loading the file as bytes. You need to put the "rt" instead of "wb" when you open the file.
From the stackoverflow question you cited, a working answer for me is:
import csv
with open("dataset-ayam-baru.csv", "rt", encoding="utf-8") as infile, open("new_dataset.csv", "w") as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
conversion = set('_"/.$')
for row in reader:
newrow = [''.join('_' if c in conversion else c for c in entry) for entry in row]
writer.writerow(newrow)
Important: the encoding of the input file! I had to convert it to ANSI to make it work, because you need to know in advance the encoding of the dataset (i.e., utf-8).
The followup question (the one regarding the bytes and the encoding) is there: csv.Error: iterator should return strings, not bytes
edited Nov 20 '18 at 13:41
answered Nov 20 '18 at 11:57


FMarazziFMarazzi
323213
323213
1
Hi, thank you for solution, but I got new error : charmap' codec can't decode byte 0x9d in position 3291: character maps to <undefined>, what is it possible about?
– Ab Rohi
Nov 20 '18 at 12:19
Hello, it depends on the encoding of your original csv. It is mandatory that you know the encoding of the file beforehand, it cannot be inferred. I edited the answer adding the parameter "encoding" to the open() function. If you do not know the encoding of the file, you have to convert it to a known encoding.
– FMarazzi
Nov 20 '18 at 13:41
add a comment |
1
Hi, thank you for solution, but I got new error : charmap' codec can't decode byte 0x9d in position 3291: character maps to <undefined>, what is it possible about?
– Ab Rohi
Nov 20 '18 at 12:19
Hello, it depends on the encoding of your original csv. It is mandatory that you know the encoding of the file beforehand, it cannot be inferred. I edited the answer adding the parameter "encoding" to the open() function. If you do not know the encoding of the file, you have to convert it to a known encoding.
– FMarazzi
Nov 20 '18 at 13:41
1
1
Hi, thank you for solution, but I got new error : charmap' codec can't decode byte 0x9d in position 3291: character maps to <undefined>, what is it possible about?
– Ab Rohi
Nov 20 '18 at 12:19
Hi, thank you for solution, but I got new error : charmap' codec can't decode byte 0x9d in position 3291: character maps to <undefined>, what is it possible about?
– Ab Rohi
Nov 20 '18 at 12:19
Hello, it depends on the encoding of your original csv. It is mandatory that you know the encoding of the file beforehand, it cannot be inferred. I edited the answer adding the parameter "encoding" to the open() function. If you do not know the encoding of the file, you have to convert it to a known encoding.
– FMarazzi
Nov 20 '18 at 13:41
Hello, it depends on the encoding of your original csv. It is mandatory that you know the encoding of the file beforehand, it cannot be inferred. I edited the answer adding the parameter "encoding" to the open() function. If you do not know the encoding of the file, you have to convert it to a known encoding.
– FMarazzi
Nov 20 '18 at 13:41
add a comment |
Here i found somewhere to remove the special character, in case someone may need it.
def give_emoji_free_text(text):
allchars = [str for str in text]
emoji_list = [c for c in allchars if c in emoji.UNICODE_EMOJI]
clean_text = ' '.join([str for str in text.split() if not any(i in str for i in emoji_list)])
return clean_text
for i in range(len(data['Title'])):
data['Ingredients'][i] = give_emoji_free_text(data['Ingredients'].get_value(i))
data['Title'][i] = give_emoji_free_text(data['Title'].get_value(i))
data['Steps'][i] = give_emoji_free_text(data['Steps'].get_value(i))
Thank you.
add a comment |
Here i found somewhere to remove the special character, in case someone may need it.
def give_emoji_free_text(text):
allchars = [str for str in text]
emoji_list = [c for c in allchars if c in emoji.UNICODE_EMOJI]
clean_text = ' '.join([str for str in text.split() if not any(i in str for i in emoji_list)])
return clean_text
for i in range(len(data['Title'])):
data['Ingredients'][i] = give_emoji_free_text(data['Ingredients'].get_value(i))
data['Title'][i] = give_emoji_free_text(data['Title'].get_value(i))
data['Steps'][i] = give_emoji_free_text(data['Steps'].get_value(i))
Thank you.
add a comment |
Here i found somewhere to remove the special character, in case someone may need it.
def give_emoji_free_text(text):
allchars = [str for str in text]
emoji_list = [c for c in allchars if c in emoji.UNICODE_EMOJI]
clean_text = ' '.join([str for str in text.split() if not any(i in str for i in emoji_list)])
return clean_text
for i in range(len(data['Title'])):
data['Ingredients'][i] = give_emoji_free_text(data['Ingredients'].get_value(i))
data['Title'][i] = give_emoji_free_text(data['Title'].get_value(i))
data['Steps'][i] = give_emoji_free_text(data['Steps'].get_value(i))
Thank you.
Here i found somewhere to remove the special character, in case someone may need it.
def give_emoji_free_text(text):
allchars = [str for str in text]
emoji_list = [c for c in allchars if c in emoji.UNICODE_EMOJI]
clean_text = ' '.join([str for str in text.split() if not any(i in str for i in emoji_list)])
return clean_text
for i in range(len(data['Title'])):
data['Ingredients'][i] = give_emoji_free_text(data['Ingredients'].get_value(i))
data['Title'][i] = give_emoji_free_text(data['Title'].get_value(i))
data['Steps'][i] = give_emoji_free_text(data['Steps'].get_value(i))
Thank you.
answered Nov 22 '18 at 8:18
Ab RohiAb Rohi
214
214
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53392073%2fremove-some-character-in-python-from-csv-file%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Can you post the error log over here.
– Sharvin Shah
Nov 20 '18 at 11:32
1
here is the error: TypeError Traceback (most recent call last) <ipython-input-28-05d606ed80df> in <module>() 10 newtext = '' 11 for c in text: ---> 12 newtext += '' if c in conversion else c 13 writer.writerow(c) 14 TypeError: 'in <string>' requires string as left operand, not int
– Ab Rohi
Nov 20 '18 at 11:33
Please add the code you used
– Jaba
Nov 20 '18 at 11:36
1
i've been aded the code i used
– Ab Rohi
Nov 20 '18 at 11:39
1
then is there any solution?
– Ab Rohi
Nov 20 '18 at 11:52