remove some character in python from csv file

-1

I have a dataset about Indonesia recipe with 3 columns (first column is recipe name, second column is ingredient, third column is step).

In the second and third column, there is some special character wsuch as '#' and '/', how do I remove them? I followed this but it's showing some errors. Here is the dataset!

This is my code:

import csv



input = open('dataset-ayam-baru.csv', 'rb')

lines = csv.reader(input)

output = open('new_dataset.csv', 'wb')

writer = csv.writer(output)



conversion = '-"/.$'

text =  input.read()

newtext = '_'

for c in text:

    newtext += '_' if c in conversion else c

    writer.writerow(c)



    input.close()

    output.close()

I am getting the following error:

TypeError Traceback (most recent call last) <ipython-input-28-05d606ed80df> in <module>() 10 newtext = '' 11 for c in text: ---> 12 newtext += '' if c in conversion else c 13 writer.writerow(c) 14

TypeError: 'in <string>' requires string as left operand, not int

edited Nov 20 '18 at 11:40

Jaba

7,051175394

asked Nov 20 '18 at 11:30

Ab Rohi

214

Can you post the error log over here.

– Sharvin Shah
Nov 20 '18 at 11:32

1

here is the error: TypeError Traceback (most recent call last) <ipython-input-28-05d606ed80df> in <module>() 10 newtext = '' 11 for c in text: ---> 12 newtext += '' if c in conversion else c 13 writer.writerow(c) 14 TypeError: 'in <string>' requires string as left operand, not int

– Ab Rohi
Nov 20 '18 at 11:33

Please add the code you used

– Jaba
Nov 20 '18 at 11:36

1

i've been aded the code i used

– Ab Rohi
Nov 20 '18 at 11:39

1

then is there any solution?

– Ab Rohi
Nov 20 '18 at 11:52

add a comment |

-1

I have a dataset about Indonesia recipe with 3 columns (first column is recipe name, second column is ingredient, third column is step).

In the second and third column, there is some special character wsuch as '#' and '/', how do I remove them? I followed this but it's showing some errors. Here is the dataset!

This is my code:

import csv



input = open('dataset-ayam-baru.csv', 'rb')

lines = csv.reader(input)

output = open('new_dataset.csv', 'wb')

writer = csv.writer(output)



conversion = '-"/.$'

text =  input.read()

newtext = '_'

for c in text:

    newtext += '_' if c in conversion else c

    writer.writerow(c)



    input.close()

    output.close()

I am getting the following error:

TypeError Traceback (most recent call last) <ipython-input-28-05d606ed80df> in <module>() 10 newtext = '' 11 for c in text: ---> 12 newtext += '' if c in conversion else c 13 writer.writerow(c) 14

TypeError: 'in <string>' requires string as left operand, not int

edited Nov 20 '18 at 11:40

Jaba

7,051175394

asked Nov 20 '18 at 11:30

Ab Rohi

214

Can you post the error log over here.

– Sharvin Shah
Nov 20 '18 at 11:32

1

here is the error: TypeError Traceback (most recent call last) <ipython-input-28-05d606ed80df> in <module>() 10 newtext = '' 11 for c in text: ---> 12 newtext += '' if c in conversion else c 13 writer.writerow(c) 14 TypeError: 'in <string>' requires string as left operand, not int

– Ab Rohi
Nov 20 '18 at 11:33

Please add the code you used

– Jaba
Nov 20 '18 at 11:36

1

i've been aded the code i used

– Ab Rohi
Nov 20 '18 at 11:39

1

then is there any solution?

– Ab Rohi
Nov 20 '18 at 11:52

add a comment |

-1

I have a dataset about Indonesia recipe with 3 columns (first column is recipe name, second column is ingredient, third column is step).

In the second and third column, there is some special character wsuch as '#' and '/', how do I remove them? I followed this but it's showing some errors. Here is the dataset!

This is my code:

import csv



input = open('dataset-ayam-baru.csv', 'rb')

lines = csv.reader(input)

output = open('new_dataset.csv', 'wb')

writer = csv.writer(output)



conversion = '-"/.$'

text =  input.read()

newtext = '_'

for c in text:

    newtext += '_' if c in conversion else c

    writer.writerow(c)



    input.close()

    output.close()

I am getting the following error:

TypeError Traceback (most recent call last) <ipython-input-28-05d606ed80df> in <module>() 10 newtext = '' 11 for c in text: ---> 12 newtext += '' if c in conversion else c 13 writer.writerow(c) 14

TypeError: 'in <string>' requires string as left operand, not int

edited Nov 20 '18 at 11:40

Jaba

7,051175394

asked Nov 20 '18 at 11:30

Ab Rohi

214

I have a dataset about Indonesia recipe with 3 columns (first column is recipe name, second column is ingredient, third column is step).

In the second and third column, there is some special character wsuch as '#' and '/', how do I remove them? I followed this but it's showing some errors. Here is the dataset!

This is my code:

import csv



input = open('dataset-ayam-baru.csv', 'rb')

lines = csv.reader(input)

output = open('new_dataset.csv', 'wb')

writer = csv.writer(output)



conversion = '-"/.$'

text =  input.read()

newtext = '_'

for c in text:

    newtext += '_' if c in conversion else c

    writer.writerow(c)



    input.close()

    output.close()

I am getting the following error:

TypeError Traceback (most recent call last) <ipython-input-28-05d606ed80df> in <module>() 10 newtext = '' 11 for c in text: ---> 12 newtext += '' if c in conversion else c 13 writer.writerow(c) 14

TypeError: 'in <string>' requires string as left operand, not int

python csv

edited Nov 20 '18 at 11:40

Jaba

7,051175394

asked Nov 20 '18 at 11:30

Ab Rohi

214

edited Nov 20 '18 at 11:40

Jaba

7,051175394

asked Nov 20 '18 at 11:30

Ab Rohi

214

edited Nov 20 '18 at 11:40

Jaba

7,051175394

edited Nov 20 '18 at 11:40

Jaba

7,051175394

edited Nov 20 '18 at 11:40

Jaba

7,051175394

asked Nov 20 '18 at 11:30

Ab Rohi

214

asked Nov 20 '18 at 11:30

Ab Rohi

214

asked Nov 20 '18 at 11:30

Ab Rohi

214

Can you post the error log over here.

– Sharvin Shah
Nov 20 '18 at 11:32

1

here is the error: TypeError Traceback (most recent call last) <ipython-input-28-05d606ed80df> in <module>() 10 newtext = '' 11 for c in text: ---> 12 newtext += '' if c in conversion else c 13 writer.writerow(c) 14 TypeError: 'in <string>' requires string as left operand, not int

– Ab Rohi
Nov 20 '18 at 11:33

Please add the code you used

– Jaba
Nov 20 '18 at 11:36

1

i've been aded the code i used

– Ab Rohi
Nov 20 '18 at 11:39

1

then is there any solution?

– Ab Rohi
Nov 20 '18 at 11:52

add a comment |

Can you post the error log over here.

– Sharvin Shah
Nov 20 '18 at 11:32

1

here is the error: TypeError Traceback (most recent call last) <ipython-input-28-05d606ed80df> in <module>() 10 newtext = '' 11 for c in text: ---> 12 newtext += '' if c in conversion else c 13 writer.writerow(c) 14 TypeError: 'in <string>' requires string as left operand, not int

– Ab Rohi
Nov 20 '18 at 11:33

Please add the code you used

– Jaba
Nov 20 '18 at 11:36

1

i've been aded the code i used

– Ab Rohi
Nov 20 '18 at 11:39

1

then is there any solution?

– Ab Rohi
Nov 20 '18 at 11:52

Can you post the error log over here.

– Sharvin Shah
Nov 20 '18 at 11:32

here is the error: TypeError Traceback (most recent call last) <ipython-input-28-05d606ed80df> in <module>() 10 newtext = '' 11 for c in text: ---> 12 newtext += '' if c in conversion else c 13 writer.writerow(c) 14 TypeError: 'in <string>' requires string as left operand, not int

– Ab Rohi
Nov 20 '18 at 11:33

Please add the code you used

– Jaba
Nov 20 '18 at 11:36

i've been aded the code i used

– Ab Rohi
Nov 20 '18 at 11:39

then is there any solution?

– Ab Rohi
Nov 20 '18 at 11:52

add a comment |

2 Answers
2

active

oldest

votes

The error is due to the fact that you are loading the file as bytes. You need to put the "rt" instead of "wb" when you open the file.

From the stackoverflow question you cited, a working answer for me is:

import csv



with open("dataset-ayam-baru.csv", "rt", encoding="utf-8") as infile, open("new_dataset.csv", "w") as outfile:

    reader = csv.reader(infile)

    writer = csv.writer(outfile)

    conversion = set('_"/.$')

    for row in reader:

        newrow = [''.join('_' if c in conversion else c for c in entry) for entry in row]

        writer.writerow(newrow)

Important: the encoding of the input file! I had to convert it to ANSI to make it work, because you need to know in advance the encoding of the dataset (i.e., utf-8).

The followup question (the one regarding the bytes and the encoding) is there: csv.Error: iterator should return strings, not bytes

edited Nov 20 '18 at 13:41

answered Nov 20 '18 at 11:57

FMarazzi

323213

1

Hi, thank you for solution, but I got new error : charmap' codec can't decode byte 0x9d in position 3291: character maps to <undefined>, what is it possible about?

– Ab Rohi
Nov 20 '18 at 12:19

Hello, it depends on the encoding of your original csv. It is mandatory that you know the encoding of the file beforehand, it cannot be inferred. I edited the answer adding the parameter "encoding" to the open() function. If you do not know the encoding of the file, you have to convert it to a known encoding.

– FMarazzi
Nov 20 '18 at 13:41

add a comment |

Here i found somewhere to remove the special character, in case someone may need it.

def give_emoji_free_text(text):

    allchars = [str for str in text]

    emoji_list = [c for c in allchars if c in emoji.UNICODE_EMOJI]

    clean_text = ' '.join([str for str in text.split() if not any(i in str for i in emoji_list)])

   return clean_text



    for i in range(len(data['Title'])):

        data['Ingredients'][i] =  give_emoji_free_text(data['Ingredients'].get_value(i))

        data['Title'][i] =  give_emoji_free_text(data['Title'].get_value(i))

        data['Steps'][i] =  give_emoji_free_text(data['Steps'].get_value(i))

Thank you.

answered Nov 22 '18 at 8:18

Ab Rohi

214

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53392073%2fremove-some-character-in-python-from-csv-file%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

The error is due to the fact that you are loading the file as bytes. You need to put the "rt" instead of "wb" when you open the file.

From the stackoverflow question you cited, a working answer for me is:

import csv



with open("dataset-ayam-baru.csv", "rt", encoding="utf-8") as infile, open("new_dataset.csv", "w") as outfile:

    reader = csv.reader(infile)

    writer = csv.writer(outfile)

    conversion = set('_"/.$')

    for row in reader:

        newrow = [''.join('_' if c in conversion else c for c in entry) for entry in row]

        writer.writerow(newrow)

Important: the encoding of the input file! I had to convert it to ANSI to make it work, because you need to know in advance the encoding of the dataset (i.e., utf-8).

The followup question (the one regarding the bytes and the encoding) is there: csv.Error: iterator should return strings, not bytes

edited Nov 20 '18 at 13:41

answered Nov 20 '18 at 11:57

FMarazzi

323213

1

Hi, thank you for solution, but I got new error : charmap' codec can't decode byte 0x9d in position 3291: character maps to <undefined>, what is it possible about?

– Ab Rohi
Nov 20 '18 at 12:19

Hello, it depends on the encoding of your original csv. It is mandatory that you know the encoding of the file beforehand, it cannot be inferred. I edited the answer adding the parameter "encoding" to the open() function. If you do not know the encoding of the file, you have to convert it to a known encoding.

– FMarazzi
Nov 20 '18 at 13:41

add a comment |

The error is due to the fact that you are loading the file as bytes. You need to put the "rt" instead of "wb" when you open the file.

From the stackoverflow question you cited, a working answer for me is:

import csv



with open("dataset-ayam-baru.csv", "rt", encoding="utf-8") as infile, open("new_dataset.csv", "w") as outfile:

    reader = csv.reader(infile)

    writer = csv.writer(outfile)

    conversion = set('_"/.$')

    for row in reader:

        newrow = [''.join('_' if c in conversion else c for c in entry) for entry in row]

        writer.writerow(newrow)

Important: the encoding of the input file! I had to convert it to ANSI to make it work, because you need to know in advance the encoding of the dataset (i.e., utf-8).

The followup question (the one regarding the bytes and the encoding) is there: csv.Error: iterator should return strings, not bytes

edited Nov 20 '18 at 13:41

answered Nov 20 '18 at 11:57

FMarazzi

323213

1

Hi, thank you for solution, but I got new error : charmap' codec can't decode byte 0x9d in position 3291: character maps to <undefined>, what is it possible about?

– Ab Rohi
Nov 20 '18 at 12:19

Hello, it depends on the encoding of your original csv. It is mandatory that you know the encoding of the file beforehand, it cannot be inferred. I edited the answer adding the parameter "encoding" to the open() function. If you do not know the encoding of the file, you have to convert it to a known encoding.

– FMarazzi
Nov 20 '18 at 13:41

add a comment |

The error is due to the fact that you are loading the file as bytes. You need to put the "rt" instead of "wb" when you open the file.

From the stackoverflow question you cited, a working answer for me is:

import csv



with open("dataset-ayam-baru.csv", "rt", encoding="utf-8") as infile, open("new_dataset.csv", "w") as outfile:

    reader = csv.reader(infile)

    writer = csv.writer(outfile)

    conversion = set('_"/.$')

    for row in reader:

        newrow = [''.join('_' if c in conversion else c for c in entry) for entry in row]

        writer.writerow(newrow)

Important: the encoding of the input file! I had to convert it to ANSI to make it work, because you need to know in advance the encoding of the dataset (i.e., utf-8).

The followup question (the one regarding the bytes and the encoding) is there: csv.Error: iterator should return strings, not bytes

edited Nov 20 '18 at 13:41

answered Nov 20 '18 at 11:57

FMarazzi

323213

The error is due to the fact that you are loading the file as bytes. You need to put the "rt" instead of "wb" when you open the file.

From the stackoverflow question you cited, a working answer for me is:

import csv



with open("dataset-ayam-baru.csv", "rt", encoding="utf-8") as infile, open("new_dataset.csv", "w") as outfile:

    reader = csv.reader(infile)

    writer = csv.writer(outfile)

    conversion = set('_"/.$')

    for row in reader:

        newrow = [''.join('_' if c in conversion else c for c in entry) for entry in row]

        writer.writerow(newrow)

Important: the encoding of the input file! I had to convert it to ANSI to make it work, because you need to know in advance the encoding of the dataset (i.e., utf-8).

The followup question (the one regarding the bytes and the encoding) is there: csv.Error: iterator should return strings, not bytes

edited Nov 20 '18 at 13:41

answered Nov 20 '18 at 11:57

FMarazzi

323213

edited Nov 20 '18 at 13:41

answered Nov 20 '18 at 11:57

FMarazzi

323213

answered Nov 20 '18 at 11:57

FMarazzi

323213

answered Nov 20 '18 at 11:57

FMarazzi

323213

1

Hi, thank you for solution, but I got new error : charmap' codec can't decode byte 0x9d in position 3291: character maps to <undefined>, what is it possible about?

– Ab Rohi
Nov 20 '18 at 12:19

Hello, it depends on the encoding of your original csv. It is mandatory that you know the encoding of the file beforehand, it cannot be inferred. I edited the answer adding the parameter "encoding" to the open() function. If you do not know the encoding of the file, you have to convert it to a known encoding.

– FMarazzi
Nov 20 '18 at 13:41

add a comment |

1

Hi, thank you for solution, but I got new error : charmap' codec can't decode byte 0x9d in position 3291: character maps to <undefined>, what is it possible about?

– Ab Rohi
Nov 20 '18 at 12:19

Hello, it depends on the encoding of your original csv. It is mandatory that you know the encoding of the file beforehand, it cannot be inferred. I edited the answer adding the parameter "encoding" to the open() function. If you do not know the encoding of the file, you have to convert it to a known encoding.

– FMarazzi
Nov 20 '18 at 13:41

Hi, thank you for solution, but I got new error : charmap' codec can't decode byte 0x9d in position 3291: character maps to <undefined>, what is it possible about?

– Ab Rohi
Nov 20 '18 at 12:19

Hello, it depends on the encoding of your original csv. It is mandatory that you know the encoding of the file beforehand, it cannot be inferred. I edited the answer adding the parameter "encoding" to the open() function. If you do not know the encoding of the file, you have to convert it to a known encoding.

– FMarazzi
Nov 20 '18 at 13:41

add a comment |

Here i found somewhere to remove the special character, in case someone may need it.

def give_emoji_free_text(text):

    allchars = [str for str in text]

    emoji_list = [c for c in allchars if c in emoji.UNICODE_EMOJI]

    clean_text = ' '.join([str for str in text.split() if not any(i in str for i in emoji_list)])

   return clean_text



    for i in range(len(data['Title'])):

        data['Ingredients'][i] =  give_emoji_free_text(data['Ingredients'].get_value(i))

        data['Title'][i] =  give_emoji_free_text(data['Title'].get_value(i))

        data['Steps'][i] =  give_emoji_free_text(data['Steps'].get_value(i))

Thank you.

answered Nov 22 '18 at 8:18

Ab Rohi

214

add a comment |

Here i found somewhere to remove the special character, in case someone may need it.

def give_emoji_free_text(text):

    allchars = [str for str in text]

    emoji_list = [c for c in allchars if c in emoji.UNICODE_EMOJI]

    clean_text = ' '.join([str for str in text.split() if not any(i in str for i in emoji_list)])

   return clean_text



    for i in range(len(data['Title'])):

        data['Ingredients'][i] =  give_emoji_free_text(data['Ingredients'].get_value(i))

        data['Title'][i] =  give_emoji_free_text(data['Title'].get_value(i))

        data['Steps'][i] =  give_emoji_free_text(data['Steps'].get_value(i))

Thank you.

answered Nov 22 '18 at 8:18

Ab Rohi

214

add a comment |

Here i found somewhere to remove the special character, in case someone may need it.

def give_emoji_free_text(text):

    allchars = [str for str in text]

    emoji_list = [c for c in allchars if c in emoji.UNICODE_EMOJI]

    clean_text = ' '.join([str for str in text.split() if not any(i in str for i in emoji_list)])

   return clean_text



    for i in range(len(data['Title'])):

        data['Ingredients'][i] =  give_emoji_free_text(data['Ingredients'].get_value(i))

        data['Title'][i] =  give_emoji_free_text(data['Title'].get_value(i))

        data['Steps'][i] =  give_emoji_free_text(data['Steps'].get_value(i))

Thank you.

answered Nov 22 '18 at 8:18

Ab Rohi

214

Here i found somewhere to remove the special character, in case someone may need it.

def give_emoji_free_text(text):

    allchars = [str for str in text]

    emoji_list = [c for c in allchars if c in emoji.UNICODE_EMOJI]

    clean_text = ' '.join([str for str in text.split() if not any(i in str for i in emoji_list)])

   return clean_text



    for i in range(len(data['Title'])):

        data['Ingredients'][i] =  give_emoji_free_text(data['Ingredients'].get_value(i))

        data['Title'][i] =  give_emoji_free_text(data['Title'].get_value(i))

        data['Steps'][i] =  give_emoji_free_text(data['Steps'].get_value(i))

Thank you.

answered Nov 22 '18 at 8:18

Ab Rohi

214

answered Nov 22 '18 at 8:18

Ab Rohi

214

answered Nov 22 '18 at 8:18

Ab Rohi

214

answered Nov 22 '18 at 8:18

Ab Rohi

214

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu