remove some character in python from csv file












-1















I have a dataset about Indonesia recipe with 3 columns (first column is recipe name, second column is ingredient, third column is step).



In the second and third column, there is some special character wsuch as '#' and '/', how do I remove them? I followed this but it's showing some errors. Here is the dataset!



This is my code:



import csv

input = open('dataset-ayam-baru.csv', 'rb')
lines = csv.reader(input)
output = open('new_dataset.csv', 'wb')
writer = csv.writer(output)

conversion = '-"/.$'
text = input.read()
newtext = '_'
for c in text:
newtext += '_' if c in conversion else c
writer.writerow(c)

input.close()
output.close()


I am getting the following error:




TypeError Traceback (most recent call last) <ipython-input-28-05d606ed80df> in <module>() 10 newtext = '' 11 for c in text: ---> 12 newtext += '' if c in conversion else c 13 writer.writerow(c) 14



TypeError: 'in <string>' requires string as left operand, not int











share|improve this question

























  • Can you post the error log over here.

    – Sharvin Shah
    Nov 20 '18 at 11:32






  • 1





    here is the error: TypeError Traceback (most recent call last) <ipython-input-28-05d606ed80df> in <module>() 10 newtext = '' 11 for c in text: ---> 12 newtext += '' if c in conversion else c 13 writer.writerow(c) 14 TypeError: 'in <string>' requires string as left operand, not int

    – Ab Rohi
    Nov 20 '18 at 11:33













  • Please add the code you used

    – Jaba
    Nov 20 '18 at 11:36






  • 1





    i've been aded the code i used

    – Ab Rohi
    Nov 20 '18 at 11:39






  • 1





    then is there any solution?

    – Ab Rohi
    Nov 20 '18 at 11:52
















-1















I have a dataset about Indonesia recipe with 3 columns (first column is recipe name, second column is ingredient, third column is step).



In the second and third column, there is some special character wsuch as '#' and '/', how do I remove them? I followed this but it's showing some errors. Here is the dataset!



This is my code:



import csv

input = open('dataset-ayam-baru.csv', 'rb')
lines = csv.reader(input)
output = open('new_dataset.csv', 'wb')
writer = csv.writer(output)

conversion = '-"/.$'
text = input.read()
newtext = '_'
for c in text:
newtext += '_' if c in conversion else c
writer.writerow(c)

input.close()
output.close()


I am getting the following error:




TypeError Traceback (most recent call last) <ipython-input-28-05d606ed80df> in <module>() 10 newtext = '' 11 for c in text: ---> 12 newtext += '' if c in conversion else c 13 writer.writerow(c) 14



TypeError: 'in <string>' requires string as left operand, not int











share|improve this question

























  • Can you post the error log over here.

    – Sharvin Shah
    Nov 20 '18 at 11:32






  • 1





    here is the error: TypeError Traceback (most recent call last) <ipython-input-28-05d606ed80df> in <module>() 10 newtext = '' 11 for c in text: ---> 12 newtext += '' if c in conversion else c 13 writer.writerow(c) 14 TypeError: 'in <string>' requires string as left operand, not int

    – Ab Rohi
    Nov 20 '18 at 11:33













  • Please add the code you used

    – Jaba
    Nov 20 '18 at 11:36






  • 1





    i've been aded the code i used

    – Ab Rohi
    Nov 20 '18 at 11:39






  • 1





    then is there any solution?

    – Ab Rohi
    Nov 20 '18 at 11:52














-1












-1








-1


1






I have a dataset about Indonesia recipe with 3 columns (first column is recipe name, second column is ingredient, third column is step).



In the second and third column, there is some special character wsuch as '#' and '/', how do I remove them? I followed this but it's showing some errors. Here is the dataset!



This is my code:



import csv

input = open('dataset-ayam-baru.csv', 'rb')
lines = csv.reader(input)
output = open('new_dataset.csv', 'wb')
writer = csv.writer(output)

conversion = '-"/.$'
text = input.read()
newtext = '_'
for c in text:
newtext += '_' if c in conversion else c
writer.writerow(c)

input.close()
output.close()


I am getting the following error:




TypeError Traceback (most recent call last) <ipython-input-28-05d606ed80df> in <module>() 10 newtext = '' 11 for c in text: ---> 12 newtext += '' if c in conversion else c 13 writer.writerow(c) 14



TypeError: 'in <string>' requires string as left operand, not int











share|improve this question
















I have a dataset about Indonesia recipe with 3 columns (first column is recipe name, second column is ingredient, third column is step).



In the second and third column, there is some special character wsuch as '#' and '/', how do I remove them? I followed this but it's showing some errors. Here is the dataset!



This is my code:



import csv

input = open('dataset-ayam-baru.csv', 'rb')
lines = csv.reader(input)
output = open('new_dataset.csv', 'wb')
writer = csv.writer(output)

conversion = '-"/.$'
text = input.read()
newtext = '_'
for c in text:
newtext += '_' if c in conversion else c
writer.writerow(c)

input.close()
output.close()


I am getting the following error:




TypeError Traceback (most recent call last) <ipython-input-28-05d606ed80df> in <module>() 10 newtext = '' 11 for c in text: ---> 12 newtext += '' if c in conversion else c 13 writer.writerow(c) 14



TypeError: 'in <string>' requires string as left operand, not int








python csv






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 20 '18 at 11:40









Jaba

7,051175394




7,051175394










asked Nov 20 '18 at 11:30









Ab RohiAb Rohi

214




214













  • Can you post the error log over here.

    – Sharvin Shah
    Nov 20 '18 at 11:32






  • 1





    here is the error: TypeError Traceback (most recent call last) <ipython-input-28-05d606ed80df> in <module>() 10 newtext = '' 11 for c in text: ---> 12 newtext += '' if c in conversion else c 13 writer.writerow(c) 14 TypeError: 'in <string>' requires string as left operand, not int

    – Ab Rohi
    Nov 20 '18 at 11:33













  • Please add the code you used

    – Jaba
    Nov 20 '18 at 11:36






  • 1





    i've been aded the code i used

    – Ab Rohi
    Nov 20 '18 at 11:39






  • 1





    then is there any solution?

    – Ab Rohi
    Nov 20 '18 at 11:52



















  • Can you post the error log over here.

    – Sharvin Shah
    Nov 20 '18 at 11:32






  • 1





    here is the error: TypeError Traceback (most recent call last) <ipython-input-28-05d606ed80df> in <module>() 10 newtext = '' 11 for c in text: ---> 12 newtext += '' if c in conversion else c 13 writer.writerow(c) 14 TypeError: 'in <string>' requires string as left operand, not int

    – Ab Rohi
    Nov 20 '18 at 11:33













  • Please add the code you used

    – Jaba
    Nov 20 '18 at 11:36






  • 1





    i've been aded the code i used

    – Ab Rohi
    Nov 20 '18 at 11:39






  • 1





    then is there any solution?

    – Ab Rohi
    Nov 20 '18 at 11:52

















Can you post the error log over here.

– Sharvin Shah
Nov 20 '18 at 11:32





Can you post the error log over here.

– Sharvin Shah
Nov 20 '18 at 11:32




1




1





here is the error: TypeError Traceback (most recent call last) <ipython-input-28-05d606ed80df> in <module>() 10 newtext = '' 11 for c in text: ---> 12 newtext += '' if c in conversion else c 13 writer.writerow(c) 14 TypeError: 'in <string>' requires string as left operand, not int

– Ab Rohi
Nov 20 '18 at 11:33







here is the error: TypeError Traceback (most recent call last) <ipython-input-28-05d606ed80df> in <module>() 10 newtext = '' 11 for c in text: ---> 12 newtext += '' if c in conversion else c 13 writer.writerow(c) 14 TypeError: 'in <string>' requires string as left operand, not int

– Ab Rohi
Nov 20 '18 at 11:33















Please add the code you used

– Jaba
Nov 20 '18 at 11:36





Please add the code you used

– Jaba
Nov 20 '18 at 11:36




1




1





i've been aded the code i used

– Ab Rohi
Nov 20 '18 at 11:39





i've been aded the code i used

– Ab Rohi
Nov 20 '18 at 11:39




1




1





then is there any solution?

– Ab Rohi
Nov 20 '18 at 11:52





then is there any solution?

– Ab Rohi
Nov 20 '18 at 11:52












2 Answers
2






active

oldest

votes


















0














The error is due to the fact that you are loading the file as bytes. You need to put the "rt" instead of "wb" when you open the file.



From the stackoverflow question you cited, a working answer for me is:



import csv

with open("dataset-ayam-baru.csv", "rt", encoding="utf-8") as infile, open("new_dataset.csv", "w") as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
conversion = set('_"/.$')
for row in reader:
newrow = [''.join('_' if c in conversion else c for c in entry) for entry in row]
writer.writerow(newrow)


Important: the encoding of the input file! I had to convert it to ANSI to make it work, because you need to know in advance the encoding of the dataset (i.e., utf-8).



The followup question (the one regarding the bytes and the encoding) is there: csv.Error: iterator should return strings, not bytes






share|improve this answer





















  • 1





    Hi, thank you for solution, but I got new error : charmap' codec can't decode byte 0x9d in position 3291: character maps to <undefined>, what is it possible about?

    – Ab Rohi
    Nov 20 '18 at 12:19











  • Hello, it depends on the encoding of your original csv. It is mandatory that you know the encoding of the file beforehand, it cannot be inferred. I edited the answer adding the parameter "encoding" to the open() function. If you do not know the encoding of the file, you have to convert it to a known encoding.

    – FMarazzi
    Nov 20 '18 at 13:41





















1














Here i found somewhere to remove the special character, in case someone may need it.



def give_emoji_free_text(text):
allchars = [str for str in text]
emoji_list = [c for c in allchars if c in emoji.UNICODE_EMOJI]
clean_text = ' '.join([str for str in text.split() if not any(i in str for i in emoji_list)])
return clean_text

for i in range(len(data['Title'])):
data['Ingredients'][i] = give_emoji_free_text(data['Ingredients'].get_value(i))
data['Title'][i] = give_emoji_free_text(data['Title'].get_value(i))
data['Steps'][i] = give_emoji_free_text(data['Steps'].get_value(i))


Thank you.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53392073%2fremove-some-character-in-python-from-csv-file%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    The error is due to the fact that you are loading the file as bytes. You need to put the "rt" instead of "wb" when you open the file.



    From the stackoverflow question you cited, a working answer for me is:



    import csv

    with open("dataset-ayam-baru.csv", "rt", encoding="utf-8") as infile, open("new_dataset.csv", "w") as outfile:
    reader = csv.reader(infile)
    writer = csv.writer(outfile)
    conversion = set('_"/.$')
    for row in reader:
    newrow = [''.join('_' if c in conversion else c for c in entry) for entry in row]
    writer.writerow(newrow)


    Important: the encoding of the input file! I had to convert it to ANSI to make it work, because you need to know in advance the encoding of the dataset (i.e., utf-8).



    The followup question (the one regarding the bytes and the encoding) is there: csv.Error: iterator should return strings, not bytes






    share|improve this answer





















    • 1





      Hi, thank you for solution, but I got new error : charmap' codec can't decode byte 0x9d in position 3291: character maps to <undefined>, what is it possible about?

      – Ab Rohi
      Nov 20 '18 at 12:19











    • Hello, it depends on the encoding of your original csv. It is mandatory that you know the encoding of the file beforehand, it cannot be inferred. I edited the answer adding the parameter "encoding" to the open() function. If you do not know the encoding of the file, you have to convert it to a known encoding.

      – FMarazzi
      Nov 20 '18 at 13:41


















    0














    The error is due to the fact that you are loading the file as bytes. You need to put the "rt" instead of "wb" when you open the file.



    From the stackoverflow question you cited, a working answer for me is:



    import csv

    with open("dataset-ayam-baru.csv", "rt", encoding="utf-8") as infile, open("new_dataset.csv", "w") as outfile:
    reader = csv.reader(infile)
    writer = csv.writer(outfile)
    conversion = set('_"/.$')
    for row in reader:
    newrow = [''.join('_' if c in conversion else c for c in entry) for entry in row]
    writer.writerow(newrow)


    Important: the encoding of the input file! I had to convert it to ANSI to make it work, because you need to know in advance the encoding of the dataset (i.e., utf-8).



    The followup question (the one regarding the bytes and the encoding) is there: csv.Error: iterator should return strings, not bytes






    share|improve this answer





















    • 1





      Hi, thank you for solution, but I got new error : charmap' codec can't decode byte 0x9d in position 3291: character maps to <undefined>, what is it possible about?

      – Ab Rohi
      Nov 20 '18 at 12:19











    • Hello, it depends on the encoding of your original csv. It is mandatory that you know the encoding of the file beforehand, it cannot be inferred. I edited the answer adding the parameter "encoding" to the open() function. If you do not know the encoding of the file, you have to convert it to a known encoding.

      – FMarazzi
      Nov 20 '18 at 13:41
















    0












    0








    0







    The error is due to the fact that you are loading the file as bytes. You need to put the "rt" instead of "wb" when you open the file.



    From the stackoverflow question you cited, a working answer for me is:



    import csv

    with open("dataset-ayam-baru.csv", "rt", encoding="utf-8") as infile, open("new_dataset.csv", "w") as outfile:
    reader = csv.reader(infile)
    writer = csv.writer(outfile)
    conversion = set('_"/.$')
    for row in reader:
    newrow = [''.join('_' if c in conversion else c for c in entry) for entry in row]
    writer.writerow(newrow)


    Important: the encoding of the input file! I had to convert it to ANSI to make it work, because you need to know in advance the encoding of the dataset (i.e., utf-8).



    The followup question (the one regarding the bytes and the encoding) is there: csv.Error: iterator should return strings, not bytes






    share|improve this answer















    The error is due to the fact that you are loading the file as bytes. You need to put the "rt" instead of "wb" when you open the file.



    From the stackoverflow question you cited, a working answer for me is:



    import csv

    with open("dataset-ayam-baru.csv", "rt", encoding="utf-8") as infile, open("new_dataset.csv", "w") as outfile:
    reader = csv.reader(infile)
    writer = csv.writer(outfile)
    conversion = set('_"/.$')
    for row in reader:
    newrow = [''.join('_' if c in conversion else c for c in entry) for entry in row]
    writer.writerow(newrow)


    Important: the encoding of the input file! I had to convert it to ANSI to make it work, because you need to know in advance the encoding of the dataset (i.e., utf-8).



    The followup question (the one regarding the bytes and the encoding) is there: csv.Error: iterator should return strings, not bytes







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 20 '18 at 13:41

























    answered Nov 20 '18 at 11:57









    FMarazziFMarazzi

    323213




    323213








    • 1





      Hi, thank you for solution, but I got new error : charmap' codec can't decode byte 0x9d in position 3291: character maps to <undefined>, what is it possible about?

      – Ab Rohi
      Nov 20 '18 at 12:19











    • Hello, it depends on the encoding of your original csv. It is mandatory that you know the encoding of the file beforehand, it cannot be inferred. I edited the answer adding the parameter "encoding" to the open() function. If you do not know the encoding of the file, you have to convert it to a known encoding.

      – FMarazzi
      Nov 20 '18 at 13:41
















    • 1





      Hi, thank you for solution, but I got new error : charmap' codec can't decode byte 0x9d in position 3291: character maps to <undefined>, what is it possible about?

      – Ab Rohi
      Nov 20 '18 at 12:19











    • Hello, it depends on the encoding of your original csv. It is mandatory that you know the encoding of the file beforehand, it cannot be inferred. I edited the answer adding the parameter "encoding" to the open() function. If you do not know the encoding of the file, you have to convert it to a known encoding.

      – FMarazzi
      Nov 20 '18 at 13:41










    1




    1





    Hi, thank you for solution, but I got new error : charmap' codec can't decode byte 0x9d in position 3291: character maps to <undefined>, what is it possible about?

    – Ab Rohi
    Nov 20 '18 at 12:19





    Hi, thank you for solution, but I got new error : charmap' codec can't decode byte 0x9d in position 3291: character maps to <undefined>, what is it possible about?

    – Ab Rohi
    Nov 20 '18 at 12:19













    Hello, it depends on the encoding of your original csv. It is mandatory that you know the encoding of the file beforehand, it cannot be inferred. I edited the answer adding the parameter "encoding" to the open() function. If you do not know the encoding of the file, you have to convert it to a known encoding.

    – FMarazzi
    Nov 20 '18 at 13:41







    Hello, it depends on the encoding of your original csv. It is mandatory that you know the encoding of the file beforehand, it cannot be inferred. I edited the answer adding the parameter "encoding" to the open() function. If you do not know the encoding of the file, you have to convert it to a known encoding.

    – FMarazzi
    Nov 20 '18 at 13:41















    1














    Here i found somewhere to remove the special character, in case someone may need it.



    def give_emoji_free_text(text):
    allchars = [str for str in text]
    emoji_list = [c for c in allchars if c in emoji.UNICODE_EMOJI]
    clean_text = ' '.join([str for str in text.split() if not any(i in str for i in emoji_list)])
    return clean_text

    for i in range(len(data['Title'])):
    data['Ingredients'][i] = give_emoji_free_text(data['Ingredients'].get_value(i))
    data['Title'][i] = give_emoji_free_text(data['Title'].get_value(i))
    data['Steps'][i] = give_emoji_free_text(data['Steps'].get_value(i))


    Thank you.






    share|improve this answer




























      1














      Here i found somewhere to remove the special character, in case someone may need it.



      def give_emoji_free_text(text):
      allchars = [str for str in text]
      emoji_list = [c for c in allchars if c in emoji.UNICODE_EMOJI]
      clean_text = ' '.join([str for str in text.split() if not any(i in str for i in emoji_list)])
      return clean_text

      for i in range(len(data['Title'])):
      data['Ingredients'][i] = give_emoji_free_text(data['Ingredients'].get_value(i))
      data['Title'][i] = give_emoji_free_text(data['Title'].get_value(i))
      data['Steps'][i] = give_emoji_free_text(data['Steps'].get_value(i))


      Thank you.






      share|improve this answer


























        1












        1








        1







        Here i found somewhere to remove the special character, in case someone may need it.



        def give_emoji_free_text(text):
        allchars = [str for str in text]
        emoji_list = [c for c in allchars if c in emoji.UNICODE_EMOJI]
        clean_text = ' '.join([str for str in text.split() if not any(i in str for i in emoji_list)])
        return clean_text

        for i in range(len(data['Title'])):
        data['Ingredients'][i] = give_emoji_free_text(data['Ingredients'].get_value(i))
        data['Title'][i] = give_emoji_free_text(data['Title'].get_value(i))
        data['Steps'][i] = give_emoji_free_text(data['Steps'].get_value(i))


        Thank you.






        share|improve this answer













        Here i found somewhere to remove the special character, in case someone may need it.



        def give_emoji_free_text(text):
        allchars = [str for str in text]
        emoji_list = [c for c in allchars if c in emoji.UNICODE_EMOJI]
        clean_text = ' '.join([str for str in text.split() if not any(i in str for i in emoji_list)])
        return clean_text

        for i in range(len(data['Title'])):
        data['Ingredients'][i] = give_emoji_free_text(data['Ingredients'].get_value(i))
        data['Title'][i] = give_emoji_free_text(data['Title'].get_value(i))
        data['Steps'][i] = give_emoji_free_text(data['Steps'].get_value(i))


        Thank you.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 22 '18 at 8:18









        Ab RohiAb Rohi

        214




        214






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53392073%2fremove-some-character-in-python-from-csv-file%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            MongoDB - Not Authorized To Execute Command

            How to fix TextFormField cause rebuild widget in Flutter

            Npm cannot find a required file even through it is in the searched directory