keep quoted blocks intact when splitting by delimiter












1















Given an example string s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"' and I want to spearate it to the following chunks:



# To Do: something like {l = s.split(',')}
l = ['Hi', 'my name is Humpty-Dumpty', '"Alice, Through the Looking Glass"']


I don't know where and how many delimiters I'll find.



This is my initial idea, and it is quite long, and not exact, as it removes the all delimiters, while I want the delimiters inside quotes to survive:



s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
ss =
inner_string = ""
delimiter = ','

for item in s.split(delimiter):
if not inner_string:
if '"' not in item: # regullar string. not intersting
ss.append(item)
else:
inner_string += item # start inner string

elif inner_string:
inner_string += item

if '"' in item: # end inner string
ss.append(inner_string)
inner_string = ""
else: # middle of inner string
pass

print(ss)
# prints ['Hi', ' my name is Humpty-Dumpty', ' from "Alice Through the Looking Glass"'] which is OK-ish









share|improve this question





























    1















    Given an example string s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"' and I want to spearate it to the following chunks:



    # To Do: something like {l = s.split(',')}
    l = ['Hi', 'my name is Humpty-Dumpty', '"Alice, Through the Looking Glass"']


    I don't know where and how many delimiters I'll find.



    This is my initial idea, and it is quite long, and not exact, as it removes the all delimiters, while I want the delimiters inside quotes to survive:



    s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
    ss =
    inner_string = ""
    delimiter = ','

    for item in s.split(delimiter):
    if not inner_string:
    if '"' not in item: # regullar string. not intersting
    ss.append(item)
    else:
    inner_string += item # start inner string

    elif inner_string:
    inner_string += item

    if '"' in item: # end inner string
    ss.append(inner_string)
    inner_string = ""
    else: # middle of inner string
    pass

    print(ss)
    # prints ['Hi', ' my name is Humpty-Dumpty', ' from "Alice Through the Looking Glass"'] which is OK-ish









    share|improve this question



























      1












      1








      1








      Given an example string s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"' and I want to spearate it to the following chunks:



      # To Do: something like {l = s.split(',')}
      l = ['Hi', 'my name is Humpty-Dumpty', '"Alice, Through the Looking Glass"']


      I don't know where and how many delimiters I'll find.



      This is my initial idea, and it is quite long, and not exact, as it removes the all delimiters, while I want the delimiters inside quotes to survive:



      s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
      ss =
      inner_string = ""
      delimiter = ','

      for item in s.split(delimiter):
      if not inner_string:
      if '"' not in item: # regullar string. not intersting
      ss.append(item)
      else:
      inner_string += item # start inner string

      elif inner_string:
      inner_string += item

      if '"' in item: # end inner string
      ss.append(inner_string)
      inner_string = ""
      else: # middle of inner string
      pass

      print(ss)
      # prints ['Hi', ' my name is Humpty-Dumpty', ' from "Alice Through the Looking Glass"'] which is OK-ish









      share|improve this question
















      Given an example string s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"' and I want to spearate it to the following chunks:



      # To Do: something like {l = s.split(',')}
      l = ['Hi', 'my name is Humpty-Dumpty', '"Alice, Through the Looking Glass"']


      I don't know where and how many delimiters I'll find.



      This is my initial idea, and it is quite long, and not exact, as it removes the all delimiters, while I want the delimiters inside quotes to survive:



      s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
      ss =
      inner_string = ""
      delimiter = ','

      for item in s.split(delimiter):
      if not inner_string:
      if '"' not in item: # regullar string. not intersting
      ss.append(item)
      else:
      inner_string += item # start inner string

      elif inner_string:
      inner_string += item

      if '"' in item: # end inner string
      ss.append(inner_string)
      inner_string = ""
      else: # middle of inner string
      pass

      print(ss)
      # prints ['Hi', ' my name is Humpty-Dumpty', ' from "Alice Through the Looking Glass"'] which is OK-ish






      python python-3.x split






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 20 '18 at 13:00









      fferri

      11.7k22251




      11.7k22251










      asked Nov 20 '18 at 11:13









      CIsForCookiesCIsForCookies

      6,74411546




      6,74411546
























          3 Answers
          3






          active

          oldest

          votes


















          2














          You can split by regular expressions with re.split:



          >>> import re
          >>> [x for x in re.split(r'([^",]*(?:"[^"]*"[^",]*)*)', s) if x not in (',','')]


          when s is equal to:



          'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'


          it outputs:



          ['Hi', ' my name is Humpty-Dumpty', ' from "Alice, Through the Looking Glass"']


          Regular expression explained:



          (
          [^",]* zero or more chars other than " or ,
          (?: non-capturing group
          "[^"]*" quoted block
          [^",]* followed by zero or more chars other than " or ,
          )* zero or more times
          )





          share|improve this answer

































            1














            I solved this problem by avoiding split entirely:



            s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
            l =
            substr = ""
            quotes_open = False

            for c in s:
            if c == ',' and not quotes_open: # check for comma only if no quotes open
            l.append(substr)
            substr = ""
            elif c == '"':
            quotes_open = not quotes_open
            else:
            substr += c

            l.append(substr)

            print(l)


            Output:



            ['Hi', ' my name is Humpty-Dumpty', ' from Alice, Through the Looking Glass']


            A more generalised function could look something like:



            def custom_split(input_str, delimiter=' ', avoid_between_char='"'):
            l =
            substr = ""
            between_avoid_chars = False
            for c in s:
            if c == delimiter and not between_avoid_chars:
            l.append(substr)
            substr = ""
            elif c == avoid_between_char:
            between_avoid_chars = not between_avoid_chars
            else:
            substr += c
            l.append(substr)
            return l





            share|improve this answer

































              0














              this would work for this specific case and can provide a starting point.



              import re
              s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'

              cut = re.search('(".*")', s)

              r = re.sub('(".*")', '$VAR$', s).split(',')
              res =
              for i in r:
              res.append(re.sub('$VAR$', cut.group(1), i))


              Output



              print(res)
              ['Hi', ' my name is Humpty-Dumpty', ' from "Alice, Through the Looking Glass"']





              share|improve this answer























                Your Answer






                StackExchange.ifUsing("editor", function () {
                StackExchange.using("externalEditor", function () {
                StackExchange.using("snippets", function () {
                StackExchange.snippets.init();
                });
                });
                }, "code-snippets");

                StackExchange.ready(function() {
                var channelOptions = {
                tags: "".split(" "),
                id: "1"
                };
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function() {
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled) {
                StackExchange.using("snippets", function() {
                createEditor();
                });
                }
                else {
                createEditor();
                }
                });

                function createEditor() {
                StackExchange.prepareEditor({
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader: {
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                },
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                });


                }
                });














                draft saved

                draft discarded


















                StackExchange.ready(
                function () {
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53391766%2fkeep-quoted-blocks-intact-when-splitting-by-delimiter%23new-answer', 'question_page');
                }
                );

                Post as a guest















                Required, but never shown

























                3 Answers
                3






                active

                oldest

                votes








                3 Answers
                3






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                2














                You can split by regular expressions with re.split:



                >>> import re
                >>> [x for x in re.split(r'([^",]*(?:"[^"]*"[^",]*)*)', s) if x not in (',','')]


                when s is equal to:



                'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'


                it outputs:



                ['Hi', ' my name is Humpty-Dumpty', ' from "Alice, Through the Looking Glass"']


                Regular expression explained:



                (
                [^",]* zero or more chars other than " or ,
                (?: non-capturing group
                "[^"]*" quoted block
                [^",]* followed by zero or more chars other than " or ,
                )* zero or more times
                )





                share|improve this answer






























                  2














                  You can split by regular expressions with re.split:



                  >>> import re
                  >>> [x for x in re.split(r'([^",]*(?:"[^"]*"[^",]*)*)', s) if x not in (',','')]


                  when s is equal to:



                  'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'


                  it outputs:



                  ['Hi', ' my name is Humpty-Dumpty', ' from "Alice, Through the Looking Glass"']


                  Regular expression explained:



                  (
                  [^",]* zero or more chars other than " or ,
                  (?: non-capturing group
                  "[^"]*" quoted block
                  [^",]* followed by zero or more chars other than " or ,
                  )* zero or more times
                  )





                  share|improve this answer




























                    2












                    2








                    2







                    You can split by regular expressions with re.split:



                    >>> import re
                    >>> [x for x in re.split(r'([^",]*(?:"[^"]*"[^",]*)*)', s) if x not in (',','')]


                    when s is equal to:



                    'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'


                    it outputs:



                    ['Hi', ' my name is Humpty-Dumpty', ' from "Alice, Through the Looking Glass"']


                    Regular expression explained:



                    (
                    [^",]* zero or more chars other than " or ,
                    (?: non-capturing group
                    "[^"]*" quoted block
                    [^",]* followed by zero or more chars other than " or ,
                    )* zero or more times
                    )





                    share|improve this answer















                    You can split by regular expressions with re.split:



                    >>> import re
                    >>> [x for x in re.split(r'([^",]*(?:"[^"]*"[^",]*)*)', s) if x not in (',','')]


                    when s is equal to:



                    'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'


                    it outputs:



                    ['Hi', ' my name is Humpty-Dumpty', ' from "Alice, Through the Looking Glass"']


                    Regular expression explained:



                    (
                    [^",]* zero or more chars other than " or ,
                    (?: non-capturing group
                    "[^"]*" quoted block
                    [^",]* followed by zero or more chars other than " or ,
                    )* zero or more times
                    )






                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Nov 20 '18 at 12:56

























                    answered Nov 20 '18 at 11:36









                    fferrifferri

                    11.7k22251




                    11.7k22251

























                        1














                        I solved this problem by avoiding split entirely:



                        s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
                        l =
                        substr = ""
                        quotes_open = False

                        for c in s:
                        if c == ',' and not quotes_open: # check for comma only if no quotes open
                        l.append(substr)
                        substr = ""
                        elif c == '"':
                        quotes_open = not quotes_open
                        else:
                        substr += c

                        l.append(substr)

                        print(l)


                        Output:



                        ['Hi', ' my name is Humpty-Dumpty', ' from Alice, Through the Looking Glass']


                        A more generalised function could look something like:



                        def custom_split(input_str, delimiter=' ', avoid_between_char='"'):
                        l =
                        substr = ""
                        between_avoid_chars = False
                        for c in s:
                        if c == delimiter and not between_avoid_chars:
                        l.append(substr)
                        substr = ""
                        elif c == avoid_between_char:
                        between_avoid_chars = not between_avoid_chars
                        else:
                        substr += c
                        l.append(substr)
                        return l





                        share|improve this answer






























                          1














                          I solved this problem by avoiding split entirely:



                          s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
                          l =
                          substr = ""
                          quotes_open = False

                          for c in s:
                          if c == ',' and not quotes_open: # check for comma only if no quotes open
                          l.append(substr)
                          substr = ""
                          elif c == '"':
                          quotes_open = not quotes_open
                          else:
                          substr += c

                          l.append(substr)

                          print(l)


                          Output:



                          ['Hi', ' my name is Humpty-Dumpty', ' from Alice, Through the Looking Glass']


                          A more generalised function could look something like:



                          def custom_split(input_str, delimiter=' ', avoid_between_char='"'):
                          l =
                          substr = ""
                          between_avoid_chars = False
                          for c in s:
                          if c == delimiter and not between_avoid_chars:
                          l.append(substr)
                          substr = ""
                          elif c == avoid_between_char:
                          between_avoid_chars = not between_avoid_chars
                          else:
                          substr += c
                          l.append(substr)
                          return l





                          share|improve this answer




























                            1












                            1








                            1







                            I solved this problem by avoiding split entirely:



                            s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
                            l =
                            substr = ""
                            quotes_open = False

                            for c in s:
                            if c == ',' and not quotes_open: # check for comma only if no quotes open
                            l.append(substr)
                            substr = ""
                            elif c == '"':
                            quotes_open = not quotes_open
                            else:
                            substr += c

                            l.append(substr)

                            print(l)


                            Output:



                            ['Hi', ' my name is Humpty-Dumpty', ' from Alice, Through the Looking Glass']


                            A more generalised function could look something like:



                            def custom_split(input_str, delimiter=' ', avoid_between_char='"'):
                            l =
                            substr = ""
                            between_avoid_chars = False
                            for c in s:
                            if c == delimiter and not between_avoid_chars:
                            l.append(substr)
                            substr = ""
                            elif c == avoid_between_char:
                            between_avoid_chars = not between_avoid_chars
                            else:
                            substr += c
                            l.append(substr)
                            return l





                            share|improve this answer















                            I solved this problem by avoiding split entirely:



                            s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
                            l =
                            substr = ""
                            quotes_open = False

                            for c in s:
                            if c == ',' and not quotes_open: # check for comma only if no quotes open
                            l.append(substr)
                            substr = ""
                            elif c == '"':
                            quotes_open = not quotes_open
                            else:
                            substr += c

                            l.append(substr)

                            print(l)


                            Output:



                            ['Hi', ' my name is Humpty-Dumpty', ' from Alice, Through the Looking Glass']


                            A more generalised function could look something like:



                            def custom_split(input_str, delimiter=' ', avoid_between_char='"'):
                            l =
                            substr = ""
                            between_avoid_chars = False
                            for c in s:
                            if c == delimiter and not between_avoid_chars:
                            l.append(substr)
                            substr = ""
                            elif c == avoid_between_char:
                            between_avoid_chars = not between_avoid_chars
                            else:
                            substr += c
                            l.append(substr)
                            return l






                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Nov 20 '18 at 11:39

























                            answered Nov 20 '18 at 11:31









                            AquarthurAquarthur

                            279111




                            279111























                                0














                                this would work for this specific case and can provide a starting point.



                                import re
                                s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'

                                cut = re.search('(".*")', s)

                                r = re.sub('(".*")', '$VAR$', s).split(',')
                                res =
                                for i in r:
                                res.append(re.sub('$VAR$', cut.group(1), i))


                                Output



                                print(res)
                                ['Hi', ' my name is Humpty-Dumpty', ' from "Alice, Through the Looking Glass"']





                                share|improve this answer




























                                  0














                                  this would work for this specific case and can provide a starting point.



                                  import re
                                  s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'

                                  cut = re.search('(".*")', s)

                                  r = re.sub('(".*")', '$VAR$', s).split(',')
                                  res =
                                  for i in r:
                                  res.append(re.sub('$VAR$', cut.group(1), i))


                                  Output



                                  print(res)
                                  ['Hi', ' my name is Humpty-Dumpty', ' from "Alice, Through the Looking Glass"']





                                  share|improve this answer


























                                    0












                                    0








                                    0







                                    this would work for this specific case and can provide a starting point.



                                    import re
                                    s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'

                                    cut = re.search('(".*")', s)

                                    r = re.sub('(".*")', '$VAR$', s).split(',')
                                    res =
                                    for i in r:
                                    res.append(re.sub('$VAR$', cut.group(1), i))


                                    Output



                                    print(res)
                                    ['Hi', ' my name is Humpty-Dumpty', ' from "Alice, Through the Looking Glass"']





                                    share|improve this answer













                                    this would work for this specific case and can provide a starting point.



                                    import re
                                    s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'

                                    cut = re.search('(".*")', s)

                                    r = re.sub('(".*")', '$VAR$', s).split(',')
                                    res =
                                    for i in r:
                                    res.append(re.sub('$VAR$', cut.group(1), i))


                                    Output



                                    print(res)
                                    ['Hi', ' my name is Humpty-Dumpty', ' from "Alice, Through the Looking Glass"']






                                    share|improve this answer












                                    share|improve this answer



                                    share|improve this answer










                                    answered Nov 20 '18 at 11:31









                                    RichyRichy

                                    30318




                                    30318






























                                        draft saved

                                        draft discarded




















































                                        Thanks for contributing an answer to Stack Overflow!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid



                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.


                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function () {
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53391766%2fkeep-quoted-blocks-intact-when-splitting-by-delimiter%23new-answer', 'question_page');
                                        }
                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        MongoDB - Not Authorized To Execute Command

                                        How to fix TextFormField cause rebuild widget in Flutter

                                        in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith