How do I find “Judge Randolph M. Hammock” with regular expressions?












-1















I want to capture a judge's name that is surrounded by a bunch of text.



Here is some sample text text:



® @ Stperio,l LED
>
Cay
OCT 9, "se"
-aeentative Ruling Sherr p 8 29
by C. 17
% Exeo, ive On Z—
Judge Randolph M. Hammock, Department 47 Fie oH/erp
a, Copy,
HEARING DATE: October 18, 2017 TRIAL DATE: March 27, 20 18
. CASE: Roger Lee Harrison v. Taylor Hackford, et al. ©
CASE NO.: BC596850


The text file will always say "Judge FirstName LastName".



This is the code I tried:



def get_judge_name(judge_file_name):
j = open("{}{}".format(PATH, judge_file_name), "r")
judge_contents = j.read()
j.close()
judge = re.search('Judge (.*?)([A-Z]{2,})', judge_contents)
print(judge)


I expected an output of Judge Randolph M. Hammock but got None.










share|improve this question

























  • Probably, Judge [^,]+ will be enough.

    – Wiktor Stribiżew
    Jan 1 at 21:20













  • Thank you! That works.

    – rachelvsamuel
    Jan 1 at 21:39











  • What if there is no "," after the judge? Is there a regular expression to capture the entire name?

    – rachelvsamuel
    Jan 1 at 21:40






  • 1





    It is hard to say, since you provide no exact specs. Judge(?: +[A-Z][^Wd_]*.?)+ might work, but you might need to have some stopwords to add at the end of the regex. Like Department, etc. See regex101.com/r/3wB5z5/3

    – Wiktor Stribiżew
    Jan 1 at 21:43













  • Thank you! I tried it.

    – rachelvsamuel
    Jan 1 at 21:55
















-1















I want to capture a judge's name that is surrounded by a bunch of text.



Here is some sample text text:



® @ Stperio,l LED
>
Cay
OCT 9, "se"
-aeentative Ruling Sherr p 8 29
by C. 17
% Exeo, ive On Z—
Judge Randolph M. Hammock, Department 47 Fie oH/erp
a, Copy,
HEARING DATE: October 18, 2017 TRIAL DATE: March 27, 20 18
. CASE: Roger Lee Harrison v. Taylor Hackford, et al. ©
CASE NO.: BC596850


The text file will always say "Judge FirstName LastName".



This is the code I tried:



def get_judge_name(judge_file_name):
j = open("{}{}".format(PATH, judge_file_name), "r")
judge_contents = j.read()
j.close()
judge = re.search('Judge (.*?)([A-Z]{2,})', judge_contents)
print(judge)


I expected an output of Judge Randolph M. Hammock but got None.










share|improve this question

























  • Probably, Judge [^,]+ will be enough.

    – Wiktor Stribiżew
    Jan 1 at 21:20













  • Thank you! That works.

    – rachelvsamuel
    Jan 1 at 21:39











  • What if there is no "," after the judge? Is there a regular expression to capture the entire name?

    – rachelvsamuel
    Jan 1 at 21:40






  • 1





    It is hard to say, since you provide no exact specs. Judge(?: +[A-Z][^Wd_]*.?)+ might work, but you might need to have some stopwords to add at the end of the regex. Like Department, etc. See regex101.com/r/3wB5z5/3

    – Wiktor Stribiżew
    Jan 1 at 21:43













  • Thank you! I tried it.

    – rachelvsamuel
    Jan 1 at 21:55














-1












-1








-1








I want to capture a judge's name that is surrounded by a bunch of text.



Here is some sample text text:



® @ Stperio,l LED
>
Cay
OCT 9, "se"
-aeentative Ruling Sherr p 8 29
by C. 17
% Exeo, ive On Z—
Judge Randolph M. Hammock, Department 47 Fie oH/erp
a, Copy,
HEARING DATE: October 18, 2017 TRIAL DATE: March 27, 20 18
. CASE: Roger Lee Harrison v. Taylor Hackford, et al. ©
CASE NO.: BC596850


The text file will always say "Judge FirstName LastName".



This is the code I tried:



def get_judge_name(judge_file_name):
j = open("{}{}".format(PATH, judge_file_name), "r")
judge_contents = j.read()
j.close()
judge = re.search('Judge (.*?)([A-Z]{2,})', judge_contents)
print(judge)


I expected an output of Judge Randolph M. Hammock but got None.










share|improve this question
















I want to capture a judge's name that is surrounded by a bunch of text.



Here is some sample text text:



® @ Stperio,l LED
>
Cay
OCT 9, "se"
-aeentative Ruling Sherr p 8 29
by C. 17
% Exeo, ive On Z—
Judge Randolph M. Hammock, Department 47 Fie oH/erp
a, Copy,
HEARING DATE: October 18, 2017 TRIAL DATE: March 27, 20 18
. CASE: Roger Lee Harrison v. Taylor Hackford, et al. ©
CASE NO.: BC596850


The text file will always say "Judge FirstName LastName".



This is the code I tried:



def get_judge_name(judge_file_name):
j = open("{}{}".format(PATH, judge_file_name), "r")
judge_contents = j.read()
j.close()
judge = re.search('Judge (.*?)([A-Z]{2,})', judge_contents)
print(judge)


I expected an output of Judge Randolph M. Hammock but got None.







python regex






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 1 at 21:24









ggorlen

7,5283826




7,5283826










asked Jan 1 at 21:13









rachelvsamuelrachelvsamuel

5610




5610













  • Probably, Judge [^,]+ will be enough.

    – Wiktor Stribiżew
    Jan 1 at 21:20













  • Thank you! That works.

    – rachelvsamuel
    Jan 1 at 21:39











  • What if there is no "," after the judge? Is there a regular expression to capture the entire name?

    – rachelvsamuel
    Jan 1 at 21:40






  • 1





    It is hard to say, since you provide no exact specs. Judge(?: +[A-Z][^Wd_]*.?)+ might work, but you might need to have some stopwords to add at the end of the regex. Like Department, etc. See regex101.com/r/3wB5z5/3

    – Wiktor Stribiżew
    Jan 1 at 21:43













  • Thank you! I tried it.

    – rachelvsamuel
    Jan 1 at 21:55



















  • Probably, Judge [^,]+ will be enough.

    – Wiktor Stribiżew
    Jan 1 at 21:20













  • Thank you! That works.

    – rachelvsamuel
    Jan 1 at 21:39











  • What if there is no "," after the judge? Is there a regular expression to capture the entire name?

    – rachelvsamuel
    Jan 1 at 21:40






  • 1





    It is hard to say, since you provide no exact specs. Judge(?: +[A-Z][^Wd_]*.?)+ might work, but you might need to have some stopwords to add at the end of the regex. Like Department, etc. See regex101.com/r/3wB5z5/3

    – Wiktor Stribiżew
    Jan 1 at 21:43













  • Thank you! I tried it.

    – rachelvsamuel
    Jan 1 at 21:55

















Probably, Judge [^,]+ will be enough.

– Wiktor Stribiżew
Jan 1 at 21:20







Probably, Judge [^,]+ will be enough.

– Wiktor Stribiżew
Jan 1 at 21:20















Thank you! That works.

– rachelvsamuel
Jan 1 at 21:39





Thank you! That works.

– rachelvsamuel
Jan 1 at 21:39













What if there is no "," after the judge? Is there a regular expression to capture the entire name?

– rachelvsamuel
Jan 1 at 21:40





What if there is no "," after the judge? Is there a regular expression to capture the entire name?

– rachelvsamuel
Jan 1 at 21:40




1




1





It is hard to say, since you provide no exact specs. Judge(?: +[A-Z][^Wd_]*.?)+ might work, but you might need to have some stopwords to add at the end of the regex. Like Department, etc. See regex101.com/r/3wB5z5/3

– Wiktor Stribiżew
Jan 1 at 21:43







It is hard to say, since you provide no exact specs. Judge(?: +[A-Z][^Wd_]*.?)+ might work, but you might need to have some stopwords to add at the end of the regex. Like Department, etc. See regex101.com/r/3wB5z5/3

– Wiktor Stribiżew
Jan 1 at 21:43















Thank you! I tried it.

– rachelvsamuel
Jan 1 at 21:55





Thank you! I tried it.

– rachelvsamuel
Jan 1 at 21:55












2 Answers
2






active

oldest

votes


















1














import re
# st is your string
>>> m = re.search(r"Judge ([^,]*)", st)
>>> m.group(0)
'Judge Randolph M. Hammock'
>>> m.group(1)
'Randolph M. Hammock'


I dont know which one you want but this might do the job






share|improve this answer
























  • What if there is no "," after the judge? Is there a regular expression to capture the entire name?

    – rachelvsamuel
    Jan 1 at 21:44



















0















What if there is no "," after the judge? Is there a regular expression to capture the entire name?




Understanding the problem



It really all depends on the variety of the judge's names and their formatting. If there's nothing semantically distinct between the name of the judge and the text afterwards, then any solution will, by the very nature of the task, be too loose or too strict about how much text it captures (unless you trained a machine learning algorithm to recognize the properties of names in the dataset you're using, which would almost certainly be more time than it's worth unless your program requires quickly and accurately scraping large datasets of judge's names; a database of judge's names would probably be the most practical solution, but it would require maintenance).



This task does one aspect in its favor: we know that the name will always start with the word "Judge". This also means we have to be careful to never discard text starting with "Judge".



Possible regex solutions



Wiktor Stribiżew's solution is a reasonable approximation:



Judge(?: +[A-Z][^Wd_]*.?)+



It also has the nice effect of allowing more diverse Unicode letters with [^Wd_], when my more ASCII-centric approach would have been [A-Za-z] (note that some it still does not account for some names beginning with letters outside of the capital ASCII letters, though this is probably less common.) The suggested expansion with stopwords also has potential.



However, it has one major flaw: it discards some text containing the word "Judge" if the text isn't followed by a word matching the criteria. I would modify it to use a * quantifier instead of a + quantifier:



Judge(?: +[A-Z][^Wd_]*.?)*



Also, I would take a different approach with the stopwords and assume that the judge's name won't contain a stopword, instead of searching for a stopword after the judge's name. This is more efficient but will also ignore part of a judge's name if it happens to be a stopword:



Judge(?: +(?!(?:Department|OtherStopword)b)[A-Z][^Wd_]*.?)*



Takeaway



In the end, unless a standardized format is followed by the source documents, this is all an approximation. That's why standardized formats often make things easier for programmers.



Errata



If Python's built-in re module supported it, I would have change the space character's + quantifier to a possessive ++ for increased efficiency. The third-party regex module can handle more sophisticated regex patterns.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53998986%2fhow-do-i-find-judge-randolph-m-hammock-with-regular-expressions%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    import re
    # st is your string
    >>> m = re.search(r"Judge ([^,]*)", st)
    >>> m.group(0)
    'Judge Randolph M. Hammock'
    >>> m.group(1)
    'Randolph M. Hammock'


    I dont know which one you want but this might do the job






    share|improve this answer
























    • What if there is no "," after the judge? Is there a regular expression to capture the entire name?

      – rachelvsamuel
      Jan 1 at 21:44
















    1














    import re
    # st is your string
    >>> m = re.search(r"Judge ([^,]*)", st)
    >>> m.group(0)
    'Judge Randolph M. Hammock'
    >>> m.group(1)
    'Randolph M. Hammock'


    I dont know which one you want but this might do the job






    share|improve this answer
























    • What if there is no "," after the judge? Is there a regular expression to capture the entire name?

      – rachelvsamuel
      Jan 1 at 21:44














    1












    1








    1







    import re
    # st is your string
    >>> m = re.search(r"Judge ([^,]*)", st)
    >>> m.group(0)
    'Judge Randolph M. Hammock'
    >>> m.group(1)
    'Randolph M. Hammock'


    I dont know which one you want but this might do the job






    share|improve this answer













    import re
    # st is your string
    >>> m = re.search(r"Judge ([^,]*)", st)
    >>> m.group(0)
    'Judge Randolph M. Hammock'
    >>> m.group(1)
    'Randolph M. Hammock'


    I dont know which one you want but this might do the job







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Jan 1 at 21:32









    barkın evginbarkın evgin

    715




    715













    • What if there is no "," after the judge? Is there a regular expression to capture the entire name?

      – rachelvsamuel
      Jan 1 at 21:44



















    • What if there is no "," after the judge? Is there a regular expression to capture the entire name?

      – rachelvsamuel
      Jan 1 at 21:44

















    What if there is no "," after the judge? Is there a regular expression to capture the entire name?

    – rachelvsamuel
    Jan 1 at 21:44





    What if there is no "," after the judge? Is there a regular expression to capture the entire name?

    – rachelvsamuel
    Jan 1 at 21:44













    0















    What if there is no "," after the judge? Is there a regular expression to capture the entire name?




    Understanding the problem



    It really all depends on the variety of the judge's names and their formatting. If there's nothing semantically distinct between the name of the judge and the text afterwards, then any solution will, by the very nature of the task, be too loose or too strict about how much text it captures (unless you trained a machine learning algorithm to recognize the properties of names in the dataset you're using, which would almost certainly be more time than it's worth unless your program requires quickly and accurately scraping large datasets of judge's names; a database of judge's names would probably be the most practical solution, but it would require maintenance).



    This task does one aspect in its favor: we know that the name will always start with the word "Judge". This also means we have to be careful to never discard text starting with "Judge".



    Possible regex solutions



    Wiktor Stribiżew's solution is a reasonable approximation:



    Judge(?: +[A-Z][^Wd_]*.?)+



    It also has the nice effect of allowing more diverse Unicode letters with [^Wd_], when my more ASCII-centric approach would have been [A-Za-z] (note that some it still does not account for some names beginning with letters outside of the capital ASCII letters, though this is probably less common.) The suggested expansion with stopwords also has potential.



    However, it has one major flaw: it discards some text containing the word "Judge" if the text isn't followed by a word matching the criteria. I would modify it to use a * quantifier instead of a + quantifier:



    Judge(?: +[A-Z][^Wd_]*.?)*



    Also, I would take a different approach with the stopwords and assume that the judge's name won't contain a stopword, instead of searching for a stopword after the judge's name. This is more efficient but will also ignore part of a judge's name if it happens to be a stopword:



    Judge(?: +(?!(?:Department|OtherStopword)b)[A-Z][^Wd_]*.?)*



    Takeaway



    In the end, unless a standardized format is followed by the source documents, this is all an approximation. That's why standardized formats often make things easier for programmers.



    Errata



    If Python's built-in re module supported it, I would have change the space character's + quantifier to a possessive ++ for increased efficiency. The third-party regex module can handle more sophisticated regex patterns.






    share|improve this answer




























      0















      What if there is no "," after the judge? Is there a regular expression to capture the entire name?




      Understanding the problem



      It really all depends on the variety of the judge's names and their formatting. If there's nothing semantically distinct between the name of the judge and the text afterwards, then any solution will, by the very nature of the task, be too loose or too strict about how much text it captures (unless you trained a machine learning algorithm to recognize the properties of names in the dataset you're using, which would almost certainly be more time than it's worth unless your program requires quickly and accurately scraping large datasets of judge's names; a database of judge's names would probably be the most practical solution, but it would require maintenance).



      This task does one aspect in its favor: we know that the name will always start with the word "Judge". This also means we have to be careful to never discard text starting with "Judge".



      Possible regex solutions



      Wiktor Stribiżew's solution is a reasonable approximation:



      Judge(?: +[A-Z][^Wd_]*.?)+



      It also has the nice effect of allowing more diverse Unicode letters with [^Wd_], when my more ASCII-centric approach would have been [A-Za-z] (note that some it still does not account for some names beginning with letters outside of the capital ASCII letters, though this is probably less common.) The suggested expansion with stopwords also has potential.



      However, it has one major flaw: it discards some text containing the word "Judge" if the text isn't followed by a word matching the criteria. I would modify it to use a * quantifier instead of a + quantifier:



      Judge(?: +[A-Z][^Wd_]*.?)*



      Also, I would take a different approach with the stopwords and assume that the judge's name won't contain a stopword, instead of searching for a stopword after the judge's name. This is more efficient but will also ignore part of a judge's name if it happens to be a stopword:



      Judge(?: +(?!(?:Department|OtherStopword)b)[A-Z][^Wd_]*.?)*



      Takeaway



      In the end, unless a standardized format is followed by the source documents, this is all an approximation. That's why standardized formats often make things easier for programmers.



      Errata



      If Python's built-in re module supported it, I would have change the space character's + quantifier to a possessive ++ for increased efficiency. The third-party regex module can handle more sophisticated regex patterns.






      share|improve this answer


























        0












        0








        0








        What if there is no "," after the judge? Is there a regular expression to capture the entire name?




        Understanding the problem



        It really all depends on the variety of the judge's names and their formatting. If there's nothing semantically distinct between the name of the judge and the text afterwards, then any solution will, by the very nature of the task, be too loose or too strict about how much text it captures (unless you trained a machine learning algorithm to recognize the properties of names in the dataset you're using, which would almost certainly be more time than it's worth unless your program requires quickly and accurately scraping large datasets of judge's names; a database of judge's names would probably be the most practical solution, but it would require maintenance).



        This task does one aspect in its favor: we know that the name will always start with the word "Judge". This also means we have to be careful to never discard text starting with "Judge".



        Possible regex solutions



        Wiktor Stribiżew's solution is a reasonable approximation:



        Judge(?: +[A-Z][^Wd_]*.?)+



        It also has the nice effect of allowing more diverse Unicode letters with [^Wd_], when my more ASCII-centric approach would have been [A-Za-z] (note that some it still does not account for some names beginning with letters outside of the capital ASCII letters, though this is probably less common.) The suggested expansion with stopwords also has potential.



        However, it has one major flaw: it discards some text containing the word "Judge" if the text isn't followed by a word matching the criteria. I would modify it to use a * quantifier instead of a + quantifier:



        Judge(?: +[A-Z][^Wd_]*.?)*



        Also, I would take a different approach with the stopwords and assume that the judge's name won't contain a stopword, instead of searching for a stopword after the judge's name. This is more efficient but will also ignore part of a judge's name if it happens to be a stopword:



        Judge(?: +(?!(?:Department|OtherStopword)b)[A-Z][^Wd_]*.?)*



        Takeaway



        In the end, unless a standardized format is followed by the source documents, this is all an approximation. That's why standardized formats often make things easier for programmers.



        Errata



        If Python's built-in re module supported it, I would have change the space character's + quantifier to a possessive ++ for increased efficiency. The third-party regex module can handle more sophisticated regex patterns.






        share|improve this answer














        What if there is no "," after the judge? Is there a regular expression to capture the entire name?




        Understanding the problem



        It really all depends on the variety of the judge's names and their formatting. If there's nothing semantically distinct between the name of the judge and the text afterwards, then any solution will, by the very nature of the task, be too loose or too strict about how much text it captures (unless you trained a machine learning algorithm to recognize the properties of names in the dataset you're using, which would almost certainly be more time than it's worth unless your program requires quickly and accurately scraping large datasets of judge's names; a database of judge's names would probably be the most practical solution, but it would require maintenance).



        This task does one aspect in its favor: we know that the name will always start with the word "Judge". This also means we have to be careful to never discard text starting with "Judge".



        Possible regex solutions



        Wiktor Stribiżew's solution is a reasonable approximation:



        Judge(?: +[A-Z][^Wd_]*.?)+



        It also has the nice effect of allowing more diverse Unicode letters with [^Wd_], when my more ASCII-centric approach would have been [A-Za-z] (note that some it still does not account for some names beginning with letters outside of the capital ASCII letters, though this is probably less common.) The suggested expansion with stopwords also has potential.



        However, it has one major flaw: it discards some text containing the word "Judge" if the text isn't followed by a word matching the criteria. I would modify it to use a * quantifier instead of a + quantifier:



        Judge(?: +[A-Z][^Wd_]*.?)*



        Also, I would take a different approach with the stopwords and assume that the judge's name won't contain a stopword, instead of searching for a stopword after the judge's name. This is more efficient but will also ignore part of a judge's name if it happens to be a stopword:



        Judge(?: +(?!(?:Department|OtherStopword)b)[A-Z][^Wd_]*.?)*



        Takeaway



        In the end, unless a standardized format is followed by the source documents, this is all an approximation. That's why standardized formats often make things easier for programmers.



        Errata



        If Python's built-in re module supported it, I would have change the space character's + quantifier to a possessive ++ for increased efficiency. The third-party regex module can handle more sophisticated regex patterns.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 2 at 1:11









        GrahamGraham

        626419




        626419






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53998986%2fhow-do-i-find-judge-randolph-m-hammock-with-regular-expressions%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            'app-layout' is not a known element: how to share Component with different Modules

            android studio warns about leanback feature tag usage required on manifest while using Unity exported app?

            WPF add header to Image with URL pettitions [duplicate]