How do I find “Judge Randolph M. Hammock” with regular expressions?

-1

I want to capture a judge's name that is surrounded by a bunch of text.

Here is some sample text text:

® @ Stperio,l LED

>

Cay

OCT 9, "se"

-aeentative Ruling Sherr p 8 29

by C. 17

% Exeo, ive On Z—

Judge Randolph M. Hammock, Department 47 Fie oH/erp

a, Copy,

HEARING DATE: October 18, 2017 TRIAL DATE: March 27, 20 18

. CASE: Roger Lee Harrison v. Taylor Hackford, et al. ©

CASE NO.: BC596850

The text file will always say "Judge FirstName LastName".

This is the code I tried:

def get_judge_name(judge_file_name):

    j = open("{}{}".format(PATH, judge_file_name), "r")

    judge_contents = j.read()

    j.close()

    judge = re.search('Judge (.*?)([A-Z]{2,})', judge_contents)

    print(judge)

I expected an output of Judge Randolph M. Hammock but got None.

edited Jan 1 at 21:24

ggorlen

7,5283826

asked Jan 1 at 21:13

rachelvsamuel

5610

Probably, Judge [^,]+ will be enough.

– Wiktor Stribiżew
Jan 1 at 21:20

Thank you! That works.

– rachelvsamuel
Jan 1 at 21:39

What if there is no "," after the judge? Is there a regular expression to capture the entire name?

– rachelvsamuel
Jan 1 at 21:40

1

It is hard to say, since you provide no exact specs. Judge(?: +[A-Z][^Wd_]*.?)+ might work, but you might need to have some stopwords to add at the end of the regex. Like Department, etc. See regex101.com/r/3wB5z5/3

– Wiktor Stribiżew
Jan 1 at 21:43

Thank you! I tried it.

– rachelvsamuel
Jan 1 at 21:55

|
show 1 more comment

-1

I want to capture a judge's name that is surrounded by a bunch of text.

Here is some sample text text:

® @ Stperio,l LED

>

Cay

OCT 9, "se"

-aeentative Ruling Sherr p 8 29

by C. 17

% Exeo, ive On Z—

Judge Randolph M. Hammock, Department 47 Fie oH/erp

a, Copy,

HEARING DATE: October 18, 2017 TRIAL DATE: March 27, 20 18

. CASE: Roger Lee Harrison v. Taylor Hackford, et al. ©

CASE NO.: BC596850

The text file will always say "Judge FirstName LastName".

This is the code I tried:

def get_judge_name(judge_file_name):

    j = open("{}{}".format(PATH, judge_file_name), "r")

    judge_contents = j.read()

    j.close()

    judge = re.search('Judge (.*?)([A-Z]{2,})', judge_contents)

    print(judge)

I expected an output of Judge Randolph M. Hammock but got None.

edited Jan 1 at 21:24

ggorlen

7,5283826

asked Jan 1 at 21:13

rachelvsamuel

5610

Probably, Judge [^,]+ will be enough.

– Wiktor Stribiżew
Jan 1 at 21:20

Thank you! That works.

– rachelvsamuel
Jan 1 at 21:39

What if there is no "," after the judge? Is there a regular expression to capture the entire name?

– rachelvsamuel
Jan 1 at 21:40

1

It is hard to say, since you provide no exact specs. Judge(?: +[A-Z][^Wd_]*.?)+ might work, but you might need to have some stopwords to add at the end of the regex. Like Department, etc. See regex101.com/r/3wB5z5/3

– Wiktor Stribiżew
Jan 1 at 21:43

Thank you! I tried it.

– rachelvsamuel
Jan 1 at 21:55

|
show 1 more comment

-1

I want to capture a judge's name that is surrounded by a bunch of text.

Here is some sample text text:

® @ Stperio,l LED

>

Cay

OCT 9, "se"

-aeentative Ruling Sherr p 8 29

by C. 17

% Exeo, ive On Z—

Judge Randolph M. Hammock, Department 47 Fie oH/erp

a, Copy,

HEARING DATE: October 18, 2017 TRIAL DATE: March 27, 20 18

. CASE: Roger Lee Harrison v. Taylor Hackford, et al. ©

CASE NO.: BC596850

The text file will always say "Judge FirstName LastName".

This is the code I tried:

def get_judge_name(judge_file_name):

    j = open("{}{}".format(PATH, judge_file_name), "r")

    judge_contents = j.read()

    j.close()

    judge = re.search('Judge (.*?)([A-Z]{2,})', judge_contents)

    print(judge)

I expected an output of Judge Randolph M. Hammock but got None.

edited Jan 1 at 21:24

ggorlen

7,5283826

asked Jan 1 at 21:13

rachelvsamuel

5610

I want to capture a judge's name that is surrounded by a bunch of text.

Here is some sample text text:

® @ Stperio,l LED

>

Cay

OCT 9, "se"

-aeentative Ruling Sherr p 8 29

by C. 17

% Exeo, ive On Z—

Judge Randolph M. Hammock, Department 47 Fie oH/erp

a, Copy,

HEARING DATE: October 18, 2017 TRIAL DATE: March 27, 20 18

. CASE: Roger Lee Harrison v. Taylor Hackford, et al. ©

CASE NO.: BC596850

The text file will always say "Judge FirstName LastName".

This is the code I tried:

def get_judge_name(judge_file_name):

    j = open("{}{}".format(PATH, judge_file_name), "r")

    judge_contents = j.read()

    j.close()

    judge = re.search('Judge (.*?)([A-Z]{2,})', judge_contents)

    print(judge)

I expected an output of Judge Randolph M. Hammock but got None.

python regex

edited Jan 1 at 21:24

ggorlen

7,5283826

asked Jan 1 at 21:13

rachelvsamuel

5610

edited Jan 1 at 21:24

ggorlen

7,5283826

asked Jan 1 at 21:13

rachelvsamuel

5610

edited Jan 1 at 21:24

ggorlen

7,5283826

edited Jan 1 at 21:24

ggorlen

7,5283826

edited Jan 1 at 21:24

ggorlen

7,5283826

asked Jan 1 at 21:13

rachelvsamuel

5610

asked Jan 1 at 21:13

rachelvsamuel

5610

asked Jan 1 at 21:13

rachelvsamuel

5610

Probably, Judge [^,]+ will be enough.

– Wiktor Stribiżew
Jan 1 at 21:20

Thank you! That works.

– rachelvsamuel
Jan 1 at 21:39

What if there is no "," after the judge? Is there a regular expression to capture the entire name?

– rachelvsamuel
Jan 1 at 21:40

1

It is hard to say, since you provide no exact specs. Judge(?: +[A-Z][^Wd_]*.?)+ might work, but you might need to have some stopwords to add at the end of the regex. Like Department, etc. See regex101.com/r/3wB5z5/3

– Wiktor Stribiżew
Jan 1 at 21:43

Thank you! I tried it.

– rachelvsamuel
Jan 1 at 21:55

|
show 1 more comment

Probably, Judge [^,]+ will be enough.

– Wiktor Stribiżew
Jan 1 at 21:20

Thank you! That works.

– rachelvsamuel
Jan 1 at 21:39

What if there is no "," after the judge? Is there a regular expression to capture the entire name?

– rachelvsamuel
Jan 1 at 21:40

1

It is hard to say, since you provide no exact specs. Judge(?: +[A-Z][^Wd_]*.?)+ might work, but you might need to have some stopwords to add at the end of the regex. Like Department, etc. See regex101.com/r/3wB5z5/3

– Wiktor Stribiżew
Jan 1 at 21:43

Thank you! I tried it.

– rachelvsamuel
Jan 1 at 21:55

Probably, Judge [^,]+ will be enough.

– Wiktor Stribiżew
Jan 1 at 21:20

Thank you! That works.

– rachelvsamuel
Jan 1 at 21:39

What if there is no "," after the judge? Is there a regular expression to capture the entire name?

– rachelvsamuel
Jan 1 at 21:40

It is hard to say, since you provide no exact specs. Judge(?: +[A-Z][^Wd_]*.?)+ might work, but you might need to have some stopwords to add at the end of the regex. Like Department, etc. See regex101.com/r/3wB5z5/3

– Wiktor Stribiżew
Jan 1 at 21:43

Thank you! I tried it.

– rachelvsamuel
Jan 1 at 21:55

|
show 1 more comment

2 Answers
2

active

oldest

votes

import re

# st is your string

>>> m = re.search(r"Judge ([^,]*)", st)

>>> m.group(0)

'Judge Randolph M. Hammock'

>>> m.group(1)

'Randolph M. Hammock'

I dont know which one you want but this might do the job

answered Jan 1 at 21:32

barkın evgin

715

What if there is no "," after the judge? Is there a regular expression to capture the entire name?

– rachelvsamuel
Jan 1 at 21:44

add a comment |

What if there is no "," after the judge? Is there a regular expression to capture the entire name?

Understanding the problem

It really all depends on the variety of the judge's names and their formatting. If there's nothing semantically distinct between the name of the judge and the text afterwards, then any solution will, by the very nature of the task, be too loose or too strict about how much text it captures (unless you trained a machine learning algorithm to recognize the properties of names in the dataset you're using, which would almost certainly be more time than it's worth unless your program requires quickly and accurately scraping large datasets of judge's names; a database of judge's names would probably be the most practical solution, but it would require maintenance).

This task does one aspect in its favor: we know that the name will always start with the word "Judge". This also means we have to be careful to never discard text starting with "Judge".

Possible regex solutions

Wiktor Stribiżew's solution is a reasonable approximation:

Judge(?: +[A-Z][^Wd_]*.?)+

It also has the nice effect of allowing more diverse Unicode letters with [^Wd_], when my more ASCII-centric approach would have been [A-Za-z] (note that some it still does not account for some names beginning with letters outside of the capital ASCII letters, though this is probably less common.) The suggested expansion with stopwords also has potential.

However, it has one major flaw: it discards some text containing the word "Judge" if the text isn't followed by a word matching the criteria. I would modify it to use a * quantifier instead of a + quantifier:

Judge(?: +[A-Z][^Wd_]*.?)*

Also, I would take a different approach with the stopwords and assume that the judge's name won't contain a stopword, instead of searching for a stopword after the judge's name. This is more efficient but will also ignore part of a judge's name if it happens to be a stopword:

Judge(?: +(?!(?:Department|OtherStopword)b)[A-Z][^Wd_]*.?)*

Takeaway

In the end, unless a standardized format is followed by the source documents, this is all an approximation. That's why standardized formats often make things easier for programmers.

Errata

If Python's built-in re module supported it, I would have change the space character's + quantifier to a possessive ++ for increased efficiency. The third-party regex module can handle more sophisticated regex patterns.

answered Jan 2 at 1:11

Graham

626419

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53998986%2fhow-do-i-find-judge-randolph-m-hammock-with-regular-expressions%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

import re

# st is your string

>>> m = re.search(r"Judge ([^,]*)", st)

>>> m.group(0)

'Judge Randolph M. Hammock'

>>> m.group(1)

'Randolph M. Hammock'

I dont know which one you want but this might do the job

answered Jan 1 at 21:32

barkın evgin

715

What if there is no "," after the judge? Is there a regular expression to capture the entire name?

– rachelvsamuel
Jan 1 at 21:44

add a comment |

import re

# st is your string

>>> m = re.search(r"Judge ([^,]*)", st)

>>> m.group(0)

'Judge Randolph M. Hammock'

>>> m.group(1)

'Randolph M. Hammock'

I dont know which one you want but this might do the job

answered Jan 1 at 21:32

barkın evgin

715

What if there is no "," after the judge? Is there a regular expression to capture the entire name?

– rachelvsamuel
Jan 1 at 21:44

add a comment |

import re

# st is your string

>>> m = re.search(r"Judge ([^,]*)", st)

>>> m.group(0)

'Judge Randolph M. Hammock'

>>> m.group(1)

'Randolph M. Hammock'

I dont know which one you want but this might do the job

answered Jan 1 at 21:32

barkın evgin

715

import re

# st is your string

>>> m = re.search(r"Judge ([^,]*)", st)

>>> m.group(0)

'Judge Randolph M. Hammock'

>>> m.group(1)

'Randolph M. Hammock'

I dont know which one you want but this might do the job

answered Jan 1 at 21:32

barkın evgin

715

answered Jan 1 at 21:32

barkın evgin

715

answered Jan 1 at 21:32

barkın evgin

715

answered Jan 1 at 21:32

barkın evgin

715

What if there is no "," after the judge? Is there a regular expression to capture the entire name?

– rachelvsamuel
Jan 1 at 21:44

add a comment |

What if there is no "," after the judge? Is there a regular expression to capture the entire name?

– rachelvsamuel
Jan 1 at 21:44

What if there is no "," after the judge? Is there a regular expression to capture the entire name?

– rachelvsamuel
Jan 1 at 21:44

add a comment |

What if there is no "," after the judge? Is there a regular expression to capture the entire name?

Understanding the problem

This task does one aspect in its favor: we know that the name will always start with the word "Judge". This also means we have to be careful to never discard text starting with "Judge".

Possible regex solutions

Wiktor Stribiżew's solution is a reasonable approximation:

Judge(?: +[A-Z][^Wd_]*.?)+

Judge(?: +[A-Z][^Wd_]*.?)*

Judge(?: +(?!(?:Department|OtherStopword)b)[A-Z][^Wd_]*.?)*

Takeaway

In the end, unless a standardized format is followed by the source documents, this is all an approximation. That's why standardized formats often make things easier for programmers.

Errata

answered Jan 2 at 1:11

Graham

626419

add a comment |

What if there is no "," after the judge? Is there a regular expression to capture the entire name?

Understanding the problem

This task does one aspect in its favor: we know that the name will always start with the word "Judge". This also means we have to be careful to never discard text starting with "Judge".

Possible regex solutions

Wiktor Stribiżew's solution is a reasonable approximation:

Judge(?: +[A-Z][^Wd_]*.?)+

Judge(?: +[A-Z][^Wd_]*.?)*

Judge(?: +(?!(?:Department|OtherStopword)b)[A-Z][^Wd_]*.?)*

Takeaway

In the end, unless a standardized format is followed by the source documents, this is all an approximation. That's why standardized formats often make things easier for programmers.

Errata

answered Jan 2 at 1:11

Graham

626419

add a comment |

What if there is no "," after the judge? Is there a regular expression to capture the entire name?

Understanding the problem

This task does one aspect in its favor: we know that the name will always start with the word "Judge". This also means we have to be careful to never discard text starting with "Judge".

Possible regex solutions

Wiktor Stribiżew's solution is a reasonable approximation:

Judge(?: +[A-Z][^Wd_]*.?)+

Judge(?: +[A-Z][^Wd_]*.?)*

Judge(?: +(?!(?:Department|OtherStopword)b)[A-Z][^Wd_]*.?)*

Takeaway

In the end, unless a standardized format is followed by the source documents, this is all an approximation. That's why standardized formats often make things easier for programmers.

Errata

answered Jan 2 at 1:11

Graham

626419

What if there is no "," after the judge? Is there a regular expression to capture the entire name?

Understanding the problem

This task does one aspect in its favor: we know that the name will always start with the word "Judge". This also means we have to be careful to never discard text starting with "Judge".

Possible regex solutions

Wiktor Stribiżew's solution is a reasonable approximation:

Judge(?: +[A-Z][^Wd_]*.?)+

Judge(?: +[A-Z][^Wd_]*.?)*

Judge(?: +(?!(?:Department|OtherStopword)b)[A-Z][^Wd_]*.?)*

Takeaway

In the end, unless a standardized format is followed by the source documents, this is all an approximation. That's why standardized formats often make things easier for programmers.

Errata

answered Jan 2 at 1:11

Graham

626419

answered Jan 2 at 1:11

Graham

626419

answered Jan 2 at 1:11

Graham

626419

answered Jan 2 at 1:11

Graham

626419

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu