Matcher for keyword and its children spacy

I have a set of keywords I am already matching for. It is a medical context so I've made up a equivalent scenario at least for the parsing I'm trying to do:

I have a car with chrome 1000-inch rims.

Let's say I want to return as a phrase all children words/tokens of the keyword rims where rims is already marked as an entity by SpaCy as CARPART.

In python this is what I'm doing:

test_phrases = nlp("""I have a car with chrome 100-inch rims.""")

print(test_phrases.cats)

for t in test_phrases:

    print('Token: {} || POS: {} || DEP: {} CHILDREN: {} || ent_type: {}'.format(t,t.pos_,t.dep_,[c for c in t.children],t.ent_type_))



Token: I || POS: PRON || DEP: nsubj CHILDREN:  || ent_type: 

Token: have || POS: VERB || DEP: ROOT CHILDREN: [I, car, .] || 

ent_type: 

Token: a || POS: DET || DEP: det CHILDREN:  || ent_type: 

Token: car || POS: NOUN || DEP: dobj CHILDREN: [a, with] || ent_type: 

Token: with || POS: ADP || DEP: prep CHILDREN: [rims] || ent_type: 

Token: chrome || POS: ADJ || DEP: amod CHILDREN:  || ent_type: 

Token: 100-inch || POS: NOUN || DEP: compound CHILDREN:  || ent_type: 

Token: rims || POS: NOUN || DEP: pobj CHILDREN: [chrome, 100-inch] || 

ent_type: 

Token: . || POS: PUNCT || DEP: punct CHILDREN:  || ent_type: CARPART

So, what I want to do is use is something like:

test_matcher = Matcher(nlp.vocab)



test_phrase = ['']

patterns = [[{'ENT':'CARPART',????}] for kp in test_phrase]

test_matcher.add('CARPHRASE', None, *patterns)

call the test_matcher on test_doc have it return:

chrome 100-inch rims

asked Nov 21 '18 at 0:44

Aus_10

12318

add a comment |

I have a set of keywords I am already matching for. It is a medical context so I've made up a equivalent scenario at least for the parsing I'm trying to do:

I have a car with chrome 1000-inch rims.

Let's say I want to return as a phrase all children words/tokens of the keyword rims where rims is already marked as an entity by SpaCy as CARPART.

In python this is what I'm doing:

test_phrases = nlp("""I have a car with chrome 100-inch rims.""")

print(test_phrases.cats)

for t in test_phrases:

    print('Token: {} || POS: {} || DEP: {} CHILDREN: {} || ent_type: {}'.format(t,t.pos_,t.dep_,[c for c in t.children],t.ent_type_))



Token: I || POS: PRON || DEP: nsubj CHILDREN:  || ent_type: 

Token: have || POS: VERB || DEP: ROOT CHILDREN: [I, car, .] || 

ent_type: 

Token: a || POS: DET || DEP: det CHILDREN:  || ent_type: 

Token: car || POS: NOUN || DEP: dobj CHILDREN: [a, with] || ent_type: 

Token: with || POS: ADP || DEP: prep CHILDREN: [rims] || ent_type: 

Token: chrome || POS: ADJ || DEP: amod CHILDREN:  || ent_type: 

Token: 100-inch || POS: NOUN || DEP: compound CHILDREN:  || ent_type: 

Token: rims || POS: NOUN || DEP: pobj CHILDREN: [chrome, 100-inch] || 

ent_type: 

Token: . || POS: PUNCT || DEP: punct CHILDREN:  || ent_type: CARPART

So, what I want to do is use is something like:

test_matcher = Matcher(nlp.vocab)



test_phrase = ['']

patterns = [[{'ENT':'CARPART',????}] for kp in test_phrase]

test_matcher.add('CARPHRASE', None, *patterns)

call the test_matcher on test_doc have it return:

chrome 100-inch rims

asked Nov 21 '18 at 0:44

Aus_10

12318

add a comment |

I have a set of keywords I am already matching for. It is a medical context so I've made up a equivalent scenario at least for the parsing I'm trying to do:

I have a car with chrome 1000-inch rims.

Let's say I want to return as a phrase all children words/tokens of the keyword rims where rims is already marked as an entity by SpaCy as CARPART.

In python this is what I'm doing:

test_phrases = nlp("""I have a car with chrome 100-inch rims.""")

print(test_phrases.cats)

for t in test_phrases:

    print('Token: {} || POS: {} || DEP: {} CHILDREN: {} || ent_type: {}'.format(t,t.pos_,t.dep_,[c for c in t.children],t.ent_type_))



Token: I || POS: PRON || DEP: nsubj CHILDREN:  || ent_type: 

Token: have || POS: VERB || DEP: ROOT CHILDREN: [I, car, .] || 

ent_type: 

Token: a || POS: DET || DEP: det CHILDREN:  || ent_type: 

Token: car || POS: NOUN || DEP: dobj CHILDREN: [a, with] || ent_type: 

Token: with || POS: ADP || DEP: prep CHILDREN: [rims] || ent_type: 

Token: chrome || POS: ADJ || DEP: amod CHILDREN:  || ent_type: 

Token: 100-inch || POS: NOUN || DEP: compound CHILDREN:  || ent_type: 

Token: rims || POS: NOUN || DEP: pobj CHILDREN: [chrome, 100-inch] || 

ent_type: 

Token: . || POS: PUNCT || DEP: punct CHILDREN:  || ent_type: CARPART

So, what I want to do is use is something like:

test_matcher = Matcher(nlp.vocab)



test_phrase = ['']

patterns = [[{'ENT':'CARPART',????}] for kp in test_phrase]

test_matcher.add('CARPHRASE', None, *patterns)

call the test_matcher on test_doc have it return:

chrome 100-inch rims

asked Nov 21 '18 at 0:44

Aus_10

12318

I have a set of keywords I am already matching for. It is a medical context so I've made up a equivalent scenario at least for the parsing I'm trying to do:

I have a car with chrome 1000-inch rims.

Let's say I want to return as a phrase all children words/tokens of the keyword rims where rims is already marked as an entity by SpaCy as CARPART.

In python this is what I'm doing:

test_phrases = nlp("""I have a car with chrome 100-inch rims.""")

print(test_phrases.cats)

for t in test_phrases:

    print('Token: {} || POS: {} || DEP: {} CHILDREN: {} || ent_type: {}'.format(t,t.pos_,t.dep_,[c for c in t.children],t.ent_type_))



Token: I || POS: PRON || DEP: nsubj CHILDREN:  || ent_type: 

Token: have || POS: VERB || DEP: ROOT CHILDREN: [I, car, .] || 

ent_type: 

Token: a || POS: DET || DEP: det CHILDREN:  || ent_type: 

Token: car || POS: NOUN || DEP: dobj CHILDREN: [a, with] || ent_type: 

Token: with || POS: ADP || DEP: prep CHILDREN: [rims] || ent_type: 

Token: chrome || POS: ADJ || DEP: amod CHILDREN:  || ent_type: 

Token: 100-inch || POS: NOUN || DEP: compound CHILDREN:  || ent_type: 

Token: rims || POS: NOUN || DEP: pobj CHILDREN: [chrome, 100-inch] || 

ent_type: 

Token: . || POS: PUNCT || DEP: punct CHILDREN:  || ent_type: CARPART

So, what I want to do is use is something like:

test_matcher = Matcher(nlp.vocab)



test_phrase = ['']

patterns = [[{'ENT':'CARPART',????}] for kp in test_phrase]

test_matcher.add('CARPHRASE', None, *patterns)

call the test_matcher on test_doc have it return:

chrome 100-inch rims

parsing nlp spacy

asked Nov 21 '18 at 0:44

Aus_10

12318

asked Nov 21 '18 at 0:44

Aus_10

12318

asked Nov 21 '18 at 0:44

Aus_10

12318

asked Nov 21 '18 at 0:44

Aus_10

12318

asked Nov 21 '18 at 0:44

Aus_10

12318

add a comment |

1 Answer
1

active

oldest

votes

I think I found a satisfactory solution that will work when creating a Spacy Class object. You can test this out to make sure it works with your solution then add to something like this in Spacy pipeline:

from spacy.matcher import Matcher



keyword_list = ['rims']

patterns = [[{'LOWER':kw}] for kw in keyword_list]



test_matcher.add('TESTPHRASE',None, *patterns)





 def add_children_matches(doc,keyword_matcher):

     '''Add children to match on original single-token keyword.'''

    matches = keyword_matcher(doc)

    for match_id, start, end in matches:

        tokens = doc[start:end]

        print('keyword:',tokens)

        # Since we are getting children for keyword, there should only be one token

        if len(tokens) != 1:

            print('Skipping {}. Too many tokens to match.'.format(tokens))

            continue

        keyword_token = tokens[0]

        sorted_children = sorted([c.i for c in keyword_token.children] + [keyword_token.i],reverse=False)

        print('keyphrase:',doc[min(sorted_children):max(sorted_children)+1])







doc = nlp("""I have a car with chrome 1000-inch rims.""")

add_children_matches(doc,test_matcher)

This gives:

keyword: rims

keyphrase: chrome 1000-inch rims

Edit: To fully answer my own question you'd have to use something like:

 def add_children_matches(doc,keyword_matcher):

     '''Add children to match on original single-token keyword.'''

    matches = keyword_matcher(doc)

    spans = 

    for match_id, start, end in matches:

        tokens = doc[start:end]

        print('keyword:',tokens)

        # Since we are getting children for keyword, there should only be one token

        if len(tokens) != 1:

            print('Skipping {}. Too many tokens to match.'.format(tokens))

            continue

        keyword_token = tokens[0]

        sorted_children = sorted([c.i for c in keyword_token.children] + [keyword_token.i],reverse=False)

        print('keyphrase:',doc[min(sorted_children):max(sorted_children)+1])





    start_char = doc[min(sorted_children):max(sorted_children)+1].start_char

    end_char = doc[min(sorted_children):max(sorted_children)+1].end_char



    span = doc.char_span(start_char, end_char,label='CARPHRASE')

    if span != None:

        spans.append(span)



    return doc

edited Nov 21 '18 at 2:40

answered Nov 21 '18 at 2:21

Aus_10

12318

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53403752%2fmatcher-for-keyword-and-its-children-spacy%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

from spacy.matcher import Matcher



keyword_list = ['rims']

patterns = [[{'LOWER':kw}] for kw in keyword_list]



test_matcher.add('TESTPHRASE',None, *patterns)





 def add_children_matches(doc,keyword_matcher):

     '''Add children to match on original single-token keyword.'''

    matches = keyword_matcher(doc)

    for match_id, start, end in matches:

        tokens = doc[start:end]

        print('keyword:',tokens)

        # Since we are getting children for keyword, there should only be one token

        if len(tokens) != 1:

            print('Skipping {}. Too many tokens to match.'.format(tokens))

            continue

        keyword_token = tokens[0]

        sorted_children = sorted([c.i for c in keyword_token.children] + [keyword_token.i],reverse=False)

        print('keyphrase:',doc[min(sorted_children):max(sorted_children)+1])







doc = nlp("""I have a car with chrome 1000-inch rims.""")

add_children_matches(doc,test_matcher)

This gives:

keyword: rims

keyphrase: chrome 1000-inch rims

Edit: To fully answer my own question you'd have to use something like:

 def add_children_matches(doc,keyword_matcher):

     '''Add children to match on original single-token keyword.'''

    matches = keyword_matcher(doc)

    spans = 

    for match_id, start, end in matches:

        tokens = doc[start:end]

        print('keyword:',tokens)

        # Since we are getting children for keyword, there should only be one token

        if len(tokens) != 1:

            print('Skipping {}. Too many tokens to match.'.format(tokens))

            continue

        keyword_token = tokens[0]

        sorted_children = sorted([c.i for c in keyword_token.children] + [keyword_token.i],reverse=False)

        print('keyphrase:',doc[min(sorted_children):max(sorted_children)+1])





    start_char = doc[min(sorted_children):max(sorted_children)+1].start_char

    end_char = doc[min(sorted_children):max(sorted_children)+1].end_char



    span = doc.char_span(start_char, end_char,label='CARPHRASE')

    if span != None:

        spans.append(span)



    return doc

edited Nov 21 '18 at 2:40

answered Nov 21 '18 at 2:21

Aus_10

12318

add a comment |

from spacy.matcher import Matcher



keyword_list = ['rims']

patterns = [[{'LOWER':kw}] for kw in keyword_list]



test_matcher.add('TESTPHRASE',None, *patterns)





 def add_children_matches(doc,keyword_matcher):

     '''Add children to match on original single-token keyword.'''

    matches = keyword_matcher(doc)

    for match_id, start, end in matches:

        tokens = doc[start:end]

        print('keyword:',tokens)

        # Since we are getting children for keyword, there should only be one token

        if len(tokens) != 1:

            print('Skipping {}. Too many tokens to match.'.format(tokens))

            continue

        keyword_token = tokens[0]

        sorted_children = sorted([c.i for c in keyword_token.children] + [keyword_token.i],reverse=False)

        print('keyphrase:',doc[min(sorted_children):max(sorted_children)+1])







doc = nlp("""I have a car with chrome 1000-inch rims.""")

add_children_matches(doc,test_matcher)

This gives:

keyword: rims

keyphrase: chrome 1000-inch rims

Edit: To fully answer my own question you'd have to use something like:

 def add_children_matches(doc,keyword_matcher):

     '''Add children to match on original single-token keyword.'''

    matches = keyword_matcher(doc)

    spans = 

    for match_id, start, end in matches:

        tokens = doc[start:end]

        print('keyword:',tokens)

        # Since we are getting children for keyword, there should only be one token

        if len(tokens) != 1:

            print('Skipping {}. Too many tokens to match.'.format(tokens))

            continue

        keyword_token = tokens[0]

        sorted_children = sorted([c.i for c in keyword_token.children] + [keyword_token.i],reverse=False)

        print('keyphrase:',doc[min(sorted_children):max(sorted_children)+1])





    start_char = doc[min(sorted_children):max(sorted_children)+1].start_char

    end_char = doc[min(sorted_children):max(sorted_children)+1].end_char



    span = doc.char_span(start_char, end_char,label='CARPHRASE')

    if span != None:

        spans.append(span)



    return doc

edited Nov 21 '18 at 2:40

answered Nov 21 '18 at 2:21

Aus_10

12318

add a comment |

from spacy.matcher import Matcher



keyword_list = ['rims']

patterns = [[{'LOWER':kw}] for kw in keyword_list]



test_matcher.add('TESTPHRASE',None, *patterns)





 def add_children_matches(doc,keyword_matcher):

     '''Add children to match on original single-token keyword.'''

    matches = keyword_matcher(doc)

    for match_id, start, end in matches:

        tokens = doc[start:end]

        print('keyword:',tokens)

        # Since we are getting children for keyword, there should only be one token

        if len(tokens) != 1:

            print('Skipping {}. Too many tokens to match.'.format(tokens))

            continue

        keyword_token = tokens[0]

        sorted_children = sorted([c.i for c in keyword_token.children] + [keyword_token.i],reverse=False)

        print('keyphrase:',doc[min(sorted_children):max(sorted_children)+1])







doc = nlp("""I have a car with chrome 1000-inch rims.""")

add_children_matches(doc,test_matcher)

This gives:

keyword: rims

keyphrase: chrome 1000-inch rims

Edit: To fully answer my own question you'd have to use something like:

 def add_children_matches(doc,keyword_matcher):

     '''Add children to match on original single-token keyword.'''

    matches = keyword_matcher(doc)

    spans = 

    for match_id, start, end in matches:

        tokens = doc[start:end]

        print('keyword:',tokens)

        # Since we are getting children for keyword, there should only be one token

        if len(tokens) != 1:

            print('Skipping {}. Too many tokens to match.'.format(tokens))

            continue

        keyword_token = tokens[0]

        sorted_children = sorted([c.i for c in keyword_token.children] + [keyword_token.i],reverse=False)

        print('keyphrase:',doc[min(sorted_children):max(sorted_children)+1])





    start_char = doc[min(sorted_children):max(sorted_children)+1].start_char

    end_char = doc[min(sorted_children):max(sorted_children)+1].end_char



    span = doc.char_span(start_char, end_char,label='CARPHRASE')

    if span != None:

        spans.append(span)



    return doc

edited Nov 21 '18 at 2:40

answered Nov 21 '18 at 2:21

Aus_10

12318

from spacy.matcher import Matcher



keyword_list = ['rims']

patterns = [[{'LOWER':kw}] for kw in keyword_list]



test_matcher.add('TESTPHRASE',None, *patterns)





 def add_children_matches(doc,keyword_matcher):

     '''Add children to match on original single-token keyword.'''

    matches = keyword_matcher(doc)

    for match_id, start, end in matches:

        tokens = doc[start:end]

        print('keyword:',tokens)

        # Since we are getting children for keyword, there should only be one token

        if len(tokens) != 1:

            print('Skipping {}. Too many tokens to match.'.format(tokens))

            continue

        keyword_token = tokens[0]

        sorted_children = sorted([c.i for c in keyword_token.children] + [keyword_token.i],reverse=False)

        print('keyphrase:',doc[min(sorted_children):max(sorted_children)+1])







doc = nlp("""I have a car with chrome 1000-inch rims.""")

add_children_matches(doc,test_matcher)

This gives:

keyword: rims

keyphrase: chrome 1000-inch rims

Edit: To fully answer my own question you'd have to use something like:

 def add_children_matches(doc,keyword_matcher):

     '''Add children to match on original single-token keyword.'''

    matches = keyword_matcher(doc)

    spans = 

    for match_id, start, end in matches:

        tokens = doc[start:end]

        print('keyword:',tokens)

        # Since we are getting children for keyword, there should only be one token

        if len(tokens) != 1:

            print('Skipping {}. Too many tokens to match.'.format(tokens))

            continue

        keyword_token = tokens[0]

        sorted_children = sorted([c.i for c in keyword_token.children] + [keyword_token.i],reverse=False)

        print('keyphrase:',doc[min(sorted_children):max(sorted_children)+1])





    start_char = doc[min(sorted_children):max(sorted_children)+1].start_char

    end_char = doc[min(sorted_children):max(sorted_children)+1].end_char



    span = doc.char_span(start_char, end_char,label='CARPHRASE')

    if span != None:

        spans.append(span)



    return doc

edited Nov 21 '18 at 2:40

answered Nov 21 '18 at 2:21

Aus_10

12318

edited Nov 21 '18 at 2:40

answered Nov 21 '18 at 2:21

Aus_10

12318

answered Nov 21 '18 at 2:21

Aus_10

12318

answered Nov 21 '18 at 2:21

Aus_10

12318

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu