Matcher for keyword and its children spacy
I have a set of keywords I am already matching for. It is a medical context so I've made up a equivalent scenario at least for the parsing I'm trying to do:
I have a car with chrome 1000-inch rims.
Let's say I want to return as a phrase all children words/tokens of the keyword rims where rims is already marked as an entity by SpaCy as CARPART.
In python this is what I'm doing:
test_phrases = nlp("""I have a car with chrome 100-inch rims.""")
print(test_phrases.cats)
for t in test_phrases:
print('Token: {} || POS: {} || DEP: {} CHILDREN: {} || ent_type: {}'.format(t,t.pos_,t.dep_,[c for c in t.children],t.ent_type_))
Token: I || POS: PRON || DEP: nsubj CHILDREN: || ent_type:
Token: have || POS: VERB || DEP: ROOT CHILDREN: [I, car, .] ||
ent_type:
Token: a || POS: DET || DEP: det CHILDREN: || ent_type:
Token: car || POS: NOUN || DEP: dobj CHILDREN: [a, with] || ent_type:
Token: with || POS: ADP || DEP: prep CHILDREN: [rims] || ent_type:
Token: chrome || POS: ADJ || DEP: amod CHILDREN: || ent_type:
Token: 100-inch || POS: NOUN || DEP: compound CHILDREN: || ent_type:
Token: rims || POS: NOUN || DEP: pobj CHILDREN: [chrome, 100-inch] ||
ent_type:
Token: . || POS: PUNCT || DEP: punct CHILDREN: || ent_type: CARPART
So, what I want to do is use is something like:
test_matcher = Matcher(nlp.vocab)
test_phrase = ['']
patterns = [[{'ENT':'CARPART',????}] for kp in test_phrase]
test_matcher.add('CARPHRASE', None, *patterns)
call the test_matcher on test_doc have it return:
chrome 100-inch rims
parsing nlp spacy
add a comment |
I have a set of keywords I am already matching for. It is a medical context so I've made up a equivalent scenario at least for the parsing I'm trying to do:
I have a car with chrome 1000-inch rims.
Let's say I want to return as a phrase all children words/tokens of the keyword rims where rims is already marked as an entity by SpaCy as CARPART.
In python this is what I'm doing:
test_phrases = nlp("""I have a car with chrome 100-inch rims.""")
print(test_phrases.cats)
for t in test_phrases:
print('Token: {} || POS: {} || DEP: {} CHILDREN: {} || ent_type: {}'.format(t,t.pos_,t.dep_,[c for c in t.children],t.ent_type_))
Token: I || POS: PRON || DEP: nsubj CHILDREN: || ent_type:
Token: have || POS: VERB || DEP: ROOT CHILDREN: [I, car, .] ||
ent_type:
Token: a || POS: DET || DEP: det CHILDREN: || ent_type:
Token: car || POS: NOUN || DEP: dobj CHILDREN: [a, with] || ent_type:
Token: with || POS: ADP || DEP: prep CHILDREN: [rims] || ent_type:
Token: chrome || POS: ADJ || DEP: amod CHILDREN: || ent_type:
Token: 100-inch || POS: NOUN || DEP: compound CHILDREN: || ent_type:
Token: rims || POS: NOUN || DEP: pobj CHILDREN: [chrome, 100-inch] ||
ent_type:
Token: . || POS: PUNCT || DEP: punct CHILDREN: || ent_type: CARPART
So, what I want to do is use is something like:
test_matcher = Matcher(nlp.vocab)
test_phrase = ['']
patterns = [[{'ENT':'CARPART',????}] for kp in test_phrase]
test_matcher.add('CARPHRASE', None, *patterns)
call the test_matcher on test_doc have it return:
chrome 100-inch rims
parsing nlp spacy
add a comment |
I have a set of keywords I am already matching for. It is a medical context so I've made up a equivalent scenario at least for the parsing I'm trying to do:
I have a car with chrome 1000-inch rims.
Let's say I want to return as a phrase all children words/tokens of the keyword rims where rims is already marked as an entity by SpaCy as CARPART.
In python this is what I'm doing:
test_phrases = nlp("""I have a car with chrome 100-inch rims.""")
print(test_phrases.cats)
for t in test_phrases:
print('Token: {} || POS: {} || DEP: {} CHILDREN: {} || ent_type: {}'.format(t,t.pos_,t.dep_,[c for c in t.children],t.ent_type_))
Token: I || POS: PRON || DEP: nsubj CHILDREN: || ent_type:
Token: have || POS: VERB || DEP: ROOT CHILDREN: [I, car, .] ||
ent_type:
Token: a || POS: DET || DEP: det CHILDREN: || ent_type:
Token: car || POS: NOUN || DEP: dobj CHILDREN: [a, with] || ent_type:
Token: with || POS: ADP || DEP: prep CHILDREN: [rims] || ent_type:
Token: chrome || POS: ADJ || DEP: amod CHILDREN: || ent_type:
Token: 100-inch || POS: NOUN || DEP: compound CHILDREN: || ent_type:
Token: rims || POS: NOUN || DEP: pobj CHILDREN: [chrome, 100-inch] ||
ent_type:
Token: . || POS: PUNCT || DEP: punct CHILDREN: || ent_type: CARPART
So, what I want to do is use is something like:
test_matcher = Matcher(nlp.vocab)
test_phrase = ['']
patterns = [[{'ENT':'CARPART',????}] for kp in test_phrase]
test_matcher.add('CARPHRASE', None, *patterns)
call the test_matcher on test_doc have it return:
chrome 100-inch rims
parsing nlp spacy
I have a set of keywords I am already matching for. It is a medical context so I've made up a equivalent scenario at least for the parsing I'm trying to do:
I have a car with chrome 1000-inch rims.
Let's say I want to return as a phrase all children words/tokens of the keyword rims where rims is already marked as an entity by SpaCy as CARPART.
In python this is what I'm doing:
test_phrases = nlp("""I have a car with chrome 100-inch rims.""")
print(test_phrases.cats)
for t in test_phrases:
print('Token: {} || POS: {} || DEP: {} CHILDREN: {} || ent_type: {}'.format(t,t.pos_,t.dep_,[c for c in t.children],t.ent_type_))
Token: I || POS: PRON || DEP: nsubj CHILDREN: || ent_type:
Token: have || POS: VERB || DEP: ROOT CHILDREN: [I, car, .] ||
ent_type:
Token: a || POS: DET || DEP: det CHILDREN: || ent_type:
Token: car || POS: NOUN || DEP: dobj CHILDREN: [a, with] || ent_type:
Token: with || POS: ADP || DEP: prep CHILDREN: [rims] || ent_type:
Token: chrome || POS: ADJ || DEP: amod CHILDREN: || ent_type:
Token: 100-inch || POS: NOUN || DEP: compound CHILDREN: || ent_type:
Token: rims || POS: NOUN || DEP: pobj CHILDREN: [chrome, 100-inch] ||
ent_type:
Token: . || POS: PUNCT || DEP: punct CHILDREN: || ent_type: CARPART
So, what I want to do is use is something like:
test_matcher = Matcher(nlp.vocab)
test_phrase = ['']
patterns = [[{'ENT':'CARPART',????}] for kp in test_phrase]
test_matcher.add('CARPHRASE', None, *patterns)
call the test_matcher on test_doc have it return:
chrome 100-inch rims
parsing nlp spacy
parsing nlp spacy
asked Nov 21 '18 at 0:44
Aus_10Aus_10
12318
12318
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
I think I found a satisfactory solution that will work when creating a Spacy Class object. You can test this out to make sure it works with your solution then add to something like this in Spacy pipeline:
from spacy.matcher import Matcher
keyword_list = ['rims']
patterns = [[{'LOWER':kw}] for kw in keyword_list]
test_matcher.add('TESTPHRASE',None, *patterns)
def add_children_matches(doc,keyword_matcher):
'''Add children to match on original single-token keyword.'''
matches = keyword_matcher(doc)
for match_id, start, end in matches:
tokens = doc[start:end]
print('keyword:',tokens)
# Since we are getting children for keyword, there should only be one token
if len(tokens) != 1:
print('Skipping {}. Too many tokens to match.'.format(tokens))
continue
keyword_token = tokens[0]
sorted_children = sorted([c.i for c in keyword_token.children] + [keyword_token.i],reverse=False)
print('keyphrase:',doc[min(sorted_children):max(sorted_children)+1])
doc = nlp("""I have a car with chrome 1000-inch rims.""")
add_children_matches(doc,test_matcher)
This gives:
keyword: rims
keyphrase: chrome 1000-inch rims
Edit: To fully answer my own question you'd have to use something like:
def add_children_matches(doc,keyword_matcher):
'''Add children to match on original single-token keyword.'''
matches = keyword_matcher(doc)
spans =
for match_id, start, end in matches:
tokens = doc[start:end]
print('keyword:',tokens)
# Since we are getting children for keyword, there should only be one token
if len(tokens) != 1:
print('Skipping {}. Too many tokens to match.'.format(tokens))
continue
keyword_token = tokens[0]
sorted_children = sorted([c.i for c in keyword_token.children] + [keyword_token.i],reverse=False)
print('keyphrase:',doc[min(sorted_children):max(sorted_children)+1])
start_char = doc[min(sorted_children):max(sorted_children)+1].start_char
end_char = doc[min(sorted_children):max(sorted_children)+1].end_char
span = doc.char_span(start_char, end_char,label='CARPHRASE')
if span != None:
spans.append(span)
return doc
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53403752%2fmatcher-for-keyword-and-its-children-spacy%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I think I found a satisfactory solution that will work when creating a Spacy Class object. You can test this out to make sure it works with your solution then add to something like this in Spacy pipeline:
from spacy.matcher import Matcher
keyword_list = ['rims']
patterns = [[{'LOWER':kw}] for kw in keyword_list]
test_matcher.add('TESTPHRASE',None, *patterns)
def add_children_matches(doc,keyword_matcher):
'''Add children to match on original single-token keyword.'''
matches = keyword_matcher(doc)
for match_id, start, end in matches:
tokens = doc[start:end]
print('keyword:',tokens)
# Since we are getting children for keyword, there should only be one token
if len(tokens) != 1:
print('Skipping {}. Too many tokens to match.'.format(tokens))
continue
keyword_token = tokens[0]
sorted_children = sorted([c.i for c in keyword_token.children] + [keyword_token.i],reverse=False)
print('keyphrase:',doc[min(sorted_children):max(sorted_children)+1])
doc = nlp("""I have a car with chrome 1000-inch rims.""")
add_children_matches(doc,test_matcher)
This gives:
keyword: rims
keyphrase: chrome 1000-inch rims
Edit: To fully answer my own question you'd have to use something like:
def add_children_matches(doc,keyword_matcher):
'''Add children to match on original single-token keyword.'''
matches = keyword_matcher(doc)
spans =
for match_id, start, end in matches:
tokens = doc[start:end]
print('keyword:',tokens)
# Since we are getting children for keyword, there should only be one token
if len(tokens) != 1:
print('Skipping {}. Too many tokens to match.'.format(tokens))
continue
keyword_token = tokens[0]
sorted_children = sorted([c.i for c in keyword_token.children] + [keyword_token.i],reverse=False)
print('keyphrase:',doc[min(sorted_children):max(sorted_children)+1])
start_char = doc[min(sorted_children):max(sorted_children)+1].start_char
end_char = doc[min(sorted_children):max(sorted_children)+1].end_char
span = doc.char_span(start_char, end_char,label='CARPHRASE')
if span != None:
spans.append(span)
return doc
add a comment |
I think I found a satisfactory solution that will work when creating a Spacy Class object. You can test this out to make sure it works with your solution then add to something like this in Spacy pipeline:
from spacy.matcher import Matcher
keyword_list = ['rims']
patterns = [[{'LOWER':kw}] for kw in keyword_list]
test_matcher.add('TESTPHRASE',None, *patterns)
def add_children_matches(doc,keyword_matcher):
'''Add children to match on original single-token keyword.'''
matches = keyword_matcher(doc)
for match_id, start, end in matches:
tokens = doc[start:end]
print('keyword:',tokens)
# Since we are getting children for keyword, there should only be one token
if len(tokens) != 1:
print('Skipping {}. Too many tokens to match.'.format(tokens))
continue
keyword_token = tokens[0]
sorted_children = sorted([c.i for c in keyword_token.children] + [keyword_token.i],reverse=False)
print('keyphrase:',doc[min(sorted_children):max(sorted_children)+1])
doc = nlp("""I have a car with chrome 1000-inch rims.""")
add_children_matches(doc,test_matcher)
This gives:
keyword: rims
keyphrase: chrome 1000-inch rims
Edit: To fully answer my own question you'd have to use something like:
def add_children_matches(doc,keyword_matcher):
'''Add children to match on original single-token keyword.'''
matches = keyword_matcher(doc)
spans =
for match_id, start, end in matches:
tokens = doc[start:end]
print('keyword:',tokens)
# Since we are getting children for keyword, there should only be one token
if len(tokens) != 1:
print('Skipping {}. Too many tokens to match.'.format(tokens))
continue
keyword_token = tokens[0]
sorted_children = sorted([c.i for c in keyword_token.children] + [keyword_token.i],reverse=False)
print('keyphrase:',doc[min(sorted_children):max(sorted_children)+1])
start_char = doc[min(sorted_children):max(sorted_children)+1].start_char
end_char = doc[min(sorted_children):max(sorted_children)+1].end_char
span = doc.char_span(start_char, end_char,label='CARPHRASE')
if span != None:
spans.append(span)
return doc
add a comment |
I think I found a satisfactory solution that will work when creating a Spacy Class object. You can test this out to make sure it works with your solution then add to something like this in Spacy pipeline:
from spacy.matcher import Matcher
keyword_list = ['rims']
patterns = [[{'LOWER':kw}] for kw in keyword_list]
test_matcher.add('TESTPHRASE',None, *patterns)
def add_children_matches(doc,keyword_matcher):
'''Add children to match on original single-token keyword.'''
matches = keyword_matcher(doc)
for match_id, start, end in matches:
tokens = doc[start:end]
print('keyword:',tokens)
# Since we are getting children for keyword, there should only be one token
if len(tokens) != 1:
print('Skipping {}. Too many tokens to match.'.format(tokens))
continue
keyword_token = tokens[0]
sorted_children = sorted([c.i for c in keyword_token.children] + [keyword_token.i],reverse=False)
print('keyphrase:',doc[min(sorted_children):max(sorted_children)+1])
doc = nlp("""I have a car with chrome 1000-inch rims.""")
add_children_matches(doc,test_matcher)
This gives:
keyword: rims
keyphrase: chrome 1000-inch rims
Edit: To fully answer my own question you'd have to use something like:
def add_children_matches(doc,keyword_matcher):
'''Add children to match on original single-token keyword.'''
matches = keyword_matcher(doc)
spans =
for match_id, start, end in matches:
tokens = doc[start:end]
print('keyword:',tokens)
# Since we are getting children for keyword, there should only be one token
if len(tokens) != 1:
print('Skipping {}. Too many tokens to match.'.format(tokens))
continue
keyword_token = tokens[0]
sorted_children = sorted([c.i for c in keyword_token.children] + [keyword_token.i],reverse=False)
print('keyphrase:',doc[min(sorted_children):max(sorted_children)+1])
start_char = doc[min(sorted_children):max(sorted_children)+1].start_char
end_char = doc[min(sorted_children):max(sorted_children)+1].end_char
span = doc.char_span(start_char, end_char,label='CARPHRASE')
if span != None:
spans.append(span)
return doc
I think I found a satisfactory solution that will work when creating a Spacy Class object. You can test this out to make sure it works with your solution then add to something like this in Spacy pipeline:
from spacy.matcher import Matcher
keyword_list = ['rims']
patterns = [[{'LOWER':kw}] for kw in keyword_list]
test_matcher.add('TESTPHRASE',None, *patterns)
def add_children_matches(doc,keyword_matcher):
'''Add children to match on original single-token keyword.'''
matches = keyword_matcher(doc)
for match_id, start, end in matches:
tokens = doc[start:end]
print('keyword:',tokens)
# Since we are getting children for keyword, there should only be one token
if len(tokens) != 1:
print('Skipping {}. Too many tokens to match.'.format(tokens))
continue
keyword_token = tokens[0]
sorted_children = sorted([c.i for c in keyword_token.children] + [keyword_token.i],reverse=False)
print('keyphrase:',doc[min(sorted_children):max(sorted_children)+1])
doc = nlp("""I have a car with chrome 1000-inch rims.""")
add_children_matches(doc,test_matcher)
This gives:
keyword: rims
keyphrase: chrome 1000-inch rims
Edit: To fully answer my own question you'd have to use something like:
def add_children_matches(doc,keyword_matcher):
'''Add children to match on original single-token keyword.'''
matches = keyword_matcher(doc)
spans =
for match_id, start, end in matches:
tokens = doc[start:end]
print('keyword:',tokens)
# Since we are getting children for keyword, there should only be one token
if len(tokens) != 1:
print('Skipping {}. Too many tokens to match.'.format(tokens))
continue
keyword_token = tokens[0]
sorted_children = sorted([c.i for c in keyword_token.children] + [keyword_token.i],reverse=False)
print('keyphrase:',doc[min(sorted_children):max(sorted_children)+1])
start_char = doc[min(sorted_children):max(sorted_children)+1].start_char
end_char = doc[min(sorted_children):max(sorted_children)+1].end_char
span = doc.char_span(start_char, end_char,label='CARPHRASE')
if span != None:
spans.append(span)
return doc
edited Nov 21 '18 at 2:40
answered Nov 21 '18 at 2:21
Aus_10Aus_10
12318
12318
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53403752%2fmatcher-for-keyword-and-its-children-spacy%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown