How to use Elmo word embedding with the original pre-trained model (5.5B) in interactive mode
I am trying to learn how to use Elmo embeddings via this tutorial:
https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md
I am specifically trying to use the interactive mode as described like this:
$ ipython
> from allennlp.commands.elmo import ElmoEmbedder
> elmo = ElmoEmbedder()
> tokens = ["I", "ate", "an", "apple", "for", "breakfast"]
> vectors = elmo.embed_sentence(tokens)
> assert(len(vectors) == 3) # one for each layer in the ELMo output
> assert(len(vectors[0]) == len(tokens)) # the vector elements
correspond with the input tokens
> import scipy
> vectors2 = elmo.embed_sentence(["I", "ate", "a", "carrot", "for",
"breakfast"])
> scipy.spatial.distance.cosine(vectors[2][3], vectors2[2][3]) # cosine
distance between "apple" and "carrot" in the last layer
0.18020617961883545
My overall question is how do I make sure to use the pre-trained elmo model on the original 5.5B set (described here: https://allennlp.org/elmo)?
I don't quite understand why we have to call "assert" or why we use the [2][3] indexing on the vector output.
My ultimate purpose is to average the all the word embeddings in order to get a sentence embedding, so I want to make sure I do it right!
Thanks for your patience as I am pretty new in all this.
python machine-learning nlp artificial-intelligence
add a comment |
I am trying to learn how to use Elmo embeddings via this tutorial:
https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md
I am specifically trying to use the interactive mode as described like this:
$ ipython
> from allennlp.commands.elmo import ElmoEmbedder
> elmo = ElmoEmbedder()
> tokens = ["I", "ate", "an", "apple", "for", "breakfast"]
> vectors = elmo.embed_sentence(tokens)
> assert(len(vectors) == 3) # one for each layer in the ELMo output
> assert(len(vectors[0]) == len(tokens)) # the vector elements
correspond with the input tokens
> import scipy
> vectors2 = elmo.embed_sentence(["I", "ate", "a", "carrot", "for",
"breakfast"])
> scipy.spatial.distance.cosine(vectors[2][3], vectors2[2][3]) # cosine
distance between "apple" and "carrot" in the last layer
0.18020617961883545
My overall question is how do I make sure to use the pre-trained elmo model on the original 5.5B set (described here: https://allennlp.org/elmo)?
I don't quite understand why we have to call "assert" or why we use the [2][3] indexing on the vector output.
My ultimate purpose is to average the all the word embeddings in order to get a sentence embedding, so I want to make sure I do it right!
Thanks for your patience as I am pretty new in all this.
python machine-learning nlp artificial-intelligence
add a comment |
I am trying to learn how to use Elmo embeddings via this tutorial:
https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md
I am specifically trying to use the interactive mode as described like this:
$ ipython
> from allennlp.commands.elmo import ElmoEmbedder
> elmo = ElmoEmbedder()
> tokens = ["I", "ate", "an", "apple", "for", "breakfast"]
> vectors = elmo.embed_sentence(tokens)
> assert(len(vectors) == 3) # one for each layer in the ELMo output
> assert(len(vectors[0]) == len(tokens)) # the vector elements
correspond with the input tokens
> import scipy
> vectors2 = elmo.embed_sentence(["I", "ate", "a", "carrot", "for",
"breakfast"])
> scipy.spatial.distance.cosine(vectors[2][3], vectors2[2][3]) # cosine
distance between "apple" and "carrot" in the last layer
0.18020617961883545
My overall question is how do I make sure to use the pre-trained elmo model on the original 5.5B set (described here: https://allennlp.org/elmo)?
I don't quite understand why we have to call "assert" or why we use the [2][3] indexing on the vector output.
My ultimate purpose is to average the all the word embeddings in order to get a sentence embedding, so I want to make sure I do it right!
Thanks for your patience as I am pretty new in all this.
python machine-learning nlp artificial-intelligence
I am trying to learn how to use Elmo embeddings via this tutorial:
https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md
I am specifically trying to use the interactive mode as described like this:
$ ipython
> from allennlp.commands.elmo import ElmoEmbedder
> elmo = ElmoEmbedder()
> tokens = ["I", "ate", "an", "apple", "for", "breakfast"]
> vectors = elmo.embed_sentence(tokens)
> assert(len(vectors) == 3) # one for each layer in the ELMo output
> assert(len(vectors[0]) == len(tokens)) # the vector elements
correspond with the input tokens
> import scipy
> vectors2 = elmo.embed_sentence(["I", "ate", "a", "carrot", "for",
"breakfast"])
> scipy.spatial.distance.cosine(vectors[2][3], vectors2[2][3]) # cosine
distance between "apple" and "carrot" in the last layer
0.18020617961883545
My overall question is how do I make sure to use the pre-trained elmo model on the original 5.5B set (described here: https://allennlp.org/elmo)?
I don't quite understand why we have to call "assert" or why we use the [2][3] indexing on the vector output.
My ultimate purpose is to average the all the word embeddings in order to get a sentence embedding, so I want to make sure I do it right!
Thanks for your patience as I am pretty new in all this.
python machine-learning nlp artificial-intelligence
python machine-learning nlp artificial-intelligence
asked Jan 2 at 2:29
somethingstrangsomethingstrang
141314
141314
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
By default, ElmoEmbedder
uses the Original weights and options from the pretrained models on the 1 Bil Word benchmark. About 800 million tokens. To ensure you're using the largest model, look at the arguments of the ElmoEmbedder
class. From here you could probably figure out that you can set the options and weights of the model:
elmo = ElmoEmbedder(
options_file='https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway_5.5B/elmo_2x4096_512_2048cnn_2xhighway_5.5B_options.json',
weight_file='https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway_5.5B/elmo_2x4096_512_2048cnn_2xhighway_5.5B_weights.hdf5'
)
I got these links from the pretrained models table provided by AllenNLP.
assert
is a convenient way to test and ensure specific values of variables. This looks like a good resource to read more. For example, the first assert
statement ensure the embedding has three output matrices.
Going off of that, we index with [i][j]
because the model outputs 3 layer matrices (where we choose the i-th) and each matrix has n
tokens (where we choose the j-th) each of length 1024. Notice how the code compares the similarity of "apple" and "carrot", both of which are the 4th token at index j=3. From the example documentation, i represents one of:
The first layer corresponds to the context insensitive token
representation, followed by the two LSTM layers. See the ELMo paper or
follow up work at EMNLP 2018 for a description of what types of
information is captured in each layer.
The paper provides the details on those two LSTM layers.
Lastly, if you have a set of sentences, with ELMO you don't need to average the token vectors. The model is a character-wise LSTM, which works perfectly fine on tokenized whole sentences. Use one of the methods designed for working with sets of sentences: embed_sentences()
, embed_batch()
, etc. More in the code!
Does embed_sentences() do straight forward vector averaging?
– somethingstrang
Jan 4 at 15:03
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54000564%2fhow-to-use-elmo-word-embedding-with-the-original-pre-trained-model-5-5b-in-int%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
By default, ElmoEmbedder
uses the Original weights and options from the pretrained models on the 1 Bil Word benchmark. About 800 million tokens. To ensure you're using the largest model, look at the arguments of the ElmoEmbedder
class. From here you could probably figure out that you can set the options and weights of the model:
elmo = ElmoEmbedder(
options_file='https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway_5.5B/elmo_2x4096_512_2048cnn_2xhighway_5.5B_options.json',
weight_file='https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway_5.5B/elmo_2x4096_512_2048cnn_2xhighway_5.5B_weights.hdf5'
)
I got these links from the pretrained models table provided by AllenNLP.
assert
is a convenient way to test and ensure specific values of variables. This looks like a good resource to read more. For example, the first assert
statement ensure the embedding has three output matrices.
Going off of that, we index with [i][j]
because the model outputs 3 layer matrices (where we choose the i-th) and each matrix has n
tokens (where we choose the j-th) each of length 1024. Notice how the code compares the similarity of "apple" and "carrot", both of which are the 4th token at index j=3. From the example documentation, i represents one of:
The first layer corresponds to the context insensitive token
representation, followed by the two LSTM layers. See the ELMo paper or
follow up work at EMNLP 2018 for a description of what types of
information is captured in each layer.
The paper provides the details on those two LSTM layers.
Lastly, if you have a set of sentences, with ELMO you don't need to average the token vectors. The model is a character-wise LSTM, which works perfectly fine on tokenized whole sentences. Use one of the methods designed for working with sets of sentences: embed_sentences()
, embed_batch()
, etc. More in the code!
Does embed_sentences() do straight forward vector averaging?
– somethingstrang
Jan 4 at 15:03
add a comment |
By default, ElmoEmbedder
uses the Original weights and options from the pretrained models on the 1 Bil Word benchmark. About 800 million tokens. To ensure you're using the largest model, look at the arguments of the ElmoEmbedder
class. From here you could probably figure out that you can set the options and weights of the model:
elmo = ElmoEmbedder(
options_file='https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway_5.5B/elmo_2x4096_512_2048cnn_2xhighway_5.5B_options.json',
weight_file='https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway_5.5B/elmo_2x4096_512_2048cnn_2xhighway_5.5B_weights.hdf5'
)
I got these links from the pretrained models table provided by AllenNLP.
assert
is a convenient way to test and ensure specific values of variables. This looks like a good resource to read more. For example, the first assert
statement ensure the embedding has three output matrices.
Going off of that, we index with [i][j]
because the model outputs 3 layer matrices (where we choose the i-th) and each matrix has n
tokens (where we choose the j-th) each of length 1024. Notice how the code compares the similarity of "apple" and "carrot", both of which are the 4th token at index j=3. From the example documentation, i represents one of:
The first layer corresponds to the context insensitive token
representation, followed by the two LSTM layers. See the ELMo paper or
follow up work at EMNLP 2018 for a description of what types of
information is captured in each layer.
The paper provides the details on those two LSTM layers.
Lastly, if you have a set of sentences, with ELMO you don't need to average the token vectors. The model is a character-wise LSTM, which works perfectly fine on tokenized whole sentences. Use one of the methods designed for working with sets of sentences: embed_sentences()
, embed_batch()
, etc. More in the code!
Does embed_sentences() do straight forward vector averaging?
– somethingstrang
Jan 4 at 15:03
add a comment |
By default, ElmoEmbedder
uses the Original weights and options from the pretrained models on the 1 Bil Word benchmark. About 800 million tokens. To ensure you're using the largest model, look at the arguments of the ElmoEmbedder
class. From here you could probably figure out that you can set the options and weights of the model:
elmo = ElmoEmbedder(
options_file='https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway_5.5B/elmo_2x4096_512_2048cnn_2xhighway_5.5B_options.json',
weight_file='https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway_5.5B/elmo_2x4096_512_2048cnn_2xhighway_5.5B_weights.hdf5'
)
I got these links from the pretrained models table provided by AllenNLP.
assert
is a convenient way to test and ensure specific values of variables. This looks like a good resource to read more. For example, the first assert
statement ensure the embedding has three output matrices.
Going off of that, we index with [i][j]
because the model outputs 3 layer matrices (where we choose the i-th) and each matrix has n
tokens (where we choose the j-th) each of length 1024. Notice how the code compares the similarity of "apple" and "carrot", both of which are the 4th token at index j=3. From the example documentation, i represents one of:
The first layer corresponds to the context insensitive token
representation, followed by the two LSTM layers. See the ELMo paper or
follow up work at EMNLP 2018 for a description of what types of
information is captured in each layer.
The paper provides the details on those two LSTM layers.
Lastly, if you have a set of sentences, with ELMO you don't need to average the token vectors. The model is a character-wise LSTM, which works perfectly fine on tokenized whole sentences. Use one of the methods designed for working with sets of sentences: embed_sentences()
, embed_batch()
, etc. More in the code!
By default, ElmoEmbedder
uses the Original weights and options from the pretrained models on the 1 Bil Word benchmark. About 800 million tokens. To ensure you're using the largest model, look at the arguments of the ElmoEmbedder
class. From here you could probably figure out that you can set the options and weights of the model:
elmo = ElmoEmbedder(
options_file='https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway_5.5B/elmo_2x4096_512_2048cnn_2xhighway_5.5B_options.json',
weight_file='https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway_5.5B/elmo_2x4096_512_2048cnn_2xhighway_5.5B_weights.hdf5'
)
I got these links from the pretrained models table provided by AllenNLP.
assert
is a convenient way to test and ensure specific values of variables. This looks like a good resource to read more. For example, the first assert
statement ensure the embedding has three output matrices.
Going off of that, we index with [i][j]
because the model outputs 3 layer matrices (where we choose the i-th) and each matrix has n
tokens (where we choose the j-th) each of length 1024. Notice how the code compares the similarity of "apple" and "carrot", both of which are the 4th token at index j=3. From the example documentation, i represents one of:
The first layer corresponds to the context insensitive token
representation, followed by the two LSTM layers. See the ELMo paper or
follow up work at EMNLP 2018 for a description of what types of
information is captured in each layer.
The paper provides the details on those two LSTM layers.
Lastly, if you have a set of sentences, with ELMO you don't need to average the token vectors. The model is a character-wise LSTM, which works perfectly fine on tokenized whole sentences. Use one of the methods designed for working with sets of sentences: embed_sentences()
, embed_batch()
, etc. More in the code!
answered Jan 3 at 22:13


Alex LAlex L
309411
309411
Does embed_sentences() do straight forward vector averaging?
– somethingstrang
Jan 4 at 15:03
add a comment |
Does embed_sentences() do straight forward vector averaging?
– somethingstrang
Jan 4 at 15:03
Does embed_sentences() do straight forward vector averaging?
– somethingstrang
Jan 4 at 15:03
Does embed_sentences() do straight forward vector averaging?
– somethingstrang
Jan 4 at 15:03
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54000564%2fhow-to-use-elmo-word-embedding-with-the-original-pre-trained-model-5-5b-in-int%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown