Keras Lambda Layer Before Embedding: Use to Convert Text to Integers
I currently have a keras
model which uses an Embedding
layer. Something like this:
input = tf.keras.layers.Input(shape=(20,) dtype='int32')
x = tf.keras.layers.Embedding(input_dim=1000,
output_dim=50,
input_length=20,
trainable=True,
embeddings_initializer='glorot_uniform',
mask_zero=False)(input)
This is great and works as expected. However, I want to be able to send text to my model, have it preprocess the text into integers, and continue normally.
Two issues:
1) The Keras docs say that Embedding layers can only be used as the first layer in a model: https://keras.io/layers/embeddings/
2) Even if I could add a Lambda
layer before the Embedding
, I'd need it to keep track of certain state (like a dictionary mapping specific words to integers). How might I go about this stateful preprocessing?
In short, I need to modify the underlying Tensorflow DAG, so when I save my model and upload to ML Engine, it'll be able to handle my sending it raw text.
Thanks!
python tensorflow keras
add a comment |
I currently have a keras
model which uses an Embedding
layer. Something like this:
input = tf.keras.layers.Input(shape=(20,) dtype='int32')
x = tf.keras.layers.Embedding(input_dim=1000,
output_dim=50,
input_length=20,
trainable=True,
embeddings_initializer='glorot_uniform',
mask_zero=False)(input)
This is great and works as expected. However, I want to be able to send text to my model, have it preprocess the text into integers, and continue normally.
Two issues:
1) The Keras docs say that Embedding layers can only be used as the first layer in a model: https://keras.io/layers/embeddings/
2) Even if I could add a Lambda
layer before the Embedding
, I'd need it to keep track of certain state (like a dictionary mapping specific words to integers). How might I go about this stateful preprocessing?
In short, I need to modify the underlying Tensorflow DAG, so when I save my model and upload to ML Engine, it'll be able to handle my sending it raw text.
Thanks!
python tensorflow keras
This is basically the same question as link except with a different description. You still have not explained why you want to convert the text to integers inside the model during training rather than beforehand.
– Jordan Patterson
Oct 20 '18 at 0:53
It sounds like from this description that your best course of action is to have an encoder and decoder function that is executed outside of training. You run the encoder on your dataset before sending it through your model, and you run the logits (output) of your model through the decoder to see what the predictions were.
– Jordan Patterson
Oct 20 '18 at 1:00
@JordanPatterson I want to convert the text to integers inside the model during training so that when I upload thesaved_model.pb
to ML Engine, it will use the same preprocessing as was done in training.
– bclayman
Oct 20 '18 at 16:47
add a comment |
I currently have a keras
model which uses an Embedding
layer. Something like this:
input = tf.keras.layers.Input(shape=(20,) dtype='int32')
x = tf.keras.layers.Embedding(input_dim=1000,
output_dim=50,
input_length=20,
trainable=True,
embeddings_initializer='glorot_uniform',
mask_zero=False)(input)
This is great and works as expected. However, I want to be able to send text to my model, have it preprocess the text into integers, and continue normally.
Two issues:
1) The Keras docs say that Embedding layers can only be used as the first layer in a model: https://keras.io/layers/embeddings/
2) Even if I could add a Lambda
layer before the Embedding
, I'd need it to keep track of certain state (like a dictionary mapping specific words to integers). How might I go about this stateful preprocessing?
In short, I need to modify the underlying Tensorflow DAG, so when I save my model and upload to ML Engine, it'll be able to handle my sending it raw text.
Thanks!
python tensorflow keras
I currently have a keras
model which uses an Embedding
layer. Something like this:
input = tf.keras.layers.Input(shape=(20,) dtype='int32')
x = tf.keras.layers.Embedding(input_dim=1000,
output_dim=50,
input_length=20,
trainable=True,
embeddings_initializer='glorot_uniform',
mask_zero=False)(input)
This is great and works as expected. However, I want to be able to send text to my model, have it preprocess the text into integers, and continue normally.
Two issues:
1) The Keras docs say that Embedding layers can only be used as the first layer in a model: https://keras.io/layers/embeddings/
2) Even if I could add a Lambda
layer before the Embedding
, I'd need it to keep track of certain state (like a dictionary mapping specific words to integers). How might I go about this stateful preprocessing?
In short, I need to modify the underlying Tensorflow DAG, so when I save my model and upload to ML Engine, it'll be able to handle my sending it raw text.
Thanks!
python tensorflow keras
python tensorflow keras
asked Oct 19 '18 at 21:12
bclaymanbclayman
2,23172559
2,23172559
This is basically the same question as link except with a different description. You still have not explained why you want to convert the text to integers inside the model during training rather than beforehand.
– Jordan Patterson
Oct 20 '18 at 0:53
It sounds like from this description that your best course of action is to have an encoder and decoder function that is executed outside of training. You run the encoder on your dataset before sending it through your model, and you run the logits (output) of your model through the decoder to see what the predictions were.
– Jordan Patterson
Oct 20 '18 at 1:00
@JordanPatterson I want to convert the text to integers inside the model during training so that when I upload thesaved_model.pb
to ML Engine, it will use the same preprocessing as was done in training.
– bclayman
Oct 20 '18 at 16:47
add a comment |
This is basically the same question as link except with a different description. You still have not explained why you want to convert the text to integers inside the model during training rather than beforehand.
– Jordan Patterson
Oct 20 '18 at 0:53
It sounds like from this description that your best course of action is to have an encoder and decoder function that is executed outside of training. You run the encoder on your dataset before sending it through your model, and you run the logits (output) of your model through the decoder to see what the predictions were.
– Jordan Patterson
Oct 20 '18 at 1:00
@JordanPatterson I want to convert the text to integers inside the model during training so that when I upload thesaved_model.pb
to ML Engine, it will use the same preprocessing as was done in training.
– bclayman
Oct 20 '18 at 16:47
This is basically the same question as link except with a different description. You still have not explained why you want to convert the text to integers inside the model during training rather than beforehand.
– Jordan Patterson
Oct 20 '18 at 0:53
This is basically the same question as link except with a different description. You still have not explained why you want to convert the text to integers inside the model during training rather than beforehand.
– Jordan Patterson
Oct 20 '18 at 0:53
It sounds like from this description that your best course of action is to have an encoder and decoder function that is executed outside of training. You run the encoder on your dataset before sending it through your model, and you run the logits (output) of your model through the decoder to see what the predictions were.
– Jordan Patterson
Oct 20 '18 at 1:00
It sounds like from this description that your best course of action is to have an encoder and decoder function that is executed outside of training. You run the encoder on your dataset before sending it through your model, and you run the logits (output) of your model through the decoder to see what the predictions were.
– Jordan Patterson
Oct 20 '18 at 1:00
@JordanPatterson I want to convert the text to integers inside the model during training so that when I upload the
saved_model.pb
to ML Engine, it will use the same preprocessing as was done in training.– bclayman
Oct 20 '18 at 16:47
@JordanPatterson I want to convert the text to integers inside the model during training so that when I upload the
saved_model.pb
to ML Engine, it will use the same preprocessing as was done in training.– bclayman
Oct 20 '18 at 16:47
add a comment |
1 Answer
1
active
oldest
votes
Here are the first few layers of a model which uses a string input:
input = keras.layers.Input(shape=(1,), dtype="string", name='input_1')
lookup_table_op = tf.contrib.lookup.index_table_from_tensor(
mapping=vocab_list,
num_oov_buckets=num_oov_buckets,
default_value=-1,
)
lambda_output = Lambda(lookup_table_op.lookup)(input)
emb_layer = Embedding(int(number_of_categories),int(number_of_categories**0.25))(lambda_output)
Then you can continue the model as you normally would after an embedding layer. This is working for me and the model trains fine from string inputs.
It is recommended that you do the string -> int conversion in some preprocessing step to speed up the training process. Then after the model is trained you create a second keras model that just converts string -> int and then combine the two models to get the full string -> target model.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52899988%2fkeras-lambda-layer-before-embedding-use-to-convert-text-to-integers%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Here are the first few layers of a model which uses a string input:
input = keras.layers.Input(shape=(1,), dtype="string", name='input_1')
lookup_table_op = tf.contrib.lookup.index_table_from_tensor(
mapping=vocab_list,
num_oov_buckets=num_oov_buckets,
default_value=-1,
)
lambda_output = Lambda(lookup_table_op.lookup)(input)
emb_layer = Embedding(int(number_of_categories),int(number_of_categories**0.25))(lambda_output)
Then you can continue the model as you normally would after an embedding layer. This is working for me and the model trains fine from string inputs.
It is recommended that you do the string -> int conversion in some preprocessing step to speed up the training process. Then after the model is trained you create a second keras model that just converts string -> int and then combine the two models to get the full string -> target model.
add a comment |
Here are the first few layers of a model which uses a string input:
input = keras.layers.Input(shape=(1,), dtype="string", name='input_1')
lookup_table_op = tf.contrib.lookup.index_table_from_tensor(
mapping=vocab_list,
num_oov_buckets=num_oov_buckets,
default_value=-1,
)
lambda_output = Lambda(lookup_table_op.lookup)(input)
emb_layer = Embedding(int(number_of_categories),int(number_of_categories**0.25))(lambda_output)
Then you can continue the model as you normally would after an embedding layer. This is working for me and the model trains fine from string inputs.
It is recommended that you do the string -> int conversion in some preprocessing step to speed up the training process. Then after the model is trained you create a second keras model that just converts string -> int and then combine the two models to get the full string -> target model.
add a comment |
Here are the first few layers of a model which uses a string input:
input = keras.layers.Input(shape=(1,), dtype="string", name='input_1')
lookup_table_op = tf.contrib.lookup.index_table_from_tensor(
mapping=vocab_list,
num_oov_buckets=num_oov_buckets,
default_value=-1,
)
lambda_output = Lambda(lookup_table_op.lookup)(input)
emb_layer = Embedding(int(number_of_categories),int(number_of_categories**0.25))(lambda_output)
Then you can continue the model as you normally would after an embedding layer. This is working for me and the model trains fine from string inputs.
It is recommended that you do the string -> int conversion in some preprocessing step to speed up the training process. Then after the model is trained you create a second keras model that just converts string -> int and then combine the two models to get the full string -> target model.
Here are the first few layers of a model which uses a string input:
input = keras.layers.Input(shape=(1,), dtype="string", name='input_1')
lookup_table_op = tf.contrib.lookup.index_table_from_tensor(
mapping=vocab_list,
num_oov_buckets=num_oov_buckets,
default_value=-1,
)
lambda_output = Lambda(lookup_table_op.lookup)(input)
emb_layer = Embedding(int(number_of_categories),int(number_of_categories**0.25))(lambda_output)
Then you can continue the model as you normally would after an embedding layer. This is working for me and the model trains fine from string inputs.
It is recommended that you do the string -> int conversion in some preprocessing step to speed up the training process. Then after the model is trained you create a second keras model that just converts string -> int and then combine the two models to get the full string -> target model.
answered Jan 1 at 5:51
DustinDustin
263
263
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52899988%2fkeras-lambda-layer-before-embedding-use-to-convert-text-to-integers%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
This is basically the same question as link except with a different description. You still have not explained why you want to convert the text to integers inside the model during training rather than beforehand.
– Jordan Patterson
Oct 20 '18 at 0:53
It sounds like from this description that your best course of action is to have an encoder and decoder function that is executed outside of training. You run the encoder on your dataset before sending it through your model, and you run the logits (output) of your model through the decoder to see what the predictions were.
– Jordan Patterson
Oct 20 '18 at 1:00
@JordanPatterson I want to convert the text to integers inside the model during training so that when I upload the
saved_model.pb
to ML Engine, it will use the same preprocessing as was done in training.– bclayman
Oct 20 '18 at 16:47