Keras Lambda Layer Before Embedding: Use to Convert Text to Integers

I currently have a keras model which uses an Embedding layer. Something like this:

input = tf.keras.layers.Input(shape=(20,) dtype='int32')

x = tf.keras.layers.Embedding(input_dim=1000,

                              output_dim=50,

                              input_length=20,

                              trainable=True,

                              embeddings_initializer='glorot_uniform',

                              mask_zero=False)(input)

This is great and works as expected. However, I want to be able to send text to my model, have it preprocess the text into integers, and continue normally.

Two issues:

1) The Keras docs say that Embedding layers can only be used as the first layer in a model: https://keras.io/layers/embeddings/

2) Even if I could add a Lambda layer before the Embedding, I'd need it to keep track of certain state (like a dictionary mapping specific words to integers). How might I go about this stateful preprocessing?

In short, I need to modify the underlying Tensorflow DAG, so when I save my model and upload to ML Engine, it'll be able to handle my sending it raw text.

Thanks!

asked Oct 19 '18 at 21:12

bclayman

2,23172559

This is basically the same question as link except with a different description. You still have not explained why you want to convert the text to integers inside the model during training rather than beforehand.

– Jordan Patterson
Oct 20 '18 at 0:53

It sounds like from this description that your best course of action is to have an encoder and decoder function that is executed outside of training. You run the encoder on your dataset before sending it through your model, and you run the logits (output) of your model through the decoder to see what the predictions were.

– Jordan Patterson
Oct 20 '18 at 1:00

@JordanPatterson I want to convert the text to integers inside the model during training so that when I upload the saved_model.pb to ML Engine, it will use the same preprocessing as was done in training.

– bclayman
Oct 20 '18 at 16:47

add a comment |

I currently have a keras model which uses an Embedding layer. Something like this:

input = tf.keras.layers.Input(shape=(20,) dtype='int32')

x = tf.keras.layers.Embedding(input_dim=1000,

                              output_dim=50,

                              input_length=20,

                              trainable=True,

                              embeddings_initializer='glorot_uniform',

                              mask_zero=False)(input)

This is great and works as expected. However, I want to be able to send text to my model, have it preprocess the text into integers, and continue normally.

Two issues:

1) The Keras docs say that Embedding layers can only be used as the first layer in a model: https://keras.io/layers/embeddings/

In short, I need to modify the underlying Tensorflow DAG, so when I save my model and upload to ML Engine, it'll be able to handle my sending it raw text.

Thanks!

asked Oct 19 '18 at 21:12

bclayman

2,23172559

This is basically the same question as link except with a different description. You still have not explained why you want to convert the text to integers inside the model during training rather than beforehand.

– Jordan Patterson
Oct 20 '18 at 0:53

It sounds like from this description that your best course of action is to have an encoder and decoder function that is executed outside of training. You run the encoder on your dataset before sending it through your model, and you run the logits (output) of your model through the decoder to see what the predictions were.

– Jordan Patterson
Oct 20 '18 at 1:00

@JordanPatterson I want to convert the text to integers inside the model during training so that when I upload the saved_model.pb to ML Engine, it will use the same preprocessing as was done in training.

– bclayman
Oct 20 '18 at 16:47

add a comment |

I currently have a keras model which uses an Embedding layer. Something like this:

input = tf.keras.layers.Input(shape=(20,) dtype='int32')

x = tf.keras.layers.Embedding(input_dim=1000,

                              output_dim=50,

                              input_length=20,

                              trainable=True,

                              embeddings_initializer='glorot_uniform',

                              mask_zero=False)(input)

This is great and works as expected. However, I want to be able to send text to my model, have it preprocess the text into integers, and continue normally.

Two issues:

1) The Keras docs say that Embedding layers can only be used as the first layer in a model: https://keras.io/layers/embeddings/

In short, I need to modify the underlying Tensorflow DAG, so when I save my model and upload to ML Engine, it'll be able to handle my sending it raw text.

Thanks!

asked Oct 19 '18 at 21:12

bclayman

2,23172559

I currently have a keras model which uses an Embedding layer. Something like this:

input = tf.keras.layers.Input(shape=(20,) dtype='int32')

x = tf.keras.layers.Embedding(input_dim=1000,

                              output_dim=50,

                              input_length=20,

                              trainable=True,

                              embeddings_initializer='glorot_uniform',

                              mask_zero=False)(input)

This is great and works as expected. However, I want to be able to send text to my model, have it preprocess the text into integers, and continue normally.

Two issues:

1) The Keras docs say that Embedding layers can only be used as the first layer in a model: https://keras.io/layers/embeddings/

In short, I need to modify the underlying Tensorflow DAG, so when I save my model and upload to ML Engine, it'll be able to handle my sending it raw text.

Thanks!

python tensorflow keras

asked Oct 19 '18 at 21:12

bclayman

2,23172559

asked Oct 19 '18 at 21:12

bclayman

2,23172559

asked Oct 19 '18 at 21:12

bclayman

2,23172559

asked Oct 19 '18 at 21:12

bclayman

2,23172559

asked Oct 19 '18 at 21:12

bclayman

2,23172559

This is basically the same question as link except with a different description. You still have not explained why you want to convert the text to integers inside the model during training rather than beforehand.

– Jordan Patterson
Oct 20 '18 at 0:53

It sounds like from this description that your best course of action is to have an encoder and decoder function that is executed outside of training. You run the encoder on your dataset before sending it through your model, and you run the logits (output) of your model through the decoder to see what the predictions were.

– Jordan Patterson
Oct 20 '18 at 1:00

@JordanPatterson I want to convert the text to integers inside the model during training so that when I upload the saved_model.pb to ML Engine, it will use the same preprocessing as was done in training.

– bclayman
Oct 20 '18 at 16:47

add a comment |

This is basically the same question as link except with a different description. You still have not explained why you want to convert the text to integers inside the model during training rather than beforehand.

– Jordan Patterson
Oct 20 '18 at 0:53

It sounds like from this description that your best course of action is to have an encoder and decoder function that is executed outside of training. You run the encoder on your dataset before sending it through your model, and you run the logits (output) of your model through the decoder to see what the predictions were.

– Jordan Patterson
Oct 20 '18 at 1:00

@JordanPatterson I want to convert the text to integers inside the model during training so that when I upload the saved_model.pb to ML Engine, it will use the same preprocessing as was done in training.

– bclayman
Oct 20 '18 at 16:47

This is basically the same question as link except with a different description. You still have not explained why you want to convert the text to integers inside the model during training rather than beforehand.

– Jordan Patterson
Oct 20 '18 at 0:53

It sounds like from this description that your best course of action is to have an encoder and decoder function that is executed outside of training. You run the encoder on your dataset before sending it through your model, and you run the logits (output) of your model through the decoder to see what the predictions were.

– Jordan Patterson
Oct 20 '18 at 1:00

@JordanPatterson I want to convert the text to integers inside the model during training so that when I upload the saved_model.pb to ML Engine, it will use the same preprocessing as was done in training.

– bclayman
Oct 20 '18 at 16:47

add a comment |

1 Answer
1

active

oldest

votes

Here are the first few layers of a model which uses a string input:

input = keras.layers.Input(shape=(1,), dtype="string", name='input_1')

lookup_table_op = tf.contrib.lookup.index_table_from_tensor(

 mapping=vocab_list,

 num_oov_buckets=num_oov_buckets,

 default_value=-1,

)

lambda_output = Lambda(lookup_table_op.lookup)(input)

emb_layer = Embedding(int(number_of_categories),int(number_of_categories**0.25))(lambda_output)

Then you can continue the model as you normally would after an embedding layer. This is working for me and the model trains fine from string inputs.

It is recommended that you do the string -> int conversion in some preprocessing step to speed up the training process. Then after the model is trained you create a second keras model that just converts string -> int and then combine the two models to get the full string -> target model.

answered Jan 1 at 5:51

Dustin

263

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52899988%2fkeras-lambda-layer-before-embedding-use-to-convert-text-to-integers%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Here are the first few layers of a model which uses a string input:

input = keras.layers.Input(shape=(1,), dtype="string", name='input_1')

lookup_table_op = tf.contrib.lookup.index_table_from_tensor(

 mapping=vocab_list,

 num_oov_buckets=num_oov_buckets,

 default_value=-1,

)

lambda_output = Lambda(lookup_table_op.lookup)(input)

emb_layer = Embedding(int(number_of_categories),int(number_of_categories**0.25))(lambda_output)

Then you can continue the model as you normally would after an embedding layer. This is working for me and the model trains fine from string inputs.

answered Jan 1 at 5:51

Dustin

263

add a comment |

Here are the first few layers of a model which uses a string input:

input = keras.layers.Input(shape=(1,), dtype="string", name='input_1')

lookup_table_op = tf.contrib.lookup.index_table_from_tensor(

 mapping=vocab_list,

 num_oov_buckets=num_oov_buckets,

 default_value=-1,

)

lambda_output = Lambda(lookup_table_op.lookup)(input)

emb_layer = Embedding(int(number_of_categories),int(number_of_categories**0.25))(lambda_output)

Then you can continue the model as you normally would after an embedding layer. This is working for me and the model trains fine from string inputs.

answered Jan 1 at 5:51

Dustin

263

add a comment |

Here are the first few layers of a model which uses a string input:

input = keras.layers.Input(shape=(1,), dtype="string", name='input_1')

lookup_table_op = tf.contrib.lookup.index_table_from_tensor(

 mapping=vocab_list,

 num_oov_buckets=num_oov_buckets,

 default_value=-1,

)

lambda_output = Lambda(lookup_table_op.lookup)(input)

emb_layer = Embedding(int(number_of_categories),int(number_of_categories**0.25))(lambda_output)

Then you can continue the model as you normally would after an embedding layer. This is working for me and the model trains fine from string inputs.

answered Jan 1 at 5:51

Dustin

263

Here are the first few layers of a model which uses a string input:

input = keras.layers.Input(shape=(1,), dtype="string", name='input_1')

lookup_table_op = tf.contrib.lookup.index_table_from_tensor(

 mapping=vocab_list,

 num_oov_buckets=num_oov_buckets,

 default_value=-1,

)

lambda_output = Lambda(lookup_table_op.lookup)(input)

emb_layer = Embedding(int(number_of_categories),int(number_of_categories**0.25))(lambda_output)

Then you can continue the model as you normally would after an embedding layer. This is working for me and the model trains fine from string inputs.

answered Jan 1 at 5:51

Dustin

263

answered Jan 1 at 5:51

Dustin

263

answered Jan 1 at 5:51

Dustin

263

answered Jan 1 at 5:51

Dustin

263

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu