Keras Lambda Layer Before Embedding: Use to Convert Text to Integers












1















I currently have a keras model which uses an Embedding layer. Something like this:



input = tf.keras.layers.Input(shape=(20,) dtype='int32')
x = tf.keras.layers.Embedding(input_dim=1000,
output_dim=50,
input_length=20,
trainable=True,
embeddings_initializer='glorot_uniform',
mask_zero=False)(input)


This is great and works as expected. However, I want to be able to send text to my model, have it preprocess the text into integers, and continue normally.



Two issues:



1) The Keras docs say that Embedding layers can only be used as the first layer in a model: https://keras.io/layers/embeddings/



2) Even if I could add a Lambda layer before the Embedding, I'd need it to keep track of certain state (like a dictionary mapping specific words to integers). How might I go about this stateful preprocessing?



In short, I need to modify the underlying Tensorflow DAG, so when I save my model and upload to ML Engine, it'll be able to handle my sending it raw text.



Thanks!










share|improve this question























  • This is basically the same question as link except with a different description. You still have not explained why you want to convert the text to integers inside the model during training rather than beforehand.

    – Jordan Patterson
    Oct 20 '18 at 0:53













  • It sounds like from this description that your best course of action is to have an encoder and decoder function that is executed outside of training. You run the encoder on your dataset before sending it through your model, and you run the logits (output) of your model through the decoder to see what the predictions were.

    – Jordan Patterson
    Oct 20 '18 at 1:00











  • @JordanPatterson I want to convert the text to integers inside the model during training so that when I upload the saved_model.pb to ML Engine, it will use the same preprocessing as was done in training.

    – bclayman
    Oct 20 '18 at 16:47
















1















I currently have a keras model which uses an Embedding layer. Something like this:



input = tf.keras.layers.Input(shape=(20,) dtype='int32')
x = tf.keras.layers.Embedding(input_dim=1000,
output_dim=50,
input_length=20,
trainable=True,
embeddings_initializer='glorot_uniform',
mask_zero=False)(input)


This is great and works as expected. However, I want to be able to send text to my model, have it preprocess the text into integers, and continue normally.



Two issues:



1) The Keras docs say that Embedding layers can only be used as the first layer in a model: https://keras.io/layers/embeddings/



2) Even if I could add a Lambda layer before the Embedding, I'd need it to keep track of certain state (like a dictionary mapping specific words to integers). How might I go about this stateful preprocessing?



In short, I need to modify the underlying Tensorflow DAG, so when I save my model and upload to ML Engine, it'll be able to handle my sending it raw text.



Thanks!










share|improve this question























  • This is basically the same question as link except with a different description. You still have not explained why you want to convert the text to integers inside the model during training rather than beforehand.

    – Jordan Patterson
    Oct 20 '18 at 0:53













  • It sounds like from this description that your best course of action is to have an encoder and decoder function that is executed outside of training. You run the encoder on your dataset before sending it through your model, and you run the logits (output) of your model through the decoder to see what the predictions were.

    – Jordan Patterson
    Oct 20 '18 at 1:00











  • @JordanPatterson I want to convert the text to integers inside the model during training so that when I upload the saved_model.pb to ML Engine, it will use the same preprocessing as was done in training.

    – bclayman
    Oct 20 '18 at 16:47














1












1








1








I currently have a keras model which uses an Embedding layer. Something like this:



input = tf.keras.layers.Input(shape=(20,) dtype='int32')
x = tf.keras.layers.Embedding(input_dim=1000,
output_dim=50,
input_length=20,
trainable=True,
embeddings_initializer='glorot_uniform',
mask_zero=False)(input)


This is great and works as expected. However, I want to be able to send text to my model, have it preprocess the text into integers, and continue normally.



Two issues:



1) The Keras docs say that Embedding layers can only be used as the first layer in a model: https://keras.io/layers/embeddings/



2) Even if I could add a Lambda layer before the Embedding, I'd need it to keep track of certain state (like a dictionary mapping specific words to integers). How might I go about this stateful preprocessing?



In short, I need to modify the underlying Tensorflow DAG, so when I save my model and upload to ML Engine, it'll be able to handle my sending it raw text.



Thanks!










share|improve this question














I currently have a keras model which uses an Embedding layer. Something like this:



input = tf.keras.layers.Input(shape=(20,) dtype='int32')
x = tf.keras.layers.Embedding(input_dim=1000,
output_dim=50,
input_length=20,
trainable=True,
embeddings_initializer='glorot_uniform',
mask_zero=False)(input)


This is great and works as expected. However, I want to be able to send text to my model, have it preprocess the text into integers, and continue normally.



Two issues:



1) The Keras docs say that Embedding layers can only be used as the first layer in a model: https://keras.io/layers/embeddings/



2) Even if I could add a Lambda layer before the Embedding, I'd need it to keep track of certain state (like a dictionary mapping specific words to integers). How might I go about this stateful preprocessing?



In short, I need to modify the underlying Tensorflow DAG, so when I save my model and upload to ML Engine, it'll be able to handle my sending it raw text.



Thanks!







python tensorflow keras






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Oct 19 '18 at 21:12









bclaymanbclayman

2,23172559




2,23172559













  • This is basically the same question as link except with a different description. You still have not explained why you want to convert the text to integers inside the model during training rather than beforehand.

    – Jordan Patterson
    Oct 20 '18 at 0:53













  • It sounds like from this description that your best course of action is to have an encoder and decoder function that is executed outside of training. You run the encoder on your dataset before sending it through your model, and you run the logits (output) of your model through the decoder to see what the predictions were.

    – Jordan Patterson
    Oct 20 '18 at 1:00











  • @JordanPatterson I want to convert the text to integers inside the model during training so that when I upload the saved_model.pb to ML Engine, it will use the same preprocessing as was done in training.

    – bclayman
    Oct 20 '18 at 16:47



















  • This is basically the same question as link except with a different description. You still have not explained why you want to convert the text to integers inside the model during training rather than beforehand.

    – Jordan Patterson
    Oct 20 '18 at 0:53













  • It sounds like from this description that your best course of action is to have an encoder and decoder function that is executed outside of training. You run the encoder on your dataset before sending it through your model, and you run the logits (output) of your model through the decoder to see what the predictions were.

    – Jordan Patterson
    Oct 20 '18 at 1:00











  • @JordanPatterson I want to convert the text to integers inside the model during training so that when I upload the saved_model.pb to ML Engine, it will use the same preprocessing as was done in training.

    – bclayman
    Oct 20 '18 at 16:47

















This is basically the same question as link except with a different description. You still have not explained why you want to convert the text to integers inside the model during training rather than beforehand.

– Jordan Patterson
Oct 20 '18 at 0:53







This is basically the same question as link except with a different description. You still have not explained why you want to convert the text to integers inside the model during training rather than beforehand.

– Jordan Patterson
Oct 20 '18 at 0:53















It sounds like from this description that your best course of action is to have an encoder and decoder function that is executed outside of training. You run the encoder on your dataset before sending it through your model, and you run the logits (output) of your model through the decoder to see what the predictions were.

– Jordan Patterson
Oct 20 '18 at 1:00





It sounds like from this description that your best course of action is to have an encoder and decoder function that is executed outside of training. You run the encoder on your dataset before sending it through your model, and you run the logits (output) of your model through the decoder to see what the predictions were.

– Jordan Patterson
Oct 20 '18 at 1:00













@JordanPatterson I want to convert the text to integers inside the model during training so that when I upload the saved_model.pb to ML Engine, it will use the same preprocessing as was done in training.

– bclayman
Oct 20 '18 at 16:47





@JordanPatterson I want to convert the text to integers inside the model during training so that when I upload the saved_model.pb to ML Engine, it will use the same preprocessing as was done in training.

– bclayman
Oct 20 '18 at 16:47












1 Answer
1






active

oldest

votes


















1














Here are the first few layers of a model which uses a string input:



input = keras.layers.Input(shape=(1,), dtype="string", name='input_1')
lookup_table_op = tf.contrib.lookup.index_table_from_tensor(
mapping=vocab_list,
num_oov_buckets=num_oov_buckets,
default_value=-1,
)
lambda_output = Lambda(lookup_table_op.lookup)(input)
emb_layer = Embedding(int(number_of_categories),int(number_of_categories**0.25))(lambda_output)


Then you can continue the model as you normally would after an embedding layer. This is working for me and the model trains fine from string inputs.



It is recommended that you do the string -> int conversion in some preprocessing step to speed up the training process. Then after the model is trained you create a second keras model that just converts string -> int and then combine the two models to get the full string -> target model.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52899988%2fkeras-lambda-layer-before-embedding-use-to-convert-text-to-integers%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    Here are the first few layers of a model which uses a string input:



    input = keras.layers.Input(shape=(1,), dtype="string", name='input_1')
    lookup_table_op = tf.contrib.lookup.index_table_from_tensor(
    mapping=vocab_list,
    num_oov_buckets=num_oov_buckets,
    default_value=-1,
    )
    lambda_output = Lambda(lookup_table_op.lookup)(input)
    emb_layer = Embedding(int(number_of_categories),int(number_of_categories**0.25))(lambda_output)


    Then you can continue the model as you normally would after an embedding layer. This is working for me and the model trains fine from string inputs.



    It is recommended that you do the string -> int conversion in some preprocessing step to speed up the training process. Then after the model is trained you create a second keras model that just converts string -> int and then combine the two models to get the full string -> target model.






    share|improve this answer




























      1














      Here are the first few layers of a model which uses a string input:



      input = keras.layers.Input(shape=(1,), dtype="string", name='input_1')
      lookup_table_op = tf.contrib.lookup.index_table_from_tensor(
      mapping=vocab_list,
      num_oov_buckets=num_oov_buckets,
      default_value=-1,
      )
      lambda_output = Lambda(lookup_table_op.lookup)(input)
      emb_layer = Embedding(int(number_of_categories),int(number_of_categories**0.25))(lambda_output)


      Then you can continue the model as you normally would after an embedding layer. This is working for me and the model trains fine from string inputs.



      It is recommended that you do the string -> int conversion in some preprocessing step to speed up the training process. Then after the model is trained you create a second keras model that just converts string -> int and then combine the two models to get the full string -> target model.






      share|improve this answer


























        1












        1








        1







        Here are the first few layers of a model which uses a string input:



        input = keras.layers.Input(shape=(1,), dtype="string", name='input_1')
        lookup_table_op = tf.contrib.lookup.index_table_from_tensor(
        mapping=vocab_list,
        num_oov_buckets=num_oov_buckets,
        default_value=-1,
        )
        lambda_output = Lambda(lookup_table_op.lookup)(input)
        emb_layer = Embedding(int(number_of_categories),int(number_of_categories**0.25))(lambda_output)


        Then you can continue the model as you normally would after an embedding layer. This is working for me and the model trains fine from string inputs.



        It is recommended that you do the string -> int conversion in some preprocessing step to speed up the training process. Then after the model is trained you create a second keras model that just converts string -> int and then combine the two models to get the full string -> target model.






        share|improve this answer













        Here are the first few layers of a model which uses a string input:



        input = keras.layers.Input(shape=(1,), dtype="string", name='input_1')
        lookup_table_op = tf.contrib.lookup.index_table_from_tensor(
        mapping=vocab_list,
        num_oov_buckets=num_oov_buckets,
        default_value=-1,
        )
        lambda_output = Lambda(lookup_table_op.lookup)(input)
        emb_layer = Embedding(int(number_of_categories),int(number_of_categories**0.25))(lambda_output)


        Then you can continue the model as you normally would after an embedding layer. This is working for me and the model trains fine from string inputs.



        It is recommended that you do the string -> int conversion in some preprocessing step to speed up the training process. Then after the model is trained you create a second keras model that just converts string -> int and then combine the two models to get the full string -> target model.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 1 at 5:51









        DustinDustin

        263




        263
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52899988%2fkeras-lambda-layer-before-embedding-use-to-convert-text-to-integers%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            MongoDB - Not Authorized To Execute Command

            How to fix TextFormField cause rebuild widget in Flutter

            in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith