Initialization state in DQN












1















I am initializing the state of my environment with some value s'.
Also i reinitialize the state of the environment everytime a new epsiode starts. But I have noticed that when I make the environment and initialize the state as lets say [10,3] , the Policy obtained after the training is not close to the optimal at all. However with other states lets say [20,3].[20,7].... etc I get results quite close to the optimal. So the question is , is it possible that starting from a state [10,3] might result in the network getting stuck at local minimas ?










share|improve this question



























    1















    I am initializing the state of my environment with some value s'.
    Also i reinitialize the state of the environment everytime a new epsiode starts. But I have noticed that when I make the environment and initialize the state as lets say [10,3] , the Policy obtained after the training is not close to the optimal at all. However with other states lets say [20,3].[20,7].... etc I get results quite close to the optimal. So the question is , is it possible that starting from a state [10,3] might result in the network getting stuck at local minimas ?










    share|improve this question

























      1












      1








      1








      I am initializing the state of my environment with some value s'.
      Also i reinitialize the state of the environment everytime a new epsiode starts. But I have noticed that when I make the environment and initialize the state as lets say [10,3] , the Policy obtained after the training is not close to the optimal at all. However with other states lets say [20,3].[20,7].... etc I get results quite close to the optimal. So the question is , is it possible that starting from a state [10,3] might result in the network getting stuck at local minimas ?










      share|improve this question














      I am initializing the state of my environment with some value s'.
      Also i reinitialize the state of the environment everytime a new epsiode starts. But I have noticed that when I make the environment and initialize the state as lets say [10,3] , the Policy obtained after the training is not close to the optimal at all. However with other states lets say [20,3].[20,7].... etc I get results quite close to the optimal. So the question is , is it possible that starting from a state [10,3] might result in the network getting stuck at local minimas ?







      deep-learning reinforcement-learning






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 22 '18 at 11:13









      Siddhant TandonSiddhant Tandon

      305




      305
























          1 Answer
          1






          active

          oldest

          votes


















          1














          Strictly answering the question, sure, it can result in sub-optimum policies. A basic case would be if the agent is not exploring enough and it is not that easy to get to the final state from the state you've chosen for initialization. This would end up in agent finding a local minimum because it never left that 'local space'.



          One question you might want to ask yourself is - why you don't initialize your state randomly? Sure, there are cases where it makes more sense to have one main state for initialization, but if your algorithm learns better for other starting points, it might be worth a try to initialize each episode with a different state and let the agent generalize the state space better. Another suggestion would be to check your exploration strategy and see if it's making enough impact.






          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53429724%2finitialization-state-in-dqn%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            Strictly answering the question, sure, it can result in sub-optimum policies. A basic case would be if the agent is not exploring enough and it is not that easy to get to the final state from the state you've chosen for initialization. This would end up in agent finding a local minimum because it never left that 'local space'.



            One question you might want to ask yourself is - why you don't initialize your state randomly? Sure, there are cases where it makes more sense to have one main state for initialization, but if your algorithm learns better for other starting points, it might be worth a try to initialize each episode with a different state and let the agent generalize the state space better. Another suggestion would be to check your exploration strategy and see if it's making enough impact.






            share|improve this answer




























              1














              Strictly answering the question, sure, it can result in sub-optimum policies. A basic case would be if the agent is not exploring enough and it is not that easy to get to the final state from the state you've chosen for initialization. This would end up in agent finding a local minimum because it never left that 'local space'.



              One question you might want to ask yourself is - why you don't initialize your state randomly? Sure, there are cases where it makes more sense to have one main state for initialization, but if your algorithm learns better for other starting points, it might be worth a try to initialize each episode with a different state and let the agent generalize the state space better. Another suggestion would be to check your exploration strategy and see if it's making enough impact.






              share|improve this answer


























                1












                1








                1







                Strictly answering the question, sure, it can result in sub-optimum policies. A basic case would be if the agent is not exploring enough and it is not that easy to get to the final state from the state you've chosen for initialization. This would end up in agent finding a local minimum because it never left that 'local space'.



                One question you might want to ask yourself is - why you don't initialize your state randomly? Sure, there are cases where it makes more sense to have one main state for initialization, but if your algorithm learns better for other starting points, it might be worth a try to initialize each episode with a different state and let the agent generalize the state space better. Another suggestion would be to check your exploration strategy and see if it's making enough impact.






                share|improve this answer













                Strictly answering the question, sure, it can result in sub-optimum policies. A basic case would be if the agent is not exploring enough and it is not that easy to get to the final state from the state you've chosen for initialization. This would end up in agent finding a local minimum because it never left that 'local space'.



                One question you might want to ask yourself is - why you don't initialize your state randomly? Sure, there are cases where it makes more sense to have one main state for initialization, but if your algorithm learns better for other starting points, it might be worth a try to initialize each episode with a different state and let the agent generalize the state space better. Another suggestion would be to check your exploration strategy and see if it's making enough impact.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 22 '18 at 14:06









                Filip O.Filip O.

                665




                665
































                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53429724%2finitialization-state-in-dqn%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    MongoDB - Not Authorized To Execute Command

                    in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith

                    How to fix TextFormField cause rebuild widget in Flutter