Create index and then insert or insert and then create index?












0















I'm inserting a big volume of data in a table in Mysql, I need to create an index to access quickly to the data, however, I would like to know if there is a difference (in performance) between these scenarios:




  • Create an index and then insert all data

  • Insert all data and then create an index


thanks in advance!










share|improve this question




















  • 1





    My gut feeling says scenario 2 is faster. Scenario 1: The index is there, a record is inserted and the index tree must be altered, another record gets inserted and the index tree must be altered, ... Scenario 2: All records get inserted straight away. Then an index is built with all branches perfectly balanced, as the data is already known. I may be wrong, but I'm pretty sure I'm right ;-)

    – Thorsten Kettner
    Nov 21 '18 at 15:41













  • Let's see SHOW CREATE TABLE; there could be other subtle issues. Also, will you be replacing the entire table, or merely augmenting it?

    – Rick James
    Nov 21 '18 at 22:59
















0















I'm inserting a big volume of data in a table in Mysql, I need to create an index to access quickly to the data, however, I would like to know if there is a difference (in performance) between these scenarios:




  • Create an index and then insert all data

  • Insert all data and then create an index


thanks in advance!










share|improve this question




















  • 1





    My gut feeling says scenario 2 is faster. Scenario 1: The index is there, a record is inserted and the index tree must be altered, another record gets inserted and the index tree must be altered, ... Scenario 2: All records get inserted straight away. Then an index is built with all branches perfectly balanced, as the data is already known. I may be wrong, but I'm pretty sure I'm right ;-)

    – Thorsten Kettner
    Nov 21 '18 at 15:41













  • Let's see SHOW CREATE TABLE; there could be other subtle issues. Also, will you be replacing the entire table, or merely augmenting it?

    – Rick James
    Nov 21 '18 at 22:59














0












0








0








I'm inserting a big volume of data in a table in Mysql, I need to create an index to access quickly to the data, however, I would like to know if there is a difference (in performance) between these scenarios:




  • Create an index and then insert all data

  • Insert all data and then create an index


thanks in advance!










share|improve this question
















I'm inserting a big volume of data in a table in Mysql, I need to create an index to access quickly to the data, however, I would like to know if there is a difference (in performance) between these scenarios:




  • Create an index and then insert all data

  • Insert all data and then create an index


thanks in advance!







mysql database indexing






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 21 '18 at 15:37







Luis Montano

















asked Nov 21 '18 at 15:32









Luis MontanoLuis Montano

164




164








  • 1





    My gut feeling says scenario 2 is faster. Scenario 1: The index is there, a record is inserted and the index tree must be altered, another record gets inserted and the index tree must be altered, ... Scenario 2: All records get inserted straight away. Then an index is built with all branches perfectly balanced, as the data is already known. I may be wrong, but I'm pretty sure I'm right ;-)

    – Thorsten Kettner
    Nov 21 '18 at 15:41













  • Let's see SHOW CREATE TABLE; there could be other subtle issues. Also, will you be replacing the entire table, or merely augmenting it?

    – Rick James
    Nov 21 '18 at 22:59














  • 1





    My gut feeling says scenario 2 is faster. Scenario 1: The index is there, a record is inserted and the index tree must be altered, another record gets inserted and the index tree must be altered, ... Scenario 2: All records get inserted straight away. Then an index is built with all branches perfectly balanced, as the data is already known. I may be wrong, but I'm pretty sure I'm right ;-)

    – Thorsten Kettner
    Nov 21 '18 at 15:41













  • Let's see SHOW CREATE TABLE; there could be other subtle issues. Also, will you be replacing the entire table, or merely augmenting it?

    – Rick James
    Nov 21 '18 at 22:59








1




1





My gut feeling says scenario 2 is faster. Scenario 1: The index is there, a record is inserted and the index tree must be altered, another record gets inserted and the index tree must be altered, ... Scenario 2: All records get inserted straight away. Then an index is built with all branches perfectly balanced, as the data is already known. I may be wrong, but I'm pretty sure I'm right ;-)

– Thorsten Kettner
Nov 21 '18 at 15:41







My gut feeling says scenario 2 is faster. Scenario 1: The index is there, a record is inserted and the index tree must be altered, another record gets inserted and the index tree must be altered, ... Scenario 2: All records get inserted straight away. Then an index is built with all branches perfectly balanced, as the data is already known. I may be wrong, but I'm pretty sure I'm right ;-)

– Thorsten Kettner
Nov 21 '18 at 15:41















Let's see SHOW CREATE TABLE; there could be other subtle issues. Also, will you be replacing the entire table, or merely augmenting it?

– Rick James
Nov 21 '18 at 22:59





Let's see SHOW CREATE TABLE; there could be other subtle issues. Also, will you be replacing the entire table, or merely augmenting it?

– Rick James
Nov 21 '18 at 22:59












1 Answer
1






active

oldest

votes


















1














For InnoDB storage engine, for the cluster index, it will be faster to specify the cluster index (i.e. PRIMARY KEY) on the table before inserting data.



This is because if a cluster index (PRIMARY KEY) is not defined on the table, then InnoDB will use a hidden 6-byte auto-incremented counter for the cluster index. If a PRIMARY KEY is later specified, the entire table will need to be rebuilt.





For secondary indexes (i.e. non-cluster indexes) with InnoDB, it is usually faster to insert data without secondary indexes defined, and then build the secondary indexes after the data is loaded.





FOLLOWUP



As far as the speed of loading to a table (in particular, a table that is truncated/emptied, and then reloaded), dropping and re-creating indexes is a well known technique for speeding up processing, not just with MySQL, but with other RDBMS such as Oracle.)



There isn't a guarantee that the processing will be faster; as with most things database, we need tests to determine which is faster.



For a table containing millions of rows, and we're adding a couple dozen hundred rows, then dropping and rebuilding indexes is likely going to be a lot slower, because of all of the extra work to re-index all of the existing rows. It would be faster to do the index maintenance while the rows are being inserted.



In terms of speeding up a load, the "drop and recreate indexes" technique isn't going to give us the kind of dramatic improvements we get from other changes. For example, it won't be anywhere near the improvement we would see by using LOAD DATA in place of INSERT statements, nor using multi-row INSERT statements vs a series of singleton INSERT statements.






share|improve this answer

























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53415440%2fcreate-index-and-then-insert-or-insert-and-then-create-index%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    For InnoDB storage engine, for the cluster index, it will be faster to specify the cluster index (i.e. PRIMARY KEY) on the table before inserting data.



    This is because if a cluster index (PRIMARY KEY) is not defined on the table, then InnoDB will use a hidden 6-byte auto-incremented counter for the cluster index. If a PRIMARY KEY is later specified, the entire table will need to be rebuilt.





    For secondary indexes (i.e. non-cluster indexes) with InnoDB, it is usually faster to insert data without secondary indexes defined, and then build the secondary indexes after the data is loaded.





    FOLLOWUP



    As far as the speed of loading to a table (in particular, a table that is truncated/emptied, and then reloaded), dropping and re-creating indexes is a well known technique for speeding up processing, not just with MySQL, but with other RDBMS such as Oracle.)



    There isn't a guarantee that the processing will be faster; as with most things database, we need tests to determine which is faster.



    For a table containing millions of rows, and we're adding a couple dozen hundred rows, then dropping and rebuilding indexes is likely going to be a lot slower, because of all of the extra work to re-index all of the existing rows. It would be faster to do the index maintenance while the rows are being inserted.



    In terms of speeding up a load, the "drop and recreate indexes" technique isn't going to give us the kind of dramatic improvements we get from other changes. For example, it won't be anywhere near the improvement we would see by using LOAD DATA in place of INSERT statements, nor using multi-row INSERT statements vs a series of singleton INSERT statements.






    share|improve this answer






























      1














      For InnoDB storage engine, for the cluster index, it will be faster to specify the cluster index (i.e. PRIMARY KEY) on the table before inserting data.



      This is because if a cluster index (PRIMARY KEY) is not defined on the table, then InnoDB will use a hidden 6-byte auto-incremented counter for the cluster index. If a PRIMARY KEY is later specified, the entire table will need to be rebuilt.





      For secondary indexes (i.e. non-cluster indexes) with InnoDB, it is usually faster to insert data without secondary indexes defined, and then build the secondary indexes after the data is loaded.





      FOLLOWUP



      As far as the speed of loading to a table (in particular, a table that is truncated/emptied, and then reloaded), dropping and re-creating indexes is a well known technique for speeding up processing, not just with MySQL, but with other RDBMS such as Oracle.)



      There isn't a guarantee that the processing will be faster; as with most things database, we need tests to determine which is faster.



      For a table containing millions of rows, and we're adding a couple dozen hundred rows, then dropping and rebuilding indexes is likely going to be a lot slower, because of all of the extra work to re-index all of the existing rows. It would be faster to do the index maintenance while the rows are being inserted.



      In terms of speeding up a load, the "drop and recreate indexes" technique isn't going to give us the kind of dramatic improvements we get from other changes. For example, it won't be anywhere near the improvement we would see by using LOAD DATA in place of INSERT statements, nor using multi-row INSERT statements vs a series of singleton INSERT statements.






      share|improve this answer




























        1












        1








        1







        For InnoDB storage engine, for the cluster index, it will be faster to specify the cluster index (i.e. PRIMARY KEY) on the table before inserting data.



        This is because if a cluster index (PRIMARY KEY) is not defined on the table, then InnoDB will use a hidden 6-byte auto-incremented counter for the cluster index. If a PRIMARY KEY is later specified, the entire table will need to be rebuilt.





        For secondary indexes (i.e. non-cluster indexes) with InnoDB, it is usually faster to insert data without secondary indexes defined, and then build the secondary indexes after the data is loaded.





        FOLLOWUP



        As far as the speed of loading to a table (in particular, a table that is truncated/emptied, and then reloaded), dropping and re-creating indexes is a well known technique for speeding up processing, not just with MySQL, but with other RDBMS such as Oracle.)



        There isn't a guarantee that the processing will be faster; as with most things database, we need tests to determine which is faster.



        For a table containing millions of rows, and we're adding a couple dozen hundred rows, then dropping and rebuilding indexes is likely going to be a lot slower, because of all of the extra work to re-index all of the existing rows. It would be faster to do the index maintenance while the rows are being inserted.



        In terms of speeding up a load, the "drop and recreate indexes" technique isn't going to give us the kind of dramatic improvements we get from other changes. For example, it won't be anywhere near the improvement we would see by using LOAD DATA in place of INSERT statements, nor using multi-row INSERT statements vs a series of singleton INSERT statements.






        share|improve this answer















        For InnoDB storage engine, for the cluster index, it will be faster to specify the cluster index (i.e. PRIMARY KEY) on the table before inserting data.



        This is because if a cluster index (PRIMARY KEY) is not defined on the table, then InnoDB will use a hidden 6-byte auto-incremented counter for the cluster index. If a PRIMARY KEY is later specified, the entire table will need to be rebuilt.





        For secondary indexes (i.e. non-cluster indexes) with InnoDB, it is usually faster to insert data without secondary indexes defined, and then build the secondary indexes after the data is loaded.





        FOLLOWUP



        As far as the speed of loading to a table (in particular, a table that is truncated/emptied, and then reloaded), dropping and re-creating indexes is a well known technique for speeding up processing, not just with MySQL, but with other RDBMS such as Oracle.)



        There isn't a guarantee that the processing will be faster; as with most things database, we need tests to determine which is faster.



        For a table containing millions of rows, and we're adding a couple dozen hundred rows, then dropping and rebuilding indexes is likely going to be a lot slower, because of all of the extra work to re-index all of the existing rows. It would be faster to do the index maintenance while the rows are being inserted.



        In terms of speeding up a load, the "drop and recreate indexes" technique isn't going to give us the kind of dramatic improvements we get from other changes. For example, it won't be anywhere near the improvement we would see by using LOAD DATA in place of INSERT statements, nor using multi-row INSERT statements vs a series of singleton INSERT statements.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 21 '18 at 21:48

























        answered Nov 21 '18 at 16:03









        spencer7593spencer7593

        85.1k108095




        85.1k108095
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53415440%2fcreate-index-and-then-insert-or-insert-and-then-create-index%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            MongoDB - Not Authorized To Execute Command

            How to fix TextFormField cause rebuild widget in Flutter

            Npm cannot find a required file even through it is in the searched directory