Create index and then insert or insert and then create index?
I'm inserting a big volume of data in a table in Mysql, I need to create an index to access quickly to the data, however, I would like to know if there is a difference (in performance) between these scenarios:
- Create an index and then insert all data
- Insert all data and then create an index
thanks in advance!
mysql database indexing
add a comment |
I'm inserting a big volume of data in a table in Mysql, I need to create an index to access quickly to the data, however, I would like to know if there is a difference (in performance) between these scenarios:
- Create an index and then insert all data
- Insert all data and then create an index
thanks in advance!
mysql database indexing
1
My gut feeling says scenario 2 is faster. Scenario 1: The index is there, a record is inserted and the index tree must be altered, another record gets inserted and the index tree must be altered, ... Scenario 2: All records get inserted straight away. Then an index is built with all branches perfectly balanced, as the data is already known. I may be wrong, but I'm pretty sure I'm right ;-)
– Thorsten Kettner
Nov 21 '18 at 15:41
Let's seeSHOW CREATE TABLE
; there could be other subtle issues. Also, will you be replacing the entire table, or merely augmenting it?
– Rick James
Nov 21 '18 at 22:59
add a comment |
I'm inserting a big volume of data in a table in Mysql, I need to create an index to access quickly to the data, however, I would like to know if there is a difference (in performance) between these scenarios:
- Create an index and then insert all data
- Insert all data and then create an index
thanks in advance!
mysql database indexing
I'm inserting a big volume of data in a table in Mysql, I need to create an index to access quickly to the data, however, I would like to know if there is a difference (in performance) between these scenarios:
- Create an index and then insert all data
- Insert all data and then create an index
thanks in advance!
mysql database indexing
mysql database indexing
edited Nov 21 '18 at 15:37
Luis Montano
asked Nov 21 '18 at 15:32


Luis MontanoLuis Montano
164
164
1
My gut feeling says scenario 2 is faster. Scenario 1: The index is there, a record is inserted and the index tree must be altered, another record gets inserted and the index tree must be altered, ... Scenario 2: All records get inserted straight away. Then an index is built with all branches perfectly balanced, as the data is already known. I may be wrong, but I'm pretty sure I'm right ;-)
– Thorsten Kettner
Nov 21 '18 at 15:41
Let's seeSHOW CREATE TABLE
; there could be other subtle issues. Also, will you be replacing the entire table, or merely augmenting it?
– Rick James
Nov 21 '18 at 22:59
add a comment |
1
My gut feeling says scenario 2 is faster. Scenario 1: The index is there, a record is inserted and the index tree must be altered, another record gets inserted and the index tree must be altered, ... Scenario 2: All records get inserted straight away. Then an index is built with all branches perfectly balanced, as the data is already known. I may be wrong, but I'm pretty sure I'm right ;-)
– Thorsten Kettner
Nov 21 '18 at 15:41
Let's seeSHOW CREATE TABLE
; there could be other subtle issues. Also, will you be replacing the entire table, or merely augmenting it?
– Rick James
Nov 21 '18 at 22:59
1
1
My gut feeling says scenario 2 is faster. Scenario 1: The index is there, a record is inserted and the index tree must be altered, another record gets inserted and the index tree must be altered, ... Scenario 2: All records get inserted straight away. Then an index is built with all branches perfectly balanced, as the data is already known. I may be wrong, but I'm pretty sure I'm right ;-)
– Thorsten Kettner
Nov 21 '18 at 15:41
My gut feeling says scenario 2 is faster. Scenario 1: The index is there, a record is inserted and the index tree must be altered, another record gets inserted and the index tree must be altered, ... Scenario 2: All records get inserted straight away. Then an index is built with all branches perfectly balanced, as the data is already known. I may be wrong, but I'm pretty sure I'm right ;-)
– Thorsten Kettner
Nov 21 '18 at 15:41
Let's see
SHOW CREATE TABLE
; there could be other subtle issues. Also, will you be replacing the entire table, or merely augmenting it?– Rick James
Nov 21 '18 at 22:59
Let's see
SHOW CREATE TABLE
; there could be other subtle issues. Also, will you be replacing the entire table, or merely augmenting it?– Rick James
Nov 21 '18 at 22:59
add a comment |
1 Answer
1
active
oldest
votes
For InnoDB storage engine, for the cluster index, it will be faster to specify the cluster index (i.e. PRIMARY KEY
) on the table before inserting data.
This is because if a cluster index (PRIMARY KEY) is not defined on the table, then InnoDB will use a hidden 6-byte auto-incremented counter for the cluster index. If a PRIMARY KEY is later specified, the entire table will need to be rebuilt.
For secondary indexes (i.e. non-cluster indexes) with InnoDB, it is usually faster to insert data without secondary indexes defined, and then build the secondary indexes after the data is loaded.
FOLLOWUP
As far as the speed of loading to a table (in particular, a table that is truncated/emptied, and then reloaded), dropping and re-creating indexes is a well known technique for speeding up processing, not just with MySQL, but with other RDBMS such as Oracle.)
There isn't a guarantee that the processing will be faster; as with most things database, we need tests to determine which is faster.
For a table containing millions of rows, and we're adding a couple dozen hundred rows, then dropping and rebuilding indexes is likely going to be a lot slower, because of all of the extra work to re-index all of the existing rows. It would be faster to do the index maintenance while the rows are being inserted.
In terms of speeding up a load, the "drop and recreate indexes" technique isn't going to give us the kind of dramatic improvements we get from other changes. For example, it won't be anywhere near the improvement we would see by using LOAD DATA
in place of INSERT
statements, nor using multi-row INSERT
statements vs a series of singleton INSERT
statements.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53415440%2fcreate-index-and-then-insert-or-insert-and-then-create-index%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
For InnoDB storage engine, for the cluster index, it will be faster to specify the cluster index (i.e. PRIMARY KEY
) on the table before inserting data.
This is because if a cluster index (PRIMARY KEY) is not defined on the table, then InnoDB will use a hidden 6-byte auto-incremented counter for the cluster index. If a PRIMARY KEY is later specified, the entire table will need to be rebuilt.
For secondary indexes (i.e. non-cluster indexes) with InnoDB, it is usually faster to insert data without secondary indexes defined, and then build the secondary indexes after the data is loaded.
FOLLOWUP
As far as the speed of loading to a table (in particular, a table that is truncated/emptied, and then reloaded), dropping and re-creating indexes is a well known technique for speeding up processing, not just with MySQL, but with other RDBMS such as Oracle.)
There isn't a guarantee that the processing will be faster; as with most things database, we need tests to determine which is faster.
For a table containing millions of rows, and we're adding a couple dozen hundred rows, then dropping and rebuilding indexes is likely going to be a lot slower, because of all of the extra work to re-index all of the existing rows. It would be faster to do the index maintenance while the rows are being inserted.
In terms of speeding up a load, the "drop and recreate indexes" technique isn't going to give us the kind of dramatic improvements we get from other changes. For example, it won't be anywhere near the improvement we would see by using LOAD DATA
in place of INSERT
statements, nor using multi-row INSERT
statements vs a series of singleton INSERT
statements.
add a comment |
For InnoDB storage engine, for the cluster index, it will be faster to specify the cluster index (i.e. PRIMARY KEY
) on the table before inserting data.
This is because if a cluster index (PRIMARY KEY) is not defined on the table, then InnoDB will use a hidden 6-byte auto-incremented counter for the cluster index. If a PRIMARY KEY is later specified, the entire table will need to be rebuilt.
For secondary indexes (i.e. non-cluster indexes) with InnoDB, it is usually faster to insert data without secondary indexes defined, and then build the secondary indexes after the data is loaded.
FOLLOWUP
As far as the speed of loading to a table (in particular, a table that is truncated/emptied, and then reloaded), dropping and re-creating indexes is a well known technique for speeding up processing, not just with MySQL, but with other RDBMS such as Oracle.)
There isn't a guarantee that the processing will be faster; as with most things database, we need tests to determine which is faster.
For a table containing millions of rows, and we're adding a couple dozen hundred rows, then dropping and rebuilding indexes is likely going to be a lot slower, because of all of the extra work to re-index all of the existing rows. It would be faster to do the index maintenance while the rows are being inserted.
In terms of speeding up a load, the "drop and recreate indexes" technique isn't going to give us the kind of dramatic improvements we get from other changes. For example, it won't be anywhere near the improvement we would see by using LOAD DATA
in place of INSERT
statements, nor using multi-row INSERT
statements vs a series of singleton INSERT
statements.
add a comment |
For InnoDB storage engine, for the cluster index, it will be faster to specify the cluster index (i.e. PRIMARY KEY
) on the table before inserting data.
This is because if a cluster index (PRIMARY KEY) is not defined on the table, then InnoDB will use a hidden 6-byte auto-incremented counter for the cluster index. If a PRIMARY KEY is later specified, the entire table will need to be rebuilt.
For secondary indexes (i.e. non-cluster indexes) with InnoDB, it is usually faster to insert data without secondary indexes defined, and then build the secondary indexes after the data is loaded.
FOLLOWUP
As far as the speed of loading to a table (in particular, a table that is truncated/emptied, and then reloaded), dropping and re-creating indexes is a well known technique for speeding up processing, not just with MySQL, but with other RDBMS such as Oracle.)
There isn't a guarantee that the processing will be faster; as with most things database, we need tests to determine which is faster.
For a table containing millions of rows, and we're adding a couple dozen hundred rows, then dropping and rebuilding indexes is likely going to be a lot slower, because of all of the extra work to re-index all of the existing rows. It would be faster to do the index maintenance while the rows are being inserted.
In terms of speeding up a load, the "drop and recreate indexes" technique isn't going to give us the kind of dramatic improvements we get from other changes. For example, it won't be anywhere near the improvement we would see by using LOAD DATA
in place of INSERT
statements, nor using multi-row INSERT
statements vs a series of singleton INSERT
statements.
For InnoDB storage engine, for the cluster index, it will be faster to specify the cluster index (i.e. PRIMARY KEY
) on the table before inserting data.
This is because if a cluster index (PRIMARY KEY) is not defined on the table, then InnoDB will use a hidden 6-byte auto-incremented counter for the cluster index. If a PRIMARY KEY is later specified, the entire table will need to be rebuilt.
For secondary indexes (i.e. non-cluster indexes) with InnoDB, it is usually faster to insert data without secondary indexes defined, and then build the secondary indexes after the data is loaded.
FOLLOWUP
As far as the speed of loading to a table (in particular, a table that is truncated/emptied, and then reloaded), dropping and re-creating indexes is a well known technique for speeding up processing, not just with MySQL, but with other RDBMS such as Oracle.)
There isn't a guarantee that the processing will be faster; as with most things database, we need tests to determine which is faster.
For a table containing millions of rows, and we're adding a couple dozen hundred rows, then dropping and rebuilding indexes is likely going to be a lot slower, because of all of the extra work to re-index all of the existing rows. It would be faster to do the index maintenance while the rows are being inserted.
In terms of speeding up a load, the "drop and recreate indexes" technique isn't going to give us the kind of dramatic improvements we get from other changes. For example, it won't be anywhere near the improvement we would see by using LOAD DATA
in place of INSERT
statements, nor using multi-row INSERT
statements vs a series of singleton INSERT
statements.
edited Nov 21 '18 at 21:48
answered Nov 21 '18 at 16:03
spencer7593spencer7593
85.1k108095
85.1k108095
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53415440%2fcreate-index-and-then-insert-or-insert-and-then-create-index%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
My gut feeling says scenario 2 is faster. Scenario 1: The index is there, a record is inserted and the index tree must be altered, another record gets inserted and the index tree must be altered, ... Scenario 2: All records get inserted straight away. Then an index is built with all branches perfectly balanced, as the data is already known. I may be wrong, but I'm pretty sure I'm right ;-)
– Thorsten Kettner
Nov 21 '18 at 15:41
Let's see
SHOW CREATE TABLE
; there could be other subtle issues. Also, will you be replacing the entire table, or merely augmenting it?– Rick James
Nov 21 '18 at 22:59