Does SQL Server Table-Scan Time depend on the Query?
I observed that doing a full table scan takes a different time based on the query. I believed that under similar conditions (set of columns under select, column data types) a table scan should take a somewhat similar time. Seems like it's not the case. I just want to understand the reason behind that.
I have used "CHECKPOINT" and "DBCC DROPCLEANBUFFERS" before querying to make sure there is no impact from the query cache.
Table:
- 10 Columns
- 10M rows Each column has different densities ranging from 0.1 to 0.000001
- No indexes
Queries:
Query A: returned 100 rows, time took: ~ 900ms
SELECT [COL00]
FROM [TEST].[dbo].[Test]
WHERE COL07 = 50000
Query B: returned 910595 rows, time took: ~ 15000ms
SELECT [COL00]
FROM [TEST].[dbo].[Test]
WHERE COL01 = 5
** Where column COL07 was randomly populated with integers ranging from 0 to 100000 and column COL01 was randomly populated with integers ranging from 0 to 10
Time Taken:
Query A: around 900 ms
Query B: around 18000 ms
What's the point I'm missing here?
sql sql-server
add a comment |
I observed that doing a full table scan takes a different time based on the query. I believed that under similar conditions (set of columns under select, column data types) a table scan should take a somewhat similar time. Seems like it's not the case. I just want to understand the reason behind that.
I have used "CHECKPOINT" and "DBCC DROPCLEANBUFFERS" before querying to make sure there is no impact from the query cache.
Table:
- 10 Columns
- 10M rows Each column has different densities ranging from 0.1 to 0.000001
- No indexes
Queries:
Query A: returned 100 rows, time took: ~ 900ms
SELECT [COL00]
FROM [TEST].[dbo].[Test]
WHERE COL07 = 50000
Query B: returned 910595 rows, time took: ~ 15000ms
SELECT [COL00]
FROM [TEST].[dbo].[Test]
WHERE COL01 = 5
** Where column COL07 was randomly populated with integers ranging from 0 to 100000 and column COL01 was randomly populated with integers ranging from 0 to 10
Time Taken:
Query A: around 900 ms
Query B: around 18000 ms
What's the point I'm missing here?
sql sql-server
2
There are about x100 more rows to fetch in the second test, that only could explain why it took 20 times longer
– GMB
Jan 2 at 23:21
1
Switch to a count(1) as the result.
– user2864740
Jan 2 at 23:28
add a comment |
I observed that doing a full table scan takes a different time based on the query. I believed that under similar conditions (set of columns under select, column data types) a table scan should take a somewhat similar time. Seems like it's not the case. I just want to understand the reason behind that.
I have used "CHECKPOINT" and "DBCC DROPCLEANBUFFERS" before querying to make sure there is no impact from the query cache.
Table:
- 10 Columns
- 10M rows Each column has different densities ranging from 0.1 to 0.000001
- No indexes
Queries:
Query A: returned 100 rows, time took: ~ 900ms
SELECT [COL00]
FROM [TEST].[dbo].[Test]
WHERE COL07 = 50000
Query B: returned 910595 rows, time took: ~ 15000ms
SELECT [COL00]
FROM [TEST].[dbo].[Test]
WHERE COL01 = 5
** Where column COL07 was randomly populated with integers ranging from 0 to 100000 and column COL01 was randomly populated with integers ranging from 0 to 10
Time Taken:
Query A: around 900 ms
Query B: around 18000 ms
What's the point I'm missing here?
sql sql-server
I observed that doing a full table scan takes a different time based on the query. I believed that under similar conditions (set of columns under select, column data types) a table scan should take a somewhat similar time. Seems like it's not the case. I just want to understand the reason behind that.
I have used "CHECKPOINT" and "DBCC DROPCLEANBUFFERS" before querying to make sure there is no impact from the query cache.
Table:
- 10 Columns
- 10M rows Each column has different densities ranging from 0.1 to 0.000001
- No indexes
Queries:
Query A: returned 100 rows, time took: ~ 900ms
SELECT [COL00]
FROM [TEST].[dbo].[Test]
WHERE COL07 = 50000
Query B: returned 910595 rows, time took: ~ 15000ms
SELECT [COL00]
FROM [TEST].[dbo].[Test]
WHERE COL01 = 5
** Where column COL07 was randomly populated with integers ranging from 0 to 100000 and column COL01 was randomly populated with integers ranging from 0 to 10
Time Taken:
Query A: around 900 ms
Query B: around 18000 ms
What's the point I'm missing here?
sql sql-server
sql sql-server
edited Jan 3 at 0:21
GMB
20.8k51028
20.8k51028
asked Jan 2 at 23:12
MalingaMalinga
387314
387314
2
There are about x100 more rows to fetch in the second test, that only could explain why it took 20 times longer
– GMB
Jan 2 at 23:21
1
Switch to a count(1) as the result.
– user2864740
Jan 2 at 23:28
add a comment |
2
There are about x100 more rows to fetch in the second test, that only could explain why it took 20 times longer
– GMB
Jan 2 at 23:21
1
Switch to a count(1) as the result.
– user2864740
Jan 2 at 23:28
2
2
There are about x100 more rows to fetch in the second test, that only could explain why it took 20 times longer
– GMB
Jan 2 at 23:21
There are about x100 more rows to fetch in the second test, that only could explain why it took 20 times longer
– GMB
Jan 2 at 23:21
1
1
Switch to a count(1) as the result.
– user2864740
Jan 2 at 23:28
Switch to a count(1) as the result.
– user2864740
Jan 2 at 23:28
add a comment |
3 Answers
3
active
oldest
votes
Query A: (returned 100 rows, time took: ~ 900ms)
Query B: (returned 910595 rows, time took: ~ 15000ms)
I believe that what you are missing is that there are about x100 more rows to fetch in the second query. That only could explain why it took 20 times longer.
Yap! seems like that. However what I thought is, isn't table scan mean going through actual row data? if that's the case anyway both queries have to go through all the data (I mean fetch data). Or is there a separate cost involved in fetching?
– Malinga
Jan 3 at 2:36
As an example think I used SELECT [COL07] FROM [TEST].[dbo].[Test] WHERE COL07 = 50000 and SELECT [COL01] FROM [TEST].[dbo].[Test] WHERE COL01 = 5 (which behave similar to what I had in question ) now I'm selecting whatever in the where clause. If the query has to go through all values for the columns in where clause to find the matching rows, why there is an additional fetching cost?
– Malinga
Jan 3 at 2:37
1
It still has to actually do something with the rows when you find them, which takes time as well. If you tried Select count(1) from ... instead of Select *, that would minimise the amount of processing to be done per row, and I suspect the times would be a lot closer.
– DancingFool
Jan 3 at 2:54
Ya, I checked with count(1) and Query B now takes only a very small time. I might have to dig deep to understand what that something is. Thanks.
– Malinga
Jan 3 at 5:38
add a comment |
The two columns have different density of the data.
Query A, COL07: 10000000/100000 = 100
Query B, COL05: 10000000/10 = 1000000
The fact that both the search parameters are in the middle of the data range doesn't necessarily impact the speed of the search. This is depending on the number of times the engine scans the column to return the values of the search predicate.
In order to see if this is indeed the case, I would try the following:
COL04: 10000000/1000 = 10000. Filtering on WHERE COL04 = 500
COL08: 10000000/10000 = 1000. Filtering on WHERE COL05 = 5000
Considering the times from the initial test, you would expect to see COL04 at ~7200ms and COL05 at ~3600ms.
An interesting article about SQL Server COUNT() Function Performance Comparison
add a comment |
Full Table Scan (also known as Sequential Scan) is a scan made on a database where each row of the table under scan is read in a sequential (serial) order
Reference
In your case, full table scan scans sequentially (in ordered way) so that it does not need to scan whole table in order to advance next record because Col7 is ordered.
but in Query2 the case is not like that, Col01 is randomly distributed so full table scan is needed.
Query 1 is optimistic scan where as Query 2 is pessimistic can.
None of them are ordered
– Malinga
Jan 3 at 2:16
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54014377%2fdoes-sql-server-table-scan-time-depend-on-the-query%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Query A: (returned 100 rows, time took: ~ 900ms)
Query B: (returned 910595 rows, time took: ~ 15000ms)
I believe that what you are missing is that there are about x100 more rows to fetch in the second query. That only could explain why it took 20 times longer.
Yap! seems like that. However what I thought is, isn't table scan mean going through actual row data? if that's the case anyway both queries have to go through all the data (I mean fetch data). Or is there a separate cost involved in fetching?
– Malinga
Jan 3 at 2:36
As an example think I used SELECT [COL07] FROM [TEST].[dbo].[Test] WHERE COL07 = 50000 and SELECT [COL01] FROM [TEST].[dbo].[Test] WHERE COL01 = 5 (which behave similar to what I had in question ) now I'm selecting whatever in the where clause. If the query has to go through all values for the columns in where clause to find the matching rows, why there is an additional fetching cost?
– Malinga
Jan 3 at 2:37
1
It still has to actually do something with the rows when you find them, which takes time as well. If you tried Select count(1) from ... instead of Select *, that would minimise the amount of processing to be done per row, and I suspect the times would be a lot closer.
– DancingFool
Jan 3 at 2:54
Ya, I checked with count(1) and Query B now takes only a very small time. I might have to dig deep to understand what that something is. Thanks.
– Malinga
Jan 3 at 5:38
add a comment |
Query A: (returned 100 rows, time took: ~ 900ms)
Query B: (returned 910595 rows, time took: ~ 15000ms)
I believe that what you are missing is that there are about x100 more rows to fetch in the second query. That only could explain why it took 20 times longer.
Yap! seems like that. However what I thought is, isn't table scan mean going through actual row data? if that's the case anyway both queries have to go through all the data (I mean fetch data). Or is there a separate cost involved in fetching?
– Malinga
Jan 3 at 2:36
As an example think I used SELECT [COL07] FROM [TEST].[dbo].[Test] WHERE COL07 = 50000 and SELECT [COL01] FROM [TEST].[dbo].[Test] WHERE COL01 = 5 (which behave similar to what I had in question ) now I'm selecting whatever in the where clause. If the query has to go through all values for the columns in where clause to find the matching rows, why there is an additional fetching cost?
– Malinga
Jan 3 at 2:37
1
It still has to actually do something with the rows when you find them, which takes time as well. If you tried Select count(1) from ... instead of Select *, that would minimise the amount of processing to be done per row, and I suspect the times would be a lot closer.
– DancingFool
Jan 3 at 2:54
Ya, I checked with count(1) and Query B now takes only a very small time. I might have to dig deep to understand what that something is. Thanks.
– Malinga
Jan 3 at 5:38
add a comment |
Query A: (returned 100 rows, time took: ~ 900ms)
Query B: (returned 910595 rows, time took: ~ 15000ms)
I believe that what you are missing is that there are about x100 more rows to fetch in the second query. That only could explain why it took 20 times longer.
Query A: (returned 100 rows, time took: ~ 900ms)
Query B: (returned 910595 rows, time took: ~ 15000ms)
I believe that what you are missing is that there are about x100 more rows to fetch in the second query. That only could explain why it took 20 times longer.
answered Jan 2 at 23:25
GMBGMB
20.8k51028
20.8k51028
Yap! seems like that. However what I thought is, isn't table scan mean going through actual row data? if that's the case anyway both queries have to go through all the data (I mean fetch data). Or is there a separate cost involved in fetching?
– Malinga
Jan 3 at 2:36
As an example think I used SELECT [COL07] FROM [TEST].[dbo].[Test] WHERE COL07 = 50000 and SELECT [COL01] FROM [TEST].[dbo].[Test] WHERE COL01 = 5 (which behave similar to what I had in question ) now I'm selecting whatever in the where clause. If the query has to go through all values for the columns in where clause to find the matching rows, why there is an additional fetching cost?
– Malinga
Jan 3 at 2:37
1
It still has to actually do something with the rows when you find them, which takes time as well. If you tried Select count(1) from ... instead of Select *, that would minimise the amount of processing to be done per row, and I suspect the times would be a lot closer.
– DancingFool
Jan 3 at 2:54
Ya, I checked with count(1) and Query B now takes only a very small time. I might have to dig deep to understand what that something is. Thanks.
– Malinga
Jan 3 at 5:38
add a comment |
Yap! seems like that. However what I thought is, isn't table scan mean going through actual row data? if that's the case anyway both queries have to go through all the data (I mean fetch data). Or is there a separate cost involved in fetching?
– Malinga
Jan 3 at 2:36
As an example think I used SELECT [COL07] FROM [TEST].[dbo].[Test] WHERE COL07 = 50000 and SELECT [COL01] FROM [TEST].[dbo].[Test] WHERE COL01 = 5 (which behave similar to what I had in question ) now I'm selecting whatever in the where clause. If the query has to go through all values for the columns in where clause to find the matching rows, why there is an additional fetching cost?
– Malinga
Jan 3 at 2:37
1
It still has to actually do something with the rows when you find them, which takes time as well. If you tried Select count(1) from ... instead of Select *, that would minimise the amount of processing to be done per row, and I suspect the times would be a lot closer.
– DancingFool
Jan 3 at 2:54
Ya, I checked with count(1) and Query B now takes only a very small time. I might have to dig deep to understand what that something is. Thanks.
– Malinga
Jan 3 at 5:38
Yap! seems like that. However what I thought is, isn't table scan mean going through actual row data? if that's the case anyway both queries have to go through all the data (I mean fetch data). Or is there a separate cost involved in fetching?
– Malinga
Jan 3 at 2:36
Yap! seems like that. However what I thought is, isn't table scan mean going through actual row data? if that's the case anyway both queries have to go through all the data (I mean fetch data). Or is there a separate cost involved in fetching?
– Malinga
Jan 3 at 2:36
As an example think I used SELECT [COL07] FROM [TEST].[dbo].[Test] WHERE COL07 = 50000 and SELECT [COL01] FROM [TEST].[dbo].[Test] WHERE COL01 = 5 (which behave similar to what I had in question ) now I'm selecting whatever in the where clause. If the query has to go through all values for the columns in where clause to find the matching rows, why there is an additional fetching cost?
– Malinga
Jan 3 at 2:37
As an example think I used SELECT [COL07] FROM [TEST].[dbo].[Test] WHERE COL07 = 50000 and SELECT [COL01] FROM [TEST].[dbo].[Test] WHERE COL01 = 5 (which behave similar to what I had in question ) now I'm selecting whatever in the where clause. If the query has to go through all values for the columns in where clause to find the matching rows, why there is an additional fetching cost?
– Malinga
Jan 3 at 2:37
1
1
It still has to actually do something with the rows when you find them, which takes time as well. If you tried Select count(1) from ... instead of Select *, that would minimise the amount of processing to be done per row, and I suspect the times would be a lot closer.
– DancingFool
Jan 3 at 2:54
It still has to actually do something with the rows when you find them, which takes time as well. If you tried Select count(1) from ... instead of Select *, that would minimise the amount of processing to be done per row, and I suspect the times would be a lot closer.
– DancingFool
Jan 3 at 2:54
Ya, I checked with count(1) and Query B now takes only a very small time. I might have to dig deep to understand what that something is. Thanks.
– Malinga
Jan 3 at 5:38
Ya, I checked with count(1) and Query B now takes only a very small time. I might have to dig deep to understand what that something is. Thanks.
– Malinga
Jan 3 at 5:38
add a comment |
The two columns have different density of the data.
Query A, COL07: 10000000/100000 = 100
Query B, COL05: 10000000/10 = 1000000
The fact that both the search parameters are in the middle of the data range doesn't necessarily impact the speed of the search. This is depending on the number of times the engine scans the column to return the values of the search predicate.
In order to see if this is indeed the case, I would try the following:
COL04: 10000000/1000 = 10000. Filtering on WHERE COL04 = 500
COL08: 10000000/10000 = 1000. Filtering on WHERE COL05 = 5000
Considering the times from the initial test, you would expect to see COL04 at ~7200ms and COL05 at ~3600ms.
An interesting article about SQL Server COUNT() Function Performance Comparison
add a comment |
The two columns have different density of the data.
Query A, COL07: 10000000/100000 = 100
Query B, COL05: 10000000/10 = 1000000
The fact that both the search parameters are in the middle of the data range doesn't necessarily impact the speed of the search. This is depending on the number of times the engine scans the column to return the values of the search predicate.
In order to see if this is indeed the case, I would try the following:
COL04: 10000000/1000 = 10000. Filtering on WHERE COL04 = 500
COL08: 10000000/10000 = 1000. Filtering on WHERE COL05 = 5000
Considering the times from the initial test, you would expect to see COL04 at ~7200ms and COL05 at ~3600ms.
An interesting article about SQL Server COUNT() Function Performance Comparison
add a comment |
The two columns have different density of the data.
Query A, COL07: 10000000/100000 = 100
Query B, COL05: 10000000/10 = 1000000
The fact that both the search parameters are in the middle of the data range doesn't necessarily impact the speed of the search. This is depending on the number of times the engine scans the column to return the values of the search predicate.
In order to see if this is indeed the case, I would try the following:
COL04: 10000000/1000 = 10000. Filtering on WHERE COL04 = 500
COL08: 10000000/10000 = 1000. Filtering on WHERE COL05 = 5000
Considering the times from the initial test, you would expect to see COL04 at ~7200ms and COL05 at ~3600ms.
An interesting article about SQL Server COUNT() Function Performance Comparison
The two columns have different density of the data.
Query A, COL07: 10000000/100000 = 100
Query B, COL05: 10000000/10 = 1000000
The fact that both the search parameters are in the middle of the data range doesn't necessarily impact the speed of the search. This is depending on the number of times the engine scans the column to return the values of the search predicate.
In order to see if this is indeed the case, I would try the following:
COL04: 10000000/1000 = 10000. Filtering on WHERE COL04 = 500
COL08: 10000000/10000 = 1000. Filtering on WHERE COL05 = 5000
Considering the times from the initial test, you would expect to see COL04 at ~7200ms and COL05 at ~3600ms.
An interesting article about SQL Server COUNT() Function Performance Comparison
edited Jan 4 at 22:15
answered Jan 3 at 0:18
Marc0Marc0
1116
1116
add a comment |
add a comment |
Full Table Scan (also known as Sequential Scan) is a scan made on a database where each row of the table under scan is read in a sequential (serial) order
Reference
In your case, full table scan scans sequentially (in ordered way) so that it does not need to scan whole table in order to advance next record because Col7 is ordered.
but in Query2 the case is not like that, Col01 is randomly distributed so full table scan is needed.
Query 1 is optimistic scan where as Query 2 is pessimistic can.
None of them are ordered
– Malinga
Jan 3 at 2:16
add a comment |
Full Table Scan (also known as Sequential Scan) is a scan made on a database where each row of the table under scan is read in a sequential (serial) order
Reference
In your case, full table scan scans sequentially (in ordered way) so that it does not need to scan whole table in order to advance next record because Col7 is ordered.
but in Query2 the case is not like that, Col01 is randomly distributed so full table scan is needed.
Query 1 is optimistic scan where as Query 2 is pessimistic can.
None of them are ordered
– Malinga
Jan 3 at 2:16
add a comment |
Full Table Scan (also known as Sequential Scan) is a scan made on a database where each row of the table under scan is read in a sequential (serial) order
Reference
In your case, full table scan scans sequentially (in ordered way) so that it does not need to scan whole table in order to advance next record because Col7 is ordered.
but in Query2 the case is not like that, Col01 is randomly distributed so full table scan is needed.
Query 1 is optimistic scan where as Query 2 is pessimistic can.
Full Table Scan (also known as Sequential Scan) is a scan made on a database where each row of the table under scan is read in a sequential (serial) order
Reference
In your case, full table scan scans sequentially (in ordered way) so that it does not need to scan whole table in order to advance next record because Col7 is ordered.
but in Query2 the case is not like that, Col01 is randomly distributed so full table scan is needed.
Query 1 is optimistic scan where as Query 2 is pessimistic can.
answered Jan 2 at 23:23
Derviş KayımbaşıoğluDerviş Kayımbaşıoğlu
15.7k22042
15.7k22042
None of them are ordered
– Malinga
Jan 3 at 2:16
add a comment |
None of them are ordered
– Malinga
Jan 3 at 2:16
None of them are ordered
– Malinga
Jan 3 at 2:16
None of them are ordered
– Malinga
Jan 3 at 2:16
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54014377%2fdoes-sql-server-table-scan-time-depend-on-the-query%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
There are about x100 more rows to fetch in the second test, that only could explain why it took 20 times longer
– GMB
Jan 2 at 23:21
1
Switch to a count(1) as the result.
– user2864740
Jan 2 at 23:28