Does SQL Server Table-Scan Time depend on the Query?












1















I observed that doing a full table scan takes a different time based on the query. I believed that under similar conditions (set of columns under select, column data types) a table scan should take a somewhat similar time. Seems like it's not the case. I just want to understand the reason behind that.



I have used "CHECKPOINT" and "DBCC DROPCLEANBUFFERS" before querying to make sure there is no impact from the query cache.



Table:




  • 10 Columns

  • 10M rows Each column has different densities ranging from 0.1 to 0.000001

  • No indexes


Queries:



Query A: returned 100 rows, time took: ~ 900ms



SELECT [COL00]
FROM [TEST].[dbo].[Test]
WHERE COL07 = 50000


Query B: returned 910595 rows, time took: ~ 15000ms



SELECT [COL00]
FROM [TEST].[dbo].[Test]
WHERE COL01 = 5


** Where column COL07 was randomly populated with integers ranging from 0 to 100000 and column COL01 was randomly populated with integers ranging from 0 to 10



Time Taken:





  • Query A: around 900 ms


  • Query B: around 18000 ms


What's the point I'm missing here?










share|improve this question




















  • 2





    There are about x100 more rows to fetch in the second test, that only could explain why it took 20 times longer

    – GMB
    Jan 2 at 23:21






  • 1





    Switch to a count(1) as the result.

    – user2864740
    Jan 2 at 23:28
















1















I observed that doing a full table scan takes a different time based on the query. I believed that under similar conditions (set of columns under select, column data types) a table scan should take a somewhat similar time. Seems like it's not the case. I just want to understand the reason behind that.



I have used "CHECKPOINT" and "DBCC DROPCLEANBUFFERS" before querying to make sure there is no impact from the query cache.



Table:




  • 10 Columns

  • 10M rows Each column has different densities ranging from 0.1 to 0.000001

  • No indexes


Queries:



Query A: returned 100 rows, time took: ~ 900ms



SELECT [COL00]
FROM [TEST].[dbo].[Test]
WHERE COL07 = 50000


Query B: returned 910595 rows, time took: ~ 15000ms



SELECT [COL00]
FROM [TEST].[dbo].[Test]
WHERE COL01 = 5


** Where column COL07 was randomly populated with integers ranging from 0 to 100000 and column COL01 was randomly populated with integers ranging from 0 to 10



Time Taken:





  • Query A: around 900 ms


  • Query B: around 18000 ms


What's the point I'm missing here?










share|improve this question




















  • 2





    There are about x100 more rows to fetch in the second test, that only could explain why it took 20 times longer

    – GMB
    Jan 2 at 23:21






  • 1





    Switch to a count(1) as the result.

    – user2864740
    Jan 2 at 23:28














1












1








1








I observed that doing a full table scan takes a different time based on the query. I believed that under similar conditions (set of columns under select, column data types) a table scan should take a somewhat similar time. Seems like it's not the case. I just want to understand the reason behind that.



I have used "CHECKPOINT" and "DBCC DROPCLEANBUFFERS" before querying to make sure there is no impact from the query cache.



Table:




  • 10 Columns

  • 10M rows Each column has different densities ranging from 0.1 to 0.000001

  • No indexes


Queries:



Query A: returned 100 rows, time took: ~ 900ms



SELECT [COL00]
FROM [TEST].[dbo].[Test]
WHERE COL07 = 50000


Query B: returned 910595 rows, time took: ~ 15000ms



SELECT [COL00]
FROM [TEST].[dbo].[Test]
WHERE COL01 = 5


** Where column COL07 was randomly populated with integers ranging from 0 to 100000 and column COL01 was randomly populated with integers ranging from 0 to 10



Time Taken:





  • Query A: around 900 ms


  • Query B: around 18000 ms


What's the point I'm missing here?










share|improve this question
















I observed that doing a full table scan takes a different time based on the query. I believed that under similar conditions (set of columns under select, column data types) a table scan should take a somewhat similar time. Seems like it's not the case. I just want to understand the reason behind that.



I have used "CHECKPOINT" and "DBCC DROPCLEANBUFFERS" before querying to make sure there is no impact from the query cache.



Table:




  • 10 Columns

  • 10M rows Each column has different densities ranging from 0.1 to 0.000001

  • No indexes


Queries:



Query A: returned 100 rows, time took: ~ 900ms



SELECT [COL00]
FROM [TEST].[dbo].[Test]
WHERE COL07 = 50000


Query B: returned 910595 rows, time took: ~ 15000ms



SELECT [COL00]
FROM [TEST].[dbo].[Test]
WHERE COL01 = 5


** Where column COL07 was randomly populated with integers ranging from 0 to 100000 and column COL01 was randomly populated with integers ranging from 0 to 10



Time Taken:





  • Query A: around 900 ms


  • Query B: around 18000 ms


What's the point I'm missing here?







sql sql-server






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 3 at 0:21









GMB

20.8k51028




20.8k51028










asked Jan 2 at 23:12









MalingaMalinga

387314




387314








  • 2





    There are about x100 more rows to fetch in the second test, that only could explain why it took 20 times longer

    – GMB
    Jan 2 at 23:21






  • 1





    Switch to a count(1) as the result.

    – user2864740
    Jan 2 at 23:28














  • 2





    There are about x100 more rows to fetch in the second test, that only could explain why it took 20 times longer

    – GMB
    Jan 2 at 23:21






  • 1





    Switch to a count(1) as the result.

    – user2864740
    Jan 2 at 23:28








2




2





There are about x100 more rows to fetch in the second test, that only could explain why it took 20 times longer

– GMB
Jan 2 at 23:21





There are about x100 more rows to fetch in the second test, that only could explain why it took 20 times longer

– GMB
Jan 2 at 23:21




1




1





Switch to a count(1) as the result.

– user2864740
Jan 2 at 23:28





Switch to a count(1) as the result.

– user2864740
Jan 2 at 23:28












3 Answers
3






active

oldest

votes


















3















Query A: (returned 100 rows, time took: ~ 900ms)



Query B: (returned 910595 rows, time took: ~ 15000ms)




I believe that what you are missing is that there are about x100 more rows to fetch in the second query. That only could explain why it took 20 times longer.






share|improve this answer
























  • Yap! seems like that. However what I thought is, isn't table scan mean going through actual row data? if that's the case anyway both queries have to go through all the data (I mean fetch data). Or is there a separate cost involved in fetching?

    – Malinga
    Jan 3 at 2:36











  • As an example think I used SELECT [COL07] FROM [TEST].[dbo].[Test] WHERE COL07 = 50000 and SELECT [COL01] FROM [TEST].[dbo].[Test] WHERE COL01 = 5 (which behave similar to what I had in question ) now I'm selecting whatever in the where clause. If the query has to go through all values for the columns in where clause to find the matching rows, why there is an additional fetching cost?

    – Malinga
    Jan 3 at 2:37








  • 1





    It still has to actually do something with the rows when you find them, which takes time as well. If you tried Select count(1) from ... instead of Select *, that would minimise the amount of processing to be done per row, and I suspect the times would be a lot closer.

    – DancingFool
    Jan 3 at 2:54











  • Ya, I checked with count(1) and Query B now takes only a very small time. I might have to dig deep to understand what that something is. Thanks.

    – Malinga
    Jan 3 at 5:38



















0














The two columns have different density of the data.



Query A, COL07: 10000000/100000 = 100

Query B, COL05: 10000000/10 = 1000000



The fact that both the search parameters are in the middle of the data range doesn't necessarily impact the speed of the search. This is depending on the number of times the engine scans the column to return the values of the search predicate.



In order to see if this is indeed the case, I would try the following:

COL04: 10000000/1000 = 10000. Filtering on WHERE COL04 = 500

COL08: 10000000/10000 = 1000. Filtering on WHERE COL05 = 5000



Considering the times from the initial test, you would expect to see COL04 at ~7200ms and COL05 at ~3600ms.



An interesting article about SQL Server COUNT() Function Performance Comparison






share|improve this answer

































    -1















    Full Table Scan (also known as Sequential Scan) is a scan made on a database where each row of the table under scan is read in a sequential (serial) order




    Reference



    In your case, full table scan scans sequentially (in ordered way) so that it does not need to scan whole table in order to advance next record because Col7 is ordered.



    but in Query2 the case is not like that, Col01 is randomly distributed so full table scan is needed.



    Query 1 is optimistic scan where as Query 2 is pessimistic can.






    share|improve this answer
























    • None of them are ordered

      – Malinga
      Jan 3 at 2:16












    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54014377%2fdoes-sql-server-table-scan-time-depend-on-the-query%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    3















    Query A: (returned 100 rows, time took: ~ 900ms)



    Query B: (returned 910595 rows, time took: ~ 15000ms)




    I believe that what you are missing is that there are about x100 more rows to fetch in the second query. That only could explain why it took 20 times longer.






    share|improve this answer
























    • Yap! seems like that. However what I thought is, isn't table scan mean going through actual row data? if that's the case anyway both queries have to go through all the data (I mean fetch data). Or is there a separate cost involved in fetching?

      – Malinga
      Jan 3 at 2:36











    • As an example think I used SELECT [COL07] FROM [TEST].[dbo].[Test] WHERE COL07 = 50000 and SELECT [COL01] FROM [TEST].[dbo].[Test] WHERE COL01 = 5 (which behave similar to what I had in question ) now I'm selecting whatever in the where clause. If the query has to go through all values for the columns in where clause to find the matching rows, why there is an additional fetching cost?

      – Malinga
      Jan 3 at 2:37








    • 1





      It still has to actually do something with the rows when you find them, which takes time as well. If you tried Select count(1) from ... instead of Select *, that would minimise the amount of processing to be done per row, and I suspect the times would be a lot closer.

      – DancingFool
      Jan 3 at 2:54











    • Ya, I checked with count(1) and Query B now takes only a very small time. I might have to dig deep to understand what that something is. Thanks.

      – Malinga
      Jan 3 at 5:38
















    3















    Query A: (returned 100 rows, time took: ~ 900ms)



    Query B: (returned 910595 rows, time took: ~ 15000ms)




    I believe that what you are missing is that there are about x100 more rows to fetch in the second query. That only could explain why it took 20 times longer.






    share|improve this answer
























    • Yap! seems like that. However what I thought is, isn't table scan mean going through actual row data? if that's the case anyway both queries have to go through all the data (I mean fetch data). Or is there a separate cost involved in fetching?

      – Malinga
      Jan 3 at 2:36











    • As an example think I used SELECT [COL07] FROM [TEST].[dbo].[Test] WHERE COL07 = 50000 and SELECT [COL01] FROM [TEST].[dbo].[Test] WHERE COL01 = 5 (which behave similar to what I had in question ) now I'm selecting whatever in the where clause. If the query has to go through all values for the columns in where clause to find the matching rows, why there is an additional fetching cost?

      – Malinga
      Jan 3 at 2:37








    • 1





      It still has to actually do something with the rows when you find them, which takes time as well. If you tried Select count(1) from ... instead of Select *, that would minimise the amount of processing to be done per row, and I suspect the times would be a lot closer.

      – DancingFool
      Jan 3 at 2:54











    • Ya, I checked with count(1) and Query B now takes only a very small time. I might have to dig deep to understand what that something is. Thanks.

      – Malinga
      Jan 3 at 5:38














    3












    3








    3








    Query A: (returned 100 rows, time took: ~ 900ms)



    Query B: (returned 910595 rows, time took: ~ 15000ms)




    I believe that what you are missing is that there are about x100 more rows to fetch in the second query. That only could explain why it took 20 times longer.






    share|improve this answer














    Query A: (returned 100 rows, time took: ~ 900ms)



    Query B: (returned 910595 rows, time took: ~ 15000ms)




    I believe that what you are missing is that there are about x100 more rows to fetch in the second query. That only could explain why it took 20 times longer.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Jan 2 at 23:25









    GMBGMB

    20.8k51028




    20.8k51028













    • Yap! seems like that. However what I thought is, isn't table scan mean going through actual row data? if that's the case anyway both queries have to go through all the data (I mean fetch data). Or is there a separate cost involved in fetching?

      – Malinga
      Jan 3 at 2:36











    • As an example think I used SELECT [COL07] FROM [TEST].[dbo].[Test] WHERE COL07 = 50000 and SELECT [COL01] FROM [TEST].[dbo].[Test] WHERE COL01 = 5 (which behave similar to what I had in question ) now I'm selecting whatever in the where clause. If the query has to go through all values for the columns in where clause to find the matching rows, why there is an additional fetching cost?

      – Malinga
      Jan 3 at 2:37








    • 1





      It still has to actually do something with the rows when you find them, which takes time as well. If you tried Select count(1) from ... instead of Select *, that would minimise the amount of processing to be done per row, and I suspect the times would be a lot closer.

      – DancingFool
      Jan 3 at 2:54











    • Ya, I checked with count(1) and Query B now takes only a very small time. I might have to dig deep to understand what that something is. Thanks.

      – Malinga
      Jan 3 at 5:38



















    • Yap! seems like that. However what I thought is, isn't table scan mean going through actual row data? if that's the case anyway both queries have to go through all the data (I mean fetch data). Or is there a separate cost involved in fetching?

      – Malinga
      Jan 3 at 2:36











    • As an example think I used SELECT [COL07] FROM [TEST].[dbo].[Test] WHERE COL07 = 50000 and SELECT [COL01] FROM [TEST].[dbo].[Test] WHERE COL01 = 5 (which behave similar to what I had in question ) now I'm selecting whatever in the where clause. If the query has to go through all values for the columns in where clause to find the matching rows, why there is an additional fetching cost?

      – Malinga
      Jan 3 at 2:37








    • 1





      It still has to actually do something with the rows when you find them, which takes time as well. If you tried Select count(1) from ... instead of Select *, that would minimise the amount of processing to be done per row, and I suspect the times would be a lot closer.

      – DancingFool
      Jan 3 at 2:54











    • Ya, I checked with count(1) and Query B now takes only a very small time. I might have to dig deep to understand what that something is. Thanks.

      – Malinga
      Jan 3 at 5:38

















    Yap! seems like that. However what I thought is, isn't table scan mean going through actual row data? if that's the case anyway both queries have to go through all the data (I mean fetch data). Or is there a separate cost involved in fetching?

    – Malinga
    Jan 3 at 2:36





    Yap! seems like that. However what I thought is, isn't table scan mean going through actual row data? if that's the case anyway both queries have to go through all the data (I mean fetch data). Or is there a separate cost involved in fetching?

    – Malinga
    Jan 3 at 2:36













    As an example think I used SELECT [COL07] FROM [TEST].[dbo].[Test] WHERE COL07 = 50000 and SELECT [COL01] FROM [TEST].[dbo].[Test] WHERE COL01 = 5 (which behave similar to what I had in question ) now I'm selecting whatever in the where clause. If the query has to go through all values for the columns in where clause to find the matching rows, why there is an additional fetching cost?

    – Malinga
    Jan 3 at 2:37







    As an example think I used SELECT [COL07] FROM [TEST].[dbo].[Test] WHERE COL07 = 50000 and SELECT [COL01] FROM [TEST].[dbo].[Test] WHERE COL01 = 5 (which behave similar to what I had in question ) now I'm selecting whatever in the where clause. If the query has to go through all values for the columns in where clause to find the matching rows, why there is an additional fetching cost?

    – Malinga
    Jan 3 at 2:37






    1




    1





    It still has to actually do something with the rows when you find them, which takes time as well. If you tried Select count(1) from ... instead of Select *, that would minimise the amount of processing to be done per row, and I suspect the times would be a lot closer.

    – DancingFool
    Jan 3 at 2:54





    It still has to actually do something with the rows when you find them, which takes time as well. If you tried Select count(1) from ... instead of Select *, that would minimise the amount of processing to be done per row, and I suspect the times would be a lot closer.

    – DancingFool
    Jan 3 at 2:54













    Ya, I checked with count(1) and Query B now takes only a very small time. I might have to dig deep to understand what that something is. Thanks.

    – Malinga
    Jan 3 at 5:38





    Ya, I checked with count(1) and Query B now takes only a very small time. I might have to dig deep to understand what that something is. Thanks.

    – Malinga
    Jan 3 at 5:38













    0














    The two columns have different density of the data.



    Query A, COL07: 10000000/100000 = 100

    Query B, COL05: 10000000/10 = 1000000



    The fact that both the search parameters are in the middle of the data range doesn't necessarily impact the speed of the search. This is depending on the number of times the engine scans the column to return the values of the search predicate.



    In order to see if this is indeed the case, I would try the following:

    COL04: 10000000/1000 = 10000. Filtering on WHERE COL04 = 500

    COL08: 10000000/10000 = 1000. Filtering on WHERE COL05 = 5000



    Considering the times from the initial test, you would expect to see COL04 at ~7200ms and COL05 at ~3600ms.



    An interesting article about SQL Server COUNT() Function Performance Comparison






    share|improve this answer






























      0














      The two columns have different density of the data.



      Query A, COL07: 10000000/100000 = 100

      Query B, COL05: 10000000/10 = 1000000



      The fact that both the search parameters are in the middle of the data range doesn't necessarily impact the speed of the search. This is depending on the number of times the engine scans the column to return the values of the search predicate.



      In order to see if this is indeed the case, I would try the following:

      COL04: 10000000/1000 = 10000. Filtering on WHERE COL04 = 500

      COL08: 10000000/10000 = 1000. Filtering on WHERE COL05 = 5000



      Considering the times from the initial test, you would expect to see COL04 at ~7200ms and COL05 at ~3600ms.



      An interesting article about SQL Server COUNT() Function Performance Comparison






      share|improve this answer




























        0












        0








        0







        The two columns have different density of the data.



        Query A, COL07: 10000000/100000 = 100

        Query B, COL05: 10000000/10 = 1000000



        The fact that both the search parameters are in the middle of the data range doesn't necessarily impact the speed of the search. This is depending on the number of times the engine scans the column to return the values of the search predicate.



        In order to see if this is indeed the case, I would try the following:

        COL04: 10000000/1000 = 10000. Filtering on WHERE COL04 = 500

        COL08: 10000000/10000 = 1000. Filtering on WHERE COL05 = 5000



        Considering the times from the initial test, you would expect to see COL04 at ~7200ms and COL05 at ~3600ms.



        An interesting article about SQL Server COUNT() Function Performance Comparison






        share|improve this answer















        The two columns have different density of the data.



        Query A, COL07: 10000000/100000 = 100

        Query B, COL05: 10000000/10 = 1000000



        The fact that both the search parameters are in the middle of the data range doesn't necessarily impact the speed of the search. This is depending on the number of times the engine scans the column to return the values of the search predicate.



        In order to see if this is indeed the case, I would try the following:

        COL04: 10000000/1000 = 10000. Filtering on WHERE COL04 = 500

        COL08: 10000000/10000 = 1000. Filtering on WHERE COL05 = 5000



        Considering the times from the initial test, you would expect to see COL04 at ~7200ms and COL05 at ~3600ms.



        An interesting article about SQL Server COUNT() Function Performance Comparison







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Jan 4 at 22:15

























        answered Jan 3 at 0:18









        Marc0Marc0

        1116




        1116























            -1















            Full Table Scan (also known as Sequential Scan) is a scan made on a database where each row of the table under scan is read in a sequential (serial) order




            Reference



            In your case, full table scan scans sequentially (in ordered way) so that it does not need to scan whole table in order to advance next record because Col7 is ordered.



            but in Query2 the case is not like that, Col01 is randomly distributed so full table scan is needed.



            Query 1 is optimistic scan where as Query 2 is pessimistic can.






            share|improve this answer
























            • None of them are ordered

              – Malinga
              Jan 3 at 2:16
















            -1















            Full Table Scan (also known as Sequential Scan) is a scan made on a database where each row of the table under scan is read in a sequential (serial) order




            Reference



            In your case, full table scan scans sequentially (in ordered way) so that it does not need to scan whole table in order to advance next record because Col7 is ordered.



            but in Query2 the case is not like that, Col01 is randomly distributed so full table scan is needed.



            Query 1 is optimistic scan where as Query 2 is pessimistic can.






            share|improve this answer
























            • None of them are ordered

              – Malinga
              Jan 3 at 2:16














            -1












            -1








            -1








            Full Table Scan (also known as Sequential Scan) is a scan made on a database where each row of the table under scan is read in a sequential (serial) order




            Reference



            In your case, full table scan scans sequentially (in ordered way) so that it does not need to scan whole table in order to advance next record because Col7 is ordered.



            but in Query2 the case is not like that, Col01 is randomly distributed so full table scan is needed.



            Query 1 is optimistic scan where as Query 2 is pessimistic can.






            share|improve this answer














            Full Table Scan (also known as Sequential Scan) is a scan made on a database where each row of the table under scan is read in a sequential (serial) order




            Reference



            In your case, full table scan scans sequentially (in ordered way) so that it does not need to scan whole table in order to advance next record because Col7 is ordered.



            but in Query2 the case is not like that, Col01 is randomly distributed so full table scan is needed.



            Query 1 is optimistic scan where as Query 2 is pessimistic can.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Jan 2 at 23:23









            Derviş KayımbaşıoğluDerviş Kayımbaşıoğlu

            15.7k22042




            15.7k22042













            • None of them are ordered

              – Malinga
              Jan 3 at 2:16



















            • None of them are ordered

              – Malinga
              Jan 3 at 2:16

















            None of them are ordered

            – Malinga
            Jan 3 at 2:16





            None of them are ordered

            – Malinga
            Jan 3 at 2:16


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54014377%2fdoes-sql-server-table-scan-time-depend-on-the-query%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Can a sorcerer learn a 5th-level spell early by creating spell slots using the Font of Magic feature?

            Does disintegrating a polymorphed enemy still kill it after the 2018 errata?

            A Topological Invariant for $pi_3(U(n))$