SQL Server: more efficient way to cluster a timeline OR reconstruct a batch number











up vote
2
down vote

favorite












I'm working on a large dataset (150k / day) of a tester database. Each row contains data about a specific test of the product. Each tester inserts the results of his test.



I want to do some measurements like pass-fail-rate over a shift per product and tester. The problem is there are no batch numbers assigned so I can't select this easy.



Considering the given subselect of the whole table:



 id   tBegin                orderId   
------------------------------------
1 2018-10-20 00:00:05 1
2 2018-10-20 00:05:15 1
3 2018-10-20 01:00:05 1
10 2018-10-20 10:03:05 3
12 2018-10-20 11:04:05 8
20 2018-10-20 14:15:05 3
37 2018-10-20 18:12:05 1


My goal is it to cluster the data to the following



 id   tBegin                orderId   pCount 
--------------------------------------------
1 2018-10-20 00:00:05 1 3
10 2018-10-20 10:03:05 3 1
12 2018-10-20 11:04:05 8 1
20 2018-10-20 14:15:05 3 1
37 2018-10-20 18:12:05 1 1


A simple GROUP BY orderID won't do the trick, so I came upwith the following



SELECT 
MIN(c.id) AS id,
MIN(c.tBegin) AS tBegin,
c.orderId,
COUNT(*) AS pCount
FROM (
SELECT t2.id, t2.tBegin, t2.orderId,
( SELECT TOP 1 t.id
FROM history t
WHERE t.tBegin > t2.tBegin
AND t.orderID <> t2.orderID
AND <restrict date here further>
ORDER BY t.tBegin
) AS nextId
FROM history t2
) AS c
WHERE <restrict date here>
GROUP BY c.orderID, c.nextId


I left out the WHEREs that select the correct date and tester.
This works, but it seams very inefficient. I have worked with small databases, but I'm new to SQL Server 2017.



I appreciate your help very much!










share|improve this question









New contributor




John Bart is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
























    up vote
    2
    down vote

    favorite












    I'm working on a large dataset (150k / day) of a tester database. Each row contains data about a specific test of the product. Each tester inserts the results of his test.



    I want to do some measurements like pass-fail-rate over a shift per product and tester. The problem is there are no batch numbers assigned so I can't select this easy.



    Considering the given subselect of the whole table:



     id   tBegin                orderId   
    ------------------------------------
    1 2018-10-20 00:00:05 1
    2 2018-10-20 00:05:15 1
    3 2018-10-20 01:00:05 1
    10 2018-10-20 10:03:05 3
    12 2018-10-20 11:04:05 8
    20 2018-10-20 14:15:05 3
    37 2018-10-20 18:12:05 1


    My goal is it to cluster the data to the following



     id   tBegin                orderId   pCount 
    --------------------------------------------
    1 2018-10-20 00:00:05 1 3
    10 2018-10-20 10:03:05 3 1
    12 2018-10-20 11:04:05 8 1
    20 2018-10-20 14:15:05 3 1
    37 2018-10-20 18:12:05 1 1


    A simple GROUP BY orderID won't do the trick, so I came upwith the following



    SELECT 
    MIN(c.id) AS id,
    MIN(c.tBegin) AS tBegin,
    c.orderId,
    COUNT(*) AS pCount
    FROM (
    SELECT t2.id, t2.tBegin, t2.orderId,
    ( SELECT TOP 1 t.id
    FROM history t
    WHERE t.tBegin > t2.tBegin
    AND t.orderID <> t2.orderID
    AND <restrict date here further>
    ORDER BY t.tBegin
    ) AS nextId
    FROM history t2
    ) AS c
    WHERE <restrict date here>
    GROUP BY c.orderID, c.nextId


    I left out the WHEREs that select the correct date and tester.
    This works, but it seams very inefficient. I have worked with small databases, but I'm new to SQL Server 2017.



    I appreciate your help very much!










    share|improve this question









    New contributor




    John Bart is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






















      up vote
      2
      down vote

      favorite









      up vote
      2
      down vote

      favorite











      I'm working on a large dataset (150k / day) of a tester database. Each row contains data about a specific test of the product. Each tester inserts the results of his test.



      I want to do some measurements like pass-fail-rate over a shift per product and tester. The problem is there are no batch numbers assigned so I can't select this easy.



      Considering the given subselect of the whole table:



       id   tBegin                orderId   
      ------------------------------------
      1 2018-10-20 00:00:05 1
      2 2018-10-20 00:05:15 1
      3 2018-10-20 01:00:05 1
      10 2018-10-20 10:03:05 3
      12 2018-10-20 11:04:05 8
      20 2018-10-20 14:15:05 3
      37 2018-10-20 18:12:05 1


      My goal is it to cluster the data to the following



       id   tBegin                orderId   pCount 
      --------------------------------------------
      1 2018-10-20 00:00:05 1 3
      10 2018-10-20 10:03:05 3 1
      12 2018-10-20 11:04:05 8 1
      20 2018-10-20 14:15:05 3 1
      37 2018-10-20 18:12:05 1 1


      A simple GROUP BY orderID won't do the trick, so I came upwith the following



      SELECT 
      MIN(c.id) AS id,
      MIN(c.tBegin) AS tBegin,
      c.orderId,
      COUNT(*) AS pCount
      FROM (
      SELECT t2.id, t2.tBegin, t2.orderId,
      ( SELECT TOP 1 t.id
      FROM history t
      WHERE t.tBegin > t2.tBegin
      AND t.orderID <> t2.orderID
      AND <restrict date here further>
      ORDER BY t.tBegin
      ) AS nextId
      FROM history t2
      ) AS c
      WHERE <restrict date here>
      GROUP BY c.orderID, c.nextId


      I left out the WHEREs that select the correct date and tester.
      This works, but it seams very inefficient. I have worked with small databases, but I'm new to SQL Server 2017.



      I appreciate your help very much!










      share|improve this question









      New contributor




      John Bart is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      I'm working on a large dataset (150k / day) of a tester database. Each row contains data about a specific test of the product. Each tester inserts the results of his test.



      I want to do some measurements like pass-fail-rate over a shift per product and tester. The problem is there are no batch numbers assigned so I can't select this easy.



      Considering the given subselect of the whole table:



       id   tBegin                orderId   
      ------------------------------------
      1 2018-10-20 00:00:05 1
      2 2018-10-20 00:05:15 1
      3 2018-10-20 01:00:05 1
      10 2018-10-20 10:03:05 3
      12 2018-10-20 11:04:05 8
      20 2018-10-20 14:15:05 3
      37 2018-10-20 18:12:05 1


      My goal is it to cluster the data to the following



       id   tBegin                orderId   pCount 
      --------------------------------------------
      1 2018-10-20 00:00:05 1 3
      10 2018-10-20 10:03:05 3 1
      12 2018-10-20 11:04:05 8 1
      20 2018-10-20 14:15:05 3 1
      37 2018-10-20 18:12:05 1 1


      A simple GROUP BY orderID won't do the trick, so I came upwith the following



      SELECT 
      MIN(c.id) AS id,
      MIN(c.tBegin) AS tBegin,
      c.orderId,
      COUNT(*) AS pCount
      FROM (
      SELECT t2.id, t2.tBegin, t2.orderId,
      ( SELECT TOP 1 t.id
      FROM history t
      WHERE t.tBegin > t2.tBegin
      AND t.orderID <> t2.orderID
      AND <restrict date here further>
      ORDER BY t.tBegin
      ) AS nextId
      FROM history t2
      ) AS c
      WHERE <restrict date here>
      GROUP BY c.orderID, c.nextId


      I left out the WHEREs that select the correct date and tester.
      This works, but it seams very inefficient. I have worked with small databases, but I'm new to SQL Server 2017.



      I appreciate your help very much!







      sql sql-server date group-by sql-server-2017






      share|improve this question









      New contributor




      John Bart is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      John Bart is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited 17 hours ago









      Salman A

      171k65328413




      171k65328413






      New contributor




      John Bart is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked yesterday









      John Bart

      132




      132




      New contributor




      John Bart is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      John Bart is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      John Bart is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.
























          2 Answers
          2






          active

          oldest

          votes

















          up vote
          0
          down vote



          accepted










          You can use window functions for this:



          DECLARE @t TABLE (id INT, tBegin DATETIME, orderId INT);
          INSERT INTO @t VALUES
          (1 , '2018-10-20 00:00:05', 1),
          (2 , '2018-10-20 00:05:15', 1),
          (3 , '2018-10-20 01:00:05', 1),
          (10, '2018-10-20 10:03:05', 3),
          (12, '2018-10-20 11:04:05', 8),
          (20, '2018-10-20 14:15:05', 3),
          (37, '2018-10-20 18:12:05', 1);

          WITH cte1 AS (
          SELECT *, CASE WHEN orderId = LAG(orderId) OVER (ORDER BY tBegin) THEN 0 ELSE 1 END AS chg
          FROM @t
          ), cte2 AS (
          SELECT *, SUM(chg) OVER(ORDER BY tBegin) AS grp
          FROM cte1
          ), cte3 AS (
          SELECT *, ROW_NUMBER() OVER (PARTITION BY grp ORDER BY tBegin) AS rn
          FROM cte2
          )
          SELECT *
          FROM cte3
          WHERE rn = 1



          • The first cte assigns a "change flag" to each row where the value changed

          • The second cte uses a running sum to convert 1s and 0s to a number which can be used to group rows

          • Finally you number rows within each group and select first row per group


          Demo on DB Fiddle






          share|improve this answer






























            up vote
            0
            down vote













            You can use cumulative approach :



            select min(id) as id, max(tBegin), orderid, count(*) 
            from (select h.*,
            row_number() over (order by id) as seq1,
            row_number() over (partition by orderid order by id) as seq2
            from history h
            ) h
            group by orderid, (seq1 - seq2)
            order by id;





            share|improve this answer





















              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });






              John Bart is a new contributor. Be nice, and check out our Code of Conduct.










               

              draft saved


              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53372583%2fsql-server-more-efficient-way-to-cluster-a-timeline-or-reconstruct-a-batch-numb%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes








              up vote
              0
              down vote



              accepted










              You can use window functions for this:



              DECLARE @t TABLE (id INT, tBegin DATETIME, orderId INT);
              INSERT INTO @t VALUES
              (1 , '2018-10-20 00:00:05', 1),
              (2 , '2018-10-20 00:05:15', 1),
              (3 , '2018-10-20 01:00:05', 1),
              (10, '2018-10-20 10:03:05', 3),
              (12, '2018-10-20 11:04:05', 8),
              (20, '2018-10-20 14:15:05', 3),
              (37, '2018-10-20 18:12:05', 1);

              WITH cte1 AS (
              SELECT *, CASE WHEN orderId = LAG(orderId) OVER (ORDER BY tBegin) THEN 0 ELSE 1 END AS chg
              FROM @t
              ), cte2 AS (
              SELECT *, SUM(chg) OVER(ORDER BY tBegin) AS grp
              FROM cte1
              ), cte3 AS (
              SELECT *, ROW_NUMBER() OVER (PARTITION BY grp ORDER BY tBegin) AS rn
              FROM cte2
              )
              SELECT *
              FROM cte3
              WHERE rn = 1



              • The first cte assigns a "change flag" to each row where the value changed

              • The second cte uses a running sum to convert 1s and 0s to a number which can be used to group rows

              • Finally you number rows within each group and select first row per group


              Demo on DB Fiddle






              share|improve this answer



























                up vote
                0
                down vote



                accepted










                You can use window functions for this:



                DECLARE @t TABLE (id INT, tBegin DATETIME, orderId INT);
                INSERT INTO @t VALUES
                (1 , '2018-10-20 00:00:05', 1),
                (2 , '2018-10-20 00:05:15', 1),
                (3 , '2018-10-20 01:00:05', 1),
                (10, '2018-10-20 10:03:05', 3),
                (12, '2018-10-20 11:04:05', 8),
                (20, '2018-10-20 14:15:05', 3),
                (37, '2018-10-20 18:12:05', 1);

                WITH cte1 AS (
                SELECT *, CASE WHEN orderId = LAG(orderId) OVER (ORDER BY tBegin) THEN 0 ELSE 1 END AS chg
                FROM @t
                ), cte2 AS (
                SELECT *, SUM(chg) OVER(ORDER BY tBegin) AS grp
                FROM cte1
                ), cte3 AS (
                SELECT *, ROW_NUMBER() OVER (PARTITION BY grp ORDER BY tBegin) AS rn
                FROM cte2
                )
                SELECT *
                FROM cte3
                WHERE rn = 1



                • The first cte assigns a "change flag" to each row where the value changed

                • The second cte uses a running sum to convert 1s and 0s to a number which can be used to group rows

                • Finally you number rows within each group and select first row per group


                Demo on DB Fiddle






                share|improve this answer

























                  up vote
                  0
                  down vote



                  accepted







                  up vote
                  0
                  down vote



                  accepted






                  You can use window functions for this:



                  DECLARE @t TABLE (id INT, tBegin DATETIME, orderId INT);
                  INSERT INTO @t VALUES
                  (1 , '2018-10-20 00:00:05', 1),
                  (2 , '2018-10-20 00:05:15', 1),
                  (3 , '2018-10-20 01:00:05', 1),
                  (10, '2018-10-20 10:03:05', 3),
                  (12, '2018-10-20 11:04:05', 8),
                  (20, '2018-10-20 14:15:05', 3),
                  (37, '2018-10-20 18:12:05', 1);

                  WITH cte1 AS (
                  SELECT *, CASE WHEN orderId = LAG(orderId) OVER (ORDER BY tBegin) THEN 0 ELSE 1 END AS chg
                  FROM @t
                  ), cte2 AS (
                  SELECT *, SUM(chg) OVER(ORDER BY tBegin) AS grp
                  FROM cte1
                  ), cte3 AS (
                  SELECT *, ROW_NUMBER() OVER (PARTITION BY grp ORDER BY tBegin) AS rn
                  FROM cte2
                  )
                  SELECT *
                  FROM cte3
                  WHERE rn = 1



                  • The first cte assigns a "change flag" to each row where the value changed

                  • The second cte uses a running sum to convert 1s and 0s to a number which can be used to group rows

                  • Finally you number rows within each group and select first row per group


                  Demo on DB Fiddle






                  share|improve this answer














                  You can use window functions for this:



                  DECLARE @t TABLE (id INT, tBegin DATETIME, orderId INT);
                  INSERT INTO @t VALUES
                  (1 , '2018-10-20 00:00:05', 1),
                  (2 , '2018-10-20 00:05:15', 1),
                  (3 , '2018-10-20 01:00:05', 1),
                  (10, '2018-10-20 10:03:05', 3),
                  (12, '2018-10-20 11:04:05', 8),
                  (20, '2018-10-20 14:15:05', 3),
                  (37, '2018-10-20 18:12:05', 1);

                  WITH cte1 AS (
                  SELECT *, CASE WHEN orderId = LAG(orderId) OVER (ORDER BY tBegin) THEN 0 ELSE 1 END AS chg
                  FROM @t
                  ), cte2 AS (
                  SELECT *, SUM(chg) OVER(ORDER BY tBegin) AS grp
                  FROM cte1
                  ), cte3 AS (
                  SELECT *, ROW_NUMBER() OVER (PARTITION BY grp ORDER BY tBegin) AS rn
                  FROM cte2
                  )
                  SELECT *
                  FROM cte3
                  WHERE rn = 1



                  • The first cte assigns a "change flag" to each row where the value changed

                  • The second cte uses a running sum to convert 1s and 0s to a number which can be used to group rows

                  • Finally you number rows within each group and select first row per group


                  Demo on DB Fiddle







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited yesterday

























                  answered yesterday









                  Salman A

                  171k65328413




                  171k65328413
























                      up vote
                      0
                      down vote













                      You can use cumulative approach :



                      select min(id) as id, max(tBegin), orderid, count(*) 
                      from (select h.*,
                      row_number() over (order by id) as seq1,
                      row_number() over (partition by orderid order by id) as seq2
                      from history h
                      ) h
                      group by orderid, (seq1 - seq2)
                      order by id;





                      share|improve this answer

























                        up vote
                        0
                        down vote













                        You can use cumulative approach :



                        select min(id) as id, max(tBegin), orderid, count(*) 
                        from (select h.*,
                        row_number() over (order by id) as seq1,
                        row_number() over (partition by orderid order by id) as seq2
                        from history h
                        ) h
                        group by orderid, (seq1 - seq2)
                        order by id;





                        share|improve this answer























                          up vote
                          0
                          down vote










                          up vote
                          0
                          down vote









                          You can use cumulative approach :



                          select min(id) as id, max(tBegin), orderid, count(*) 
                          from (select h.*,
                          row_number() over (order by id) as seq1,
                          row_number() over (partition by orderid order by id) as seq2
                          from history h
                          ) h
                          group by orderid, (seq1 - seq2)
                          order by id;





                          share|improve this answer












                          You can use cumulative approach :



                          select min(id) as id, max(tBegin), orderid, count(*) 
                          from (select h.*,
                          row_number() over (order by id) as seq1,
                          row_number() over (partition by orderid order by id) as seq2
                          from history h
                          ) h
                          group by orderid, (seq1 - seq2)
                          order by id;






                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered yesterday









                          Yogesh Sharma

                          26.4k51334




                          26.4k51334






















                              John Bart is a new contributor. Be nice, and check out our Code of Conduct.










                               

                              draft saved


                              draft discarded


















                              John Bart is a new contributor. Be nice, and check out our Code of Conduct.













                              John Bart is a new contributor. Be nice, and check out our Code of Conduct.












                              John Bart is a new contributor. Be nice, and check out our Code of Conduct.















                               


                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53372583%2fsql-server-more-efficient-way-to-cluster-a-timeline-or-reconstruct-a-batch-numb%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Can a sorcerer learn a 5th-level spell early by creating spell slots using the Font of Magic feature?

                              ts Property 'filter' does not exist on type '{}'

                              mat-slide-toggle shouldn't change it's state when I click cancel in confirmation window