SQL Server: more efficient way to cluster a timeline OR reconstruct a batch number

up vote
2
down vote

favorite

I'm working on a large dataset (150k / day) of a tester database. Each row contains data about a specific test of the product. Each tester inserts the results of his test.

I want to do some measurements like pass-fail-rate over a shift per product and tester. The problem is there are no batch numbers assigned so I can't select this easy.

Considering the given subselect of the whole table:

 id   tBegin                orderId   

------------------------------------

 1    2018-10-20 00:00:05   1

 2    2018-10-20 00:05:15   1

 3    2018-10-20 01:00:05   1

 10   2018-10-20 10:03:05   3

 12   2018-10-20 11:04:05   8

 20   2018-10-20 14:15:05   3

 37   2018-10-20 18:12:05   1

My goal is it to cluster the data to the following

 id   tBegin                orderId   pCount 

--------------------------------------------

 1    2018-10-20 00:00:05   1         3

 10   2018-10-20 10:03:05   3         1

 12   2018-10-20 11:04:05   8         1

 20   2018-10-20 14:15:05   3         1

 37   2018-10-20 18:12:05   1         1

A simple GROUP BY orderID won't do the trick, so I came upwith the following

SELECT 

  MIN(c.id) AS id,

  MIN(c.tBegin) AS tBegin,

  c.orderId,

  COUNT(*) AS pCount

FROM (

    SELECT t2.id, t2.tBegin, t2.orderId,

      ( SELECT TOP 1 t.id

        FROM history t

        WHERE t.tBegin > t2.tBegin

          AND t.orderID <> t2.orderID

          AND <restrict date here further>

        ORDER BY t.tBegin 

       ) AS nextId

    FROM history t2 

) AS c

WHERE <restrict date here>

GROUP BY c.orderID, c.nextId

I left out the WHEREs that select the correct date and tester.
This works, but it seams very inefficient. I have worked with small databases, but I'm new to SQL Server 2017.

I appreciate your help very much!

edited 17 hours ago

Salman A

171k65328413

asked yesterday

John Bart

132

New contributor

add a comment |

up vote
2
down vote

favorite

I'm working on a large dataset (150k / day) of a tester database. Each row contains data about a specific test of the product. Each tester inserts the results of his test.

I want to do some measurements like pass-fail-rate over a shift per product and tester. The problem is there are no batch numbers assigned so I can't select this easy.

Considering the given subselect of the whole table:

 id   tBegin                orderId   

------------------------------------

 1    2018-10-20 00:00:05   1

 2    2018-10-20 00:05:15   1

 3    2018-10-20 01:00:05   1

 10   2018-10-20 10:03:05   3

 12   2018-10-20 11:04:05   8

 20   2018-10-20 14:15:05   3

 37   2018-10-20 18:12:05   1

My goal is it to cluster the data to the following

 id   tBegin                orderId   pCount 

--------------------------------------------

 1    2018-10-20 00:00:05   1         3

 10   2018-10-20 10:03:05   3         1

 12   2018-10-20 11:04:05   8         1

 20   2018-10-20 14:15:05   3         1

 37   2018-10-20 18:12:05   1         1

A simple GROUP BY orderID won't do the trick, so I came upwith the following

SELECT 

  MIN(c.id) AS id,

  MIN(c.tBegin) AS tBegin,

  c.orderId,

  COUNT(*) AS pCount

FROM (

    SELECT t2.id, t2.tBegin, t2.orderId,

      ( SELECT TOP 1 t.id

        FROM history t

        WHERE t.tBegin > t2.tBegin

          AND t.orderID <> t2.orderID

          AND <restrict date here further>

        ORDER BY t.tBegin 

       ) AS nextId

    FROM history t2 

) AS c

WHERE <restrict date here>

GROUP BY c.orderID, c.nextId

I left out the WHEREs that select the correct date and tester.
This works, but it seams very inefficient. I have worked with small databases, but I'm new to SQL Server 2017.

I appreciate your help very much!

edited 17 hours ago

Salman A

171k65328413

asked yesterday

John Bart

132

New contributor

add a comment |

up vote
2
down vote

favorite

I'm working on a large dataset (150k / day) of a tester database. Each row contains data about a specific test of the product. Each tester inserts the results of his test.

I want to do some measurements like pass-fail-rate over a shift per product and tester. The problem is there are no batch numbers assigned so I can't select this easy.

Considering the given subselect of the whole table:

 id   tBegin                orderId   

------------------------------------

 1    2018-10-20 00:00:05   1

 2    2018-10-20 00:05:15   1

 3    2018-10-20 01:00:05   1

 10   2018-10-20 10:03:05   3

 12   2018-10-20 11:04:05   8

 20   2018-10-20 14:15:05   3

 37   2018-10-20 18:12:05   1

My goal is it to cluster the data to the following

 id   tBegin                orderId   pCount 

--------------------------------------------

 1    2018-10-20 00:00:05   1         3

 10   2018-10-20 10:03:05   3         1

 12   2018-10-20 11:04:05   8         1

 20   2018-10-20 14:15:05   3         1

 37   2018-10-20 18:12:05   1         1

A simple GROUP BY orderID won't do the trick, so I came upwith the following

SELECT 

  MIN(c.id) AS id,

  MIN(c.tBegin) AS tBegin,

  c.orderId,

  COUNT(*) AS pCount

FROM (

    SELECT t2.id, t2.tBegin, t2.orderId,

      ( SELECT TOP 1 t.id

        FROM history t

        WHERE t.tBegin > t2.tBegin

          AND t.orderID <> t2.orderID

          AND <restrict date here further>

        ORDER BY t.tBegin 

       ) AS nextId

    FROM history t2 

) AS c

WHERE <restrict date here>

GROUP BY c.orderID, c.nextId

I left out the WHEREs that select the correct date and tester.
This works, but it seams very inefficient. I have worked with small databases, but I'm new to SQL Server 2017.

I appreciate your help very much!

edited 17 hours ago

Salman A

171k65328413

asked yesterday

John Bart

132

New contributor

I'm working on a large dataset (150k / day) of a tester database. Each row contains data about a specific test of the product. Each tester inserts the results of his test.

I want to do some measurements like pass-fail-rate over a shift per product and tester. The problem is there are no batch numbers assigned so I can't select this easy.

Considering the given subselect of the whole table:

 id   tBegin                orderId   

------------------------------------

 1    2018-10-20 00:00:05   1

 2    2018-10-20 00:05:15   1

 3    2018-10-20 01:00:05   1

 10   2018-10-20 10:03:05   3

 12   2018-10-20 11:04:05   8

 20   2018-10-20 14:15:05   3

 37   2018-10-20 18:12:05   1

My goal is it to cluster the data to the following

 id   tBegin                orderId   pCount 

--------------------------------------------

 1    2018-10-20 00:00:05   1         3

 10   2018-10-20 10:03:05   3         1

 12   2018-10-20 11:04:05   8         1

 20   2018-10-20 14:15:05   3         1

 37   2018-10-20 18:12:05   1         1

A simple GROUP BY orderID won't do the trick, so I came upwith the following

SELECT 

  MIN(c.id) AS id,

  MIN(c.tBegin) AS tBegin,

  c.orderId,

  COUNT(*) AS pCount

FROM (

    SELECT t2.id, t2.tBegin, t2.orderId,

      ( SELECT TOP 1 t.id

        FROM history t

        WHERE t.tBegin > t2.tBegin

          AND t.orderID <> t2.orderID

          AND <restrict date here further>

        ORDER BY t.tBegin 

       ) AS nextId

    FROM history t2 

) AS c

WHERE <restrict date here>

GROUP BY c.orderID, c.nextId

I left out the WHEREs that select the correct date and tester.
This works, but it seams very inefficient. I have worked with small databases, but I'm new to SQL Server 2017.

I appreciate your help very much!

sql sql-server date group-by sql-server-2017

edited 17 hours ago

Salman A

171k65328413

asked yesterday

John Bart

132

New contributor

edited 17 hours ago

Salman A

171k65328413

asked yesterday

John Bart

132

New contributor

edited 17 hours ago

Salman A

171k65328413

edited 17 hours ago

Salman A

171k65328413

edited 17 hours ago

Salman A

171k65328413

asked yesterday

John Bart

132

New contributor

asked yesterday

John Bart

132

asked yesterday

John Bart

132

New contributor

John Bart is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

2 Answers
2

active

oldest

votes

up vote
0
down vote

accepted

You can use window functions for this:

DECLARE @t TABLE (id INT, tBegin DATETIME, orderId INT);

INSERT INTO @t VALUES

(1 , '2018-10-20 00:00:05', 1),

(2 , '2018-10-20 00:05:15', 1),

(3 , '2018-10-20 01:00:05', 1),

(10, '2018-10-20 10:03:05', 3),

(12, '2018-10-20 11:04:05', 8),

(20, '2018-10-20 14:15:05', 3),

(37, '2018-10-20 18:12:05', 1);



WITH cte1 AS (

    SELECT *, CASE WHEN orderId = LAG(orderId) OVER (ORDER BY tBegin) THEN 0 ELSE 1 END AS chg

    FROM @t

), cte2 AS (

    SELECT *, SUM(chg) OVER(ORDER BY tBegin) AS grp

    FROM cte1

), cte3 AS (

    SELECT *, ROW_NUMBER() OVER (PARTITION BY grp ORDER BY tBegin) AS rn

    FROM cte2

)

SELECT *

FROM cte3

WHERE rn = 1

The first cte assigns a "change flag" to each row where the value changed

The second cte uses a running sum to convert 1s and 0s to a number which can be used to group rows

Finally you number rows within each group and select first row per group

Demo on DB Fiddle

edited yesterday

answered yesterday

Salman A

171k65328413

add a comment |

up vote
0
down vote

You can use cumulative approach :

select min(id) as id, max(tBegin), orderid, count(*) 

from (select h.*,

             row_number() over (order by id) as seq1,

             row_number() over (partition by orderid order by id) as seq2

      from history h

     ) h

group by orderid, (seq1 - seq2)

order by id;

answered yesterday

Yogesh Sharma

26.4k51334

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

John Bart is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53372583%2fsql-server-more-efficient-way-to-cluster-a-timeline-or-reconstruct-a-batch-numb%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
0
down vote

accepted

You can use window functions for this:

DECLARE @t TABLE (id INT, tBegin DATETIME, orderId INT);

INSERT INTO @t VALUES

(1 , '2018-10-20 00:00:05', 1),

(2 , '2018-10-20 00:05:15', 1),

(3 , '2018-10-20 01:00:05', 1),

(10, '2018-10-20 10:03:05', 3),

(12, '2018-10-20 11:04:05', 8),

(20, '2018-10-20 14:15:05', 3),

(37, '2018-10-20 18:12:05', 1);



WITH cte1 AS (

    SELECT *, CASE WHEN orderId = LAG(orderId) OVER (ORDER BY tBegin) THEN 0 ELSE 1 END AS chg

    FROM @t

), cte2 AS (

    SELECT *, SUM(chg) OVER(ORDER BY tBegin) AS grp

    FROM cte1

), cte3 AS (

    SELECT *, ROW_NUMBER() OVER (PARTITION BY grp ORDER BY tBegin) AS rn

    FROM cte2

)

SELECT *

FROM cte3

WHERE rn = 1

The first cte assigns a "change flag" to each row where the value changed

The second cte uses a running sum to convert 1s and 0s to a number which can be used to group rows

Finally you number rows within each group and select first row per group

Demo on DB Fiddle

edited yesterday

answered yesterday

Salman A

171k65328413

add a comment |

up vote
0
down vote

accepted

You can use window functions for this:

DECLARE @t TABLE (id INT, tBegin DATETIME, orderId INT);

INSERT INTO @t VALUES

(1 , '2018-10-20 00:00:05', 1),

(2 , '2018-10-20 00:05:15', 1),

(3 , '2018-10-20 01:00:05', 1),

(10, '2018-10-20 10:03:05', 3),

(12, '2018-10-20 11:04:05', 8),

(20, '2018-10-20 14:15:05', 3),

(37, '2018-10-20 18:12:05', 1);



WITH cte1 AS (

    SELECT *, CASE WHEN orderId = LAG(orderId) OVER (ORDER BY tBegin) THEN 0 ELSE 1 END AS chg

    FROM @t

), cte2 AS (

    SELECT *, SUM(chg) OVER(ORDER BY tBegin) AS grp

    FROM cte1

), cte3 AS (

    SELECT *, ROW_NUMBER() OVER (PARTITION BY grp ORDER BY tBegin) AS rn

    FROM cte2

)

SELECT *

FROM cte3

WHERE rn = 1

The first cte assigns a "change flag" to each row where the value changed

The second cte uses a running sum to convert 1s and 0s to a number which can be used to group rows

Finally you number rows within each group and select first row per group

Demo on DB Fiddle

edited yesterday

answered yesterday

Salman A

171k65328413

add a comment |

up vote
0
down vote

accepted

You can use window functions for this:

DECLARE @t TABLE (id INT, tBegin DATETIME, orderId INT);

INSERT INTO @t VALUES

(1 , '2018-10-20 00:00:05', 1),

(2 , '2018-10-20 00:05:15', 1),

(3 , '2018-10-20 01:00:05', 1),

(10, '2018-10-20 10:03:05', 3),

(12, '2018-10-20 11:04:05', 8),

(20, '2018-10-20 14:15:05', 3),

(37, '2018-10-20 18:12:05', 1);



WITH cte1 AS (

    SELECT *, CASE WHEN orderId = LAG(orderId) OVER (ORDER BY tBegin) THEN 0 ELSE 1 END AS chg

    FROM @t

), cte2 AS (

    SELECT *, SUM(chg) OVER(ORDER BY tBegin) AS grp

    FROM cte1

), cte3 AS (

    SELECT *, ROW_NUMBER() OVER (PARTITION BY grp ORDER BY tBegin) AS rn

    FROM cte2

)

SELECT *

FROM cte3

WHERE rn = 1

The first cte assigns a "change flag" to each row where the value changed

The second cte uses a running sum to convert 1s and 0s to a number which can be used to group rows

Finally you number rows within each group and select first row per group

Demo on DB Fiddle

edited yesterday

answered yesterday

Salman A

171k65328413

You can use window functions for this:

DECLARE @t TABLE (id INT, tBegin DATETIME, orderId INT);

INSERT INTO @t VALUES

(1 , '2018-10-20 00:00:05', 1),

(2 , '2018-10-20 00:05:15', 1),

(3 , '2018-10-20 01:00:05', 1),

(10, '2018-10-20 10:03:05', 3),

(12, '2018-10-20 11:04:05', 8),

(20, '2018-10-20 14:15:05', 3),

(37, '2018-10-20 18:12:05', 1);



WITH cte1 AS (

    SELECT *, CASE WHEN orderId = LAG(orderId) OVER (ORDER BY tBegin) THEN 0 ELSE 1 END AS chg

    FROM @t

), cte2 AS (

    SELECT *, SUM(chg) OVER(ORDER BY tBegin) AS grp

    FROM cte1

), cte3 AS (

    SELECT *, ROW_NUMBER() OVER (PARTITION BY grp ORDER BY tBegin) AS rn

    FROM cte2

)

SELECT *

FROM cte3

WHERE rn = 1

The first cte assigns a "change flag" to each row where the value changed

The second cte uses a running sum to convert 1s and 0s to a number which can be used to group rows

Finally you number rows within each group and select first row per group

Demo on DB Fiddle

edited yesterday

answered yesterday

Salman A

171k65328413

edited yesterday

answered yesterday

Salman A

171k65328413

answered yesterday

Salman A

171k65328413

answered yesterday

Salman A

171k65328413

add a comment |

up vote
0
down vote

You can use cumulative approach :

select min(id) as id, max(tBegin), orderid, count(*) 

from (select h.*,

             row_number() over (order by id) as seq1,

             row_number() over (partition by orderid order by id) as seq2

      from history h

     ) h

group by orderid, (seq1 - seq2)

order by id;

answered yesterday

Yogesh Sharma

26.4k51334

add a comment |

up vote
0
down vote

You can use cumulative approach :

select min(id) as id, max(tBegin), orderid, count(*) 

from (select h.*,

             row_number() over (order by id) as seq1,

             row_number() over (partition by orderid order by id) as seq2

      from history h

     ) h

group by orderid, (seq1 - seq2)

order by id;

answered yesterday

Yogesh Sharma

26.4k51334

add a comment |

up vote
0
down vote

You can use cumulative approach :

select min(id) as id, max(tBegin), orderid, count(*) 

from (select h.*,

             row_number() over (order by id) as seq1,

             row_number() over (partition by orderid order by id) as seq2

      from history h

     ) h

group by orderid, (seq1 - seq2)

order by id;

answered yesterday

Yogesh Sharma

26.4k51334

You can use cumulative approach :

select min(id) as id, max(tBegin), orderid, count(*) 

from (select h.*,

             row_number() over (order by id) as seq1,

             row_number() over (partition by orderid order by id) as seq2

      from history h

     ) h

group by orderid, (seq1 - seq2)

order by id;

answered yesterday

Yogesh Sharma

26.4k51334

answered yesterday

Yogesh Sharma

26.4k51334

answered yesterday

Yogesh Sharma

26.4k51334

answered yesterday

Yogesh Sharma

26.4k51334

add a comment |

John Bart is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

John Bart is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu