SQL Server: more efficient way to cluster a timeline OR reconstruct a batch number
up vote
2
down vote
favorite
I'm working on a large dataset (150k / day) of a tester database. Each row contains data about a specific test of the product. Each tester inserts the results of his test.
I want to do some measurements like pass-fail-rate over a shift per product and tester. The problem is there are no batch numbers assigned so I can't select this easy.
Considering the given subselect of the whole table:
id tBegin orderId
------------------------------------
1 2018-10-20 00:00:05 1
2 2018-10-20 00:05:15 1
3 2018-10-20 01:00:05 1
10 2018-10-20 10:03:05 3
12 2018-10-20 11:04:05 8
20 2018-10-20 14:15:05 3
37 2018-10-20 18:12:05 1
My goal is it to cluster the data to the following
id tBegin orderId pCount
--------------------------------------------
1 2018-10-20 00:00:05 1 3
10 2018-10-20 10:03:05 3 1
12 2018-10-20 11:04:05 8 1
20 2018-10-20 14:15:05 3 1
37 2018-10-20 18:12:05 1 1
A simple GROUP BY orderID
won't do the trick, so I came upwith the following
SELECT
MIN(c.id) AS id,
MIN(c.tBegin) AS tBegin,
c.orderId,
COUNT(*) AS pCount
FROM (
SELECT t2.id, t2.tBegin, t2.orderId,
( SELECT TOP 1 t.id
FROM history t
WHERE t.tBegin > t2.tBegin
AND t.orderID <> t2.orderID
AND <restrict date here further>
ORDER BY t.tBegin
) AS nextId
FROM history t2
) AS c
WHERE <restrict date here>
GROUP BY c.orderID, c.nextId
I left out the WHERE
s that select the correct date and tester.
This works, but it seams very inefficient. I have worked with small databases, but I'm new to SQL Server 2017.
I appreciate your help very much!
sql sql-server date group-by sql-server-2017
New contributor
add a comment |
up vote
2
down vote
favorite
I'm working on a large dataset (150k / day) of a tester database. Each row contains data about a specific test of the product. Each tester inserts the results of his test.
I want to do some measurements like pass-fail-rate over a shift per product and tester. The problem is there are no batch numbers assigned so I can't select this easy.
Considering the given subselect of the whole table:
id tBegin orderId
------------------------------------
1 2018-10-20 00:00:05 1
2 2018-10-20 00:05:15 1
3 2018-10-20 01:00:05 1
10 2018-10-20 10:03:05 3
12 2018-10-20 11:04:05 8
20 2018-10-20 14:15:05 3
37 2018-10-20 18:12:05 1
My goal is it to cluster the data to the following
id tBegin orderId pCount
--------------------------------------------
1 2018-10-20 00:00:05 1 3
10 2018-10-20 10:03:05 3 1
12 2018-10-20 11:04:05 8 1
20 2018-10-20 14:15:05 3 1
37 2018-10-20 18:12:05 1 1
A simple GROUP BY orderID
won't do the trick, so I came upwith the following
SELECT
MIN(c.id) AS id,
MIN(c.tBegin) AS tBegin,
c.orderId,
COUNT(*) AS pCount
FROM (
SELECT t2.id, t2.tBegin, t2.orderId,
( SELECT TOP 1 t.id
FROM history t
WHERE t.tBegin > t2.tBegin
AND t.orderID <> t2.orderID
AND <restrict date here further>
ORDER BY t.tBegin
) AS nextId
FROM history t2
) AS c
WHERE <restrict date here>
GROUP BY c.orderID, c.nextId
I left out the WHERE
s that select the correct date and tester.
This works, but it seams very inefficient. I have worked with small databases, but I'm new to SQL Server 2017.
I appreciate your help very much!
sql sql-server date group-by sql-server-2017
New contributor
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I'm working on a large dataset (150k / day) of a tester database. Each row contains data about a specific test of the product. Each tester inserts the results of his test.
I want to do some measurements like pass-fail-rate over a shift per product and tester. The problem is there are no batch numbers assigned so I can't select this easy.
Considering the given subselect of the whole table:
id tBegin orderId
------------------------------------
1 2018-10-20 00:00:05 1
2 2018-10-20 00:05:15 1
3 2018-10-20 01:00:05 1
10 2018-10-20 10:03:05 3
12 2018-10-20 11:04:05 8
20 2018-10-20 14:15:05 3
37 2018-10-20 18:12:05 1
My goal is it to cluster the data to the following
id tBegin orderId pCount
--------------------------------------------
1 2018-10-20 00:00:05 1 3
10 2018-10-20 10:03:05 3 1
12 2018-10-20 11:04:05 8 1
20 2018-10-20 14:15:05 3 1
37 2018-10-20 18:12:05 1 1
A simple GROUP BY orderID
won't do the trick, so I came upwith the following
SELECT
MIN(c.id) AS id,
MIN(c.tBegin) AS tBegin,
c.orderId,
COUNT(*) AS pCount
FROM (
SELECT t2.id, t2.tBegin, t2.orderId,
( SELECT TOP 1 t.id
FROM history t
WHERE t.tBegin > t2.tBegin
AND t.orderID <> t2.orderID
AND <restrict date here further>
ORDER BY t.tBegin
) AS nextId
FROM history t2
) AS c
WHERE <restrict date here>
GROUP BY c.orderID, c.nextId
I left out the WHERE
s that select the correct date and tester.
This works, but it seams very inefficient. I have worked with small databases, but I'm new to SQL Server 2017.
I appreciate your help very much!
sql sql-server date group-by sql-server-2017
New contributor
I'm working on a large dataset (150k / day) of a tester database. Each row contains data about a specific test of the product. Each tester inserts the results of his test.
I want to do some measurements like pass-fail-rate over a shift per product and tester. The problem is there are no batch numbers assigned so I can't select this easy.
Considering the given subselect of the whole table:
id tBegin orderId
------------------------------------
1 2018-10-20 00:00:05 1
2 2018-10-20 00:05:15 1
3 2018-10-20 01:00:05 1
10 2018-10-20 10:03:05 3
12 2018-10-20 11:04:05 8
20 2018-10-20 14:15:05 3
37 2018-10-20 18:12:05 1
My goal is it to cluster the data to the following
id tBegin orderId pCount
--------------------------------------------
1 2018-10-20 00:00:05 1 3
10 2018-10-20 10:03:05 3 1
12 2018-10-20 11:04:05 8 1
20 2018-10-20 14:15:05 3 1
37 2018-10-20 18:12:05 1 1
A simple GROUP BY orderID
won't do the trick, so I came upwith the following
SELECT
MIN(c.id) AS id,
MIN(c.tBegin) AS tBegin,
c.orderId,
COUNT(*) AS pCount
FROM (
SELECT t2.id, t2.tBegin, t2.orderId,
( SELECT TOP 1 t.id
FROM history t
WHERE t.tBegin > t2.tBegin
AND t.orderID <> t2.orderID
AND <restrict date here further>
ORDER BY t.tBegin
) AS nextId
FROM history t2
) AS c
WHERE <restrict date here>
GROUP BY c.orderID, c.nextId
I left out the WHERE
s that select the correct date and tester.
This works, but it seams very inefficient. I have worked with small databases, but I'm new to SQL Server 2017.
I appreciate your help very much!
sql sql-server date group-by sql-server-2017
sql sql-server date group-by sql-server-2017
New contributor
New contributor
edited 17 hours ago
Salman A
171k65328413
171k65328413
New contributor
asked yesterday
John Bart
132
132
New contributor
New contributor
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
up vote
0
down vote
accepted
You can use window functions for this:
DECLARE @t TABLE (id INT, tBegin DATETIME, orderId INT);
INSERT INTO @t VALUES
(1 , '2018-10-20 00:00:05', 1),
(2 , '2018-10-20 00:05:15', 1),
(3 , '2018-10-20 01:00:05', 1),
(10, '2018-10-20 10:03:05', 3),
(12, '2018-10-20 11:04:05', 8),
(20, '2018-10-20 14:15:05', 3),
(37, '2018-10-20 18:12:05', 1);
WITH cte1 AS (
SELECT *, CASE WHEN orderId = LAG(orderId) OVER (ORDER BY tBegin) THEN 0 ELSE 1 END AS chg
FROM @t
), cte2 AS (
SELECT *, SUM(chg) OVER(ORDER BY tBegin) AS grp
FROM cte1
), cte3 AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY grp ORDER BY tBegin) AS rn
FROM cte2
)
SELECT *
FROM cte3
WHERE rn = 1
- The first cte assigns a "change flag" to each row where the value changed
- The second cte uses a running sum to convert 1s and 0s to a number which can be used to group rows
- Finally you number rows within each group and select first row per group
Demo on DB Fiddle
add a comment |
up vote
0
down vote
You can use cumulative approach :
select min(id) as id, max(tBegin), orderid, count(*)
from (select h.*,
row_number() over (order by id) as seq1,
row_number() over (partition by orderid order by id) as seq2
from history h
) h
group by orderid, (seq1 - seq2)
order by id;
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
accepted
You can use window functions for this:
DECLARE @t TABLE (id INT, tBegin DATETIME, orderId INT);
INSERT INTO @t VALUES
(1 , '2018-10-20 00:00:05', 1),
(2 , '2018-10-20 00:05:15', 1),
(3 , '2018-10-20 01:00:05', 1),
(10, '2018-10-20 10:03:05', 3),
(12, '2018-10-20 11:04:05', 8),
(20, '2018-10-20 14:15:05', 3),
(37, '2018-10-20 18:12:05', 1);
WITH cte1 AS (
SELECT *, CASE WHEN orderId = LAG(orderId) OVER (ORDER BY tBegin) THEN 0 ELSE 1 END AS chg
FROM @t
), cte2 AS (
SELECT *, SUM(chg) OVER(ORDER BY tBegin) AS grp
FROM cte1
), cte3 AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY grp ORDER BY tBegin) AS rn
FROM cte2
)
SELECT *
FROM cte3
WHERE rn = 1
- The first cte assigns a "change flag" to each row where the value changed
- The second cte uses a running sum to convert 1s and 0s to a number which can be used to group rows
- Finally you number rows within each group and select first row per group
Demo on DB Fiddle
add a comment |
up vote
0
down vote
accepted
You can use window functions for this:
DECLARE @t TABLE (id INT, tBegin DATETIME, orderId INT);
INSERT INTO @t VALUES
(1 , '2018-10-20 00:00:05', 1),
(2 , '2018-10-20 00:05:15', 1),
(3 , '2018-10-20 01:00:05', 1),
(10, '2018-10-20 10:03:05', 3),
(12, '2018-10-20 11:04:05', 8),
(20, '2018-10-20 14:15:05', 3),
(37, '2018-10-20 18:12:05', 1);
WITH cte1 AS (
SELECT *, CASE WHEN orderId = LAG(orderId) OVER (ORDER BY tBegin) THEN 0 ELSE 1 END AS chg
FROM @t
), cte2 AS (
SELECT *, SUM(chg) OVER(ORDER BY tBegin) AS grp
FROM cte1
), cte3 AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY grp ORDER BY tBegin) AS rn
FROM cte2
)
SELECT *
FROM cte3
WHERE rn = 1
- The first cte assigns a "change flag" to each row where the value changed
- The second cte uses a running sum to convert 1s and 0s to a number which can be used to group rows
- Finally you number rows within each group and select first row per group
Demo on DB Fiddle
add a comment |
up vote
0
down vote
accepted
up vote
0
down vote
accepted
You can use window functions for this:
DECLARE @t TABLE (id INT, tBegin DATETIME, orderId INT);
INSERT INTO @t VALUES
(1 , '2018-10-20 00:00:05', 1),
(2 , '2018-10-20 00:05:15', 1),
(3 , '2018-10-20 01:00:05', 1),
(10, '2018-10-20 10:03:05', 3),
(12, '2018-10-20 11:04:05', 8),
(20, '2018-10-20 14:15:05', 3),
(37, '2018-10-20 18:12:05', 1);
WITH cte1 AS (
SELECT *, CASE WHEN orderId = LAG(orderId) OVER (ORDER BY tBegin) THEN 0 ELSE 1 END AS chg
FROM @t
), cte2 AS (
SELECT *, SUM(chg) OVER(ORDER BY tBegin) AS grp
FROM cte1
), cte3 AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY grp ORDER BY tBegin) AS rn
FROM cte2
)
SELECT *
FROM cte3
WHERE rn = 1
- The first cte assigns a "change flag" to each row where the value changed
- The second cte uses a running sum to convert 1s and 0s to a number which can be used to group rows
- Finally you number rows within each group and select first row per group
Demo on DB Fiddle
You can use window functions for this:
DECLARE @t TABLE (id INT, tBegin DATETIME, orderId INT);
INSERT INTO @t VALUES
(1 , '2018-10-20 00:00:05', 1),
(2 , '2018-10-20 00:05:15', 1),
(3 , '2018-10-20 01:00:05', 1),
(10, '2018-10-20 10:03:05', 3),
(12, '2018-10-20 11:04:05', 8),
(20, '2018-10-20 14:15:05', 3),
(37, '2018-10-20 18:12:05', 1);
WITH cte1 AS (
SELECT *, CASE WHEN orderId = LAG(orderId) OVER (ORDER BY tBegin) THEN 0 ELSE 1 END AS chg
FROM @t
), cte2 AS (
SELECT *, SUM(chg) OVER(ORDER BY tBegin) AS grp
FROM cte1
), cte3 AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY grp ORDER BY tBegin) AS rn
FROM cte2
)
SELECT *
FROM cte3
WHERE rn = 1
- The first cte assigns a "change flag" to each row where the value changed
- The second cte uses a running sum to convert 1s and 0s to a number which can be used to group rows
- Finally you number rows within each group and select first row per group
Demo on DB Fiddle
edited yesterday
answered yesterday
Salman A
171k65328413
171k65328413
add a comment |
add a comment |
up vote
0
down vote
You can use cumulative approach :
select min(id) as id, max(tBegin), orderid, count(*)
from (select h.*,
row_number() over (order by id) as seq1,
row_number() over (partition by orderid order by id) as seq2
from history h
) h
group by orderid, (seq1 - seq2)
order by id;
add a comment |
up vote
0
down vote
You can use cumulative approach :
select min(id) as id, max(tBegin), orderid, count(*)
from (select h.*,
row_number() over (order by id) as seq1,
row_number() over (partition by orderid order by id) as seq2
from history h
) h
group by orderid, (seq1 - seq2)
order by id;
add a comment |
up vote
0
down vote
up vote
0
down vote
You can use cumulative approach :
select min(id) as id, max(tBegin), orderid, count(*)
from (select h.*,
row_number() over (order by id) as seq1,
row_number() over (partition by orderid order by id) as seq2
from history h
) h
group by orderid, (seq1 - seq2)
order by id;
You can use cumulative approach :
select min(id) as id, max(tBegin), orderid, count(*)
from (select h.*,
row_number() over (order by id) as seq1,
row_number() over (partition by orderid order by id) as seq2
from history h
) h
group by orderid, (seq1 - seq2)
order by id;
answered yesterday
Yogesh Sharma
26.4k51334
26.4k51334
add a comment |
add a comment |
John Bart is a new contributor. Be nice, and check out our Code of Conduct.
John Bart is a new contributor. Be nice, and check out our Code of Conduct.
John Bart is a new contributor. Be nice, and check out our Code of Conduct.
John Bart is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53372583%2fsql-server-more-efficient-way-to-cluster-a-timeline-or-reconstruct-a-batch-numb%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown