Ufyukyu

Question

There is a table messages that contains data as shown below:

Id   Name   Other_Columns

-------------------------

1    A       A_data_1

2    A       A_data_2

3    A       A_data_3

4    B       B_data_1

5    B       B_data_2

6    C       C_data_1

If I run a query select * from messages group by name, I will get the result as:

1    A       A_data_1

4    B       B_data_1

6    C       C_data_1

What query will return the following result?

3    A       A_data_3

5    B       B_data_2

6    C       C_data_1

That is, the last record in each group should be returned.

At present, this is the query that I use:

SELECT

  *

FROM (SELECT

  *

FROM messages

ORDER BY id DESC) AS x

GROUP BY name

But this looks highly inefficient. Any other ways to achieve the same result?

see accepted answer in stackoverflow.com/questions/1379565/… for a more efficient solution — Jun 25 '12 at 12:45
Why can't you just add DESC, i.e. select * from messages group by name DESC — Dec 3 '15 at 6:41
Possible duplicate of How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL? — Jun 12 '16 at 22:19
@KimPrince It seems like the answer you are suggesting doesn't do what is expected! I just tried your method and it took FIRST row for each group and ordered DESC. It does NOT take the last row of each group — May 22 '17 at 15:34

score 750 · Accepted Answer · 2017-12-26 20:38:20Z

MySQL 8.0 now supports windowing functions, like almost all popular SQL implementations. With this standard syntax, we can write greatest-n-per-group queries:

WITH ranked_messages AS (

  SELECT m.*, ROW_NUMBER() OVER (PARTITION BY name ORDER BY id DESC) AS rn

  FROM messages AS m

)

SELECT * FROM ranked_messages WHERE rn = 1;

Below is the original answer I wrote for this question in 2009:

I write the solution this way:

SELECT m1.*

FROM messages m1 LEFT JOIN messages m2

 ON (m1.name = m2.name AND m1.id < m2.id)

WHERE m2.id IS NULL;

Regarding performance, one solution or the other can be better, depending on the nature of your data. So you should test both queries and use the one that is better at performance given your database.

For example, I have a copy of the StackOverflow August data dump. I'll use that for benchmarking. There are 1,114,357 rows in the Posts table. This is running on MySQL 5.0.75 on my Macbook Pro 2.40GHz.

I'll write a query to find the most recent post for a given user ID (mine).

First using the technique shown by @Eric with the GROUP BY in a subquery:

SELECT p1.postid

FROM Posts p1

INNER JOIN (SELECT pi.owneruserid, MAX(pi.postid) AS maxpostid

            FROM Posts pi GROUP BY pi.owneruserid) p2

  ON (p1.postid = p2.maxpostid)

WHERE p1.owneruserid = 20860;



1 row in set (1 min 17.89 sec)

Even the EXPLAIN analysis takes over 16 seconds:

+----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+

| id | select_type | table      | type   | possible_keys              | key         | key_len | ref          | rows    | Extra       |

+----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+

|  1 | PRIMARY     | <derived2> | ALL    | NULL                       | NULL        | NULL    | NULL         |   76756 |             | 

|  1 | PRIMARY     | p1         | eq_ref | PRIMARY,PostId,OwnerUserId | PRIMARY     | 8       | p2.maxpostid |       1 | Using where | 

|  2 | DERIVED     | pi         | index  | NULL                       | OwnerUserId | 8       | NULL         | 1151268 | Using index | 

+----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+

3 rows in set (16.09 sec)

Now produce the same query result using my technique with LEFT JOIN:

SELECT p1.postid

FROM Posts p1 LEFT JOIN posts p2

  ON (p1.owneruserid = p2.owneruserid AND p1.postid < p2.postid)

WHERE p2.postid IS NULL AND p1.owneruserid = 20860;



1 row in set (0.28 sec)

The EXPLAIN analysis shows that both tables are able to use their indexes:

+----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+

| id | select_type | table | type | possible_keys              | key         | key_len | ref   | rows | Extra                                |

+----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+

|  1 | SIMPLE      | p1    | ref  | OwnerUserId                | OwnerUserId | 8       | const | 1384 | Using index                          | 

|  1 | SIMPLE      | p2    | ref  | PRIMARY,PostId,OwnerUserId | OwnerUserId | 8       | const | 1384 | Using where; Using index; Not exists | 

+----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+

2 rows in set (0.00 sec)

Here's the DDL for my Posts table:

CREATE TABLE `posts` (

  `PostId` bigint(20) unsigned NOT NULL auto_increment,

  `PostTypeId` bigint(20) unsigned NOT NULL,

  `AcceptedAnswerId` bigint(20) unsigned default NULL,

  `ParentId` bigint(20) unsigned default NULL,

  `CreationDate` datetime NOT NULL,

  `Score` int(11) NOT NULL default '0',

  `ViewCount` int(11) NOT NULL default '0',

  `Body` text NOT NULL,

  `OwnerUserId` bigint(20) unsigned NOT NULL,

  `OwnerDisplayName` varchar(40) default NULL,

  `LastEditorUserId` bigint(20) unsigned default NULL,

  `LastEditDate` datetime default NULL,

  `LastActivityDate` datetime default NULL,

  `Title` varchar(250) NOT NULL default '',

  `Tags` varchar(150) NOT NULL default '',

  `AnswerCount` int(11) NOT NULL default '0',

  `CommentCount` int(11) NOT NULL default '0',

  `FavoriteCount` int(11) NOT NULL default '0',

  `ClosedDate` datetime default NULL,

  PRIMARY KEY  (`PostId`),

  UNIQUE KEY `PostId` (`PostId`),

  KEY `PostTypeId` (`PostTypeId`),

  KEY `AcceptedAnswerId` (`AcceptedAnswerId`),

  KEY `OwnerUserId` (`OwnerUserId`),

  KEY `LastEditorUserId` (`LastEditorUserId`),

  KEY `ParentId` (`ParentId`),

  CONSTRAINT `posts_ibfk_1` FOREIGN KEY (`PostTypeId`) REFERENCES `posttypes` (`PostTypeId`)

) ENGINE=InnoDB;

Really? What happens if you have a ton of entries? For example, if you're working w/ an in-house version control, say, and you have a ton of versions per file, that join result would be massive. Have you ever benchmarked the subquery method with this one? I'm pretty curious to know which would win, but not curious enough to not ask you first. — Aug 21 '09 at 18:19
Did some testing. On a small table (~300k records, ~190k groups, so not massive groups or anything), the queries tied (8 seconds each). — Aug 21 '09 at 18:44
@BillKarwin: See meta.stackexchange.com/questions/123017, especially the comments below Adam Rackis' answer. Let me know if you want to reclaim your answer on the new question. — Feb 21 '12 at 18:06
@Tim, no, <= will not help if you have a non-unique column. You must use a unique column as a tiebreaker. — Jul 3 '15 at 7:13
The performance degrades exponentially as the number of rows increases or when groups become larger. For example a group consisting of 5 dates will yield 4+3+2+1+1 = 11 rows via left join out of which one row is filtered in the end. Performance of joining with grouped results is almost linear. Your tests look flawed. — Oct 16 '15 at 12:12

score 125 · Answer 2 · 2017-03-31 15:08:26Z

UPD: 2017-03-31, the version 5.7.5 of MySQL made the ONLY_FULL_GROUP_BY switch enabled by default (hence, non-deterministic GROUP BY queries became disabled). Moreover, they updated the GROUP BY implementation and the solution might not work as expected anymore even with the disabled switch. One needs to check.

Bill Karwin's solution above works fine when item count within groups is rather small, but the performance of the query becomes bad when the groups are rather large, since the solution requires about n*n/2 + n/2 of only IS NULL comparisons.

I made my tests on a InnoDB table of 18684446 rows with 1182 groups. The table contains testresults for functional tests and has the (test_id, request_id) as the primary key. Thus, test_id is a group and I was searching for the last request_id for each test_id.

Bill's solution has already been running for several hours on my dell e4310 and I do not know when it is going to finish even though it operates on a coverage index (hence using index in EXPLAIN).

I have a couple of other solutions that are based on the same ideas:

if the underlying index is BTREE index (which is usually the case), the largest (group_id, item_value) pair is the last value within each group_id, that is the first for each group_id if we walk through the index in descending order;

if we read the values which are covered by an index, the values are read in the order of the index;

each index implicitly contains primary key columns appended to that (that is the primary key is in the coverage index). In solutions below I operate directly on the primary key, in you case, you will just need to add primary key columns in the result.

in many cases it is much cheaper to collect the required row ids in the required order in a subquery and join the result of the subquery on the id. Since for each row in the subquery result MySQL will need a single fetch based on primary key, the subquery will be put first in the join and the rows will be output in the order of the ids in the subquery (if we omit explicit ORDER BY for the join)

3 ways MySQL uses indexes is a great article to understand some details.

Solution 1

This one is incredibly fast, it takes about 0,8 secs on my 18M+ rows:

SELECT test_id, MAX(request_id), request_id

FROM testresults

GROUP BY test_id DESC;

If you want to change the order to ASC, put it in a subquery, return the ids only and use that as the subquery to join to the rest of the columns:

SELECT test_id, request_id

FROM (

    SELECT test_id, MAX(request_id), request_id

    FROM testresults

    GROUP BY test_id DESC) as ids

ORDER BY test_id;

This one takes about 1,2 secs on my data.

Solution 2

Here is another solution that takes about 19 seconds for my table:

SELECT test_id, request_id

FROM testresults, (SELECT @group:=NULL) as init

WHERE IF(IFNULL(@group, -1)=@group:=test_id, 0, 1)

ORDER BY test_id DESC, request_id DESC

It returns tests in descending order as well. It is much slower since it does a full index scan but it is here to give you an idea how to output N max rows for each group.

The disadvantage of the query is that its result cannot be cached by the query cache.

Please link to a dump of your tables so that people can test it on their platforms. — Feb 3 '15 at 3:44
Solution 1 can't work, you can't select request_id without having that in group by clause, — Mar 9 '17 at 9:57
@giò, this is answer is 5 years old. Until MySQL 5.7.5 ONLY_FULL_GROUP_BY was disabled by default and this solution worked out of the box dev.mysql.com/doc/relnotes/mysql/5.7/en/…. Now I'm not sure if the solution still works when you disable the mode, because the implementation of the GROUP BY has been changed. — Mar 31 '17 at 14:58
If you wanted ASC in the first solution, would it work if you turn MAX to MIN? — May 9 '17 at 15:45

score 82 · Answer 3 · 2009-08-21 17:14:13Z

up vote
82
down vote

Use your subquery to return the correct grouping, because you're halfway there.

Try this:

select

    a.*

from

    messages a

    inner join 

        (select name, max(id) as maxid from messages group by name) as b on

        a.id = b.maxid

If it's not id you want the max of:

select

    a.*

from

    messages a

    inner join 

        (select name, max(other_col) as other_col 

         from messages group by name) as b on

        a.name = b.name

        and a.other_col = b.other_col

This way, you avoid correlated subqueries and/or ordering in your subqueries, which tend to be very slow/inefficient.

edited Aug 21 '09 at 17:14

answered Aug 21 '09 at 17:06

Eric

69.4k9100109

1

Note a caveat for the solution with other_col: if that column is not unique you may get multiple records back with the same name, if they tie for max(other_col). I found this post that describes a solution for my needs, where I need exactly one record per name.
– Eric Simonton
Aug 21 '15 at 13:48

In some situations you can only use this solution but ont the accepted one.
– tom10271
Sep 4 '15 at 2:59

In my experience, it is grouping the whole damn messages table that tends to be slow/inefficient! In other words, note that the subquery requires a full table scan, and does a grouping on that to boot... unless your optimizer is doing something that mine is not. So this solution depends heavily on holding the entire table in memory.
– Timo
Apr 30 at 14:56

add a comment |

JYelton 25.1k20105171 · Answer 4 · 2012-02-20 21:46:38Z

up vote
35
down vote

I arrived at a different solution, which is to get the IDs for the last post within each group, then select from the messages table using the result from the first query as the argument for a WHERE x IN construct:

SELECT id, name, other_columns

FROM messages

WHERE id IN (

    SELECT MAX(id)

    FROM messages

    GROUP BY name

);

I don't know how this performs compared to some of the other solutions, but it worked spectacularly for my table with 3+ million rows. (4 second execution with 1200+ results)

This should work both on MySQL and SQL Server.

answered Feb 20 '12 at 21:46

JYelton

25.1k20105171

Just make sure you have an index on (name, id).
– Samuel Åslund
Apr 22 '16 at 11:58

1

Much better that self joins
– anwerj
Dec 23 '16 at 7:40

I learned something from you that is a good job and this query is faster
– Humphrey
Feb 23 at 7:48

add a comment |

Vipin 2,18221842 · Answer 5 · 2013-12-25 08:36:42Z

up vote
25
down vote

Solution by sub query fiddle Link

select * from messages where id in

(select max(id) from messages group by Name)

Solution By join condition fiddle link

select m1.* from messages m1 

left outer join messages m2 

on ( m1.id<m2.id and m1.name=m2.name )

where m2.id is null

Reason for this post is to give fiddle link only.
Same SQL is already provided in other answers.

answered Dec 25 '13 at 8:36

Vipin

2,18221842

What's the point of the 'fiddle' if you can't run it?
– Alexander Suraphel
Jul 4 at 9:41

@AlexanderSuraphel mysql5.5 is not available in fiddle now, fiddle link was created using that. Now a days fiddle supports mysql5.6, i changed database to mysql 5.6 and i am able to build schema and run the sql.
– Vipin
Jul 4 at 17:21

add a comment |

Shai 68.2k22133238 · Answer 6 · 2013-02-14 07:07:11Z

up vote
7
down vote

I've not yet tested with large DB but I think this could be faster than joining tables:

SELECT *, Max(Id) FROM messages GROUP BY Name

edited Feb 14 '13 at 7:07

Shai

68.2k22133238

answered Mar 31 '12 at 14:44

user942821

4

This returns arbitrary data. In other words there returned columns might not be from the record with MAX(Id).
– harm
Jul 3 '14 at 15:05

Useful to select the max Id from a set of record with WHERE condition : "SELECT Max(Id) FROM Prod WHERE Pn='" + Pn + "'" It returns the max Id from a set of records with same Pn.In c# use reader.GetString(0) to get the result
– Nicola
Apr 8 '15 at 9:24

add a comment |

Steve Kass 6,1951321 · Answer 7 · 2009-08-21 17:26:12Z

Here are two suggestions. First, if mysql supports ROW_NUMBER(), it's very simple:

WITH Ranked AS (

  SELECT Id, Name, OtherColumns,

    ROW_NUMBER() OVER (

      PARTITION BY Name

      ORDER BY Id DESC

    ) AS rk

  FROM messages

)

  SELECT Id, Name, OtherColumns

  FROM messages

  WHERE rk = 1;

I'm assuming by "last" you mean last in Id order. If not, change the ORDER BY clause of the ROW_NUMBER() window accordingly. If ROW_NUMBER() isn't available, this is another solution:

Second, if it doesn't, this is often a good way to proceed:

SELECT

  Id, Name, OtherColumns

FROM messages

WHERE NOT EXISTS (

  SELECT * FROM messages as M2

  WHERE M2.Name = messages.Name

  AND M2.Id > messages.Id

)

In other words, select messages where there is no later-Id message with the same Name.

MySQL doesn't support ROW_NUMBER() or CTE's.
– Bill Karwin
Aug 21 '09 at 17:37 — Aug 21 '09 at 17:37

Paul Roub 32.6k85773 · Answer 8 · 2017-06-08 19:03:49Z

up vote
5
down vote

Here is my solution:

SELECT 

  DISTINCT NAME,

  MAX(MESSAGES) OVER(PARTITION BY NAME) MESSAGES 

FROM MESSAGE;

edited Jun 8 '17 at 19:03

Paul Roub

32.6k85773

answered Jun 8 '17 at 18:49

Abhishek Yadav

6213

add a comment |

score 4 · Answer 9 · 2014-03-30 06:01:52Z

Here is another way to get the last related record using GROUP_CONCAT with order by and SUBSTRING_INDEX to pick one of the record from the list

SELECT 

  `Id`,

  `Name`,

  SUBSTRING_INDEX(

    GROUP_CONCAT(

      `Other_Columns` 

      ORDER BY `Id` DESC 

      SEPARATOR '||'

    ),

    '||',

    1

  ) Other_Columns 

FROM

  messages 

GROUP BY `Name`

Above query will group the all the Other_Columns that are in same Name group and using ORDER BY id DESC will join all the Other_Columns in a specific group in descending order with the provided separator in my case i have used || ,using SUBSTRING_INDEX over this list will pick the first one

Fiddle Demo

M Khalid Junaid 52.2k86091 · Answer 10 · 2014-05-04 11:38:30Z

up vote
4
down vote

SELECT 

  column1,

  column2 

FROM

  table_name 

WHERE id IN 

  (SELECT 

    MAX(id) 

  FROM

    table_name 

  GROUP BY column1) 

ORDER BY column1 ;

edited May 4 '14 at 11:38

M Khalid Junaid

52.2k86091

answered Apr 11 '14 at 6:55

jeet singh parmar

41753

Could you elaborate a bit on your answer? Why is your query preferrable to Vijays original query?
– janfoeh
May 4 '14 at 11:57

add a comment |

Brock Adams 67.6k14153211 · Answer 11 · 2011-07-15 13:47:27Z

Try this:

SELECT jos_categories.title AS name,

       joined .catid,

       joined .title,

       joined .introtext

FROM   jos_categories

       INNER JOIN (SELECT *

                   FROM   (SELECT `title`,

                                  catid,

                                  `created`,

                                  introtext

                           FROM   `jos_content`

                           WHERE  `sectionid` = 6

                           ORDER  BY `id` DESC) AS yes

                   GROUP  BY `yes`.`catid` DESC

                   ORDER  BY `yes`.`created` DESC) AS joined

         ON( joined.catid = jos_categories.id )

Shrikant Gupta 559 · Answer 12 · 2015-09-28 09:07:12Z

up vote
3
down vote

You can take view from here as well.

http://sqlfiddle.com/#!9/ef42b/9

FIRST SOLUTION

SELECT d1.ID,Name,City FROM Demo_User d1

INNER JOIN

(SELECT MAX(ID) AS ID FROM Demo_User GROUP By NAME) AS P ON (d1.ID=P.ID);

SECOND SOLUTION

SELECT * FROM (SELECT * FROM Demo_User ORDER BY ID DESC) AS T GROUP BY NAME ;

answered Sep 28 '15 at 9:07

Shrikant Gupta

559

Second Solution doesn't work for my case
– dikirill
Apr 28 '17 at 18:41

add a comment |

score 2 · Answer 13 · 2010-10-08 01:57:49Z

Is there any way we could use this method to delete duplicates in a table? The result set is basically a collection of unique records, so if we could delete all records not in the result set, we would effectively have no duplicates? I tried this but mySQL gave a 1093 error.

DELETE FROM messages WHERE id NOT IN

 (SELECT m1.id  

 FROM messages m1 LEFT JOIN messages m2  

 ON (m1.name = m2.name AND m1.id < m2.id)  

 WHERE m2.id IS NULL)

Is there a way to maybe save the output to a temp variable then delete from NOT IN (temp variable)? @Bill thanks for a very useful solution.

EDIT: Think i found the solution:

DROP TABLE IF EXISTS UniqueIDs; 

CREATE Temporary table UniqueIDs (id Int(11)); 



INSERT INTO UniqueIDs 

    (SELECT T1.ID FROM Table T1 LEFT JOIN Table T2 ON 

    (T1.Field1 = T2.Field1 AND T1.Field2 = T2.Field2 #Comparison Fields  

    AND T1.ID < T2.ID) 

    WHERE T2.ID IS NULL); 



DELETE FROM Table WHERE id NOT IN (SELECT ID FROM UniqueIDs);

animuson♦ 41.9k22113129 · Answer 14 · 2011-11-18 20:21:00Z

up vote
2
down vote

The below query will work fine as per your question.

SELECT M1.* 

FROM MESSAGES M1,

(

 SELECT SUBSTR(Others_data,1,2),MAX(Others_data) AS Max_Others_data

 FROM MESSAGES

 GROUP BY 1

) M2

WHERE M1.Others_data = M2.Max_Others_data

ORDER BY Others_data;

edited Nov 18 '11 at 20:21

animuson♦

41.9k22113129

answered Nov 18 '11 at 20:19

Teja

7,2942363103

add a comment |

bikashphp 7429 · Answer 15 · 2014-10-21 14:08:16Z

Hi @Vijay Dev if your table messages contains Id which is auto increment primary key then to fetch the latest record basis on the primary key your query should read as below:

SELECT m1.* FROM messages m1 INNER JOIN (SELECT max(Id) as lastmsgId FROM messages GROUP BY Name) m2 ON m1.Id=m2.lastmsgId

Wanderer 10.1k42143 · Answer 16 · 2015-11-19 04:36:11Z

If you want the last row for each Name, then you can give a row number to each row group by the Name and order by Id in descending order.

QUERY

SELECT t1.Id, 

       t1.Name, 

       t1.Other_Columns

FROM 

(

     SELECT Id, 

            Name, 

            Other_Columns,

    (

        CASE Name WHEN @curA 

        THEN @curRow := @curRow + 1 

        ELSE @curRow := 1 AND @curA := Name END 

    ) + 1 AS rn 

    FROM messages t, 

    (SELECT @curRow := 0, @curA := '') r 

    ORDER BY Name,Id DESC 

)t1

WHERE t1.rn = 1

ORDER BY t1.Id;

SQL Fiddle

Song Zhengyi 1615 · Answer 17 · 2018-03-10 20:33:11Z

An approach with considerable speed is as follows.

SELECT * 

FROM messages a

WHERE Id = (SELECT MAX(Id) FROM messages WHERE a.Name = Name)

Result

Id  Name    Other_Columns

3   A   A_data_3

5   B   B_data_2

6   C   C_data_1

score 2 · Answer 18 · 2018-04-30 06:20:59Z

Clearly there are lots of different ways of getting the same results, your question seems to be what is an efficient way of getting the last results in each group in MySQL. If you are working with huge amounts of data and assuming you are using InnoDB with even the latest versions of MySQL (such as 5.7.21 and 8.0.4-rc) then there might not be an efficient way of doing this.

We sometimes need to do this with tables with even more than 60 million rows.

For these examples I will use data with only about 1.5 million rows where the queries would need to find results for all groups in the data. In our actual cases we would often need to return back data from about 2,000 groups (which hypothetically would not require examining very much of the data).

I will use the following tables:

CREATE TABLE temperature(

  id INT UNSIGNED NOT NULL AUTO_INCREMENT, 

  groupID INT UNSIGNED NOT NULL, 

  recordedTimestamp TIMESTAMP NOT NULL, 

  recordedValue INT NOT NULL,

  INDEX groupIndex(groupID, recordedTimestamp), 

  PRIMARY KEY (id)

);



CREATE TEMPORARY TABLE selected_group(id INT UNSIGNED NOT NULL, PRIMARY KEY(id));

The temperature table is populated with about 1.5 million random records, and with 100 different groups.
The selected_group is populated with those 100 groups (in our cases this would normally be less than 20% for all of the groups).

As this data is random it means that multiple rows can have the same recordedTimestamps. What we want is to get a list of all of the selected groups in order of groupID with the last recordedTimestamp for each group, and if the same group has more than one matching row like that then the last matching id of those rows.

If hypothetically MySQL had a last() function which returned values from the last row in a special ORDER BY clause then we could simply do:

SELECT 

  last(t1.id) AS id, 

  t1.groupID, 

  last(t1.recordedTimestamp) AS recordedTimestamp, 

  last(t1.recordedValue) AS recordedValue

FROM selected_group g

INNER JOIN temperature t1 ON t1.groupID = g.id

ORDER BY t1.recordedTimestamp, t1.id

GROUP BY t1.groupID;

which would only need to examine a few 100 rows in this case as it doesn't use any of the normal GROUP BY functions. This would execute in 0 seconds and hence be highly efficient.
Note that normally in MySQL we would see an ORDER BY clause following the GROUP BY clause however this ORDER BY clause is used to determine the ORDER for the last() function, if it was after the GROUP BY then it would be ordering the GROUPS. If no GROUP BY clause is present then the last values will be the same in all of the returned rows.

However MySQL does not have this so let's look at different ideas of what it does have and prove that none of these are efficient.

Example 1

SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue

FROM selected_group g

INNER JOIN temperature t1 ON t1.id = (

  SELECT t2.id

  FROM temperature t2 

  WHERE t2.groupID = g.id

  ORDER BY t2.recordedTimestamp DESC, t2.id DESC

  LIMIT 1

);

This examined 3,009,254 rows and took ~0.859 seconds on 5.7.21 and slightly longer on 8.0.4-rc

Example 2

SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 

FROM temperature t1

INNER JOIN ( 

  SELECT max(t2.id) AS id   

  FROM temperature t2

  INNER JOIN (

    SELECT t3.groupID, max(t3.recordedTimestamp) AS recordedTimestamp

    FROM selected_group g

    INNER JOIN temperature t3 ON t3.groupID = g.id

    GROUP BY t3.groupID

  ) t4 ON t4.groupID = t2.groupID AND t4.recordedTimestamp = t2.recordedTimestamp

  GROUP BY t2.groupID

) t5 ON t5.id = t1.id;

This examined 1,505,331 rows and took ~1.25 seconds on 5.7.21 and slightly longer on 8.0.4-rc

Example 3

SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 

FROM temperature t1

WHERE t1.id IN ( 

  SELECT max(t2.id) AS id   

  FROM temperature t2

  INNER JOIN (

    SELECT t3.groupID, max(t3.recordedTimestamp) AS recordedTimestamp

    FROM selected_group g

    INNER JOIN temperature t3 ON t3.groupID = g.id

    GROUP BY t3.groupID

  ) t4 ON t4.groupID = t2.groupID AND t4.recordedTimestamp = t2.recordedTimestamp

  GROUP BY t2.groupID

)

ORDER BY t1.groupID;

This examined 3,009,685 rows and took ~1.95 seconds on 5.7.21 and slightly longer on 8.0.4-rc

Example 4

SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue

FROM selected_group g

INNER JOIN temperature t1 ON t1.id = (

  SELECT max(t2.id)

  FROM temperature t2 

  WHERE t2.groupID = g.id AND t2.recordedTimestamp = (

      SELECT max(t3.recordedTimestamp)

      FROM temperature t3 

      WHERE t3.groupID = g.id

    )

);

This examined 6,137,810 rows and took ~2.2 seconds on 5.7.21 and slightly longer on 8.0.4-rc

Example 5

SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue

FROM (

  SELECT 

    t2.id, 

    t2.groupID, 

    t2.recordedTimestamp, 

    t2.recordedValue, 

    row_number() OVER (

      PARTITION BY t2.groupID ORDER BY t2.recordedTimestamp DESC, t2.id DESC

    ) AS rowNumber

  FROM selected_group g 

  INNER JOIN temperature t2 ON t2.groupID = g.id

) t1 WHERE t1.rowNumber = 1;

This examined 6,017,808 rows and took ~4.2 seconds on 8.0.4-rc

Example 6

SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 

FROM (

  SELECT 

    last_value(t2.id) OVER w AS id, 

    t2.groupID, 

    last_value(t2.recordedTimestamp) OVER w AS recordedTimestamp, 

    last_value(t2.recordedValue) OVER w AS recordedValue

  FROM selected_group g

  INNER JOIN temperature t2 ON t2.groupID = g.id

  WINDOW w AS (

    PARTITION BY t2.groupID 

    ORDER BY t2.recordedTimestamp, t2.id 

    RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

  )

) t1

GROUP BY t1.groupID;

This examined 6,017,908 rows and took ~17.5 seconds on 8.0.4-rc

Example 7

SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 

FROM selected_group g

INNER JOIN temperature t1 ON t1.groupID = g.id

LEFT JOIN temperature t2 

  ON t2.groupID = g.id 

  AND (

    t2.recordedTimestamp > t1.recordedTimestamp 

    OR (t2.recordedTimestamp = t1.recordedTimestamp AND t2.id > t1.id)

  )

WHERE t2.id IS NULL

ORDER BY t1.groupID;

This one was taking forever so I had to kill it.

Tunaki 87.3k21188258 · Answer 19 · 2016-06-18 14:21:07Z

up vote
1
down vote

select * from messages group by name desc

edited Jun 18 '16 at 14:21

Tunaki

87.3k21188258

answered Jun 18 '16 at 14:12

huuang

9210

this works fine! see here also stackoverflow.com/questions/1313120/…
– user2241289
Feb 12 '17 at 18:45

add a comment |

Azathoth 3601522 · Answer 20 · 2016-11-30 10:50:40Z

How about this:

SELECT DISTINCT ON (name) *

FROM messages

ORDER BY name, id DESC;

I had similar issue (on postgresql tough) and on a 1M records table. This solution takes 1.7s vs 44s produced by the one with LEFT JOIN.
In my case I had to filter the corrispondant of your name field against NULL values, resulting in even better performances by 0.2 secs

jabko87 1,46011221 · Answer 21 · 2018-05-02 15:05:59Z

If performance is really your concern you can introduce a new column on the table called IsLastInGroup of type BIT.

Set it to true on the columns which are last and maintain it with every row insert/update/delete. Writes will be slower, but you'll benefit on reads. It depends on your use case and I recommend it only if you're read-focused.

So your query will look like:

SELECT * FROM Messages WHERE IsLastInGroup = 1

score 750 · Accepted Answer · 2017-12-26 20:38:20Z

MySQL 8.0 now supports windowing functions, like almost all popular SQL implementations. With this standard syntax, we can write greatest-n-per-group queries:

WITH ranked_messages AS (

  SELECT m.*, ROW_NUMBER() OVER (PARTITION BY name ORDER BY id DESC) AS rn

  FROM messages AS m

)

SELECT * FROM ranked_messages WHERE rn = 1;

Below is the original answer I wrote for this question in 2009:

I write the solution this way:

SELECT m1.*

FROM messages m1 LEFT JOIN messages m2

 ON (m1.name = m2.name AND m1.id < m2.id)

WHERE m2.id IS NULL;

Regarding performance, one solution or the other can be better, depending on the nature of your data. So you should test both queries and use the one that is better at performance given your database.

For example, I have a copy of the StackOverflow August data dump. I'll use that for benchmarking. There are 1,114,357 rows in the Posts table. This is running on MySQL 5.0.75 on my Macbook Pro 2.40GHz.

I'll write a query to find the most recent post for a given user ID (mine).

First using the technique shown by @Eric with the GROUP BY in a subquery:

SELECT p1.postid

FROM Posts p1

INNER JOIN (SELECT pi.owneruserid, MAX(pi.postid) AS maxpostid

            FROM Posts pi GROUP BY pi.owneruserid) p2

  ON (p1.postid = p2.maxpostid)

WHERE p1.owneruserid = 20860;



1 row in set (1 min 17.89 sec)

Even the EXPLAIN analysis takes over 16 seconds:

+----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+

| id | select_type | table      | type   | possible_keys              | key         | key_len | ref          | rows    | Extra       |

+----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+

|  1 | PRIMARY     | <derived2> | ALL    | NULL                       | NULL        | NULL    | NULL         |   76756 |             | 

|  1 | PRIMARY     | p1         | eq_ref | PRIMARY,PostId,OwnerUserId | PRIMARY     | 8       | p2.maxpostid |       1 | Using where | 

|  2 | DERIVED     | pi         | index  | NULL                       | OwnerUserId | 8       | NULL         | 1151268 | Using index | 

+----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+

3 rows in set (16.09 sec)

Now produce the same query result using my technique with LEFT JOIN:

SELECT p1.postid

FROM Posts p1 LEFT JOIN posts p2

  ON (p1.owneruserid = p2.owneruserid AND p1.postid < p2.postid)

WHERE p2.postid IS NULL AND p1.owneruserid = 20860;



1 row in set (0.28 sec)

The EXPLAIN analysis shows that both tables are able to use their indexes:

+----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+

| id | select_type | table | type | possible_keys              | key         | key_len | ref   | rows | Extra                                |

+----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+

|  1 | SIMPLE      | p1    | ref  | OwnerUserId                | OwnerUserId | 8       | const | 1384 | Using index                          | 

|  1 | SIMPLE      | p2    | ref  | PRIMARY,PostId,OwnerUserId | OwnerUserId | 8       | const | 1384 | Using where; Using index; Not exists | 

+----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+

2 rows in set (0.00 sec)

Here's the DDL for my Posts table:

CREATE TABLE `posts` (

  `PostId` bigint(20) unsigned NOT NULL auto_increment,

  `PostTypeId` bigint(20) unsigned NOT NULL,

  `AcceptedAnswerId` bigint(20) unsigned default NULL,

  `ParentId` bigint(20) unsigned default NULL,

  `CreationDate` datetime NOT NULL,

  `Score` int(11) NOT NULL default '0',

  `ViewCount` int(11) NOT NULL default '0',

  `Body` text NOT NULL,

  `OwnerUserId` bigint(20) unsigned NOT NULL,

  `OwnerDisplayName` varchar(40) default NULL,

  `LastEditorUserId` bigint(20) unsigned default NULL,

  `LastEditDate` datetime default NULL,

  `LastActivityDate` datetime default NULL,

  `Title` varchar(250) NOT NULL default '',

  `Tags` varchar(150) NOT NULL default '',

  `AnswerCount` int(11) NOT NULL default '0',

  `CommentCount` int(11) NOT NULL default '0',

  `FavoriteCount` int(11) NOT NULL default '0',

  `ClosedDate` datetime default NULL,

  PRIMARY KEY  (`PostId`),

  UNIQUE KEY `PostId` (`PostId`),

  KEY `PostTypeId` (`PostTypeId`),

  KEY `AcceptedAnswerId` (`AcceptedAnswerId`),

  KEY `OwnerUserId` (`OwnerUserId`),

  KEY `LastEditorUserId` (`LastEditorUserId`),

  KEY `ParentId` (`ParentId`),

  CONSTRAINT `posts_ibfk_1` FOREIGN KEY (`PostTypeId`) REFERENCES `posttypes` (`PostTypeId`)

) ENGINE=InnoDB;

Really? What happens if you have a ton of entries? For example, if you're working w/ an in-house version control, say, and you have a ton of versions per file, that join result would be massive. Have you ever benchmarked the subquery method with this one? I'm pretty curious to know which would win, but not curious enough to not ask you first. — Aug 21 '09 at 18:19
Did some testing. On a small table (~300k records, ~190k groups, so not massive groups or anything), the queries tied (8 seconds each). — Aug 21 '09 at 18:44
@BillKarwin: See meta.stackexchange.com/questions/123017, especially the comments below Adam Rackis' answer. Let me know if you want to reclaim your answer on the new question. — Feb 21 '12 at 18:06
@Tim, no, <= will not help if you have a non-unique column. You must use a unique column as a tiebreaker. — Jul 3 '15 at 7:13
The performance degrades exponentially as the number of rows increases or when groups become larger. For example a group consisting of 5 dates will yield 4+3+2+1+1 = 11 rows via left join out of which one row is filtered in the end. Performance of joining with grouped results is almost linear. Your tests look flawed. — Oct 16 '15 at 12:12

score 125 · Answer 23 · 2017-03-31 15:08:26Z

UPD: 2017-03-31, the version 5.7.5 of MySQL made the ONLY_FULL_GROUP_BY switch enabled by default (hence, non-deterministic GROUP BY queries became disabled). Moreover, they updated the GROUP BY implementation and the solution might not work as expected anymore even with the disabled switch. One needs to check.

Bill Karwin's solution above works fine when item count within groups is rather small, but the performance of the query becomes bad when the groups are rather large, since the solution requires about n*n/2 + n/2 of only IS NULL comparisons.

I made my tests on a InnoDB table of 18684446 rows with 1182 groups. The table contains testresults for functional tests and has the (test_id, request_id) as the primary key. Thus, test_id is a group and I was searching for the last request_id for each test_id.

Bill's solution has already been running for several hours on my dell e4310 and I do not know when it is going to finish even though it operates on a coverage index (hence using index in EXPLAIN).

I have a couple of other solutions that are based on the same ideas:

if the underlying index is BTREE index (which is usually the case), the largest (group_id, item_value) pair is the last value within each group_id, that is the first for each group_id if we walk through the index in descending order;

if we read the values which are covered by an index, the values are read in the order of the index;

each index implicitly contains primary key columns appended to that (that is the primary key is in the coverage index). In solutions below I operate directly on the primary key, in you case, you will just need to add primary key columns in the result.

in many cases it is much cheaper to collect the required row ids in the required order in a subquery and join the result of the subquery on the id. Since for each row in the subquery result MySQL will need a single fetch based on primary key, the subquery will be put first in the join and the rows will be output in the order of the ids in the subquery (if we omit explicit ORDER BY for the join)

3 ways MySQL uses indexes is a great article to understand some details.

Solution 1

This one is incredibly fast, it takes about 0,8 secs on my 18M+ rows:

SELECT test_id, MAX(request_id), request_id

FROM testresults

GROUP BY test_id DESC;

If you want to change the order to ASC, put it in a subquery, return the ids only and use that as the subquery to join to the rest of the columns:

SELECT test_id, request_id

FROM (

    SELECT test_id, MAX(request_id), request_id

    FROM testresults

    GROUP BY test_id DESC) as ids

ORDER BY test_id;

This one takes about 1,2 secs on my data.

Solution 2

Here is another solution that takes about 19 seconds for my table:

SELECT test_id, request_id

FROM testresults, (SELECT @group:=NULL) as init

WHERE IF(IFNULL(@group, -1)=@group:=test_id, 0, 1)

ORDER BY test_id DESC, request_id DESC

It returns tests in descending order as well. It is much slower since it does a full index scan but it is here to give you an idea how to output N max rows for each group.

The disadvantage of the query is that its result cannot be cached by the query cache.

Please link to a dump of your tables so that people can test it on their platforms. — Feb 3 '15 at 3:44
Solution 1 can't work, you can't select request_id without having that in group by clause, — Mar 9 '17 at 9:57
@giò, this is answer is 5 years old. Until MySQL 5.7.5 ONLY_FULL_GROUP_BY was disabled by default and this solution worked out of the box dev.mysql.com/doc/relnotes/mysql/5.7/en/…. Now I'm not sure if the solution still works when you disable the mode, because the implementation of the GROUP BY has been changed. — Mar 31 '17 at 14:58
If you wanted ASC in the first solution, would it work if you turn MAX to MIN? — May 9 '17 at 15:45

score 82 · Answer 24 · 2009-08-21 17:14:13Z

up vote
82
down vote

Use your subquery to return the correct grouping, because you're halfway there.

Try this:

select

    a.*

from

    messages a

    inner join 

        (select name, max(id) as maxid from messages group by name) as b on

        a.id = b.maxid

If it's not id you want the max of:

select

    a.*

from

    messages a

    inner join 

        (select name, max(other_col) as other_col 

         from messages group by name) as b on

        a.name = b.name

        and a.other_col = b.other_col

This way, you avoid correlated subqueries and/or ordering in your subqueries, which tend to be very slow/inefficient.

edited Aug 21 '09 at 17:14

answered Aug 21 '09 at 17:06

Eric

69.4k9100109

1

Note a caveat for the solution with other_col: if that column is not unique you may get multiple records back with the same name, if they tie for max(other_col). I found this post that describes a solution for my needs, where I need exactly one record per name.
– Eric Simonton
Aug 21 '15 at 13:48

In some situations you can only use this solution but ont the accepted one.
– tom10271
Sep 4 '15 at 2:59

In my experience, it is grouping the whole damn messages table that tends to be slow/inefficient! In other words, note that the subquery requires a full table scan, and does a grouping on that to boot... unless your optimizer is doing something that mine is not. So this solution depends heavily on holding the entire table in memory.
– Timo
Apr 30 at 14:56

add a comment |

JYelton 25.1k20105171 · Answer 25 · 2012-02-20 21:46:38Z

up vote
35
down vote

I arrived at a different solution, which is to get the IDs for the last post within each group, then select from the messages table using the result from the first query as the argument for a WHERE x IN construct:

SELECT id, name, other_columns

FROM messages

WHERE id IN (

    SELECT MAX(id)

    FROM messages

    GROUP BY name

);

I don't know how this performs compared to some of the other solutions, but it worked spectacularly for my table with 3+ million rows. (4 second execution with 1200+ results)

This should work both on MySQL and SQL Server.

answered Feb 20 '12 at 21:46

JYelton

25.1k20105171

Just make sure you have an index on (name, id).
– Samuel Åslund
Apr 22 '16 at 11:58

1

Much better that self joins
– anwerj
Dec 23 '16 at 7:40

I learned something from you that is a good job and this query is faster
– Humphrey
Feb 23 at 7:48

add a comment |

Vipin 2,18221842 · Answer 26 · 2013-12-25 08:36:42Z

up vote
25
down vote

Solution by sub query fiddle Link

select * from messages where id in

(select max(id) from messages group by Name)

Solution By join condition fiddle link

select m1.* from messages m1 

left outer join messages m2 

on ( m1.id<m2.id and m1.name=m2.name )

where m2.id is null

Reason for this post is to give fiddle link only.
Same SQL is already provided in other answers.

answered Dec 25 '13 at 8:36

Vipin

2,18221842

What's the point of the 'fiddle' if you can't run it?
– Alexander Suraphel
Jul 4 at 9:41

@AlexanderSuraphel mysql5.5 is not available in fiddle now, fiddle link was created using that. Now a days fiddle supports mysql5.6, i changed database to mysql 5.6 and i am able to build schema and run the sql.
– Vipin
Jul 4 at 17:21

add a comment |

Shai 68.2k22133238 · Answer 27 · 2013-02-14 07:07:11Z

up vote
7
down vote

I've not yet tested with large DB but I think this could be faster than joining tables:

SELECT *, Max(Id) FROM messages GROUP BY Name

edited Feb 14 '13 at 7:07

Shai

68.2k22133238

answered Mar 31 '12 at 14:44

user942821

4

This returns arbitrary data. In other words there returned columns might not be from the record with MAX(Id).
– harm
Jul 3 '14 at 15:05

Useful to select the max Id from a set of record with WHERE condition : "SELECT Max(Id) FROM Prod WHERE Pn='" + Pn + "'" It returns the max Id from a set of records with same Pn.In c# use reader.GetString(0) to get the result
– Nicola
Apr 8 '15 at 9:24

add a comment |

Steve Kass 6,1951321 · Answer 28 · 2009-08-21 17:26:12Z

Here are two suggestions. First, if mysql supports ROW_NUMBER(), it's very simple:

WITH Ranked AS (

  SELECT Id, Name, OtherColumns,

    ROW_NUMBER() OVER (

      PARTITION BY Name

      ORDER BY Id DESC

    ) AS rk

  FROM messages

)

  SELECT Id, Name, OtherColumns

  FROM messages

  WHERE rk = 1;

I'm assuming by "last" you mean last in Id order. If not, change the ORDER BY clause of the ROW_NUMBER() window accordingly. If ROW_NUMBER() isn't available, this is another solution:

Second, if it doesn't, this is often a good way to proceed:

SELECT

  Id, Name, OtherColumns

FROM messages

WHERE NOT EXISTS (

  SELECT * FROM messages as M2

  WHERE M2.Name = messages.Name

  AND M2.Id > messages.Id

)

In other words, select messages where there is no later-Id message with the same Name.

MySQL doesn't support ROW_NUMBER() or CTE's.
– Bill Karwin
Aug 21 '09 at 17:37 — Aug 21 '09 at 17:37

Paul Roub 32.6k85773 · Answer 29 · 2017-06-08 19:03:49Z

up vote
5
down vote

Here is my solution:

SELECT 

  DISTINCT NAME,

  MAX(MESSAGES) OVER(PARTITION BY NAME) MESSAGES 

FROM MESSAGE;

edited Jun 8 '17 at 19:03

Paul Roub

32.6k85773

answered Jun 8 '17 at 18:49

Abhishek Yadav

6213

add a comment |

score 4 · Answer 30 · 2014-03-30 06:01:52Z

Here is another way to get the last related record using GROUP_CONCAT with order by and SUBSTRING_INDEX to pick one of the record from the list

SELECT 

  `Id`,

  `Name`,

  SUBSTRING_INDEX(

    GROUP_CONCAT(

      `Other_Columns` 

      ORDER BY `Id` DESC 

      SEPARATOR '||'

    ),

    '||',

    1

  ) Other_Columns 

FROM

  messages 

GROUP BY `Name`

Above query will group the all the Other_Columns that are in same Name group and using ORDER BY id DESC will join all the Other_Columns in a specific group in descending order with the provided separator in my case i have used || ,using SUBSTRING_INDEX over this list will pick the first one

Fiddle Demo

M Khalid Junaid 52.2k86091 · Answer 31 · 2014-05-04 11:38:30Z

up vote
4
down vote

SELECT 

  column1,

  column2 

FROM

  table_name 

WHERE id IN 

  (SELECT 

    MAX(id) 

  FROM

    table_name 

  GROUP BY column1) 

ORDER BY column1 ;

edited May 4 '14 at 11:38

M Khalid Junaid

52.2k86091

answered Apr 11 '14 at 6:55

jeet singh parmar

41753

Could you elaborate a bit on your answer? Why is your query preferrable to Vijays original query?
– janfoeh
May 4 '14 at 11:57

add a comment |

Brock Adams 67.6k14153211 · Answer 32 · 2011-07-15 13:47:27Z

Try this:

SELECT jos_categories.title AS name,

       joined .catid,

       joined .title,

       joined .introtext

FROM   jos_categories

       INNER JOIN (SELECT *

                   FROM   (SELECT `title`,

                                  catid,

                                  `created`,

                                  introtext

                           FROM   `jos_content`

                           WHERE  `sectionid` = 6

                           ORDER  BY `id` DESC) AS yes

                   GROUP  BY `yes`.`catid` DESC

                   ORDER  BY `yes`.`created` DESC) AS joined

         ON( joined.catid = jos_categories.id )

Shrikant Gupta 559 · Answer 33 · 2015-09-28 09:07:12Z

up vote
3
down vote

You can take view from here as well.

http://sqlfiddle.com/#!9/ef42b/9

FIRST SOLUTION

SELECT d1.ID,Name,City FROM Demo_User d1

INNER JOIN

(SELECT MAX(ID) AS ID FROM Demo_User GROUP By NAME) AS P ON (d1.ID=P.ID);

SECOND SOLUTION

SELECT * FROM (SELECT * FROM Demo_User ORDER BY ID DESC) AS T GROUP BY NAME ;

answered Sep 28 '15 at 9:07

Shrikant Gupta

559

Second Solution doesn't work for my case
– dikirill
Apr 28 '17 at 18:41

add a comment |

score 2 · Answer 34 · 2010-10-08 01:57:49Z

Is there any way we could use this method to delete duplicates in a table? The result set is basically a collection of unique records, so if we could delete all records not in the result set, we would effectively have no duplicates? I tried this but mySQL gave a 1093 error.

DELETE FROM messages WHERE id NOT IN

 (SELECT m1.id  

 FROM messages m1 LEFT JOIN messages m2  

 ON (m1.name = m2.name AND m1.id < m2.id)  

 WHERE m2.id IS NULL)

Is there a way to maybe save the output to a temp variable then delete from NOT IN (temp variable)? @Bill thanks for a very useful solution.

EDIT: Think i found the solution:

DROP TABLE IF EXISTS UniqueIDs; 

CREATE Temporary table UniqueIDs (id Int(11)); 



INSERT INTO UniqueIDs 

    (SELECT T1.ID FROM Table T1 LEFT JOIN Table T2 ON 

    (T1.Field1 = T2.Field1 AND T1.Field2 = T2.Field2 #Comparison Fields  

    AND T1.ID < T2.ID) 

    WHERE T2.ID IS NULL); 



DELETE FROM Table WHERE id NOT IN (SELECT ID FROM UniqueIDs);

animuson♦ 41.9k22113129 · Answer 35 · 2011-11-18 20:21:00Z

up vote
2
down vote

The below query will work fine as per your question.

SELECT M1.* 

FROM MESSAGES M1,

(

 SELECT SUBSTR(Others_data,1,2),MAX(Others_data) AS Max_Others_data

 FROM MESSAGES

 GROUP BY 1

) M2

WHERE M1.Others_data = M2.Max_Others_data

ORDER BY Others_data;

edited Nov 18 '11 at 20:21

animuson♦

41.9k22113129

answered Nov 18 '11 at 20:19

Teja

7,2942363103

add a comment |

bikashphp 7429 · Answer 36 · 2014-10-21 14:08:16Z

Hi @Vijay Dev if your table messages contains Id which is auto increment primary key then to fetch the latest record basis on the primary key your query should read as below:

SELECT m1.* FROM messages m1 INNER JOIN (SELECT max(Id) as lastmsgId FROM messages GROUP BY Name) m2 ON m1.Id=m2.lastmsgId

Wanderer 10.1k42143 · Answer 37 · 2015-11-19 04:36:11Z

If you want the last row for each Name, then you can give a row number to each row group by the Name and order by Id in descending order.

QUERY

SELECT t1.Id, 

       t1.Name, 

       t1.Other_Columns

FROM 

(

     SELECT Id, 

            Name, 

            Other_Columns,

    (

        CASE Name WHEN @curA 

        THEN @curRow := @curRow + 1 

        ELSE @curRow := 1 AND @curA := Name END 

    ) + 1 AS rn 

    FROM messages t, 

    (SELECT @curRow := 0, @curA := '') r 

    ORDER BY Name,Id DESC 

)t1

WHERE t1.rn = 1

ORDER BY t1.Id;

SQL Fiddle

Song Zhengyi 1615 · Answer 38 · 2018-03-10 20:33:11Z

An approach with considerable speed is as follows.

SELECT * 

FROM messages a

WHERE Id = (SELECT MAX(Id) FROM messages WHERE a.Name = Name)

Result

Id  Name    Other_Columns

3   A   A_data_3

5   B   B_data_2

6   C   C_data_1

score 2 · Answer 39 · 2018-04-30 06:20:59Z

Clearly there are lots of different ways of getting the same results, your question seems to be what is an efficient way of getting the last results in each group in MySQL. If you are working with huge amounts of data and assuming you are using InnoDB with even the latest versions of MySQL (such as 5.7.21 and 8.0.4-rc) then there might not be an efficient way of doing this.

We sometimes need to do this with tables with even more than 60 million rows.

For these examples I will use data with only about 1.5 million rows where the queries would need to find results for all groups in the data. In our actual cases we would often need to return back data from about 2,000 groups (which hypothetically would not require examining very much of the data).

I will use the following tables:

CREATE TABLE temperature(

  id INT UNSIGNED NOT NULL AUTO_INCREMENT, 

  groupID INT UNSIGNED NOT NULL, 

  recordedTimestamp TIMESTAMP NOT NULL, 

  recordedValue INT NOT NULL,

  INDEX groupIndex(groupID, recordedTimestamp), 

  PRIMARY KEY (id)

);



CREATE TEMPORARY TABLE selected_group(id INT UNSIGNED NOT NULL, PRIMARY KEY(id));

The temperature table is populated with about 1.5 million random records, and with 100 different groups.
The selected_group is populated with those 100 groups (in our cases this would normally be less than 20% for all of the groups).

As this data is random it means that multiple rows can have the same recordedTimestamps. What we want is to get a list of all of the selected groups in order of groupID with the last recordedTimestamp for each group, and if the same group has more than one matching row like that then the last matching id of those rows.

If hypothetically MySQL had a last() function which returned values from the last row in a special ORDER BY clause then we could simply do:

SELECT 

  last(t1.id) AS id, 

  t1.groupID, 

  last(t1.recordedTimestamp) AS recordedTimestamp, 

  last(t1.recordedValue) AS recordedValue

FROM selected_group g

INNER JOIN temperature t1 ON t1.groupID = g.id

ORDER BY t1.recordedTimestamp, t1.id

GROUP BY t1.groupID;

which would only need to examine a few 100 rows in this case as it doesn't use any of the normal GROUP BY functions. This would execute in 0 seconds and hence be highly efficient.
Note that normally in MySQL we would see an ORDER BY clause following the GROUP BY clause however this ORDER BY clause is used to determine the ORDER for the last() function, if it was after the GROUP BY then it would be ordering the GROUPS. If no GROUP BY clause is present then the last values will be the same in all of the returned rows.

However MySQL does not have this so let's look at different ideas of what it does have and prove that none of these are efficient.

Example 1

SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue

FROM selected_group g

INNER JOIN temperature t1 ON t1.id = (

  SELECT t2.id

  FROM temperature t2 

  WHERE t2.groupID = g.id

  ORDER BY t2.recordedTimestamp DESC, t2.id DESC

  LIMIT 1

);

This examined 3,009,254 rows and took ~0.859 seconds on 5.7.21 and slightly longer on 8.0.4-rc

Example 2

SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 

FROM temperature t1

INNER JOIN ( 

  SELECT max(t2.id) AS id   

  FROM temperature t2

  INNER JOIN (

    SELECT t3.groupID, max(t3.recordedTimestamp) AS recordedTimestamp

    FROM selected_group g

    INNER JOIN temperature t3 ON t3.groupID = g.id

    GROUP BY t3.groupID

  ) t4 ON t4.groupID = t2.groupID AND t4.recordedTimestamp = t2.recordedTimestamp

  GROUP BY t2.groupID

) t5 ON t5.id = t1.id;

This examined 1,505,331 rows and took ~1.25 seconds on 5.7.21 and slightly longer on 8.0.4-rc

Example 3

SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 

FROM temperature t1

WHERE t1.id IN ( 

  SELECT max(t2.id) AS id   

  FROM temperature t2

  INNER JOIN (

    SELECT t3.groupID, max(t3.recordedTimestamp) AS recordedTimestamp

    FROM selected_group g

    INNER JOIN temperature t3 ON t3.groupID = g.id

    GROUP BY t3.groupID

  ) t4 ON t4.groupID = t2.groupID AND t4.recordedTimestamp = t2.recordedTimestamp

  GROUP BY t2.groupID

)

ORDER BY t1.groupID;

This examined 3,009,685 rows and took ~1.95 seconds on 5.7.21 and slightly longer on 8.0.4-rc

Example 4

SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue

FROM selected_group g

INNER JOIN temperature t1 ON t1.id = (

  SELECT max(t2.id)

  FROM temperature t2 

  WHERE t2.groupID = g.id AND t2.recordedTimestamp = (

      SELECT max(t3.recordedTimestamp)

      FROM temperature t3 

      WHERE t3.groupID = g.id

    )

);

This examined 6,137,810 rows and took ~2.2 seconds on 5.7.21 and slightly longer on 8.0.4-rc

Example 5

SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue

FROM (

  SELECT 

    t2.id, 

    t2.groupID, 

    t2.recordedTimestamp, 

    t2.recordedValue, 

    row_number() OVER (

      PARTITION BY t2.groupID ORDER BY t2.recordedTimestamp DESC, t2.id DESC

    ) AS rowNumber

  FROM selected_group g 

  INNER JOIN temperature t2 ON t2.groupID = g.id

) t1 WHERE t1.rowNumber = 1;

This examined 6,017,808 rows and took ~4.2 seconds on 8.0.4-rc

Example 6

SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 

FROM (

  SELECT 

    last_value(t2.id) OVER w AS id, 

    t2.groupID, 

    last_value(t2.recordedTimestamp) OVER w AS recordedTimestamp, 

    last_value(t2.recordedValue) OVER w AS recordedValue

  FROM selected_group g

  INNER JOIN temperature t2 ON t2.groupID = g.id

  WINDOW w AS (

    PARTITION BY t2.groupID 

    ORDER BY t2.recordedTimestamp, t2.id 

    RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

  )

) t1

GROUP BY t1.groupID;

This examined 6,017,908 rows and took ~17.5 seconds on 8.0.4-rc

Example 7

SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 

FROM selected_group g

INNER JOIN temperature t1 ON t1.groupID = g.id

LEFT JOIN temperature t2 

  ON t2.groupID = g.id 

  AND (

    t2.recordedTimestamp > t1.recordedTimestamp 

    OR (t2.recordedTimestamp = t1.recordedTimestamp AND t2.id > t1.id)

  )

WHERE t2.id IS NULL

ORDER BY t1.groupID;

This one was taking forever so I had to kill it.

Tunaki 87.3k21188258 · Answer 40 · 2016-06-18 14:21:07Z

up vote
1
down vote

select * from messages group by name desc

edited Jun 18 '16 at 14:21

Tunaki

87.3k21188258

answered Jun 18 '16 at 14:12

huuang

9210

this works fine! see here also stackoverflow.com/questions/1313120/…
– user2241289
Feb 12 '17 at 18:45

add a comment |

Azathoth 3601522 · Answer 41 · 2016-11-30 10:50:40Z

How about this:

SELECT DISTINCT ON (name) *

FROM messages

ORDER BY name, id DESC;

I had similar issue (on postgresql tough) and on a 1M records table. This solution takes 1.7s vs 44s produced by the one with LEFT JOIN.
In my case I had to filter the corrispondant of your name field against NULL values, resulting in even better performances by 0.2 secs

jabko87 1,46011221 · Answer 42 · 2018-05-02 15:05:59Z

If performance is really your concern you can introduce a new column on the table called IsLastInGroup of type BIT.

Set it to true on the columns which are last and maintain it with every row insert/update/delete. Writes will be slower, but you'll benefit on reads. It depends on your use case and I recommend it only if you're read-focused.

So your query will look like:

SELECT * FROM Messages WHERE IsLastInGroup = 1

Search This Blog

Ufyukyu

Retrieving the last record in each group - MySQL

21 Answers
21

Fiddle Demo

SQL Fiddle

protected by Community♦ Mar 30 '12 at 9:58

21 Answers
21

21 Answers
21

Fiddle Demo

Fiddle Demo

Fiddle Demo

Fiddle Demo

SQL Fiddle

SQL Fiddle

SQL Fiddle

SQL Fiddle

protected by Community♦ Mar 30 '12 at 9:58

Popular posts from this blog

Azure Devops hosted Ubuntu agent cancels build with edited hosts file

bold and italics in edittext

NPM command prompt closes immediately [closed]

Category

Random preview

Retrieving the last record in each group - MySQL

21 Answers 21

Fiddle Demo

SQL Fiddle

protected by Community♦ Mar 30 '12 at 9:58

21 Answers 21

21 Answers 21

Fiddle Demo

Fiddle Demo

Fiddle Demo

Fiddle Demo

SQL Fiddle

SQL Fiddle

SQL Fiddle

SQL Fiddle

protected by Community♦ Mar 30 '12 at 9:58

Popular posts from this blog

Azure Devops hosted Ubuntu agent cancels build with edited hosts file

bold and italics in edittext

NPM command prompt closes immediately [closed]

21 Answers
21

21 Answers
21

21 Answers
21