Retrieving the last record in each group - MySQL











up vote
725
down vote

favorite
323












There is a table messages that contains data as shown below:



Id   Name   Other_Columns
-------------------------
1 A A_data_1
2 A A_data_2
3 A A_data_3
4 B B_data_1
5 B B_data_2
6 C C_data_1


If I run a query select * from messages group by name, I will get the result as:



1    A       A_data_1
4 B B_data_1
6 C C_data_1


What query will return the following result?



3    A       A_data_3
5 B B_data_2
6 C C_data_1


That is, the last record in each group should be returned.



At present, this is the query that I use:



SELECT
*
FROM (SELECT
*
FROM messages
ORDER BY id DESC) AS x
GROUP BY name


But this looks highly inefficient. Any other ways to achieve the same result?










share|improve this question




















  • 2




    see accepted answer in stackoverflow.com/questions/1379565/… for a more efficient solution
    – eyaler
    Jun 25 '12 at 12:45










  • Duplicate of stackoverflow.com/q/121387/684229
    – TMS
    Jun 14 '13 at 20:10






  • 4




    Why can't you just add DESC, i.e. select * from messages group by name DESC
    – Kim Prince
    Dec 3 '15 at 6:41










  • Possible duplicate of How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL?
    – Ciro Santilli 新疆改造中心 六四事件 法轮功
    Jun 12 '16 at 22:19






  • 1




    @KimPrince It seems like the answer you are suggesting doesn't do what is expected! I just tried your method and it took FIRST row for each group and ordered DESC. It does NOT take the last row of each group
    – Ayrat
    May 22 '17 at 15:34















up vote
725
down vote

favorite
323












There is a table messages that contains data as shown below:



Id   Name   Other_Columns
-------------------------
1 A A_data_1
2 A A_data_2
3 A A_data_3
4 B B_data_1
5 B B_data_2
6 C C_data_1


If I run a query select * from messages group by name, I will get the result as:



1    A       A_data_1
4 B B_data_1
6 C C_data_1


What query will return the following result?



3    A       A_data_3
5 B B_data_2
6 C C_data_1


That is, the last record in each group should be returned.



At present, this is the query that I use:



SELECT
*
FROM (SELECT
*
FROM messages
ORDER BY id DESC) AS x
GROUP BY name


But this looks highly inefficient. Any other ways to achieve the same result?










share|improve this question




















  • 2




    see accepted answer in stackoverflow.com/questions/1379565/… for a more efficient solution
    – eyaler
    Jun 25 '12 at 12:45










  • Duplicate of stackoverflow.com/q/121387/684229
    – TMS
    Jun 14 '13 at 20:10






  • 4




    Why can't you just add DESC, i.e. select * from messages group by name DESC
    – Kim Prince
    Dec 3 '15 at 6:41










  • Possible duplicate of How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL?
    – Ciro Santilli 新疆改造中心 六四事件 法轮功
    Jun 12 '16 at 22:19






  • 1




    @KimPrince It seems like the answer you are suggesting doesn't do what is expected! I just tried your method and it took FIRST row for each group and ordered DESC. It does NOT take the last row of each group
    – Ayrat
    May 22 '17 at 15:34













up vote
725
down vote

favorite
323









up vote
725
down vote

favorite
323






323





There is a table messages that contains data as shown below:



Id   Name   Other_Columns
-------------------------
1 A A_data_1
2 A A_data_2
3 A A_data_3
4 B B_data_1
5 B B_data_2
6 C C_data_1


If I run a query select * from messages group by name, I will get the result as:



1    A       A_data_1
4 B B_data_1
6 C C_data_1


What query will return the following result?



3    A       A_data_3
5 B B_data_2
6 C C_data_1


That is, the last record in each group should be returned.



At present, this is the query that I use:



SELECT
*
FROM (SELECT
*
FROM messages
ORDER BY id DESC) AS x
GROUP BY name


But this looks highly inefficient. Any other ways to achieve the same result?










share|improve this question















There is a table messages that contains data as shown below:



Id   Name   Other_Columns
-------------------------
1 A A_data_1
2 A A_data_2
3 A A_data_3
4 B B_data_1
5 B B_data_2
6 C C_data_1


If I run a query select * from messages group by name, I will get the result as:



1    A       A_data_1
4 B B_data_1
6 C C_data_1


What query will return the following result?



3    A       A_data_3
5 B B_data_2
6 C C_data_1


That is, the last record in each group should be returned.



At present, this is the query that I use:



SELECT
*
FROM (SELECT
*
FROM messages
ORDER BY id DESC) AS x
GROUP BY name


But this looks highly inefficient. Any other ways to achieve the same result?







sql mysql group-by greatest-n-per-group






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 26 at 9:50









DineshDB

3,89431938




3,89431938










asked Aug 21 '09 at 17:04









Vijay Dev

12.5k166794




12.5k166794








  • 2




    see accepted answer in stackoverflow.com/questions/1379565/… for a more efficient solution
    – eyaler
    Jun 25 '12 at 12:45










  • Duplicate of stackoverflow.com/q/121387/684229
    – TMS
    Jun 14 '13 at 20:10






  • 4




    Why can't you just add DESC, i.e. select * from messages group by name DESC
    – Kim Prince
    Dec 3 '15 at 6:41










  • Possible duplicate of How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL?
    – Ciro Santilli 新疆改造中心 六四事件 法轮功
    Jun 12 '16 at 22:19






  • 1




    @KimPrince It seems like the answer you are suggesting doesn't do what is expected! I just tried your method and it took FIRST row for each group and ordered DESC. It does NOT take the last row of each group
    – Ayrat
    May 22 '17 at 15:34














  • 2




    see accepted answer in stackoverflow.com/questions/1379565/… for a more efficient solution
    – eyaler
    Jun 25 '12 at 12:45










  • Duplicate of stackoverflow.com/q/121387/684229
    – TMS
    Jun 14 '13 at 20:10






  • 4




    Why can't you just add DESC, i.e. select * from messages group by name DESC
    – Kim Prince
    Dec 3 '15 at 6:41










  • Possible duplicate of How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL?
    – Ciro Santilli 新疆改造中心 六四事件 法轮功
    Jun 12 '16 at 22:19






  • 1




    @KimPrince It seems like the answer you are suggesting doesn't do what is expected! I just tried your method and it took FIRST row for each group and ordered DESC. It does NOT take the last row of each group
    – Ayrat
    May 22 '17 at 15:34








2




2




see accepted answer in stackoverflow.com/questions/1379565/… for a more efficient solution
– eyaler
Jun 25 '12 at 12:45




see accepted answer in stackoverflow.com/questions/1379565/… for a more efficient solution
– eyaler
Jun 25 '12 at 12:45












Duplicate of stackoverflow.com/q/121387/684229
– TMS
Jun 14 '13 at 20:10




Duplicate of stackoverflow.com/q/121387/684229
– TMS
Jun 14 '13 at 20:10




4




4




Why can't you just add DESC, i.e. select * from messages group by name DESC
– Kim Prince
Dec 3 '15 at 6:41




Why can't you just add DESC, i.e. select * from messages group by name DESC
– Kim Prince
Dec 3 '15 at 6:41












Possible duplicate of How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL?
– Ciro Santilli 新疆改造中心 六四事件 法轮功
Jun 12 '16 at 22:19




Possible duplicate of How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL?
– Ciro Santilli 新疆改造中心 六四事件 法轮功
Jun 12 '16 at 22:19




1




1




@KimPrince It seems like the answer you are suggesting doesn't do what is expected! I just tried your method and it took FIRST row for each group and ordered DESC. It does NOT take the last row of each group
– Ayrat
May 22 '17 at 15:34




@KimPrince It seems like the answer you are suggesting doesn't do what is expected! I just tried your method and it took FIRST row for each group and ordered DESC. It does NOT take the last row of each group
– Ayrat
May 22 '17 at 15:34












21 Answers
21






active

oldest

votes

















up vote
750
down vote



accepted










MySQL 8.0 now supports windowing functions, like almost all popular SQL implementations. With this standard syntax, we can write greatest-n-per-group queries:



WITH ranked_messages AS (
SELECT m.*, ROW_NUMBER() OVER (PARTITION BY name ORDER BY id DESC) AS rn
FROM messages AS m
)
SELECT * FROM ranked_messages WHERE rn = 1;


Below is the original answer I wrote for this question in 2009:





I write the solution this way:



SELECT m1.*
FROM messages m1 LEFT JOIN messages m2
ON (m1.name = m2.name AND m1.id < m2.id)
WHERE m2.id IS NULL;


Regarding performance, one solution or the other can be better, depending on the nature of your data. So you should test both queries and use the one that is better at performance given your database.



For example, I have a copy of the StackOverflow August data dump. I'll use that for benchmarking. There are 1,114,357 rows in the Posts table. This is running on MySQL 5.0.75 on my Macbook Pro 2.40GHz.



I'll write a query to find the most recent post for a given user ID (mine).



First using the technique shown by @Eric with the GROUP BY in a subquery:



SELECT p1.postid
FROM Posts p1
INNER JOIN (SELECT pi.owneruserid, MAX(pi.postid) AS maxpostid
FROM Posts pi GROUP BY pi.owneruserid) p2
ON (p1.postid = p2.maxpostid)
WHERE p1.owneruserid = 20860;

1 row in set (1 min 17.89 sec)


Even the EXPLAIN analysis takes over 16 seconds:



+----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 76756 | |
| 1 | PRIMARY | p1 | eq_ref | PRIMARY,PostId,OwnerUserId | PRIMARY | 8 | p2.maxpostid | 1 | Using where |
| 2 | DERIVED | pi | index | NULL | OwnerUserId | 8 | NULL | 1151268 | Using index |
+----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
3 rows in set (16.09 sec)


Now produce the same query result using my technique with LEFT JOIN:



SELECT p1.postid
FROM Posts p1 LEFT JOIN posts p2
ON (p1.owneruserid = p2.owneruserid AND p1.postid < p2.postid)
WHERE p2.postid IS NULL AND p1.owneruserid = 20860;

1 row in set (0.28 sec)


The EXPLAIN analysis shows that both tables are able to use their indexes:



+----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
| 1 | SIMPLE | p1 | ref | OwnerUserId | OwnerUserId | 8 | const | 1384 | Using index |
| 1 | SIMPLE | p2 | ref | PRIMARY,PostId,OwnerUserId | OwnerUserId | 8 | const | 1384 | Using where; Using index; Not exists |
+----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
2 rows in set (0.00 sec)




Here's the DDL for my Posts table:



CREATE TABLE `posts` (
`PostId` bigint(20) unsigned NOT NULL auto_increment,
`PostTypeId` bigint(20) unsigned NOT NULL,
`AcceptedAnswerId` bigint(20) unsigned default NULL,
`ParentId` bigint(20) unsigned default NULL,
`CreationDate` datetime NOT NULL,
`Score` int(11) NOT NULL default '0',
`ViewCount` int(11) NOT NULL default '0',
`Body` text NOT NULL,
`OwnerUserId` bigint(20) unsigned NOT NULL,
`OwnerDisplayName` varchar(40) default NULL,
`LastEditorUserId` bigint(20) unsigned default NULL,
`LastEditDate` datetime default NULL,
`LastActivityDate` datetime default NULL,
`Title` varchar(250) NOT NULL default '',
`Tags` varchar(150) NOT NULL default '',
`AnswerCount` int(11) NOT NULL default '0',
`CommentCount` int(11) NOT NULL default '0',
`FavoriteCount` int(11) NOT NULL default '0',
`ClosedDate` datetime default NULL,
PRIMARY KEY (`PostId`),
UNIQUE KEY `PostId` (`PostId`),
KEY `PostTypeId` (`PostTypeId`),
KEY `AcceptedAnswerId` (`AcceptedAnswerId`),
KEY `OwnerUserId` (`OwnerUserId`),
KEY `LastEditorUserId` (`LastEditorUserId`),
KEY `ParentId` (`ParentId`),
CONSTRAINT `posts_ibfk_1` FOREIGN KEY (`PostTypeId`) REFERENCES `posttypes` (`PostTypeId`)
) ENGINE=InnoDB;





share|improve this answer



















  • 7




    Really? What happens if you have a ton of entries? For example, if you're working w/ an in-house version control, say, and you have a ton of versions per file, that join result would be massive. Have you ever benchmarked the subquery method with this one? I'm pretty curious to know which would win, but not curious enough to not ask you first.
    – Eric
    Aug 21 '09 at 18:19






  • 1




    Did some testing. On a small table (~300k records, ~190k groups, so not massive groups or anything), the queries tied (8 seconds each).
    – Eric
    Aug 21 '09 at 18:44






  • 1




    @BillKarwin: See meta.stackexchange.com/questions/123017, especially the comments below Adam Rackis' answer. Let me know if you want to reclaim your answer on the new question.
    – Robert Harvey
    Feb 21 '12 at 18:06








  • 2




    @Tim, no, <= will not help if you have a non-unique column. You must use a unique column as a tiebreaker.
    – Bill Karwin
    Jul 3 '15 at 7:13






  • 1




    The performance degrades exponentially as the number of rows increases or when groups become larger. For example a group consisting of 5 dates will yield 4+3+2+1+1 = 11 rows via left join out of which one row is filtered in the end. Performance of joining with grouped results is almost linear. Your tests look flawed.
    – Salman A
    Oct 16 '15 at 12:12


















up vote
125
down vote













UPD: 2017-03-31, the version 5.7.5 of MySQL made the ONLY_FULL_GROUP_BY switch enabled by default (hence, non-deterministic GROUP BY queries became disabled). Moreover, they updated the GROUP BY implementation and the solution might not work as expected anymore even with the disabled switch. One needs to check.



Bill Karwin's solution above works fine when item count within groups is rather small, but the performance of the query becomes bad when the groups are rather large, since the solution requires about n*n/2 + n/2 of only IS NULL comparisons.



I made my tests on a InnoDB table of 18684446 rows with 1182 groups. The table contains testresults for functional tests and has the (test_id, request_id) as the primary key. Thus, test_id is a group and I was searching for the last request_id for each test_id.



Bill's solution has already been running for several hours on my dell e4310 and I do not know when it is going to finish even though it operates on a coverage index (hence using index in EXPLAIN).



I have a couple of other solutions that are based on the same ideas:




  • if the underlying index is BTREE index (which is usually the case), the largest (group_id, item_value) pair is the last value within each group_id, that is the first for each group_id if we walk through the index in descending order;

  • if we read the values which are covered by an index, the values are read in the order of the index;

  • each index implicitly contains primary key columns appended to that (that is the primary key is in the coverage index). In solutions below I operate directly on the primary key, in you case, you will just need to add primary key columns in the result.

  • in many cases it is much cheaper to collect the required row ids in the required order in a subquery and join the result of the subquery on the id. Since for each row in the subquery result MySQL will need a single fetch based on primary key, the subquery will be put first in the join and the rows will be output in the order of the ids in the subquery (if we omit explicit ORDER BY for the join)


3 ways MySQL uses indexes is a great article to understand some details.



Solution 1



This one is incredibly fast, it takes about 0,8 secs on my 18M+ rows:



SELECT test_id, MAX(request_id), request_id
FROM testresults
GROUP BY test_id DESC;


If you want to change the order to ASC, put it in a subquery, return the ids only and use that as the subquery to join to the rest of the columns:



SELECT test_id, request_id
FROM (
SELECT test_id, MAX(request_id), request_id
FROM testresults
GROUP BY test_id DESC) as ids
ORDER BY test_id;


This one takes about 1,2 secs on my data.



Solution 2



Here is another solution that takes about 19 seconds for my table:



SELECT test_id, request_id
FROM testresults, (SELECT @group:=NULL) as init
WHERE IF(IFNULL(@group, -1)=@group:=test_id, 0, 1)
ORDER BY test_id DESC, request_id DESC


It returns tests in descending order as well. It is much slower since it does a full index scan but it is here to give you an idea how to output N max rows for each group.



The disadvantage of the query is that its result cannot be cached by the query cache.






share|improve this answer























  • A related answer: stackoverflow.com/a/14836418/68998
    – newtover
    Feb 13 '13 at 9:13












  • Please link to a dump of your tables so that people can test it on their platforms.
    – Pacerier
    Feb 3 '15 at 3:44






  • 2




    Solution 1 can't work, you can't select request_id without having that in group by clause,
    – giò
    Mar 9 '17 at 9:57






  • 2




    @giò, this is answer is 5 years old. Until MySQL 5.7.5 ONLY_FULL_GROUP_BY was disabled by default and this solution worked out of the box dev.mysql.com/doc/relnotes/mysql/5.7/en/…. Now I'm not sure if the solution still works when you disable the mode, because the implementation of the GROUP BY has been changed.
    – newtover
    Mar 31 '17 at 14:58










  • If you wanted ASC in the first solution, would it work if you turn MAX to MIN?
    – Jin Izzraeel
    May 9 '17 at 15:45


















up vote
82
down vote













Use your subquery to return the correct grouping, because you're halfway there.



Try this:



select
a.*
from
messages a
inner join
(select name, max(id) as maxid from messages group by name) as b on
a.id = b.maxid


If it's not id you want the max of:



select
a.*
from
messages a
inner join
(select name, max(other_col) as other_col
from messages group by name) as b on
a.name = b.name
and a.other_col = b.other_col


This way, you avoid correlated subqueries and/or ordering in your subqueries, which tend to be very slow/inefficient.






share|improve this answer



















  • 1




    Note a caveat for the solution with other_col: if that column is not unique you may get multiple records back with the same name, if they tie for max(other_col). I found this post that describes a solution for my needs, where I need exactly one record per name.
    – Eric Simonton
    Aug 21 '15 at 13:48












  • In some situations you can only use this solution but ont the accepted one.
    – tom10271
    Sep 4 '15 at 2:59










  • In my experience, it is grouping the whole damn messages table that tends to be slow/inefficient! In other words, note that the subquery requires a full table scan, and does a grouping on that to boot... unless your optimizer is doing something that mine is not. So this solution depends heavily on holding the entire table in memory.
    – Timo
    Apr 30 at 14:56




















up vote
35
down vote













I arrived at a different solution, which is to get the IDs for the last post within each group, then select from the messages table using the result from the first query as the argument for a WHERE x IN construct:



SELECT id, name, other_columns
FROM messages
WHERE id IN (
SELECT MAX(id)
FROM messages
GROUP BY name
);


I don't know how this performs compared to some of the other solutions, but it worked spectacularly for my table with 3+ million rows. (4 second execution with 1200+ results)



This should work both on MySQL and SQL Server.






share|improve this answer





















  • Just make sure you have an index on (name, id).
    – Samuel Åslund
    Apr 22 '16 at 11:58






  • 1




    Much better that self joins
    – anwerj
    Dec 23 '16 at 7:40










  • I learned something from you that is a good job and this query is faster
    – Humphrey
    Feb 23 at 7:48


















up vote
25
down vote













Solution by sub query fiddle Link



select * from messages where id in
(select max(id) from messages group by Name)


Solution By join condition fiddle link



select m1.* from messages m1 
left outer join messages m2
on ( m1.id<m2.id and m1.name=m2.name )
where m2.id is null


Reason for this post is to give fiddle link only.
Same SQL is already provided in other answers.






share|improve this answer





















  • What's the point of the 'fiddle' if you can't run it?
    – Alexander Suraphel
    Jul 4 at 9:41










  • @AlexanderSuraphel mysql5.5 is not available in fiddle now, fiddle link was created using that. Now a days fiddle supports mysql5.6, i changed database to mysql 5.6 and i am able to build schema and run the sql.
    – Vipin
    Jul 4 at 17:21


















up vote
7
down vote













I've not yet tested with large DB but I think this could be faster than joining tables:



SELECT *, Max(Id) FROM messages GROUP BY Name





share|improve this answer



















  • 4




    This returns arbitrary data. In other words there returned columns might not be from the record with MAX(Id).
    – harm
    Jul 3 '14 at 15:05










  • Useful to select the max Id from a set of record with WHERE condition : "SELECT Max(Id) FROM Prod WHERE Pn='" + Pn + "'" It returns the max Id from a set of records with same Pn.In c# use reader.GetString(0) to get the result
    – Nicola
    Apr 8 '15 at 9:24




















up vote
5
down vote













Here are two suggestions. First, if mysql supports ROW_NUMBER(), it's very simple:



WITH Ranked AS (
SELECT Id, Name, OtherColumns,
ROW_NUMBER() OVER (
PARTITION BY Name
ORDER BY Id DESC
) AS rk
FROM messages
)
SELECT Id, Name, OtherColumns
FROM messages
WHERE rk = 1;


I'm assuming by "last" you mean last in Id order. If not, change the ORDER BY clause of the ROW_NUMBER() window accordingly. If ROW_NUMBER() isn't available, this is another solution:



Second, if it doesn't, this is often a good way to proceed:



SELECT
Id, Name, OtherColumns
FROM messages
WHERE NOT EXISTS (
SELECT * FROM messages as M2
WHERE M2.Name = messages.Name
AND M2.Id > messages.Id
)


In other words, select messages where there is no later-Id message with the same Name.






share|improve this answer

















  • 8




    MySQL doesn't support ROW_NUMBER() or CTE's.
    – Bill Karwin
    Aug 21 '09 at 17:37


















up vote
5
down vote













Here is my solution:



SELECT 
DISTINCT NAME,
MAX(MESSAGES) OVER(PARTITION BY NAME) MESSAGES
FROM MESSAGE;





share|improve this answer






























    up vote
    4
    down vote













    Here is another way to get the last related record using GROUP_CONCAT with order by and SUBSTRING_INDEX to pick one of the record from the list



    SELECT 
    `Id`,
    `Name`,
    SUBSTRING_INDEX(
    GROUP_CONCAT(
    `Other_Columns`
    ORDER BY `Id` DESC
    SEPARATOR '||'
    ),
    '||',
    1
    ) Other_Columns
    FROM
    messages
    GROUP BY `Name`


    Above query will group the all the Other_Columns that are in same Name group and using ORDER BY id DESC will join all the Other_Columns in a specific group in descending order with the provided separator in my case i have used || ,using SUBSTRING_INDEX over this list will pick the first one



    Fiddle Demo






    share|improve this answer






























      up vote
      4
      down vote













      SELECT 
      column1,
      column2
      FROM
      table_name
      WHERE id IN
      (SELECT
      MAX(id)
      FROM
      table_name
      GROUP BY column1)
      ORDER BY column1 ;





      share|improve this answer























      • Could you elaborate a bit on your answer? Why is your query preferrable to Vijays original query?
        – janfoeh
        May 4 '14 at 11:57


















      up vote
      3
      down vote













      Try this:



      SELECT jos_categories.title AS name,
      joined .catid,
      joined .title,
      joined .introtext
      FROM jos_categories
      INNER JOIN (SELECT *
      FROM (SELECT `title`,
      catid,
      `created`,
      introtext
      FROM `jos_content`
      WHERE `sectionid` = 6
      ORDER BY `id` DESC) AS yes
      GROUP BY `yes`.`catid` DESC
      ORDER BY `yes`.`created` DESC) AS joined
      ON( joined.catid = jos_categories.id )





      share|improve this answer






























        up vote
        3
        down vote













        You can take view from here as well.



        http://sqlfiddle.com/#!9/ef42b/9



        FIRST SOLUTION



        SELECT d1.ID,Name,City FROM Demo_User d1
        INNER JOIN
        (SELECT MAX(ID) AS ID FROM Demo_User GROUP By NAME) AS P ON (d1.ID=P.ID);


        SECOND SOLUTION



        SELECT * FROM (SELECT * FROM Demo_User ORDER BY ID DESC) AS T GROUP BY NAME ;





        share|improve this answer





















        • Second Solution doesn't work for my case
          – dikirill
          Apr 28 '17 at 18:41


















        up vote
        2
        down vote













        Is there any way we could use this method to delete duplicates in a table? The result set is basically a collection of unique records, so if we could delete all records not in the result set, we would effectively have no duplicates? I tried this but mySQL gave a 1093 error.



        DELETE FROM messages WHERE id NOT IN
        (SELECT m1.id
        FROM messages m1 LEFT JOIN messages m2
        ON (m1.name = m2.name AND m1.id < m2.id)
        WHERE m2.id IS NULL)


        Is there a way to maybe save the output to a temp variable then delete from NOT IN (temp variable)? @Bill thanks for a very useful solution.



        EDIT: Think i found the solution:



        DROP TABLE IF EXISTS UniqueIDs; 
        CREATE Temporary table UniqueIDs (id Int(11));

        INSERT INTO UniqueIDs
        (SELECT T1.ID FROM Table T1 LEFT JOIN Table T2 ON
        (T1.Field1 = T2.Field1 AND T1.Field2 = T2.Field2 #Comparison Fields
        AND T1.ID < T2.ID)
        WHERE T2.ID IS NULL);

        DELETE FROM Table WHERE id NOT IN (SELECT ID FROM UniqueIDs);





        share|improve this answer






























          up vote
          2
          down vote













          The below query will work fine as per your question.



          SELECT M1.* 
          FROM MESSAGES M1,
          (
          SELECT SUBSTR(Others_data,1,2),MAX(Others_data) AS Max_Others_data
          FROM MESSAGES
          GROUP BY 1
          ) M2
          WHERE M1.Others_data = M2.Max_Others_data
          ORDER BY Others_data;





          share|improve this answer






























            up vote
            2
            down vote













            Hi @Vijay Dev if your table messages contains Id which is auto increment primary key then to fetch the latest record basis on the primary key your query should read as below:



            SELECT m1.* FROM messages m1 INNER JOIN (SELECT max(Id) as lastmsgId FROM messages GROUP BY Name) m2 ON m1.Id=m2.lastmsgId





            share|improve this answer




























              up vote
              2
              down vote













              If you want the last row for each Name, then you can give a row number to each row group by the Name and order by Id in descending order.



              QUERY



              SELECT t1.Id, 
              t1.Name,
              t1.Other_Columns
              FROM
              (
              SELECT Id,
              Name,
              Other_Columns,
              (
              CASE Name WHEN @curA
              THEN @curRow := @curRow + 1
              ELSE @curRow := 1 AND @curA := Name END
              ) + 1 AS rn
              FROM messages t,
              (SELECT @curRow := 0, @curA := '') r
              ORDER BY Name,Id DESC
              )t1
              WHERE t1.rn = 1
              ORDER BY t1.Id;


              SQL Fiddle






              share|improve this answer




























                up vote
                2
                down vote













                An approach with considerable speed is as follows.



                SELECT * 
                FROM messages a
                WHERE Id = (SELECT MAX(Id) FROM messages WHERE a.Name = Name)


                Result



                Id  Name    Other_Columns
                3 A A_data_3
                5 B B_data_2
                6 C C_data_1





                share|improve this answer




























                  up vote
                  2
                  down vote













                  Clearly there are lots of different ways of getting the same results, your question seems to be what is an efficient way of getting the last results in each group in MySQL. If you are working with huge amounts of data and assuming you are using InnoDB with even the latest versions of MySQL (such as 5.7.21 and 8.0.4-rc) then there might not be an efficient way of doing this.



                  We sometimes need to do this with tables with even more than 60 million rows.



                  For these examples I will use data with only about 1.5 million rows where the queries would need to find results for all groups in the data. In our actual cases we would often need to return back data from about 2,000 groups (which hypothetically would not require examining very much of the data).



                  I will use the following tables:



                  CREATE TABLE temperature(
                  id INT UNSIGNED NOT NULL AUTO_INCREMENT,
                  groupID INT UNSIGNED NOT NULL,
                  recordedTimestamp TIMESTAMP NOT NULL,
                  recordedValue INT NOT NULL,
                  INDEX groupIndex(groupID, recordedTimestamp),
                  PRIMARY KEY (id)
                  );

                  CREATE TEMPORARY TABLE selected_group(id INT UNSIGNED NOT NULL, PRIMARY KEY(id));


                  The temperature table is populated with about 1.5 million random records, and with 100 different groups.
                  The selected_group is populated with those 100 groups (in our cases this would normally be less than 20% for all of the groups).



                  As this data is random it means that multiple rows can have the same recordedTimestamps. What we want is to get a list of all of the selected groups in order of groupID with the last recordedTimestamp for each group, and if the same group has more than one matching row like that then the last matching id of those rows.



                  If hypothetically MySQL had a last() function which returned values from the last row in a special ORDER BY clause then we could simply do:



                  SELECT 
                  last(t1.id) AS id,
                  t1.groupID,
                  last(t1.recordedTimestamp) AS recordedTimestamp,
                  last(t1.recordedValue) AS recordedValue
                  FROM selected_group g
                  INNER JOIN temperature t1 ON t1.groupID = g.id
                  ORDER BY t1.recordedTimestamp, t1.id
                  GROUP BY t1.groupID;


                  which would only need to examine a few 100 rows in this case as it doesn't use any of the normal GROUP BY functions. This would execute in 0 seconds and hence be highly efficient.
                  Note that normally in MySQL we would see an ORDER BY clause following the GROUP BY clause however this ORDER BY clause is used to determine the ORDER for the last() function, if it was after the GROUP BY then it would be ordering the GROUPS. If no GROUP BY clause is present then the last values will be the same in all of the returned rows.



                  However MySQL does not have this so let's look at different ideas of what it does have and prove that none of these are efficient.



                  Example 1



                  SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
                  FROM selected_group g
                  INNER JOIN temperature t1 ON t1.id = (
                  SELECT t2.id
                  FROM temperature t2
                  WHERE t2.groupID = g.id
                  ORDER BY t2.recordedTimestamp DESC, t2.id DESC
                  LIMIT 1
                  );


                  This examined 3,009,254 rows and took ~0.859 seconds on 5.7.21 and slightly longer on 8.0.4-rc



                  Example 2



                  SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 
                  FROM temperature t1
                  INNER JOIN (
                  SELECT max(t2.id) AS id
                  FROM temperature t2
                  INNER JOIN (
                  SELECT t3.groupID, max(t3.recordedTimestamp) AS recordedTimestamp
                  FROM selected_group g
                  INNER JOIN temperature t3 ON t3.groupID = g.id
                  GROUP BY t3.groupID
                  ) t4 ON t4.groupID = t2.groupID AND t4.recordedTimestamp = t2.recordedTimestamp
                  GROUP BY t2.groupID
                  ) t5 ON t5.id = t1.id;


                  This examined 1,505,331 rows and took ~1.25 seconds on 5.7.21 and slightly longer on 8.0.4-rc



                  Example 3



                  SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 
                  FROM temperature t1
                  WHERE t1.id IN (
                  SELECT max(t2.id) AS id
                  FROM temperature t2
                  INNER JOIN (
                  SELECT t3.groupID, max(t3.recordedTimestamp) AS recordedTimestamp
                  FROM selected_group g
                  INNER JOIN temperature t3 ON t3.groupID = g.id
                  GROUP BY t3.groupID
                  ) t4 ON t4.groupID = t2.groupID AND t4.recordedTimestamp = t2.recordedTimestamp
                  GROUP BY t2.groupID
                  )
                  ORDER BY t1.groupID;


                  This examined 3,009,685 rows and took ~1.95 seconds on 5.7.21 and slightly longer on 8.0.4-rc



                  Example 4



                  SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
                  FROM selected_group g
                  INNER JOIN temperature t1 ON t1.id = (
                  SELECT max(t2.id)
                  FROM temperature t2
                  WHERE t2.groupID = g.id AND t2.recordedTimestamp = (
                  SELECT max(t3.recordedTimestamp)
                  FROM temperature t3
                  WHERE t3.groupID = g.id
                  )
                  );


                  This examined 6,137,810 rows and took ~2.2 seconds on 5.7.21 and slightly longer on 8.0.4-rc



                  Example 5



                  SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
                  FROM (
                  SELECT
                  t2.id,
                  t2.groupID,
                  t2.recordedTimestamp,
                  t2.recordedValue,
                  row_number() OVER (
                  PARTITION BY t2.groupID ORDER BY t2.recordedTimestamp DESC, t2.id DESC
                  ) AS rowNumber
                  FROM selected_group g
                  INNER JOIN temperature t2 ON t2.groupID = g.id
                  ) t1 WHERE t1.rowNumber = 1;


                  This examined 6,017,808 rows and took ~4.2 seconds on 8.0.4-rc



                  Example 6



                  SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 
                  FROM (
                  SELECT
                  last_value(t2.id) OVER w AS id,
                  t2.groupID,
                  last_value(t2.recordedTimestamp) OVER w AS recordedTimestamp,
                  last_value(t2.recordedValue) OVER w AS recordedValue
                  FROM selected_group g
                  INNER JOIN temperature t2 ON t2.groupID = g.id
                  WINDOW w AS (
                  PARTITION BY t2.groupID
                  ORDER BY t2.recordedTimestamp, t2.id
                  RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
                  )
                  ) t1
                  GROUP BY t1.groupID;


                  This examined 6,017,908 rows and took ~17.5 seconds on 8.0.4-rc



                  Example 7



                  SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 
                  FROM selected_group g
                  INNER JOIN temperature t1 ON t1.groupID = g.id
                  LEFT JOIN temperature t2
                  ON t2.groupID = g.id
                  AND (
                  t2.recordedTimestamp > t1.recordedTimestamp
                  OR (t2.recordedTimestamp = t1.recordedTimestamp AND t2.id > t1.id)
                  )
                  WHERE t2.id IS NULL
                  ORDER BY t1.groupID;


                  This one was taking forever so I had to kill it.






                  share|improve this answer






























                    up vote
                    1
                    down vote













                    select * from messages group by name desc





                    share|improve this answer























                    • this works fine! see here also stackoverflow.com/questions/1313120/…
                      – user2241289
                      Feb 12 '17 at 18:45




















                    up vote
                    1
                    down vote













                    How about this:



                    SELECT DISTINCT ON (name) *
                    FROM messages
                    ORDER BY name, id DESC;


                    I had similar issue (on postgresql tough) and on a 1M records table. This solution takes 1.7s vs 44s produced by the one with LEFT JOIN.
                    In my case I had to filter the corrispondant of your name field against NULL values, resulting in even better performances by 0.2 secs






                    share|improve this answer




























                      up vote
                      0
                      down vote













                      If performance is really your concern you can introduce a new column on the table called IsLastInGroup of type BIT.



                      Set it to true on the columns which are last and maintain it with every row insert/update/delete. Writes will be slower, but you'll benefit on reads. It depends on your use case and I recommend it only if you're read-focused.



                      So your query will look like:



                      SELECT * FROM Messages WHERE IsLastInGroup = 1





                      share|improve this answer




















                        protected by Community Mar 30 '12 at 9:58



                        Thank you for your interest in this question.
                        Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).



                        Would you like to answer one of these unanswered questions instead?














                        21 Answers
                        21






                        active

                        oldest

                        votes








                        21 Answers
                        21






                        active

                        oldest

                        votes









                        active

                        oldest

                        votes






                        active

                        oldest

                        votes








                        up vote
                        750
                        down vote



                        accepted










                        MySQL 8.0 now supports windowing functions, like almost all popular SQL implementations. With this standard syntax, we can write greatest-n-per-group queries:



                        WITH ranked_messages AS (
                        SELECT m.*, ROW_NUMBER() OVER (PARTITION BY name ORDER BY id DESC) AS rn
                        FROM messages AS m
                        )
                        SELECT * FROM ranked_messages WHERE rn = 1;


                        Below is the original answer I wrote for this question in 2009:





                        I write the solution this way:



                        SELECT m1.*
                        FROM messages m1 LEFT JOIN messages m2
                        ON (m1.name = m2.name AND m1.id < m2.id)
                        WHERE m2.id IS NULL;


                        Regarding performance, one solution or the other can be better, depending on the nature of your data. So you should test both queries and use the one that is better at performance given your database.



                        For example, I have a copy of the StackOverflow August data dump. I'll use that for benchmarking. There are 1,114,357 rows in the Posts table. This is running on MySQL 5.0.75 on my Macbook Pro 2.40GHz.



                        I'll write a query to find the most recent post for a given user ID (mine).



                        First using the technique shown by @Eric with the GROUP BY in a subquery:



                        SELECT p1.postid
                        FROM Posts p1
                        INNER JOIN (SELECT pi.owneruserid, MAX(pi.postid) AS maxpostid
                        FROM Posts pi GROUP BY pi.owneruserid) p2
                        ON (p1.postid = p2.maxpostid)
                        WHERE p1.owneruserid = 20860;

                        1 row in set (1 min 17.89 sec)


                        Even the EXPLAIN analysis takes over 16 seconds:



                        +----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
                        | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
                        +----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
                        | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 76756 | |
                        | 1 | PRIMARY | p1 | eq_ref | PRIMARY,PostId,OwnerUserId | PRIMARY | 8 | p2.maxpostid | 1 | Using where |
                        | 2 | DERIVED | pi | index | NULL | OwnerUserId | 8 | NULL | 1151268 | Using index |
                        +----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
                        3 rows in set (16.09 sec)


                        Now produce the same query result using my technique with LEFT JOIN:



                        SELECT p1.postid
                        FROM Posts p1 LEFT JOIN posts p2
                        ON (p1.owneruserid = p2.owneruserid AND p1.postid < p2.postid)
                        WHERE p2.postid IS NULL AND p1.owneruserid = 20860;

                        1 row in set (0.28 sec)


                        The EXPLAIN analysis shows that both tables are able to use their indexes:



                        +----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
                        | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
                        +----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
                        | 1 | SIMPLE | p1 | ref | OwnerUserId | OwnerUserId | 8 | const | 1384 | Using index |
                        | 1 | SIMPLE | p2 | ref | PRIMARY,PostId,OwnerUserId | OwnerUserId | 8 | const | 1384 | Using where; Using index; Not exists |
                        +----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
                        2 rows in set (0.00 sec)




                        Here's the DDL for my Posts table:



                        CREATE TABLE `posts` (
                        `PostId` bigint(20) unsigned NOT NULL auto_increment,
                        `PostTypeId` bigint(20) unsigned NOT NULL,
                        `AcceptedAnswerId` bigint(20) unsigned default NULL,
                        `ParentId` bigint(20) unsigned default NULL,
                        `CreationDate` datetime NOT NULL,
                        `Score` int(11) NOT NULL default '0',
                        `ViewCount` int(11) NOT NULL default '0',
                        `Body` text NOT NULL,
                        `OwnerUserId` bigint(20) unsigned NOT NULL,
                        `OwnerDisplayName` varchar(40) default NULL,
                        `LastEditorUserId` bigint(20) unsigned default NULL,
                        `LastEditDate` datetime default NULL,
                        `LastActivityDate` datetime default NULL,
                        `Title` varchar(250) NOT NULL default '',
                        `Tags` varchar(150) NOT NULL default '',
                        `AnswerCount` int(11) NOT NULL default '0',
                        `CommentCount` int(11) NOT NULL default '0',
                        `FavoriteCount` int(11) NOT NULL default '0',
                        `ClosedDate` datetime default NULL,
                        PRIMARY KEY (`PostId`),
                        UNIQUE KEY `PostId` (`PostId`),
                        KEY `PostTypeId` (`PostTypeId`),
                        KEY `AcceptedAnswerId` (`AcceptedAnswerId`),
                        KEY `OwnerUserId` (`OwnerUserId`),
                        KEY `LastEditorUserId` (`LastEditorUserId`),
                        KEY `ParentId` (`ParentId`),
                        CONSTRAINT `posts_ibfk_1` FOREIGN KEY (`PostTypeId`) REFERENCES `posttypes` (`PostTypeId`)
                        ) ENGINE=InnoDB;





                        share|improve this answer



















                        • 7




                          Really? What happens if you have a ton of entries? For example, if you're working w/ an in-house version control, say, and you have a ton of versions per file, that join result would be massive. Have you ever benchmarked the subquery method with this one? I'm pretty curious to know which would win, but not curious enough to not ask you first.
                          – Eric
                          Aug 21 '09 at 18:19






                        • 1




                          Did some testing. On a small table (~300k records, ~190k groups, so not massive groups or anything), the queries tied (8 seconds each).
                          – Eric
                          Aug 21 '09 at 18:44






                        • 1




                          @BillKarwin: See meta.stackexchange.com/questions/123017, especially the comments below Adam Rackis' answer. Let me know if you want to reclaim your answer on the new question.
                          – Robert Harvey
                          Feb 21 '12 at 18:06








                        • 2




                          @Tim, no, <= will not help if you have a non-unique column. You must use a unique column as a tiebreaker.
                          – Bill Karwin
                          Jul 3 '15 at 7:13






                        • 1




                          The performance degrades exponentially as the number of rows increases or when groups become larger. For example a group consisting of 5 dates will yield 4+3+2+1+1 = 11 rows via left join out of which one row is filtered in the end. Performance of joining with grouped results is almost linear. Your tests look flawed.
                          – Salman A
                          Oct 16 '15 at 12:12















                        up vote
                        750
                        down vote



                        accepted










                        MySQL 8.0 now supports windowing functions, like almost all popular SQL implementations. With this standard syntax, we can write greatest-n-per-group queries:



                        WITH ranked_messages AS (
                        SELECT m.*, ROW_NUMBER() OVER (PARTITION BY name ORDER BY id DESC) AS rn
                        FROM messages AS m
                        )
                        SELECT * FROM ranked_messages WHERE rn = 1;


                        Below is the original answer I wrote for this question in 2009:





                        I write the solution this way:



                        SELECT m1.*
                        FROM messages m1 LEFT JOIN messages m2
                        ON (m1.name = m2.name AND m1.id < m2.id)
                        WHERE m2.id IS NULL;


                        Regarding performance, one solution or the other can be better, depending on the nature of your data. So you should test both queries and use the one that is better at performance given your database.



                        For example, I have a copy of the StackOverflow August data dump. I'll use that for benchmarking. There are 1,114,357 rows in the Posts table. This is running on MySQL 5.0.75 on my Macbook Pro 2.40GHz.



                        I'll write a query to find the most recent post for a given user ID (mine).



                        First using the technique shown by @Eric with the GROUP BY in a subquery:



                        SELECT p1.postid
                        FROM Posts p1
                        INNER JOIN (SELECT pi.owneruserid, MAX(pi.postid) AS maxpostid
                        FROM Posts pi GROUP BY pi.owneruserid) p2
                        ON (p1.postid = p2.maxpostid)
                        WHERE p1.owneruserid = 20860;

                        1 row in set (1 min 17.89 sec)


                        Even the EXPLAIN analysis takes over 16 seconds:



                        +----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
                        | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
                        +----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
                        | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 76756 | |
                        | 1 | PRIMARY | p1 | eq_ref | PRIMARY,PostId,OwnerUserId | PRIMARY | 8 | p2.maxpostid | 1 | Using where |
                        | 2 | DERIVED | pi | index | NULL | OwnerUserId | 8 | NULL | 1151268 | Using index |
                        +----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
                        3 rows in set (16.09 sec)


                        Now produce the same query result using my technique with LEFT JOIN:



                        SELECT p1.postid
                        FROM Posts p1 LEFT JOIN posts p2
                        ON (p1.owneruserid = p2.owneruserid AND p1.postid < p2.postid)
                        WHERE p2.postid IS NULL AND p1.owneruserid = 20860;

                        1 row in set (0.28 sec)


                        The EXPLAIN analysis shows that both tables are able to use their indexes:



                        +----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
                        | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
                        +----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
                        | 1 | SIMPLE | p1 | ref | OwnerUserId | OwnerUserId | 8 | const | 1384 | Using index |
                        | 1 | SIMPLE | p2 | ref | PRIMARY,PostId,OwnerUserId | OwnerUserId | 8 | const | 1384 | Using where; Using index; Not exists |
                        +----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
                        2 rows in set (0.00 sec)




                        Here's the DDL for my Posts table:



                        CREATE TABLE `posts` (
                        `PostId` bigint(20) unsigned NOT NULL auto_increment,
                        `PostTypeId` bigint(20) unsigned NOT NULL,
                        `AcceptedAnswerId` bigint(20) unsigned default NULL,
                        `ParentId` bigint(20) unsigned default NULL,
                        `CreationDate` datetime NOT NULL,
                        `Score` int(11) NOT NULL default '0',
                        `ViewCount` int(11) NOT NULL default '0',
                        `Body` text NOT NULL,
                        `OwnerUserId` bigint(20) unsigned NOT NULL,
                        `OwnerDisplayName` varchar(40) default NULL,
                        `LastEditorUserId` bigint(20) unsigned default NULL,
                        `LastEditDate` datetime default NULL,
                        `LastActivityDate` datetime default NULL,
                        `Title` varchar(250) NOT NULL default '',
                        `Tags` varchar(150) NOT NULL default '',
                        `AnswerCount` int(11) NOT NULL default '0',
                        `CommentCount` int(11) NOT NULL default '0',
                        `FavoriteCount` int(11) NOT NULL default '0',
                        `ClosedDate` datetime default NULL,
                        PRIMARY KEY (`PostId`),
                        UNIQUE KEY `PostId` (`PostId`),
                        KEY `PostTypeId` (`PostTypeId`),
                        KEY `AcceptedAnswerId` (`AcceptedAnswerId`),
                        KEY `OwnerUserId` (`OwnerUserId`),
                        KEY `LastEditorUserId` (`LastEditorUserId`),
                        KEY `ParentId` (`ParentId`),
                        CONSTRAINT `posts_ibfk_1` FOREIGN KEY (`PostTypeId`) REFERENCES `posttypes` (`PostTypeId`)
                        ) ENGINE=InnoDB;





                        share|improve this answer



















                        • 7




                          Really? What happens if you have a ton of entries? For example, if you're working w/ an in-house version control, say, and you have a ton of versions per file, that join result would be massive. Have you ever benchmarked the subquery method with this one? I'm pretty curious to know which would win, but not curious enough to not ask you first.
                          – Eric
                          Aug 21 '09 at 18:19






                        • 1




                          Did some testing. On a small table (~300k records, ~190k groups, so not massive groups or anything), the queries tied (8 seconds each).
                          – Eric
                          Aug 21 '09 at 18:44






                        • 1




                          @BillKarwin: See meta.stackexchange.com/questions/123017, especially the comments below Adam Rackis' answer. Let me know if you want to reclaim your answer on the new question.
                          – Robert Harvey
                          Feb 21 '12 at 18:06








                        • 2




                          @Tim, no, <= will not help if you have a non-unique column. You must use a unique column as a tiebreaker.
                          – Bill Karwin
                          Jul 3 '15 at 7:13






                        • 1




                          The performance degrades exponentially as the number of rows increases or when groups become larger. For example a group consisting of 5 dates will yield 4+3+2+1+1 = 11 rows via left join out of which one row is filtered in the end. Performance of joining with grouped results is almost linear. Your tests look flawed.
                          – Salman A
                          Oct 16 '15 at 12:12













                        up vote
                        750
                        down vote



                        accepted







                        up vote
                        750
                        down vote



                        accepted






                        MySQL 8.0 now supports windowing functions, like almost all popular SQL implementations. With this standard syntax, we can write greatest-n-per-group queries:



                        WITH ranked_messages AS (
                        SELECT m.*, ROW_NUMBER() OVER (PARTITION BY name ORDER BY id DESC) AS rn
                        FROM messages AS m
                        )
                        SELECT * FROM ranked_messages WHERE rn = 1;


                        Below is the original answer I wrote for this question in 2009:





                        I write the solution this way:



                        SELECT m1.*
                        FROM messages m1 LEFT JOIN messages m2
                        ON (m1.name = m2.name AND m1.id < m2.id)
                        WHERE m2.id IS NULL;


                        Regarding performance, one solution or the other can be better, depending on the nature of your data. So you should test both queries and use the one that is better at performance given your database.



                        For example, I have a copy of the StackOverflow August data dump. I'll use that for benchmarking. There are 1,114,357 rows in the Posts table. This is running on MySQL 5.0.75 on my Macbook Pro 2.40GHz.



                        I'll write a query to find the most recent post for a given user ID (mine).



                        First using the technique shown by @Eric with the GROUP BY in a subquery:



                        SELECT p1.postid
                        FROM Posts p1
                        INNER JOIN (SELECT pi.owneruserid, MAX(pi.postid) AS maxpostid
                        FROM Posts pi GROUP BY pi.owneruserid) p2
                        ON (p1.postid = p2.maxpostid)
                        WHERE p1.owneruserid = 20860;

                        1 row in set (1 min 17.89 sec)


                        Even the EXPLAIN analysis takes over 16 seconds:



                        +----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
                        | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
                        +----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
                        | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 76756 | |
                        | 1 | PRIMARY | p1 | eq_ref | PRIMARY,PostId,OwnerUserId | PRIMARY | 8 | p2.maxpostid | 1 | Using where |
                        | 2 | DERIVED | pi | index | NULL | OwnerUserId | 8 | NULL | 1151268 | Using index |
                        +----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
                        3 rows in set (16.09 sec)


                        Now produce the same query result using my technique with LEFT JOIN:



                        SELECT p1.postid
                        FROM Posts p1 LEFT JOIN posts p2
                        ON (p1.owneruserid = p2.owneruserid AND p1.postid < p2.postid)
                        WHERE p2.postid IS NULL AND p1.owneruserid = 20860;

                        1 row in set (0.28 sec)


                        The EXPLAIN analysis shows that both tables are able to use their indexes:



                        +----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
                        | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
                        +----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
                        | 1 | SIMPLE | p1 | ref | OwnerUserId | OwnerUserId | 8 | const | 1384 | Using index |
                        | 1 | SIMPLE | p2 | ref | PRIMARY,PostId,OwnerUserId | OwnerUserId | 8 | const | 1384 | Using where; Using index; Not exists |
                        +----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
                        2 rows in set (0.00 sec)




                        Here's the DDL for my Posts table:



                        CREATE TABLE `posts` (
                        `PostId` bigint(20) unsigned NOT NULL auto_increment,
                        `PostTypeId` bigint(20) unsigned NOT NULL,
                        `AcceptedAnswerId` bigint(20) unsigned default NULL,
                        `ParentId` bigint(20) unsigned default NULL,
                        `CreationDate` datetime NOT NULL,
                        `Score` int(11) NOT NULL default '0',
                        `ViewCount` int(11) NOT NULL default '0',
                        `Body` text NOT NULL,
                        `OwnerUserId` bigint(20) unsigned NOT NULL,
                        `OwnerDisplayName` varchar(40) default NULL,
                        `LastEditorUserId` bigint(20) unsigned default NULL,
                        `LastEditDate` datetime default NULL,
                        `LastActivityDate` datetime default NULL,
                        `Title` varchar(250) NOT NULL default '',
                        `Tags` varchar(150) NOT NULL default '',
                        `AnswerCount` int(11) NOT NULL default '0',
                        `CommentCount` int(11) NOT NULL default '0',
                        `FavoriteCount` int(11) NOT NULL default '0',
                        `ClosedDate` datetime default NULL,
                        PRIMARY KEY (`PostId`),
                        UNIQUE KEY `PostId` (`PostId`),
                        KEY `PostTypeId` (`PostTypeId`),
                        KEY `AcceptedAnswerId` (`AcceptedAnswerId`),
                        KEY `OwnerUserId` (`OwnerUserId`),
                        KEY `LastEditorUserId` (`LastEditorUserId`),
                        KEY `ParentId` (`ParentId`),
                        CONSTRAINT `posts_ibfk_1` FOREIGN KEY (`PostTypeId`) REFERENCES `posttypes` (`PostTypeId`)
                        ) ENGINE=InnoDB;





                        share|improve this answer














                        MySQL 8.0 now supports windowing functions, like almost all popular SQL implementations. With this standard syntax, we can write greatest-n-per-group queries:



                        WITH ranked_messages AS (
                        SELECT m.*, ROW_NUMBER() OVER (PARTITION BY name ORDER BY id DESC) AS rn
                        FROM messages AS m
                        )
                        SELECT * FROM ranked_messages WHERE rn = 1;


                        Below is the original answer I wrote for this question in 2009:





                        I write the solution this way:



                        SELECT m1.*
                        FROM messages m1 LEFT JOIN messages m2
                        ON (m1.name = m2.name AND m1.id < m2.id)
                        WHERE m2.id IS NULL;


                        Regarding performance, one solution or the other can be better, depending on the nature of your data. So you should test both queries and use the one that is better at performance given your database.



                        For example, I have a copy of the StackOverflow August data dump. I'll use that for benchmarking. There are 1,114,357 rows in the Posts table. This is running on MySQL 5.0.75 on my Macbook Pro 2.40GHz.



                        I'll write a query to find the most recent post for a given user ID (mine).



                        First using the technique shown by @Eric with the GROUP BY in a subquery:



                        SELECT p1.postid
                        FROM Posts p1
                        INNER JOIN (SELECT pi.owneruserid, MAX(pi.postid) AS maxpostid
                        FROM Posts pi GROUP BY pi.owneruserid) p2
                        ON (p1.postid = p2.maxpostid)
                        WHERE p1.owneruserid = 20860;

                        1 row in set (1 min 17.89 sec)


                        Even the EXPLAIN analysis takes over 16 seconds:



                        +----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
                        | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
                        +----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
                        | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 76756 | |
                        | 1 | PRIMARY | p1 | eq_ref | PRIMARY,PostId,OwnerUserId | PRIMARY | 8 | p2.maxpostid | 1 | Using where |
                        | 2 | DERIVED | pi | index | NULL | OwnerUserId | 8 | NULL | 1151268 | Using index |
                        +----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
                        3 rows in set (16.09 sec)


                        Now produce the same query result using my technique with LEFT JOIN:



                        SELECT p1.postid
                        FROM Posts p1 LEFT JOIN posts p2
                        ON (p1.owneruserid = p2.owneruserid AND p1.postid < p2.postid)
                        WHERE p2.postid IS NULL AND p1.owneruserid = 20860;

                        1 row in set (0.28 sec)


                        The EXPLAIN analysis shows that both tables are able to use their indexes:



                        +----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
                        | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
                        +----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
                        | 1 | SIMPLE | p1 | ref | OwnerUserId | OwnerUserId | 8 | const | 1384 | Using index |
                        | 1 | SIMPLE | p2 | ref | PRIMARY,PostId,OwnerUserId | OwnerUserId | 8 | const | 1384 | Using where; Using index; Not exists |
                        +----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
                        2 rows in set (0.00 sec)




                        Here's the DDL for my Posts table:



                        CREATE TABLE `posts` (
                        `PostId` bigint(20) unsigned NOT NULL auto_increment,
                        `PostTypeId` bigint(20) unsigned NOT NULL,
                        `AcceptedAnswerId` bigint(20) unsigned default NULL,
                        `ParentId` bigint(20) unsigned default NULL,
                        `CreationDate` datetime NOT NULL,
                        `Score` int(11) NOT NULL default '0',
                        `ViewCount` int(11) NOT NULL default '0',
                        `Body` text NOT NULL,
                        `OwnerUserId` bigint(20) unsigned NOT NULL,
                        `OwnerDisplayName` varchar(40) default NULL,
                        `LastEditorUserId` bigint(20) unsigned default NULL,
                        `LastEditDate` datetime default NULL,
                        `LastActivityDate` datetime default NULL,
                        `Title` varchar(250) NOT NULL default '',
                        `Tags` varchar(150) NOT NULL default '',
                        `AnswerCount` int(11) NOT NULL default '0',
                        `CommentCount` int(11) NOT NULL default '0',
                        `FavoriteCount` int(11) NOT NULL default '0',
                        `ClosedDate` datetime default NULL,
                        PRIMARY KEY (`PostId`),
                        UNIQUE KEY `PostId` (`PostId`),
                        KEY `PostTypeId` (`PostTypeId`),
                        KEY `AcceptedAnswerId` (`AcceptedAnswerId`),
                        KEY `OwnerUserId` (`OwnerUserId`),
                        KEY `LastEditorUserId` (`LastEditorUserId`),
                        KEY `ParentId` (`ParentId`),
                        CONSTRAINT `posts_ibfk_1` FOREIGN KEY (`PostTypeId`) REFERENCES `posttypes` (`PostTypeId`)
                        ) ENGINE=InnoDB;






                        share|improve this answer














                        share|improve this answer



                        share|improve this answer








                        edited Dec 26 '17 at 20:38

























                        answered Aug 21 '09 at 17:39









                        Bill Karwin

                        369k60506663




                        369k60506663








                        • 7




                          Really? What happens if you have a ton of entries? For example, if you're working w/ an in-house version control, say, and you have a ton of versions per file, that join result would be massive. Have you ever benchmarked the subquery method with this one? I'm pretty curious to know which would win, but not curious enough to not ask you first.
                          – Eric
                          Aug 21 '09 at 18:19






                        • 1




                          Did some testing. On a small table (~300k records, ~190k groups, so not massive groups or anything), the queries tied (8 seconds each).
                          – Eric
                          Aug 21 '09 at 18:44






                        • 1




                          @BillKarwin: See meta.stackexchange.com/questions/123017, especially the comments below Adam Rackis' answer. Let me know if you want to reclaim your answer on the new question.
                          – Robert Harvey
                          Feb 21 '12 at 18:06








                        • 2




                          @Tim, no, <= will not help if you have a non-unique column. You must use a unique column as a tiebreaker.
                          – Bill Karwin
                          Jul 3 '15 at 7:13






                        • 1




                          The performance degrades exponentially as the number of rows increases or when groups become larger. For example a group consisting of 5 dates will yield 4+3+2+1+1 = 11 rows via left join out of which one row is filtered in the end. Performance of joining with grouped results is almost linear. Your tests look flawed.
                          – Salman A
                          Oct 16 '15 at 12:12














                        • 7




                          Really? What happens if you have a ton of entries? For example, if you're working w/ an in-house version control, say, and you have a ton of versions per file, that join result would be massive. Have you ever benchmarked the subquery method with this one? I'm pretty curious to know which would win, but not curious enough to not ask you first.
                          – Eric
                          Aug 21 '09 at 18:19






                        • 1




                          Did some testing. On a small table (~300k records, ~190k groups, so not massive groups or anything), the queries tied (8 seconds each).
                          – Eric
                          Aug 21 '09 at 18:44






                        • 1




                          @BillKarwin: See meta.stackexchange.com/questions/123017, especially the comments below Adam Rackis' answer. Let me know if you want to reclaim your answer on the new question.
                          – Robert Harvey
                          Feb 21 '12 at 18:06








                        • 2




                          @Tim, no, <= will not help if you have a non-unique column. You must use a unique column as a tiebreaker.
                          – Bill Karwin
                          Jul 3 '15 at 7:13






                        • 1




                          The performance degrades exponentially as the number of rows increases or when groups become larger. For example a group consisting of 5 dates will yield 4+3+2+1+1 = 11 rows via left join out of which one row is filtered in the end. Performance of joining with grouped results is almost linear. Your tests look flawed.
                          – Salman A
                          Oct 16 '15 at 12:12








                        7




                        7




                        Really? What happens if you have a ton of entries? For example, if you're working w/ an in-house version control, say, and you have a ton of versions per file, that join result would be massive. Have you ever benchmarked the subquery method with this one? I'm pretty curious to know which would win, but not curious enough to not ask you first.
                        – Eric
                        Aug 21 '09 at 18:19




                        Really? What happens if you have a ton of entries? For example, if you're working w/ an in-house version control, say, and you have a ton of versions per file, that join result would be massive. Have you ever benchmarked the subquery method with this one? I'm pretty curious to know which would win, but not curious enough to not ask you first.
                        – Eric
                        Aug 21 '09 at 18:19




                        1




                        1




                        Did some testing. On a small table (~300k records, ~190k groups, so not massive groups or anything), the queries tied (8 seconds each).
                        – Eric
                        Aug 21 '09 at 18:44




                        Did some testing. On a small table (~300k records, ~190k groups, so not massive groups or anything), the queries tied (8 seconds each).
                        – Eric
                        Aug 21 '09 at 18:44




                        1




                        1




                        @BillKarwin: See meta.stackexchange.com/questions/123017, especially the comments below Adam Rackis' answer. Let me know if you want to reclaim your answer on the new question.
                        – Robert Harvey
                        Feb 21 '12 at 18:06






                        @BillKarwin: See meta.stackexchange.com/questions/123017, especially the comments below Adam Rackis' answer. Let me know if you want to reclaim your answer on the new question.
                        – Robert Harvey
                        Feb 21 '12 at 18:06






                        2




                        2




                        @Tim, no, <= will not help if you have a non-unique column. You must use a unique column as a tiebreaker.
                        – Bill Karwin
                        Jul 3 '15 at 7:13




                        @Tim, no, <= will not help if you have a non-unique column. You must use a unique column as a tiebreaker.
                        – Bill Karwin
                        Jul 3 '15 at 7:13




                        1




                        1




                        The performance degrades exponentially as the number of rows increases or when groups become larger. For example a group consisting of 5 dates will yield 4+3+2+1+1 = 11 rows via left join out of which one row is filtered in the end. Performance of joining with grouped results is almost linear. Your tests look flawed.
                        – Salman A
                        Oct 16 '15 at 12:12




                        The performance degrades exponentially as the number of rows increases or when groups become larger. For example a group consisting of 5 dates will yield 4+3+2+1+1 = 11 rows via left join out of which one row is filtered in the end. Performance of joining with grouped results is almost linear. Your tests look flawed.
                        – Salman A
                        Oct 16 '15 at 12:12












                        up vote
                        125
                        down vote













                        UPD: 2017-03-31, the version 5.7.5 of MySQL made the ONLY_FULL_GROUP_BY switch enabled by default (hence, non-deterministic GROUP BY queries became disabled). Moreover, they updated the GROUP BY implementation and the solution might not work as expected anymore even with the disabled switch. One needs to check.



                        Bill Karwin's solution above works fine when item count within groups is rather small, but the performance of the query becomes bad when the groups are rather large, since the solution requires about n*n/2 + n/2 of only IS NULL comparisons.



                        I made my tests on a InnoDB table of 18684446 rows with 1182 groups. The table contains testresults for functional tests and has the (test_id, request_id) as the primary key. Thus, test_id is a group and I was searching for the last request_id for each test_id.



                        Bill's solution has already been running for several hours on my dell e4310 and I do not know when it is going to finish even though it operates on a coverage index (hence using index in EXPLAIN).



                        I have a couple of other solutions that are based on the same ideas:




                        • if the underlying index is BTREE index (which is usually the case), the largest (group_id, item_value) pair is the last value within each group_id, that is the first for each group_id if we walk through the index in descending order;

                        • if we read the values which are covered by an index, the values are read in the order of the index;

                        • each index implicitly contains primary key columns appended to that (that is the primary key is in the coverage index). In solutions below I operate directly on the primary key, in you case, you will just need to add primary key columns in the result.

                        • in many cases it is much cheaper to collect the required row ids in the required order in a subquery and join the result of the subquery on the id. Since for each row in the subquery result MySQL will need a single fetch based on primary key, the subquery will be put first in the join and the rows will be output in the order of the ids in the subquery (if we omit explicit ORDER BY for the join)


                        3 ways MySQL uses indexes is a great article to understand some details.



                        Solution 1



                        This one is incredibly fast, it takes about 0,8 secs on my 18M+ rows:



                        SELECT test_id, MAX(request_id), request_id
                        FROM testresults
                        GROUP BY test_id DESC;


                        If you want to change the order to ASC, put it in a subquery, return the ids only and use that as the subquery to join to the rest of the columns:



                        SELECT test_id, request_id
                        FROM (
                        SELECT test_id, MAX(request_id), request_id
                        FROM testresults
                        GROUP BY test_id DESC) as ids
                        ORDER BY test_id;


                        This one takes about 1,2 secs on my data.



                        Solution 2



                        Here is another solution that takes about 19 seconds for my table:



                        SELECT test_id, request_id
                        FROM testresults, (SELECT @group:=NULL) as init
                        WHERE IF(IFNULL(@group, -1)=@group:=test_id, 0, 1)
                        ORDER BY test_id DESC, request_id DESC


                        It returns tests in descending order as well. It is much slower since it does a full index scan but it is here to give you an idea how to output N max rows for each group.



                        The disadvantage of the query is that its result cannot be cached by the query cache.






                        share|improve this answer























                        • A related answer: stackoverflow.com/a/14836418/68998
                          – newtover
                          Feb 13 '13 at 9:13












                        • Please link to a dump of your tables so that people can test it on their platforms.
                          – Pacerier
                          Feb 3 '15 at 3:44






                        • 2




                          Solution 1 can't work, you can't select request_id without having that in group by clause,
                          – giò
                          Mar 9 '17 at 9:57






                        • 2




                          @giò, this is answer is 5 years old. Until MySQL 5.7.5 ONLY_FULL_GROUP_BY was disabled by default and this solution worked out of the box dev.mysql.com/doc/relnotes/mysql/5.7/en/…. Now I'm not sure if the solution still works when you disable the mode, because the implementation of the GROUP BY has been changed.
                          – newtover
                          Mar 31 '17 at 14:58










                        • If you wanted ASC in the first solution, would it work if you turn MAX to MIN?
                          – Jin Izzraeel
                          May 9 '17 at 15:45















                        up vote
                        125
                        down vote













                        UPD: 2017-03-31, the version 5.7.5 of MySQL made the ONLY_FULL_GROUP_BY switch enabled by default (hence, non-deterministic GROUP BY queries became disabled). Moreover, they updated the GROUP BY implementation and the solution might not work as expected anymore even with the disabled switch. One needs to check.



                        Bill Karwin's solution above works fine when item count within groups is rather small, but the performance of the query becomes bad when the groups are rather large, since the solution requires about n*n/2 + n/2 of only IS NULL comparisons.



                        I made my tests on a InnoDB table of 18684446 rows with 1182 groups. The table contains testresults for functional tests and has the (test_id, request_id) as the primary key. Thus, test_id is a group and I was searching for the last request_id for each test_id.



                        Bill's solution has already been running for several hours on my dell e4310 and I do not know when it is going to finish even though it operates on a coverage index (hence using index in EXPLAIN).



                        I have a couple of other solutions that are based on the same ideas:




                        • if the underlying index is BTREE index (which is usually the case), the largest (group_id, item_value) pair is the last value within each group_id, that is the first for each group_id if we walk through the index in descending order;

                        • if we read the values which are covered by an index, the values are read in the order of the index;

                        • each index implicitly contains primary key columns appended to that (that is the primary key is in the coverage index). In solutions below I operate directly on the primary key, in you case, you will just need to add primary key columns in the result.

                        • in many cases it is much cheaper to collect the required row ids in the required order in a subquery and join the result of the subquery on the id. Since for each row in the subquery result MySQL will need a single fetch based on primary key, the subquery will be put first in the join and the rows will be output in the order of the ids in the subquery (if we omit explicit ORDER BY for the join)


                        3 ways MySQL uses indexes is a great article to understand some details.



                        Solution 1



                        This one is incredibly fast, it takes about 0,8 secs on my 18M+ rows:



                        SELECT test_id, MAX(request_id), request_id
                        FROM testresults
                        GROUP BY test_id DESC;


                        If you want to change the order to ASC, put it in a subquery, return the ids only and use that as the subquery to join to the rest of the columns:



                        SELECT test_id, request_id
                        FROM (
                        SELECT test_id, MAX(request_id), request_id
                        FROM testresults
                        GROUP BY test_id DESC) as ids
                        ORDER BY test_id;


                        This one takes about 1,2 secs on my data.



                        Solution 2



                        Here is another solution that takes about 19 seconds for my table:



                        SELECT test_id, request_id
                        FROM testresults, (SELECT @group:=NULL) as init
                        WHERE IF(IFNULL(@group, -1)=@group:=test_id, 0, 1)
                        ORDER BY test_id DESC, request_id DESC


                        It returns tests in descending order as well. It is much slower since it does a full index scan but it is here to give you an idea how to output N max rows for each group.



                        The disadvantage of the query is that its result cannot be cached by the query cache.






                        share|improve this answer























                        • A related answer: stackoverflow.com/a/14836418/68998
                          – newtover
                          Feb 13 '13 at 9:13












                        • Please link to a dump of your tables so that people can test it on their platforms.
                          – Pacerier
                          Feb 3 '15 at 3:44






                        • 2




                          Solution 1 can't work, you can't select request_id without having that in group by clause,
                          – giò
                          Mar 9 '17 at 9:57






                        • 2




                          @giò, this is answer is 5 years old. Until MySQL 5.7.5 ONLY_FULL_GROUP_BY was disabled by default and this solution worked out of the box dev.mysql.com/doc/relnotes/mysql/5.7/en/…. Now I'm not sure if the solution still works when you disable the mode, because the implementation of the GROUP BY has been changed.
                          – newtover
                          Mar 31 '17 at 14:58










                        • If you wanted ASC in the first solution, would it work if you turn MAX to MIN?
                          – Jin Izzraeel
                          May 9 '17 at 15:45













                        up vote
                        125
                        down vote










                        up vote
                        125
                        down vote









                        UPD: 2017-03-31, the version 5.7.5 of MySQL made the ONLY_FULL_GROUP_BY switch enabled by default (hence, non-deterministic GROUP BY queries became disabled). Moreover, they updated the GROUP BY implementation and the solution might not work as expected anymore even with the disabled switch. One needs to check.



                        Bill Karwin's solution above works fine when item count within groups is rather small, but the performance of the query becomes bad when the groups are rather large, since the solution requires about n*n/2 + n/2 of only IS NULL comparisons.



                        I made my tests on a InnoDB table of 18684446 rows with 1182 groups. The table contains testresults for functional tests and has the (test_id, request_id) as the primary key. Thus, test_id is a group and I was searching for the last request_id for each test_id.



                        Bill's solution has already been running for several hours on my dell e4310 and I do not know when it is going to finish even though it operates on a coverage index (hence using index in EXPLAIN).



                        I have a couple of other solutions that are based on the same ideas:




                        • if the underlying index is BTREE index (which is usually the case), the largest (group_id, item_value) pair is the last value within each group_id, that is the first for each group_id if we walk through the index in descending order;

                        • if we read the values which are covered by an index, the values are read in the order of the index;

                        • each index implicitly contains primary key columns appended to that (that is the primary key is in the coverage index). In solutions below I operate directly on the primary key, in you case, you will just need to add primary key columns in the result.

                        • in many cases it is much cheaper to collect the required row ids in the required order in a subquery and join the result of the subquery on the id. Since for each row in the subquery result MySQL will need a single fetch based on primary key, the subquery will be put first in the join and the rows will be output in the order of the ids in the subquery (if we omit explicit ORDER BY for the join)


                        3 ways MySQL uses indexes is a great article to understand some details.



                        Solution 1



                        This one is incredibly fast, it takes about 0,8 secs on my 18M+ rows:



                        SELECT test_id, MAX(request_id), request_id
                        FROM testresults
                        GROUP BY test_id DESC;


                        If you want to change the order to ASC, put it in a subquery, return the ids only and use that as the subquery to join to the rest of the columns:



                        SELECT test_id, request_id
                        FROM (
                        SELECT test_id, MAX(request_id), request_id
                        FROM testresults
                        GROUP BY test_id DESC) as ids
                        ORDER BY test_id;


                        This one takes about 1,2 secs on my data.



                        Solution 2



                        Here is another solution that takes about 19 seconds for my table:



                        SELECT test_id, request_id
                        FROM testresults, (SELECT @group:=NULL) as init
                        WHERE IF(IFNULL(@group, -1)=@group:=test_id, 0, 1)
                        ORDER BY test_id DESC, request_id DESC


                        It returns tests in descending order as well. It is much slower since it does a full index scan but it is here to give you an idea how to output N max rows for each group.



                        The disadvantage of the query is that its result cannot be cached by the query cache.






                        share|improve this answer














                        UPD: 2017-03-31, the version 5.7.5 of MySQL made the ONLY_FULL_GROUP_BY switch enabled by default (hence, non-deterministic GROUP BY queries became disabled). Moreover, they updated the GROUP BY implementation and the solution might not work as expected anymore even with the disabled switch. One needs to check.



                        Bill Karwin's solution above works fine when item count within groups is rather small, but the performance of the query becomes bad when the groups are rather large, since the solution requires about n*n/2 + n/2 of only IS NULL comparisons.



                        I made my tests on a InnoDB table of 18684446 rows with 1182 groups. The table contains testresults for functional tests and has the (test_id, request_id) as the primary key. Thus, test_id is a group and I was searching for the last request_id for each test_id.



                        Bill's solution has already been running for several hours on my dell e4310 and I do not know when it is going to finish even though it operates on a coverage index (hence using index in EXPLAIN).



                        I have a couple of other solutions that are based on the same ideas:




                        • if the underlying index is BTREE index (which is usually the case), the largest (group_id, item_value) pair is the last value within each group_id, that is the first for each group_id if we walk through the index in descending order;

                        • if we read the values which are covered by an index, the values are read in the order of the index;

                        • each index implicitly contains primary key columns appended to that (that is the primary key is in the coverage index). In solutions below I operate directly on the primary key, in you case, you will just need to add primary key columns in the result.

                        • in many cases it is much cheaper to collect the required row ids in the required order in a subquery and join the result of the subquery on the id. Since for each row in the subquery result MySQL will need a single fetch based on primary key, the subquery will be put first in the join and the rows will be output in the order of the ids in the subquery (if we omit explicit ORDER BY for the join)


                        3 ways MySQL uses indexes is a great article to understand some details.



                        Solution 1



                        This one is incredibly fast, it takes about 0,8 secs on my 18M+ rows:



                        SELECT test_id, MAX(request_id), request_id
                        FROM testresults
                        GROUP BY test_id DESC;


                        If you want to change the order to ASC, put it in a subquery, return the ids only and use that as the subquery to join to the rest of the columns:



                        SELECT test_id, request_id
                        FROM (
                        SELECT test_id, MAX(request_id), request_id
                        FROM testresults
                        GROUP BY test_id DESC) as ids
                        ORDER BY test_id;


                        This one takes about 1,2 secs on my data.



                        Solution 2



                        Here is another solution that takes about 19 seconds for my table:



                        SELECT test_id, request_id
                        FROM testresults, (SELECT @group:=NULL) as init
                        WHERE IF(IFNULL(@group, -1)=@group:=test_id, 0, 1)
                        ORDER BY test_id DESC, request_id DESC


                        It returns tests in descending order as well. It is much slower since it does a full index scan but it is here to give you an idea how to output N max rows for each group.



                        The disadvantage of the query is that its result cannot be cached by the query cache.







                        share|improve this answer














                        share|improve this answer



                        share|improve this answer








                        edited Mar 31 '17 at 15:08

























                        answered Jan 6 '12 at 11:21









                        newtover

                        22.2k76073




                        22.2k76073












                        • A related answer: stackoverflow.com/a/14836418/68998
                          – newtover
                          Feb 13 '13 at 9:13












                        • Please link to a dump of your tables so that people can test it on their platforms.
                          – Pacerier
                          Feb 3 '15 at 3:44






                        • 2




                          Solution 1 can't work, you can't select request_id without having that in group by clause,
                          – giò
                          Mar 9 '17 at 9:57






                        • 2




                          @giò, this is answer is 5 years old. Until MySQL 5.7.5 ONLY_FULL_GROUP_BY was disabled by default and this solution worked out of the box dev.mysql.com/doc/relnotes/mysql/5.7/en/…. Now I'm not sure if the solution still works when you disable the mode, because the implementation of the GROUP BY has been changed.
                          – newtover
                          Mar 31 '17 at 14:58










                        • If you wanted ASC in the first solution, would it work if you turn MAX to MIN?
                          – Jin Izzraeel
                          May 9 '17 at 15:45


















                        • A related answer: stackoverflow.com/a/14836418/68998
                          – newtover
                          Feb 13 '13 at 9:13












                        • Please link to a dump of your tables so that people can test it on their platforms.
                          – Pacerier
                          Feb 3 '15 at 3:44






                        • 2




                          Solution 1 can't work, you can't select request_id without having that in group by clause,
                          – giò
                          Mar 9 '17 at 9:57






                        • 2




                          @giò, this is answer is 5 years old. Until MySQL 5.7.5 ONLY_FULL_GROUP_BY was disabled by default and this solution worked out of the box dev.mysql.com/doc/relnotes/mysql/5.7/en/…. Now I'm not sure if the solution still works when you disable the mode, because the implementation of the GROUP BY has been changed.
                          – newtover
                          Mar 31 '17 at 14:58










                        • If you wanted ASC in the first solution, would it work if you turn MAX to MIN?
                          – Jin Izzraeel
                          May 9 '17 at 15:45
















                        A related answer: stackoverflow.com/a/14836418/68998
                        – newtover
                        Feb 13 '13 at 9:13






                        A related answer: stackoverflow.com/a/14836418/68998
                        – newtover
                        Feb 13 '13 at 9:13














                        Please link to a dump of your tables so that people can test it on their platforms.
                        – Pacerier
                        Feb 3 '15 at 3:44




                        Please link to a dump of your tables so that people can test it on their platforms.
                        – Pacerier
                        Feb 3 '15 at 3:44




                        2




                        2




                        Solution 1 can't work, you can't select request_id without having that in group by clause,
                        – giò
                        Mar 9 '17 at 9:57




                        Solution 1 can't work, you can't select request_id without having that in group by clause,
                        – giò
                        Mar 9 '17 at 9:57




                        2




                        2




                        @giò, this is answer is 5 years old. Until MySQL 5.7.5 ONLY_FULL_GROUP_BY was disabled by default and this solution worked out of the box dev.mysql.com/doc/relnotes/mysql/5.7/en/…. Now I'm not sure if the solution still works when you disable the mode, because the implementation of the GROUP BY has been changed.
                        – newtover
                        Mar 31 '17 at 14:58




                        @giò, this is answer is 5 years old. Until MySQL 5.7.5 ONLY_FULL_GROUP_BY was disabled by default and this solution worked out of the box dev.mysql.com/doc/relnotes/mysql/5.7/en/…. Now I'm not sure if the solution still works when you disable the mode, because the implementation of the GROUP BY has been changed.
                        – newtover
                        Mar 31 '17 at 14:58












                        If you wanted ASC in the first solution, would it work if you turn MAX to MIN?
                        – Jin Izzraeel
                        May 9 '17 at 15:45




                        If you wanted ASC in the first solution, would it work if you turn MAX to MIN?
                        – Jin Izzraeel
                        May 9 '17 at 15:45










                        up vote
                        82
                        down vote













                        Use your subquery to return the correct grouping, because you're halfway there.



                        Try this:



                        select
                        a.*
                        from
                        messages a
                        inner join
                        (select name, max(id) as maxid from messages group by name) as b on
                        a.id = b.maxid


                        If it's not id you want the max of:



                        select
                        a.*
                        from
                        messages a
                        inner join
                        (select name, max(other_col) as other_col
                        from messages group by name) as b on
                        a.name = b.name
                        and a.other_col = b.other_col


                        This way, you avoid correlated subqueries and/or ordering in your subqueries, which tend to be very slow/inefficient.






                        share|improve this answer



















                        • 1




                          Note a caveat for the solution with other_col: if that column is not unique you may get multiple records back with the same name, if they tie for max(other_col). I found this post that describes a solution for my needs, where I need exactly one record per name.
                          – Eric Simonton
                          Aug 21 '15 at 13:48












                        • In some situations you can only use this solution but ont the accepted one.
                          – tom10271
                          Sep 4 '15 at 2:59










                        • In my experience, it is grouping the whole damn messages table that tends to be slow/inefficient! In other words, note that the subquery requires a full table scan, and does a grouping on that to boot... unless your optimizer is doing something that mine is not. So this solution depends heavily on holding the entire table in memory.
                          – Timo
                          Apr 30 at 14:56

















                        up vote
                        82
                        down vote













                        Use your subquery to return the correct grouping, because you're halfway there.



                        Try this:



                        select
                        a.*
                        from
                        messages a
                        inner join
                        (select name, max(id) as maxid from messages group by name) as b on
                        a.id = b.maxid


                        If it's not id you want the max of:



                        select
                        a.*
                        from
                        messages a
                        inner join
                        (select name, max(other_col) as other_col
                        from messages group by name) as b on
                        a.name = b.name
                        and a.other_col = b.other_col


                        This way, you avoid correlated subqueries and/or ordering in your subqueries, which tend to be very slow/inefficient.






                        share|improve this answer



















                        • 1




                          Note a caveat for the solution with other_col: if that column is not unique you may get multiple records back with the same name, if they tie for max(other_col). I found this post that describes a solution for my needs, where I need exactly one record per name.
                          – Eric Simonton
                          Aug 21 '15 at 13:48












                        • In some situations you can only use this solution but ont the accepted one.
                          – tom10271
                          Sep 4 '15 at 2:59










                        • In my experience, it is grouping the whole damn messages table that tends to be slow/inefficient! In other words, note that the subquery requires a full table scan, and does a grouping on that to boot... unless your optimizer is doing something that mine is not. So this solution depends heavily on holding the entire table in memory.
                          – Timo
                          Apr 30 at 14:56















                        up vote
                        82
                        down vote










                        up vote
                        82
                        down vote









                        Use your subquery to return the correct grouping, because you're halfway there.



                        Try this:



                        select
                        a.*
                        from
                        messages a
                        inner join
                        (select name, max(id) as maxid from messages group by name) as b on
                        a.id = b.maxid


                        If it's not id you want the max of:



                        select
                        a.*
                        from
                        messages a
                        inner join
                        (select name, max(other_col) as other_col
                        from messages group by name) as b on
                        a.name = b.name
                        and a.other_col = b.other_col


                        This way, you avoid correlated subqueries and/or ordering in your subqueries, which tend to be very slow/inefficient.






                        share|improve this answer














                        Use your subquery to return the correct grouping, because you're halfway there.



                        Try this:



                        select
                        a.*
                        from
                        messages a
                        inner join
                        (select name, max(id) as maxid from messages group by name) as b on
                        a.id = b.maxid


                        If it's not id you want the max of:



                        select
                        a.*
                        from
                        messages a
                        inner join
                        (select name, max(other_col) as other_col
                        from messages group by name) as b on
                        a.name = b.name
                        and a.other_col = b.other_col


                        This way, you avoid correlated subqueries and/or ordering in your subqueries, which tend to be very slow/inefficient.







                        share|improve this answer














                        share|improve this answer



                        share|improve this answer








                        edited Aug 21 '09 at 17:14

























                        answered Aug 21 '09 at 17:06









                        Eric

                        69.4k9100109




                        69.4k9100109








                        • 1




                          Note a caveat for the solution with other_col: if that column is not unique you may get multiple records back with the same name, if they tie for max(other_col). I found this post that describes a solution for my needs, where I need exactly one record per name.
                          – Eric Simonton
                          Aug 21 '15 at 13:48












                        • In some situations you can only use this solution but ont the accepted one.
                          – tom10271
                          Sep 4 '15 at 2:59










                        • In my experience, it is grouping the whole damn messages table that tends to be slow/inefficient! In other words, note that the subquery requires a full table scan, and does a grouping on that to boot... unless your optimizer is doing something that mine is not. So this solution depends heavily on holding the entire table in memory.
                          – Timo
                          Apr 30 at 14:56
















                        • 1




                          Note a caveat for the solution with other_col: if that column is not unique you may get multiple records back with the same name, if they tie for max(other_col). I found this post that describes a solution for my needs, where I need exactly one record per name.
                          – Eric Simonton
                          Aug 21 '15 at 13:48












                        • In some situations you can only use this solution but ont the accepted one.
                          – tom10271
                          Sep 4 '15 at 2:59










                        • In my experience, it is grouping the whole damn messages table that tends to be slow/inefficient! In other words, note that the subquery requires a full table scan, and does a grouping on that to boot... unless your optimizer is doing something that mine is not. So this solution depends heavily on holding the entire table in memory.
                          – Timo
                          Apr 30 at 14:56










                        1




                        1




                        Note a caveat for the solution with other_col: if that column is not unique you may get multiple records back with the same name, if they tie for max(other_col). I found this post that describes a solution for my needs, where I need exactly one record per name.
                        – Eric Simonton
                        Aug 21 '15 at 13:48






                        Note a caveat for the solution with other_col: if that column is not unique you may get multiple records back with the same name, if they tie for max(other_col). I found this post that describes a solution for my needs, where I need exactly one record per name.
                        – Eric Simonton
                        Aug 21 '15 at 13:48














                        In some situations you can only use this solution but ont the accepted one.
                        – tom10271
                        Sep 4 '15 at 2:59




                        In some situations you can only use this solution but ont the accepted one.
                        – tom10271
                        Sep 4 '15 at 2:59












                        In my experience, it is grouping the whole damn messages table that tends to be slow/inefficient! In other words, note that the subquery requires a full table scan, and does a grouping on that to boot... unless your optimizer is doing something that mine is not. So this solution depends heavily on holding the entire table in memory.
                        – Timo
                        Apr 30 at 14:56






                        In my experience, it is grouping the whole damn messages table that tends to be slow/inefficient! In other words, note that the subquery requires a full table scan, and does a grouping on that to boot... unless your optimizer is doing something that mine is not. So this solution depends heavily on holding the entire table in memory.
                        – Timo
                        Apr 30 at 14:56












                        up vote
                        35
                        down vote













                        I arrived at a different solution, which is to get the IDs for the last post within each group, then select from the messages table using the result from the first query as the argument for a WHERE x IN construct:



                        SELECT id, name, other_columns
                        FROM messages
                        WHERE id IN (
                        SELECT MAX(id)
                        FROM messages
                        GROUP BY name
                        );


                        I don't know how this performs compared to some of the other solutions, but it worked spectacularly for my table with 3+ million rows. (4 second execution with 1200+ results)



                        This should work both on MySQL and SQL Server.






                        share|improve this answer





















                        • Just make sure you have an index on (name, id).
                          – Samuel Åslund
                          Apr 22 '16 at 11:58






                        • 1




                          Much better that self joins
                          – anwerj
                          Dec 23 '16 at 7:40










                        • I learned something from you that is a good job and this query is faster
                          – Humphrey
                          Feb 23 at 7:48















                        up vote
                        35
                        down vote













                        I arrived at a different solution, which is to get the IDs for the last post within each group, then select from the messages table using the result from the first query as the argument for a WHERE x IN construct:



                        SELECT id, name, other_columns
                        FROM messages
                        WHERE id IN (
                        SELECT MAX(id)
                        FROM messages
                        GROUP BY name
                        );


                        I don't know how this performs compared to some of the other solutions, but it worked spectacularly for my table with 3+ million rows. (4 second execution with 1200+ results)



                        This should work both on MySQL and SQL Server.






                        share|improve this answer





















                        • Just make sure you have an index on (name, id).
                          – Samuel Åslund
                          Apr 22 '16 at 11:58






                        • 1




                          Much better that self joins
                          – anwerj
                          Dec 23 '16 at 7:40










                        • I learned something from you that is a good job and this query is faster
                          – Humphrey
                          Feb 23 at 7:48













                        up vote
                        35
                        down vote










                        up vote
                        35
                        down vote









                        I arrived at a different solution, which is to get the IDs for the last post within each group, then select from the messages table using the result from the first query as the argument for a WHERE x IN construct:



                        SELECT id, name, other_columns
                        FROM messages
                        WHERE id IN (
                        SELECT MAX(id)
                        FROM messages
                        GROUP BY name
                        );


                        I don't know how this performs compared to some of the other solutions, but it worked spectacularly for my table with 3+ million rows. (4 second execution with 1200+ results)



                        This should work both on MySQL and SQL Server.






                        share|improve this answer












                        I arrived at a different solution, which is to get the IDs for the last post within each group, then select from the messages table using the result from the first query as the argument for a WHERE x IN construct:



                        SELECT id, name, other_columns
                        FROM messages
                        WHERE id IN (
                        SELECT MAX(id)
                        FROM messages
                        GROUP BY name
                        );


                        I don't know how this performs compared to some of the other solutions, but it worked spectacularly for my table with 3+ million rows. (4 second execution with 1200+ results)



                        This should work both on MySQL and SQL Server.







                        share|improve this answer












                        share|improve this answer



                        share|improve this answer










                        answered Feb 20 '12 at 21:46









                        JYelton

                        25.1k20105171




                        25.1k20105171












                        • Just make sure you have an index on (name, id).
                          – Samuel Åslund
                          Apr 22 '16 at 11:58






                        • 1




                          Much better that self joins
                          – anwerj
                          Dec 23 '16 at 7:40










                        • I learned something from you that is a good job and this query is faster
                          – Humphrey
                          Feb 23 at 7:48


















                        • Just make sure you have an index on (name, id).
                          – Samuel Åslund
                          Apr 22 '16 at 11:58






                        • 1




                          Much better that self joins
                          – anwerj
                          Dec 23 '16 at 7:40










                        • I learned something from you that is a good job and this query is faster
                          – Humphrey
                          Feb 23 at 7:48
















                        Just make sure you have an index on (name, id).
                        – Samuel Åslund
                        Apr 22 '16 at 11:58




                        Just make sure you have an index on (name, id).
                        – Samuel Åslund
                        Apr 22 '16 at 11:58




                        1




                        1




                        Much better that self joins
                        – anwerj
                        Dec 23 '16 at 7:40




                        Much better that self joins
                        – anwerj
                        Dec 23 '16 at 7:40












                        I learned something from you that is a good job and this query is faster
                        – Humphrey
                        Feb 23 at 7:48




                        I learned something from you that is a good job and this query is faster
                        – Humphrey
                        Feb 23 at 7:48










                        up vote
                        25
                        down vote













                        Solution by sub query fiddle Link



                        select * from messages where id in
                        (select max(id) from messages group by Name)


                        Solution By join condition fiddle link



                        select m1.* from messages m1 
                        left outer join messages m2
                        on ( m1.id<m2.id and m1.name=m2.name )
                        where m2.id is null


                        Reason for this post is to give fiddle link only.
                        Same SQL is already provided in other answers.






                        share|improve this answer





















                        • What's the point of the 'fiddle' if you can't run it?
                          – Alexander Suraphel
                          Jul 4 at 9:41










                        • @AlexanderSuraphel mysql5.5 is not available in fiddle now, fiddle link was created using that. Now a days fiddle supports mysql5.6, i changed database to mysql 5.6 and i am able to build schema and run the sql.
                          – Vipin
                          Jul 4 at 17:21















                        up vote
                        25
                        down vote













                        Solution by sub query fiddle Link



                        select * from messages where id in
                        (select max(id) from messages group by Name)


                        Solution By join condition fiddle link



                        select m1.* from messages m1 
                        left outer join messages m2
                        on ( m1.id<m2.id and m1.name=m2.name )
                        where m2.id is null


                        Reason for this post is to give fiddle link only.
                        Same SQL is already provided in other answers.






                        share|improve this answer





















                        • What's the point of the 'fiddle' if you can't run it?
                          – Alexander Suraphel
                          Jul 4 at 9:41










                        • @AlexanderSuraphel mysql5.5 is not available in fiddle now, fiddle link was created using that. Now a days fiddle supports mysql5.6, i changed database to mysql 5.6 and i am able to build schema and run the sql.
                          – Vipin
                          Jul 4 at 17:21













                        up vote
                        25
                        down vote










                        up vote
                        25
                        down vote









                        Solution by sub query fiddle Link



                        select * from messages where id in
                        (select max(id) from messages group by Name)


                        Solution By join condition fiddle link



                        select m1.* from messages m1 
                        left outer join messages m2
                        on ( m1.id<m2.id and m1.name=m2.name )
                        where m2.id is null


                        Reason for this post is to give fiddle link only.
                        Same SQL is already provided in other answers.






                        share|improve this answer












                        Solution by sub query fiddle Link



                        select * from messages where id in
                        (select max(id) from messages group by Name)


                        Solution By join condition fiddle link



                        select m1.* from messages m1 
                        left outer join messages m2
                        on ( m1.id<m2.id and m1.name=m2.name )
                        where m2.id is null


                        Reason for this post is to give fiddle link only.
                        Same SQL is already provided in other answers.







                        share|improve this answer












                        share|improve this answer



                        share|improve this answer










                        answered Dec 25 '13 at 8:36









                        Vipin

                        2,18221842




                        2,18221842












                        • What's the point of the 'fiddle' if you can't run it?
                          – Alexander Suraphel
                          Jul 4 at 9:41










                        • @AlexanderSuraphel mysql5.5 is not available in fiddle now, fiddle link was created using that. Now a days fiddle supports mysql5.6, i changed database to mysql 5.6 and i am able to build schema and run the sql.
                          – Vipin
                          Jul 4 at 17:21


















                        • What's the point of the 'fiddle' if you can't run it?
                          – Alexander Suraphel
                          Jul 4 at 9:41










                        • @AlexanderSuraphel mysql5.5 is not available in fiddle now, fiddle link was created using that. Now a days fiddle supports mysql5.6, i changed database to mysql 5.6 and i am able to build schema and run the sql.
                          – Vipin
                          Jul 4 at 17:21
















                        What's the point of the 'fiddle' if you can't run it?
                        – Alexander Suraphel
                        Jul 4 at 9:41




                        What's the point of the 'fiddle' if you can't run it?
                        – Alexander Suraphel
                        Jul 4 at 9:41












                        @AlexanderSuraphel mysql5.5 is not available in fiddle now, fiddle link was created using that. Now a days fiddle supports mysql5.6, i changed database to mysql 5.6 and i am able to build schema and run the sql.
                        – Vipin
                        Jul 4 at 17:21




                        @AlexanderSuraphel mysql5.5 is not available in fiddle now, fiddle link was created using that. Now a days fiddle supports mysql5.6, i changed database to mysql 5.6 and i am able to build schema and run the sql.
                        – Vipin
                        Jul 4 at 17:21










                        up vote
                        7
                        down vote













                        I've not yet tested with large DB but I think this could be faster than joining tables:



                        SELECT *, Max(Id) FROM messages GROUP BY Name





                        share|improve this answer



















                        • 4




                          This returns arbitrary data. In other words there returned columns might not be from the record with MAX(Id).
                          – harm
                          Jul 3 '14 at 15:05










                        • Useful to select the max Id from a set of record with WHERE condition : "SELECT Max(Id) FROM Prod WHERE Pn='" + Pn + "'" It returns the max Id from a set of records with same Pn.In c# use reader.GetString(0) to get the result
                          – Nicola
                          Apr 8 '15 at 9:24

















                        up vote
                        7
                        down vote













                        I've not yet tested with large DB but I think this could be faster than joining tables:



                        SELECT *, Max(Id) FROM messages GROUP BY Name





                        share|improve this answer



















                        • 4




                          This returns arbitrary data. In other words there returned columns might not be from the record with MAX(Id).
                          – harm
                          Jul 3 '14 at 15:05










                        • Useful to select the max Id from a set of record with WHERE condition : "SELECT Max(Id) FROM Prod WHERE Pn='" + Pn + "'" It returns the max Id from a set of records with same Pn.In c# use reader.GetString(0) to get the result
                          – Nicola
                          Apr 8 '15 at 9:24















                        up vote
                        7
                        down vote










                        up vote
                        7
                        down vote









                        I've not yet tested with large DB but I think this could be faster than joining tables:



                        SELECT *, Max(Id) FROM messages GROUP BY Name





                        share|improve this answer














                        I've not yet tested with large DB but I think this could be faster than joining tables:



                        SELECT *, Max(Id) FROM messages GROUP BY Name






                        share|improve this answer














                        share|improve this answer



                        share|improve this answer








                        edited Feb 14 '13 at 7:07









                        Shai

                        68.2k22133238




                        68.2k22133238










                        answered Mar 31 '12 at 14:44







                        user942821















                        • 4




                          This returns arbitrary data. In other words there returned columns might not be from the record with MAX(Id).
                          – harm
                          Jul 3 '14 at 15:05










                        • Useful to select the max Id from a set of record with WHERE condition : "SELECT Max(Id) FROM Prod WHERE Pn='" + Pn + "'" It returns the max Id from a set of records with same Pn.In c# use reader.GetString(0) to get the result
                          – Nicola
                          Apr 8 '15 at 9:24
















                        • 4




                          This returns arbitrary data. In other words there returned columns might not be from the record with MAX(Id).
                          – harm
                          Jul 3 '14 at 15:05










                        • Useful to select the max Id from a set of record with WHERE condition : "SELECT Max(Id) FROM Prod WHERE Pn='" + Pn + "'" It returns the max Id from a set of records with same Pn.In c# use reader.GetString(0) to get the result
                          – Nicola
                          Apr 8 '15 at 9:24










                        4




                        4




                        This returns arbitrary data. In other words there returned columns might not be from the record with MAX(Id).
                        – harm
                        Jul 3 '14 at 15:05




                        This returns arbitrary data. In other words there returned columns might not be from the record with MAX(Id).
                        – harm
                        Jul 3 '14 at 15:05












                        Useful to select the max Id from a set of record with WHERE condition : "SELECT Max(Id) FROM Prod WHERE Pn='" + Pn + "'" It returns the max Id from a set of records with same Pn.In c# use reader.GetString(0) to get the result
                        – Nicola
                        Apr 8 '15 at 9:24






                        Useful to select the max Id from a set of record with WHERE condition : "SELECT Max(Id) FROM Prod WHERE Pn='" + Pn + "'" It returns the max Id from a set of records with same Pn.In c# use reader.GetString(0) to get the result
                        – Nicola
                        Apr 8 '15 at 9:24












                        up vote
                        5
                        down vote













                        Here are two suggestions. First, if mysql supports ROW_NUMBER(), it's very simple:



                        WITH Ranked AS (
                        SELECT Id, Name, OtherColumns,
                        ROW_NUMBER() OVER (
                        PARTITION BY Name
                        ORDER BY Id DESC
                        ) AS rk
                        FROM messages
                        )
                        SELECT Id, Name, OtherColumns
                        FROM messages
                        WHERE rk = 1;


                        I'm assuming by "last" you mean last in Id order. If not, change the ORDER BY clause of the ROW_NUMBER() window accordingly. If ROW_NUMBER() isn't available, this is another solution:



                        Second, if it doesn't, this is often a good way to proceed:



                        SELECT
                        Id, Name, OtherColumns
                        FROM messages
                        WHERE NOT EXISTS (
                        SELECT * FROM messages as M2
                        WHERE M2.Name = messages.Name
                        AND M2.Id > messages.Id
                        )


                        In other words, select messages where there is no later-Id message with the same Name.






                        share|improve this answer

















                        • 8




                          MySQL doesn't support ROW_NUMBER() or CTE's.
                          – Bill Karwin
                          Aug 21 '09 at 17:37















                        up vote
                        5
                        down vote













                        Here are two suggestions. First, if mysql supports ROW_NUMBER(), it's very simple:



                        WITH Ranked AS (
                        SELECT Id, Name, OtherColumns,
                        ROW_NUMBER() OVER (
                        PARTITION BY Name
                        ORDER BY Id DESC
                        ) AS rk
                        FROM messages
                        )
                        SELECT Id, Name, OtherColumns
                        FROM messages
                        WHERE rk = 1;


                        I'm assuming by "last" you mean last in Id order. If not, change the ORDER BY clause of the ROW_NUMBER() window accordingly. If ROW_NUMBER() isn't available, this is another solution:



                        Second, if it doesn't, this is often a good way to proceed:



                        SELECT
                        Id, Name, OtherColumns
                        FROM messages
                        WHERE NOT EXISTS (
                        SELECT * FROM messages as M2
                        WHERE M2.Name = messages.Name
                        AND M2.Id > messages.Id
                        )


                        In other words, select messages where there is no later-Id message with the same Name.






                        share|improve this answer

















                        • 8




                          MySQL doesn't support ROW_NUMBER() or CTE's.
                          – Bill Karwin
                          Aug 21 '09 at 17:37













                        up vote
                        5
                        down vote










                        up vote
                        5
                        down vote









                        Here are two suggestions. First, if mysql supports ROW_NUMBER(), it's very simple:



                        WITH Ranked AS (
                        SELECT Id, Name, OtherColumns,
                        ROW_NUMBER() OVER (
                        PARTITION BY Name
                        ORDER BY Id DESC
                        ) AS rk
                        FROM messages
                        )
                        SELECT Id, Name, OtherColumns
                        FROM messages
                        WHERE rk = 1;


                        I'm assuming by "last" you mean last in Id order. If not, change the ORDER BY clause of the ROW_NUMBER() window accordingly. If ROW_NUMBER() isn't available, this is another solution:



                        Second, if it doesn't, this is often a good way to proceed:



                        SELECT
                        Id, Name, OtherColumns
                        FROM messages
                        WHERE NOT EXISTS (
                        SELECT * FROM messages as M2
                        WHERE M2.Name = messages.Name
                        AND M2.Id > messages.Id
                        )


                        In other words, select messages where there is no later-Id message with the same Name.






                        share|improve this answer












                        Here are two suggestions. First, if mysql supports ROW_NUMBER(), it's very simple:



                        WITH Ranked AS (
                        SELECT Id, Name, OtherColumns,
                        ROW_NUMBER() OVER (
                        PARTITION BY Name
                        ORDER BY Id DESC
                        ) AS rk
                        FROM messages
                        )
                        SELECT Id, Name, OtherColumns
                        FROM messages
                        WHERE rk = 1;


                        I'm assuming by "last" you mean last in Id order. If not, change the ORDER BY clause of the ROW_NUMBER() window accordingly. If ROW_NUMBER() isn't available, this is another solution:



                        Second, if it doesn't, this is often a good way to proceed:



                        SELECT
                        Id, Name, OtherColumns
                        FROM messages
                        WHERE NOT EXISTS (
                        SELECT * FROM messages as M2
                        WHERE M2.Name = messages.Name
                        AND M2.Id > messages.Id
                        )


                        In other words, select messages where there is no later-Id message with the same Name.







                        share|improve this answer












                        share|improve this answer



                        share|improve this answer










                        answered Aug 21 '09 at 17:26









                        Steve Kass

                        6,1951321




                        6,1951321








                        • 8




                          MySQL doesn't support ROW_NUMBER() or CTE's.
                          – Bill Karwin
                          Aug 21 '09 at 17:37














                        • 8




                          MySQL doesn't support ROW_NUMBER() or CTE's.
                          – Bill Karwin
                          Aug 21 '09 at 17:37








                        8




                        8




                        MySQL doesn't support ROW_NUMBER() or CTE's.
                        – Bill Karwin
                        Aug 21 '09 at 17:37




                        MySQL doesn't support ROW_NUMBER() or CTE's.
                        – Bill Karwin
                        Aug 21 '09 at 17:37










                        up vote
                        5
                        down vote













                        Here is my solution:



                        SELECT 
                        DISTINCT NAME,
                        MAX(MESSAGES) OVER(PARTITION BY NAME) MESSAGES
                        FROM MESSAGE;





                        share|improve this answer



























                          up vote
                          5
                          down vote













                          Here is my solution:



                          SELECT 
                          DISTINCT NAME,
                          MAX(MESSAGES) OVER(PARTITION BY NAME) MESSAGES
                          FROM MESSAGE;





                          share|improve this answer

























                            up vote
                            5
                            down vote










                            up vote
                            5
                            down vote









                            Here is my solution:



                            SELECT 
                            DISTINCT NAME,
                            MAX(MESSAGES) OVER(PARTITION BY NAME) MESSAGES
                            FROM MESSAGE;





                            share|improve this answer














                            Here is my solution:



                            SELECT 
                            DISTINCT NAME,
                            MAX(MESSAGES) OVER(PARTITION BY NAME) MESSAGES
                            FROM MESSAGE;






                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Jun 8 '17 at 19:03









                            Paul Roub

                            32.6k85773




                            32.6k85773










                            answered Jun 8 '17 at 18:49









                            Abhishek Yadav

                            6213




                            6213






















                                up vote
                                4
                                down vote













                                Here is another way to get the last related record using GROUP_CONCAT with order by and SUBSTRING_INDEX to pick one of the record from the list



                                SELECT 
                                `Id`,
                                `Name`,
                                SUBSTRING_INDEX(
                                GROUP_CONCAT(
                                `Other_Columns`
                                ORDER BY `Id` DESC
                                SEPARATOR '||'
                                ),
                                '||',
                                1
                                ) Other_Columns
                                FROM
                                messages
                                GROUP BY `Name`


                                Above query will group the all the Other_Columns that are in same Name group and using ORDER BY id DESC will join all the Other_Columns in a specific group in descending order with the provided separator in my case i have used || ,using SUBSTRING_INDEX over this list will pick the first one



                                Fiddle Demo






                                share|improve this answer



























                                  up vote
                                  4
                                  down vote













                                  Here is another way to get the last related record using GROUP_CONCAT with order by and SUBSTRING_INDEX to pick one of the record from the list



                                  SELECT 
                                  `Id`,
                                  `Name`,
                                  SUBSTRING_INDEX(
                                  GROUP_CONCAT(
                                  `Other_Columns`
                                  ORDER BY `Id` DESC
                                  SEPARATOR '||'
                                  ),
                                  '||',
                                  1
                                  ) Other_Columns
                                  FROM
                                  messages
                                  GROUP BY `Name`


                                  Above query will group the all the Other_Columns that are in same Name group and using ORDER BY id DESC will join all the Other_Columns in a specific group in descending order with the provided separator in my case i have used || ,using SUBSTRING_INDEX over this list will pick the first one



                                  Fiddle Demo






                                  share|improve this answer

























                                    up vote
                                    4
                                    down vote










                                    up vote
                                    4
                                    down vote









                                    Here is another way to get the last related record using GROUP_CONCAT with order by and SUBSTRING_INDEX to pick one of the record from the list



                                    SELECT 
                                    `Id`,
                                    `Name`,
                                    SUBSTRING_INDEX(
                                    GROUP_CONCAT(
                                    `Other_Columns`
                                    ORDER BY `Id` DESC
                                    SEPARATOR '||'
                                    ),
                                    '||',
                                    1
                                    ) Other_Columns
                                    FROM
                                    messages
                                    GROUP BY `Name`


                                    Above query will group the all the Other_Columns that are in same Name group and using ORDER BY id DESC will join all the Other_Columns in a specific group in descending order with the provided separator in my case i have used || ,using SUBSTRING_INDEX over this list will pick the first one



                                    Fiddle Demo






                                    share|improve this answer














                                    Here is another way to get the last related record using GROUP_CONCAT with order by and SUBSTRING_INDEX to pick one of the record from the list



                                    SELECT 
                                    `Id`,
                                    `Name`,
                                    SUBSTRING_INDEX(
                                    GROUP_CONCAT(
                                    `Other_Columns`
                                    ORDER BY `Id` DESC
                                    SEPARATOR '||'
                                    ),
                                    '||',
                                    1
                                    ) Other_Columns
                                    FROM
                                    messages
                                    GROUP BY `Name`


                                    Above query will group the all the Other_Columns that are in same Name group and using ORDER BY id DESC will join all the Other_Columns in a specific group in descending order with the provided separator in my case i have used || ,using SUBSTRING_INDEX over this list will pick the first one



                                    Fiddle Demo







                                    share|improve this answer














                                    share|improve this answer



                                    share|improve this answer








                                    edited Mar 30 '14 at 6:01

























                                    answered Mar 29 '14 at 14:51









                                    M Khalid Junaid

                                    52.2k86091




                                    52.2k86091






















                                        up vote
                                        4
                                        down vote













                                        SELECT 
                                        column1,
                                        column2
                                        FROM
                                        table_name
                                        WHERE id IN
                                        (SELECT
                                        MAX(id)
                                        FROM
                                        table_name
                                        GROUP BY column1)
                                        ORDER BY column1 ;





                                        share|improve this answer























                                        • Could you elaborate a bit on your answer? Why is your query preferrable to Vijays original query?
                                          – janfoeh
                                          May 4 '14 at 11:57















                                        up vote
                                        4
                                        down vote













                                        SELECT 
                                        column1,
                                        column2
                                        FROM
                                        table_name
                                        WHERE id IN
                                        (SELECT
                                        MAX(id)
                                        FROM
                                        table_name
                                        GROUP BY column1)
                                        ORDER BY column1 ;





                                        share|improve this answer























                                        • Could you elaborate a bit on your answer? Why is your query preferrable to Vijays original query?
                                          – janfoeh
                                          May 4 '14 at 11:57













                                        up vote
                                        4
                                        down vote










                                        up vote
                                        4
                                        down vote









                                        SELECT 
                                        column1,
                                        column2
                                        FROM
                                        table_name
                                        WHERE id IN
                                        (SELECT
                                        MAX(id)
                                        FROM
                                        table_name
                                        GROUP BY column1)
                                        ORDER BY column1 ;





                                        share|improve this answer














                                        SELECT 
                                        column1,
                                        column2
                                        FROM
                                        table_name
                                        WHERE id IN
                                        (SELECT
                                        MAX(id)
                                        FROM
                                        table_name
                                        GROUP BY column1)
                                        ORDER BY column1 ;






                                        share|improve this answer














                                        share|improve this answer



                                        share|improve this answer








                                        edited May 4 '14 at 11:38









                                        M Khalid Junaid

                                        52.2k86091




                                        52.2k86091










                                        answered Apr 11 '14 at 6:55









                                        jeet singh parmar

                                        41753




                                        41753












                                        • Could you elaborate a bit on your answer? Why is your query preferrable to Vijays original query?
                                          – janfoeh
                                          May 4 '14 at 11:57


















                                        • Could you elaborate a bit on your answer? Why is your query preferrable to Vijays original query?
                                          – janfoeh
                                          May 4 '14 at 11:57
















                                        Could you elaborate a bit on your answer? Why is your query preferrable to Vijays original query?
                                        – janfoeh
                                        May 4 '14 at 11:57




                                        Could you elaborate a bit on your answer? Why is your query preferrable to Vijays original query?
                                        – janfoeh
                                        May 4 '14 at 11:57










                                        up vote
                                        3
                                        down vote













                                        Try this:



                                        SELECT jos_categories.title AS name,
                                        joined .catid,
                                        joined .title,
                                        joined .introtext
                                        FROM jos_categories
                                        INNER JOIN (SELECT *
                                        FROM (SELECT `title`,
                                        catid,
                                        `created`,
                                        introtext
                                        FROM `jos_content`
                                        WHERE `sectionid` = 6
                                        ORDER BY `id` DESC) AS yes
                                        GROUP BY `yes`.`catid` DESC
                                        ORDER BY `yes`.`created` DESC) AS joined
                                        ON( joined.catid = jos_categories.id )





                                        share|improve this answer



























                                          up vote
                                          3
                                          down vote













                                          Try this:



                                          SELECT jos_categories.title AS name,
                                          joined .catid,
                                          joined .title,
                                          joined .introtext
                                          FROM jos_categories
                                          INNER JOIN (SELECT *
                                          FROM (SELECT `title`,
                                          catid,
                                          `created`,
                                          introtext
                                          FROM `jos_content`
                                          WHERE `sectionid` = 6
                                          ORDER BY `id` DESC) AS yes
                                          GROUP BY `yes`.`catid` DESC
                                          ORDER BY `yes`.`created` DESC) AS joined
                                          ON( joined.catid = jos_categories.id )





                                          share|improve this answer

























                                            up vote
                                            3
                                            down vote










                                            up vote
                                            3
                                            down vote









                                            Try this:



                                            SELECT jos_categories.title AS name,
                                            joined .catid,
                                            joined .title,
                                            joined .introtext
                                            FROM jos_categories
                                            INNER JOIN (SELECT *
                                            FROM (SELECT `title`,
                                            catid,
                                            `created`,
                                            introtext
                                            FROM `jos_content`
                                            WHERE `sectionid` = 6
                                            ORDER BY `id` DESC) AS yes
                                            GROUP BY `yes`.`catid` DESC
                                            ORDER BY `yes`.`created` DESC) AS joined
                                            ON( joined.catid = jos_categories.id )





                                            share|improve this answer














                                            Try this:



                                            SELECT jos_categories.title AS name,
                                            joined .catid,
                                            joined .title,
                                            joined .introtext
                                            FROM jos_categories
                                            INNER JOIN (SELECT *
                                            FROM (SELECT `title`,
                                            catid,
                                            `created`,
                                            introtext
                                            FROM `jos_content`
                                            WHERE `sectionid` = 6
                                            ORDER BY `id` DESC) AS yes
                                            GROUP BY `yes`.`catid` DESC
                                            ORDER BY `yes`.`created` DESC) AS joined
                                            ON( joined.catid = jos_categories.id )






                                            share|improve this answer














                                            share|improve this answer



                                            share|improve this answer








                                            edited Jul 15 '11 at 13:47









                                            Brock Adams

                                            67.6k14153211




                                            67.6k14153211










                                            answered Jul 15 '11 at 2:05









                                            Pro Web Design

                                            311




                                            311






















                                                up vote
                                                3
                                                down vote













                                                You can take view from here as well.



                                                http://sqlfiddle.com/#!9/ef42b/9



                                                FIRST SOLUTION



                                                SELECT d1.ID,Name,City FROM Demo_User d1
                                                INNER JOIN
                                                (SELECT MAX(ID) AS ID FROM Demo_User GROUP By NAME) AS P ON (d1.ID=P.ID);


                                                SECOND SOLUTION



                                                SELECT * FROM (SELECT * FROM Demo_User ORDER BY ID DESC) AS T GROUP BY NAME ;





                                                share|improve this answer





















                                                • Second Solution doesn't work for my case
                                                  – dikirill
                                                  Apr 28 '17 at 18:41















                                                up vote
                                                3
                                                down vote













                                                You can take view from here as well.



                                                http://sqlfiddle.com/#!9/ef42b/9



                                                FIRST SOLUTION



                                                SELECT d1.ID,Name,City FROM Demo_User d1
                                                INNER JOIN
                                                (SELECT MAX(ID) AS ID FROM Demo_User GROUP By NAME) AS P ON (d1.ID=P.ID);


                                                SECOND SOLUTION



                                                SELECT * FROM (SELECT * FROM Demo_User ORDER BY ID DESC) AS T GROUP BY NAME ;





                                                share|improve this answer





















                                                • Second Solution doesn't work for my case
                                                  – dikirill
                                                  Apr 28 '17 at 18:41













                                                up vote
                                                3
                                                down vote










                                                up vote
                                                3
                                                down vote









                                                You can take view from here as well.



                                                http://sqlfiddle.com/#!9/ef42b/9



                                                FIRST SOLUTION



                                                SELECT d1.ID,Name,City FROM Demo_User d1
                                                INNER JOIN
                                                (SELECT MAX(ID) AS ID FROM Demo_User GROUP By NAME) AS P ON (d1.ID=P.ID);


                                                SECOND SOLUTION



                                                SELECT * FROM (SELECT * FROM Demo_User ORDER BY ID DESC) AS T GROUP BY NAME ;





                                                share|improve this answer












                                                You can take view from here as well.



                                                http://sqlfiddle.com/#!9/ef42b/9



                                                FIRST SOLUTION



                                                SELECT d1.ID,Name,City FROM Demo_User d1
                                                INNER JOIN
                                                (SELECT MAX(ID) AS ID FROM Demo_User GROUP By NAME) AS P ON (d1.ID=P.ID);


                                                SECOND SOLUTION



                                                SELECT * FROM (SELECT * FROM Demo_User ORDER BY ID DESC) AS T GROUP BY NAME ;






                                                share|improve this answer












                                                share|improve this answer



                                                share|improve this answer










                                                answered Sep 28 '15 at 9:07









                                                Shrikant Gupta

                                                559




                                                559












                                                • Second Solution doesn't work for my case
                                                  – dikirill
                                                  Apr 28 '17 at 18:41


















                                                • Second Solution doesn't work for my case
                                                  – dikirill
                                                  Apr 28 '17 at 18:41
















                                                Second Solution doesn't work for my case
                                                – dikirill
                                                Apr 28 '17 at 18:41




                                                Second Solution doesn't work for my case
                                                – dikirill
                                                Apr 28 '17 at 18:41










                                                up vote
                                                2
                                                down vote













                                                Is there any way we could use this method to delete duplicates in a table? The result set is basically a collection of unique records, so if we could delete all records not in the result set, we would effectively have no duplicates? I tried this but mySQL gave a 1093 error.



                                                DELETE FROM messages WHERE id NOT IN
                                                (SELECT m1.id
                                                FROM messages m1 LEFT JOIN messages m2
                                                ON (m1.name = m2.name AND m1.id < m2.id)
                                                WHERE m2.id IS NULL)


                                                Is there a way to maybe save the output to a temp variable then delete from NOT IN (temp variable)? @Bill thanks for a very useful solution.



                                                EDIT: Think i found the solution:



                                                DROP TABLE IF EXISTS UniqueIDs; 
                                                CREATE Temporary table UniqueIDs (id Int(11));

                                                INSERT INTO UniqueIDs
                                                (SELECT T1.ID FROM Table T1 LEFT JOIN Table T2 ON
                                                (T1.Field1 = T2.Field1 AND T1.Field2 = T2.Field2 #Comparison Fields
                                                AND T1.ID < T2.ID)
                                                WHERE T2.ID IS NULL);

                                                DELETE FROM Table WHERE id NOT IN (SELECT ID FROM UniqueIDs);





                                                share|improve this answer



























                                                  up vote
                                                  2
                                                  down vote













                                                  Is there any way we could use this method to delete duplicates in a table? The result set is basically a collection of unique records, so if we could delete all records not in the result set, we would effectively have no duplicates? I tried this but mySQL gave a 1093 error.



                                                  DELETE FROM messages WHERE id NOT IN
                                                  (SELECT m1.id
                                                  FROM messages m1 LEFT JOIN messages m2
                                                  ON (m1.name = m2.name AND m1.id < m2.id)
                                                  WHERE m2.id IS NULL)


                                                  Is there a way to maybe save the output to a temp variable then delete from NOT IN (temp variable)? @Bill thanks for a very useful solution.



                                                  EDIT: Think i found the solution:



                                                  DROP TABLE IF EXISTS UniqueIDs; 
                                                  CREATE Temporary table UniqueIDs (id Int(11));

                                                  INSERT INTO UniqueIDs
                                                  (SELECT T1.ID FROM Table T1 LEFT JOIN Table T2 ON
                                                  (T1.Field1 = T2.Field1 AND T1.Field2 = T2.Field2 #Comparison Fields
                                                  AND T1.ID < T2.ID)
                                                  WHERE T2.ID IS NULL);

                                                  DELETE FROM Table WHERE id NOT IN (SELECT ID FROM UniqueIDs);





                                                  share|improve this answer

























                                                    up vote
                                                    2
                                                    down vote










                                                    up vote
                                                    2
                                                    down vote









                                                    Is there any way we could use this method to delete duplicates in a table? The result set is basically a collection of unique records, so if we could delete all records not in the result set, we would effectively have no duplicates? I tried this but mySQL gave a 1093 error.



                                                    DELETE FROM messages WHERE id NOT IN
                                                    (SELECT m1.id
                                                    FROM messages m1 LEFT JOIN messages m2
                                                    ON (m1.name = m2.name AND m1.id < m2.id)
                                                    WHERE m2.id IS NULL)


                                                    Is there a way to maybe save the output to a temp variable then delete from NOT IN (temp variable)? @Bill thanks for a very useful solution.



                                                    EDIT: Think i found the solution:



                                                    DROP TABLE IF EXISTS UniqueIDs; 
                                                    CREATE Temporary table UniqueIDs (id Int(11));

                                                    INSERT INTO UniqueIDs
                                                    (SELECT T1.ID FROM Table T1 LEFT JOIN Table T2 ON
                                                    (T1.Field1 = T2.Field1 AND T1.Field2 = T2.Field2 #Comparison Fields
                                                    AND T1.ID < T2.ID)
                                                    WHERE T2.ID IS NULL);

                                                    DELETE FROM Table WHERE id NOT IN (SELECT ID FROM UniqueIDs);





                                                    share|improve this answer














                                                    Is there any way we could use this method to delete duplicates in a table? The result set is basically a collection of unique records, so if we could delete all records not in the result set, we would effectively have no duplicates? I tried this but mySQL gave a 1093 error.



                                                    DELETE FROM messages WHERE id NOT IN
                                                    (SELECT m1.id
                                                    FROM messages m1 LEFT JOIN messages m2
                                                    ON (m1.name = m2.name AND m1.id < m2.id)
                                                    WHERE m2.id IS NULL)


                                                    Is there a way to maybe save the output to a temp variable then delete from NOT IN (temp variable)? @Bill thanks for a very useful solution.



                                                    EDIT: Think i found the solution:



                                                    DROP TABLE IF EXISTS UniqueIDs; 
                                                    CREATE Temporary table UniqueIDs (id Int(11));

                                                    INSERT INTO UniqueIDs
                                                    (SELECT T1.ID FROM Table T1 LEFT JOIN Table T2 ON
                                                    (T1.Field1 = T2.Field1 AND T1.Field2 = T2.Field2 #Comparison Fields
                                                    AND T1.ID < T2.ID)
                                                    WHERE T2.ID IS NULL);

                                                    DELETE FROM Table WHERE id NOT IN (SELECT ID FROM UniqueIDs);






                                                    share|improve this answer














                                                    share|improve this answer



                                                    share|improve this answer








                                                    edited Oct 8 '10 at 1:57

























                                                    answered Oct 8 '10 at 1:10









                                                    Simon

                                                    5,6981353102




                                                    5,6981353102






















                                                        up vote
                                                        2
                                                        down vote













                                                        The below query will work fine as per your question.



                                                        SELECT M1.* 
                                                        FROM MESSAGES M1,
                                                        (
                                                        SELECT SUBSTR(Others_data,1,2),MAX(Others_data) AS Max_Others_data
                                                        FROM MESSAGES
                                                        GROUP BY 1
                                                        ) M2
                                                        WHERE M1.Others_data = M2.Max_Others_data
                                                        ORDER BY Others_data;





                                                        share|improve this answer



























                                                          up vote
                                                          2
                                                          down vote













                                                          The below query will work fine as per your question.



                                                          SELECT M1.* 
                                                          FROM MESSAGES M1,
                                                          (
                                                          SELECT SUBSTR(Others_data,1,2),MAX(Others_data) AS Max_Others_data
                                                          FROM MESSAGES
                                                          GROUP BY 1
                                                          ) M2
                                                          WHERE M1.Others_data = M2.Max_Others_data
                                                          ORDER BY Others_data;





                                                          share|improve this answer

























                                                            up vote
                                                            2
                                                            down vote










                                                            up vote
                                                            2
                                                            down vote









                                                            The below query will work fine as per your question.



                                                            SELECT M1.* 
                                                            FROM MESSAGES M1,
                                                            (
                                                            SELECT SUBSTR(Others_data,1,2),MAX(Others_data) AS Max_Others_data
                                                            FROM MESSAGES
                                                            GROUP BY 1
                                                            ) M2
                                                            WHERE M1.Others_data = M2.Max_Others_data
                                                            ORDER BY Others_data;





                                                            share|improve this answer














                                                            The below query will work fine as per your question.



                                                            SELECT M1.* 
                                                            FROM MESSAGES M1,
                                                            (
                                                            SELECT SUBSTR(Others_data,1,2),MAX(Others_data) AS Max_Others_data
                                                            FROM MESSAGES
                                                            GROUP BY 1
                                                            ) M2
                                                            WHERE M1.Others_data = M2.Max_Others_data
                                                            ORDER BY Others_data;






                                                            share|improve this answer














                                                            share|improve this answer



                                                            share|improve this answer








                                                            edited Nov 18 '11 at 20:21









                                                            animuson

                                                            41.9k22113129




                                                            41.9k22113129










                                                            answered Nov 18 '11 at 20:19









                                                            Teja

                                                            7,2942363103




                                                            7,2942363103






















                                                                up vote
                                                                2
                                                                down vote













                                                                Hi @Vijay Dev if your table messages contains Id which is auto increment primary key then to fetch the latest record basis on the primary key your query should read as below:



                                                                SELECT m1.* FROM messages m1 INNER JOIN (SELECT max(Id) as lastmsgId FROM messages GROUP BY Name) m2 ON m1.Id=m2.lastmsgId





                                                                share|improve this answer

























                                                                  up vote
                                                                  2
                                                                  down vote













                                                                  Hi @Vijay Dev if your table messages contains Id which is auto increment primary key then to fetch the latest record basis on the primary key your query should read as below:



                                                                  SELECT m1.* FROM messages m1 INNER JOIN (SELECT max(Id) as lastmsgId FROM messages GROUP BY Name) m2 ON m1.Id=m2.lastmsgId





                                                                  share|improve this answer























                                                                    up vote
                                                                    2
                                                                    down vote










                                                                    up vote
                                                                    2
                                                                    down vote









                                                                    Hi @Vijay Dev if your table messages contains Id which is auto increment primary key then to fetch the latest record basis on the primary key your query should read as below:



                                                                    SELECT m1.* FROM messages m1 INNER JOIN (SELECT max(Id) as lastmsgId FROM messages GROUP BY Name) m2 ON m1.Id=m2.lastmsgId





                                                                    share|improve this answer












                                                                    Hi @Vijay Dev if your table messages contains Id which is auto increment primary key then to fetch the latest record basis on the primary key your query should read as below:



                                                                    SELECT m1.* FROM messages m1 INNER JOIN (SELECT max(Id) as lastmsgId FROM messages GROUP BY Name) m2 ON m1.Id=m2.lastmsgId






                                                                    share|improve this answer












                                                                    share|improve this answer



                                                                    share|improve this answer










                                                                    answered Oct 21 '14 at 14:08









                                                                    bikashphp

                                                                    7429




                                                                    7429






















                                                                        up vote
                                                                        2
                                                                        down vote













                                                                        If you want the last row for each Name, then you can give a row number to each row group by the Name and order by Id in descending order.



                                                                        QUERY



                                                                        SELECT t1.Id, 
                                                                        t1.Name,
                                                                        t1.Other_Columns
                                                                        FROM
                                                                        (
                                                                        SELECT Id,
                                                                        Name,
                                                                        Other_Columns,
                                                                        (
                                                                        CASE Name WHEN @curA
                                                                        THEN @curRow := @curRow + 1
                                                                        ELSE @curRow := 1 AND @curA := Name END
                                                                        ) + 1 AS rn
                                                                        FROM messages t,
                                                                        (SELECT @curRow := 0, @curA := '') r
                                                                        ORDER BY Name,Id DESC
                                                                        )t1
                                                                        WHERE t1.rn = 1
                                                                        ORDER BY t1.Id;


                                                                        SQL Fiddle






                                                                        share|improve this answer

























                                                                          up vote
                                                                          2
                                                                          down vote













                                                                          If you want the last row for each Name, then you can give a row number to each row group by the Name and order by Id in descending order.



                                                                          QUERY



                                                                          SELECT t1.Id, 
                                                                          t1.Name,
                                                                          t1.Other_Columns
                                                                          FROM
                                                                          (
                                                                          SELECT Id,
                                                                          Name,
                                                                          Other_Columns,
                                                                          (
                                                                          CASE Name WHEN @curA
                                                                          THEN @curRow := @curRow + 1
                                                                          ELSE @curRow := 1 AND @curA := Name END
                                                                          ) + 1 AS rn
                                                                          FROM messages t,
                                                                          (SELECT @curRow := 0, @curA := '') r
                                                                          ORDER BY Name,Id DESC
                                                                          )t1
                                                                          WHERE t1.rn = 1
                                                                          ORDER BY t1.Id;


                                                                          SQL Fiddle






                                                                          share|improve this answer























                                                                            up vote
                                                                            2
                                                                            down vote










                                                                            up vote
                                                                            2
                                                                            down vote









                                                                            If you want the last row for each Name, then you can give a row number to each row group by the Name and order by Id in descending order.



                                                                            QUERY



                                                                            SELECT t1.Id, 
                                                                            t1.Name,
                                                                            t1.Other_Columns
                                                                            FROM
                                                                            (
                                                                            SELECT Id,
                                                                            Name,
                                                                            Other_Columns,
                                                                            (
                                                                            CASE Name WHEN @curA
                                                                            THEN @curRow := @curRow + 1
                                                                            ELSE @curRow := 1 AND @curA := Name END
                                                                            ) + 1 AS rn
                                                                            FROM messages t,
                                                                            (SELECT @curRow := 0, @curA := '') r
                                                                            ORDER BY Name,Id DESC
                                                                            )t1
                                                                            WHERE t1.rn = 1
                                                                            ORDER BY t1.Id;


                                                                            SQL Fiddle






                                                                            share|improve this answer












                                                                            If you want the last row for each Name, then you can give a row number to each row group by the Name and order by Id in descending order.



                                                                            QUERY



                                                                            SELECT t1.Id, 
                                                                            t1.Name,
                                                                            t1.Other_Columns
                                                                            FROM
                                                                            (
                                                                            SELECT Id,
                                                                            Name,
                                                                            Other_Columns,
                                                                            (
                                                                            CASE Name WHEN @curA
                                                                            THEN @curRow := @curRow + 1
                                                                            ELSE @curRow := 1 AND @curA := Name END
                                                                            ) + 1 AS rn
                                                                            FROM messages t,
                                                                            (SELECT @curRow := 0, @curA := '') r
                                                                            ORDER BY Name,Id DESC
                                                                            )t1
                                                                            WHERE t1.rn = 1
                                                                            ORDER BY t1.Id;


                                                                            SQL Fiddle







                                                                            share|improve this answer












                                                                            share|improve this answer



                                                                            share|improve this answer










                                                                            answered Nov 19 '15 at 4:36









                                                                            Wanderer

                                                                            10.1k42143




                                                                            10.1k42143






















                                                                                up vote
                                                                                2
                                                                                down vote













                                                                                An approach with considerable speed is as follows.



                                                                                SELECT * 
                                                                                FROM messages a
                                                                                WHERE Id = (SELECT MAX(Id) FROM messages WHERE a.Name = Name)


                                                                                Result



                                                                                Id  Name    Other_Columns
                                                                                3 A A_data_3
                                                                                5 B B_data_2
                                                                                6 C C_data_1





                                                                                share|improve this answer

























                                                                                  up vote
                                                                                  2
                                                                                  down vote













                                                                                  An approach with considerable speed is as follows.



                                                                                  SELECT * 
                                                                                  FROM messages a
                                                                                  WHERE Id = (SELECT MAX(Id) FROM messages WHERE a.Name = Name)


                                                                                  Result



                                                                                  Id  Name    Other_Columns
                                                                                  3 A A_data_3
                                                                                  5 B B_data_2
                                                                                  6 C C_data_1





                                                                                  share|improve this answer























                                                                                    up vote
                                                                                    2
                                                                                    down vote










                                                                                    up vote
                                                                                    2
                                                                                    down vote









                                                                                    An approach with considerable speed is as follows.



                                                                                    SELECT * 
                                                                                    FROM messages a
                                                                                    WHERE Id = (SELECT MAX(Id) FROM messages WHERE a.Name = Name)


                                                                                    Result



                                                                                    Id  Name    Other_Columns
                                                                                    3 A A_data_3
                                                                                    5 B B_data_2
                                                                                    6 C C_data_1





                                                                                    share|improve this answer












                                                                                    An approach with considerable speed is as follows.



                                                                                    SELECT * 
                                                                                    FROM messages a
                                                                                    WHERE Id = (SELECT MAX(Id) FROM messages WHERE a.Name = Name)


                                                                                    Result



                                                                                    Id  Name    Other_Columns
                                                                                    3 A A_data_3
                                                                                    5 B B_data_2
                                                                                    6 C C_data_1






                                                                                    share|improve this answer












                                                                                    share|improve this answer



                                                                                    share|improve this answer










                                                                                    answered Mar 10 at 20:33









                                                                                    Song Zhengyi

                                                                                    1615




                                                                                    1615






















                                                                                        up vote
                                                                                        2
                                                                                        down vote













                                                                                        Clearly there are lots of different ways of getting the same results, your question seems to be what is an efficient way of getting the last results in each group in MySQL. If you are working with huge amounts of data and assuming you are using InnoDB with even the latest versions of MySQL (such as 5.7.21 and 8.0.4-rc) then there might not be an efficient way of doing this.



                                                                                        We sometimes need to do this with tables with even more than 60 million rows.



                                                                                        For these examples I will use data with only about 1.5 million rows where the queries would need to find results for all groups in the data. In our actual cases we would often need to return back data from about 2,000 groups (which hypothetically would not require examining very much of the data).



                                                                                        I will use the following tables:



                                                                                        CREATE TABLE temperature(
                                                                                        id INT UNSIGNED NOT NULL AUTO_INCREMENT,
                                                                                        groupID INT UNSIGNED NOT NULL,
                                                                                        recordedTimestamp TIMESTAMP NOT NULL,
                                                                                        recordedValue INT NOT NULL,
                                                                                        INDEX groupIndex(groupID, recordedTimestamp),
                                                                                        PRIMARY KEY (id)
                                                                                        );

                                                                                        CREATE TEMPORARY TABLE selected_group(id INT UNSIGNED NOT NULL, PRIMARY KEY(id));


                                                                                        The temperature table is populated with about 1.5 million random records, and with 100 different groups.
                                                                                        The selected_group is populated with those 100 groups (in our cases this would normally be less than 20% for all of the groups).



                                                                                        As this data is random it means that multiple rows can have the same recordedTimestamps. What we want is to get a list of all of the selected groups in order of groupID with the last recordedTimestamp for each group, and if the same group has more than one matching row like that then the last matching id of those rows.



                                                                                        If hypothetically MySQL had a last() function which returned values from the last row in a special ORDER BY clause then we could simply do:



                                                                                        SELECT 
                                                                                        last(t1.id) AS id,
                                                                                        t1.groupID,
                                                                                        last(t1.recordedTimestamp) AS recordedTimestamp,
                                                                                        last(t1.recordedValue) AS recordedValue
                                                                                        FROM selected_group g
                                                                                        INNER JOIN temperature t1 ON t1.groupID = g.id
                                                                                        ORDER BY t1.recordedTimestamp, t1.id
                                                                                        GROUP BY t1.groupID;


                                                                                        which would only need to examine a few 100 rows in this case as it doesn't use any of the normal GROUP BY functions. This would execute in 0 seconds and hence be highly efficient.
                                                                                        Note that normally in MySQL we would see an ORDER BY clause following the GROUP BY clause however this ORDER BY clause is used to determine the ORDER for the last() function, if it was after the GROUP BY then it would be ordering the GROUPS. If no GROUP BY clause is present then the last values will be the same in all of the returned rows.



                                                                                        However MySQL does not have this so let's look at different ideas of what it does have and prove that none of these are efficient.



                                                                                        Example 1



                                                                                        SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
                                                                                        FROM selected_group g
                                                                                        INNER JOIN temperature t1 ON t1.id = (
                                                                                        SELECT t2.id
                                                                                        FROM temperature t2
                                                                                        WHERE t2.groupID = g.id
                                                                                        ORDER BY t2.recordedTimestamp DESC, t2.id DESC
                                                                                        LIMIT 1
                                                                                        );


                                                                                        This examined 3,009,254 rows and took ~0.859 seconds on 5.7.21 and slightly longer on 8.0.4-rc



                                                                                        Example 2



                                                                                        SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 
                                                                                        FROM temperature t1
                                                                                        INNER JOIN (
                                                                                        SELECT max(t2.id) AS id
                                                                                        FROM temperature t2
                                                                                        INNER JOIN (
                                                                                        SELECT t3.groupID, max(t3.recordedTimestamp) AS recordedTimestamp
                                                                                        FROM selected_group g
                                                                                        INNER JOIN temperature t3 ON t3.groupID = g.id
                                                                                        GROUP BY t3.groupID
                                                                                        ) t4 ON t4.groupID = t2.groupID AND t4.recordedTimestamp = t2.recordedTimestamp
                                                                                        GROUP BY t2.groupID
                                                                                        ) t5 ON t5.id = t1.id;


                                                                                        This examined 1,505,331 rows and took ~1.25 seconds on 5.7.21 and slightly longer on 8.0.4-rc



                                                                                        Example 3



                                                                                        SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 
                                                                                        FROM temperature t1
                                                                                        WHERE t1.id IN (
                                                                                        SELECT max(t2.id) AS id
                                                                                        FROM temperature t2
                                                                                        INNER JOIN (
                                                                                        SELECT t3.groupID, max(t3.recordedTimestamp) AS recordedTimestamp
                                                                                        FROM selected_group g
                                                                                        INNER JOIN temperature t3 ON t3.groupID = g.id
                                                                                        GROUP BY t3.groupID
                                                                                        ) t4 ON t4.groupID = t2.groupID AND t4.recordedTimestamp = t2.recordedTimestamp
                                                                                        GROUP BY t2.groupID
                                                                                        )
                                                                                        ORDER BY t1.groupID;


                                                                                        This examined 3,009,685 rows and took ~1.95 seconds on 5.7.21 and slightly longer on 8.0.4-rc



                                                                                        Example 4



                                                                                        SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
                                                                                        FROM selected_group g
                                                                                        INNER JOIN temperature t1 ON t1.id = (
                                                                                        SELECT max(t2.id)
                                                                                        FROM temperature t2
                                                                                        WHERE t2.groupID = g.id AND t2.recordedTimestamp = (
                                                                                        SELECT max(t3.recordedTimestamp)
                                                                                        FROM temperature t3
                                                                                        WHERE t3.groupID = g.id
                                                                                        )
                                                                                        );


                                                                                        This examined 6,137,810 rows and took ~2.2 seconds on 5.7.21 and slightly longer on 8.0.4-rc



                                                                                        Example 5



                                                                                        SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
                                                                                        FROM (
                                                                                        SELECT
                                                                                        t2.id,
                                                                                        t2.groupID,
                                                                                        t2.recordedTimestamp,
                                                                                        t2.recordedValue,
                                                                                        row_number() OVER (
                                                                                        PARTITION BY t2.groupID ORDER BY t2.recordedTimestamp DESC, t2.id DESC
                                                                                        ) AS rowNumber
                                                                                        FROM selected_group g
                                                                                        INNER JOIN temperature t2 ON t2.groupID = g.id
                                                                                        ) t1 WHERE t1.rowNumber = 1;


                                                                                        This examined 6,017,808 rows and took ~4.2 seconds on 8.0.4-rc



                                                                                        Example 6



                                                                                        SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 
                                                                                        FROM (
                                                                                        SELECT
                                                                                        last_value(t2.id) OVER w AS id,
                                                                                        t2.groupID,
                                                                                        last_value(t2.recordedTimestamp) OVER w AS recordedTimestamp,
                                                                                        last_value(t2.recordedValue) OVER w AS recordedValue
                                                                                        FROM selected_group g
                                                                                        INNER JOIN temperature t2 ON t2.groupID = g.id
                                                                                        WINDOW w AS (
                                                                                        PARTITION BY t2.groupID
                                                                                        ORDER BY t2.recordedTimestamp, t2.id
                                                                                        RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
                                                                                        )
                                                                                        ) t1
                                                                                        GROUP BY t1.groupID;


                                                                                        This examined 6,017,908 rows and took ~17.5 seconds on 8.0.4-rc



                                                                                        Example 7



                                                                                        SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 
                                                                                        FROM selected_group g
                                                                                        INNER JOIN temperature t1 ON t1.groupID = g.id
                                                                                        LEFT JOIN temperature t2
                                                                                        ON t2.groupID = g.id
                                                                                        AND (
                                                                                        t2.recordedTimestamp > t1.recordedTimestamp
                                                                                        OR (t2.recordedTimestamp = t1.recordedTimestamp AND t2.id > t1.id)
                                                                                        )
                                                                                        WHERE t2.id IS NULL
                                                                                        ORDER BY t1.groupID;


                                                                                        This one was taking forever so I had to kill it.






                                                                                        share|improve this answer



























                                                                                          up vote
                                                                                          2
                                                                                          down vote













                                                                                          Clearly there are lots of different ways of getting the same results, your question seems to be what is an efficient way of getting the last results in each group in MySQL. If you are working with huge amounts of data and assuming you are using InnoDB with even the latest versions of MySQL (such as 5.7.21 and 8.0.4-rc) then there might not be an efficient way of doing this.



                                                                                          We sometimes need to do this with tables with even more than 60 million rows.



                                                                                          For these examples I will use data with only about 1.5 million rows where the queries would need to find results for all groups in the data. In our actual cases we would often need to return back data from about 2,000 groups (which hypothetically would not require examining very much of the data).



                                                                                          I will use the following tables:



                                                                                          CREATE TABLE temperature(
                                                                                          id INT UNSIGNED NOT NULL AUTO_INCREMENT,
                                                                                          groupID INT UNSIGNED NOT NULL,
                                                                                          recordedTimestamp TIMESTAMP NOT NULL,
                                                                                          recordedValue INT NOT NULL,
                                                                                          INDEX groupIndex(groupID, recordedTimestamp),
                                                                                          PRIMARY KEY (id)
                                                                                          );

                                                                                          CREATE TEMPORARY TABLE selected_group(id INT UNSIGNED NOT NULL, PRIMARY KEY(id));


                                                                                          The temperature table is populated with about 1.5 million random records, and with 100 different groups.
                                                                                          The selected_group is populated with those 100 groups (in our cases this would normally be less than 20% for all of the groups).



                                                                                          As this data is random it means that multiple rows can have the same recordedTimestamps. What we want is to get a list of all of the selected groups in order of groupID with the last recordedTimestamp for each group, and if the same group has more than one matching row like that then the last matching id of those rows.



                                                                                          If hypothetically MySQL had a last() function which returned values from the last row in a special ORDER BY clause then we could simply do:



                                                                                          SELECT 
                                                                                          last(t1.id) AS id,
                                                                                          t1.groupID,
                                                                                          last(t1.recordedTimestamp) AS recordedTimestamp,
                                                                                          last(t1.recordedValue) AS recordedValue
                                                                                          FROM selected_group g
                                                                                          INNER JOIN temperature t1 ON t1.groupID = g.id
                                                                                          ORDER BY t1.recordedTimestamp, t1.id
                                                                                          GROUP BY t1.groupID;


                                                                                          which would only need to examine a few 100 rows in this case as it doesn't use any of the normal GROUP BY functions. This would execute in 0 seconds and hence be highly efficient.
                                                                                          Note that normally in MySQL we would see an ORDER BY clause following the GROUP BY clause however this ORDER BY clause is used to determine the ORDER for the last() function, if it was after the GROUP BY then it would be ordering the GROUPS. If no GROUP BY clause is present then the last values will be the same in all of the returned rows.



                                                                                          However MySQL does not have this so let's look at different ideas of what it does have and prove that none of these are efficient.



                                                                                          Example 1



                                                                                          SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
                                                                                          FROM selected_group g
                                                                                          INNER JOIN temperature t1 ON t1.id = (
                                                                                          SELECT t2.id
                                                                                          FROM temperature t2
                                                                                          WHERE t2.groupID = g.id
                                                                                          ORDER BY t2.recordedTimestamp DESC, t2.id DESC
                                                                                          LIMIT 1
                                                                                          );


                                                                                          This examined 3,009,254 rows and took ~0.859 seconds on 5.7.21 and slightly longer on 8.0.4-rc



                                                                                          Example 2



                                                                                          SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 
                                                                                          FROM temperature t1
                                                                                          INNER JOIN (
                                                                                          SELECT max(t2.id) AS id
                                                                                          FROM temperature t2
                                                                                          INNER JOIN (
                                                                                          SELECT t3.groupID, max(t3.recordedTimestamp) AS recordedTimestamp
                                                                                          FROM selected_group g
                                                                                          INNER JOIN temperature t3 ON t3.groupID = g.id
                                                                                          GROUP BY t3.groupID
                                                                                          ) t4 ON t4.groupID = t2.groupID AND t4.recordedTimestamp = t2.recordedTimestamp
                                                                                          GROUP BY t2.groupID
                                                                                          ) t5 ON t5.id = t1.id;


                                                                                          This examined 1,505,331 rows and took ~1.25 seconds on 5.7.21 and slightly longer on 8.0.4-rc



                                                                                          Example 3



                                                                                          SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 
                                                                                          FROM temperature t1
                                                                                          WHERE t1.id IN (
                                                                                          SELECT max(t2.id) AS id
                                                                                          FROM temperature t2
                                                                                          INNER JOIN (
                                                                                          SELECT t3.groupID, max(t3.recordedTimestamp) AS recordedTimestamp
                                                                                          FROM selected_group g
                                                                                          INNER JOIN temperature t3 ON t3.groupID = g.id
                                                                                          GROUP BY t3.groupID
                                                                                          ) t4 ON t4.groupID = t2.groupID AND t4.recordedTimestamp = t2.recordedTimestamp
                                                                                          GROUP BY t2.groupID
                                                                                          )
                                                                                          ORDER BY t1.groupID;


                                                                                          This examined 3,009,685 rows and took ~1.95 seconds on 5.7.21 and slightly longer on 8.0.4-rc



                                                                                          Example 4



                                                                                          SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
                                                                                          FROM selected_group g
                                                                                          INNER JOIN temperature t1 ON t1.id = (
                                                                                          SELECT max(t2.id)
                                                                                          FROM temperature t2
                                                                                          WHERE t2.groupID = g.id AND t2.recordedTimestamp = (
                                                                                          SELECT max(t3.recordedTimestamp)
                                                                                          FROM temperature t3
                                                                                          WHERE t3.groupID = g.id
                                                                                          )
                                                                                          );


                                                                                          This examined 6,137,810 rows and took ~2.2 seconds on 5.7.21 and slightly longer on 8.0.4-rc



                                                                                          Example 5



                                                                                          SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
                                                                                          FROM (
                                                                                          SELECT
                                                                                          t2.id,
                                                                                          t2.groupID,
                                                                                          t2.recordedTimestamp,
                                                                                          t2.recordedValue,
                                                                                          row_number() OVER (
                                                                                          PARTITION BY t2.groupID ORDER BY t2.recordedTimestamp DESC, t2.id DESC
                                                                                          ) AS rowNumber
                                                                                          FROM selected_group g
                                                                                          INNER JOIN temperature t2 ON t2.groupID = g.id
                                                                                          ) t1 WHERE t1.rowNumber = 1;


                                                                                          This examined 6,017,808 rows and took ~4.2 seconds on 8.0.4-rc



                                                                                          Example 6



                                                                                          SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 
                                                                                          FROM (
                                                                                          SELECT
                                                                                          last_value(t2.id) OVER w AS id,
                                                                                          t2.groupID,
                                                                                          last_value(t2.recordedTimestamp) OVER w AS recordedTimestamp,
                                                                                          last_value(t2.recordedValue) OVER w AS recordedValue
                                                                                          FROM selected_group g
                                                                                          INNER JOIN temperature t2 ON t2.groupID = g.id
                                                                                          WINDOW w AS (
                                                                                          PARTITION BY t2.groupID
                                                                                          ORDER BY t2.recordedTimestamp, t2.id
                                                                                          RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
                                                                                          )
                                                                                          ) t1
                                                                                          GROUP BY t1.groupID;


                                                                                          This examined 6,017,908 rows and took ~17.5 seconds on 8.0.4-rc



                                                                                          Example 7



                                                                                          SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 
                                                                                          FROM selected_group g
                                                                                          INNER JOIN temperature t1 ON t1.groupID = g.id
                                                                                          LEFT JOIN temperature t2
                                                                                          ON t2.groupID = g.id
                                                                                          AND (
                                                                                          t2.recordedTimestamp > t1.recordedTimestamp
                                                                                          OR (t2.recordedTimestamp = t1.recordedTimestamp AND t2.id > t1.id)
                                                                                          )
                                                                                          WHERE t2.id IS NULL
                                                                                          ORDER BY t1.groupID;


                                                                                          This one was taking forever so I had to kill it.






                                                                                          share|improve this answer

























                                                                                            up vote
                                                                                            2
                                                                                            down vote










                                                                                            up vote
                                                                                            2
                                                                                            down vote









                                                                                            Clearly there are lots of different ways of getting the same results, your question seems to be what is an efficient way of getting the last results in each group in MySQL. If you are working with huge amounts of data and assuming you are using InnoDB with even the latest versions of MySQL (such as 5.7.21 and 8.0.4-rc) then there might not be an efficient way of doing this.



                                                                                            We sometimes need to do this with tables with even more than 60 million rows.



                                                                                            For these examples I will use data with only about 1.5 million rows where the queries would need to find results for all groups in the data. In our actual cases we would often need to return back data from about 2,000 groups (which hypothetically would not require examining very much of the data).



                                                                                            I will use the following tables:



                                                                                            CREATE TABLE temperature(
                                                                                            id INT UNSIGNED NOT NULL AUTO_INCREMENT,
                                                                                            groupID INT UNSIGNED NOT NULL,
                                                                                            recordedTimestamp TIMESTAMP NOT NULL,
                                                                                            recordedValue INT NOT NULL,
                                                                                            INDEX groupIndex(groupID, recordedTimestamp),
                                                                                            PRIMARY KEY (id)
                                                                                            );

                                                                                            CREATE TEMPORARY TABLE selected_group(id INT UNSIGNED NOT NULL, PRIMARY KEY(id));


                                                                                            The temperature table is populated with about 1.5 million random records, and with 100 different groups.
                                                                                            The selected_group is populated with those 100 groups (in our cases this would normally be less than 20% for all of the groups).



                                                                                            As this data is random it means that multiple rows can have the same recordedTimestamps. What we want is to get a list of all of the selected groups in order of groupID with the last recordedTimestamp for each group, and if the same group has more than one matching row like that then the last matching id of those rows.



                                                                                            If hypothetically MySQL had a last() function which returned values from the last row in a special ORDER BY clause then we could simply do:



                                                                                            SELECT 
                                                                                            last(t1.id) AS id,
                                                                                            t1.groupID,
                                                                                            last(t1.recordedTimestamp) AS recordedTimestamp,
                                                                                            last(t1.recordedValue) AS recordedValue
                                                                                            FROM selected_group g
                                                                                            INNER JOIN temperature t1 ON t1.groupID = g.id
                                                                                            ORDER BY t1.recordedTimestamp, t1.id
                                                                                            GROUP BY t1.groupID;


                                                                                            which would only need to examine a few 100 rows in this case as it doesn't use any of the normal GROUP BY functions. This would execute in 0 seconds and hence be highly efficient.
                                                                                            Note that normally in MySQL we would see an ORDER BY clause following the GROUP BY clause however this ORDER BY clause is used to determine the ORDER for the last() function, if it was after the GROUP BY then it would be ordering the GROUPS. If no GROUP BY clause is present then the last values will be the same in all of the returned rows.



                                                                                            However MySQL does not have this so let's look at different ideas of what it does have and prove that none of these are efficient.



                                                                                            Example 1



                                                                                            SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
                                                                                            FROM selected_group g
                                                                                            INNER JOIN temperature t1 ON t1.id = (
                                                                                            SELECT t2.id
                                                                                            FROM temperature t2
                                                                                            WHERE t2.groupID = g.id
                                                                                            ORDER BY t2.recordedTimestamp DESC, t2.id DESC
                                                                                            LIMIT 1
                                                                                            );


                                                                                            This examined 3,009,254 rows and took ~0.859 seconds on 5.7.21 and slightly longer on 8.0.4-rc



                                                                                            Example 2



                                                                                            SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 
                                                                                            FROM temperature t1
                                                                                            INNER JOIN (
                                                                                            SELECT max(t2.id) AS id
                                                                                            FROM temperature t2
                                                                                            INNER JOIN (
                                                                                            SELECT t3.groupID, max(t3.recordedTimestamp) AS recordedTimestamp
                                                                                            FROM selected_group g
                                                                                            INNER JOIN temperature t3 ON t3.groupID = g.id
                                                                                            GROUP BY t3.groupID
                                                                                            ) t4 ON t4.groupID = t2.groupID AND t4.recordedTimestamp = t2.recordedTimestamp
                                                                                            GROUP BY t2.groupID
                                                                                            ) t5 ON t5.id = t1.id;


                                                                                            This examined 1,505,331 rows and took ~1.25 seconds on 5.7.21 and slightly longer on 8.0.4-rc



                                                                                            Example 3



                                                                                            SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 
                                                                                            FROM temperature t1
                                                                                            WHERE t1.id IN (
                                                                                            SELECT max(t2.id) AS id
                                                                                            FROM temperature t2
                                                                                            INNER JOIN (
                                                                                            SELECT t3.groupID, max(t3.recordedTimestamp) AS recordedTimestamp
                                                                                            FROM selected_group g
                                                                                            INNER JOIN temperature t3 ON t3.groupID = g.id
                                                                                            GROUP BY t3.groupID
                                                                                            ) t4 ON t4.groupID = t2.groupID AND t4.recordedTimestamp = t2.recordedTimestamp
                                                                                            GROUP BY t2.groupID
                                                                                            )
                                                                                            ORDER BY t1.groupID;


                                                                                            This examined 3,009,685 rows and took ~1.95 seconds on 5.7.21 and slightly longer on 8.0.4-rc



                                                                                            Example 4



                                                                                            SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
                                                                                            FROM selected_group g
                                                                                            INNER JOIN temperature t1 ON t1.id = (
                                                                                            SELECT max(t2.id)
                                                                                            FROM temperature t2
                                                                                            WHERE t2.groupID = g.id AND t2.recordedTimestamp = (
                                                                                            SELECT max(t3.recordedTimestamp)
                                                                                            FROM temperature t3
                                                                                            WHERE t3.groupID = g.id
                                                                                            )
                                                                                            );


                                                                                            This examined 6,137,810 rows and took ~2.2 seconds on 5.7.21 and slightly longer on 8.0.4-rc



                                                                                            Example 5



                                                                                            SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
                                                                                            FROM (
                                                                                            SELECT
                                                                                            t2.id,
                                                                                            t2.groupID,
                                                                                            t2.recordedTimestamp,
                                                                                            t2.recordedValue,
                                                                                            row_number() OVER (
                                                                                            PARTITION BY t2.groupID ORDER BY t2.recordedTimestamp DESC, t2.id DESC
                                                                                            ) AS rowNumber
                                                                                            FROM selected_group g
                                                                                            INNER JOIN temperature t2 ON t2.groupID = g.id
                                                                                            ) t1 WHERE t1.rowNumber = 1;


                                                                                            This examined 6,017,808 rows and took ~4.2 seconds on 8.0.4-rc



                                                                                            Example 6



                                                                                            SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 
                                                                                            FROM (
                                                                                            SELECT
                                                                                            last_value(t2.id) OVER w AS id,
                                                                                            t2.groupID,
                                                                                            last_value(t2.recordedTimestamp) OVER w AS recordedTimestamp,
                                                                                            last_value(t2.recordedValue) OVER w AS recordedValue
                                                                                            FROM selected_group g
                                                                                            INNER JOIN temperature t2 ON t2.groupID = g.id
                                                                                            WINDOW w AS (
                                                                                            PARTITION BY t2.groupID
                                                                                            ORDER BY t2.recordedTimestamp, t2.id
                                                                                            RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
                                                                                            )
                                                                                            ) t1
                                                                                            GROUP BY t1.groupID;


                                                                                            This examined 6,017,908 rows and took ~17.5 seconds on 8.0.4-rc



                                                                                            Example 7



                                                                                            SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 
                                                                                            FROM selected_group g
                                                                                            INNER JOIN temperature t1 ON t1.groupID = g.id
                                                                                            LEFT JOIN temperature t2
                                                                                            ON t2.groupID = g.id
                                                                                            AND (
                                                                                            t2.recordedTimestamp > t1.recordedTimestamp
                                                                                            OR (t2.recordedTimestamp = t1.recordedTimestamp AND t2.id > t1.id)
                                                                                            )
                                                                                            WHERE t2.id IS NULL
                                                                                            ORDER BY t1.groupID;


                                                                                            This one was taking forever so I had to kill it.






                                                                                            share|improve this answer














                                                                                            Clearly there are lots of different ways of getting the same results, your question seems to be what is an efficient way of getting the last results in each group in MySQL. If you are working with huge amounts of data and assuming you are using InnoDB with even the latest versions of MySQL (such as 5.7.21 and 8.0.4-rc) then there might not be an efficient way of doing this.



                                                                                            We sometimes need to do this with tables with even more than 60 million rows.



                                                                                            For these examples I will use data with only about 1.5 million rows where the queries would need to find results for all groups in the data. In our actual cases we would often need to return back data from about 2,000 groups (which hypothetically would not require examining very much of the data).



                                                                                            I will use the following tables:



                                                                                            CREATE TABLE temperature(
                                                                                            id INT UNSIGNED NOT NULL AUTO_INCREMENT,
                                                                                            groupID INT UNSIGNED NOT NULL,
                                                                                            recordedTimestamp TIMESTAMP NOT NULL,
                                                                                            recordedValue INT NOT NULL,
                                                                                            INDEX groupIndex(groupID, recordedTimestamp),
                                                                                            PRIMARY KEY (id)
                                                                                            );

                                                                                            CREATE TEMPORARY TABLE selected_group(id INT UNSIGNED NOT NULL, PRIMARY KEY(id));


                                                                                            The temperature table is populated with about 1.5 million random records, and with 100 different groups.
                                                                                            The selected_group is populated with those 100 groups (in our cases this would normally be less than 20% for all of the groups).



                                                                                            As this data is random it means that multiple rows can have the same recordedTimestamps. What we want is to get a list of all of the selected groups in order of groupID with the last recordedTimestamp for each group, and if the same group has more than one matching row like that then the last matching id of those rows.



                                                                                            If hypothetically MySQL had a last() function which returned values from the last row in a special ORDER BY clause then we could simply do:



                                                                                            SELECT 
                                                                                            last(t1.id) AS id,
                                                                                            t1.groupID,
                                                                                            last(t1.recordedTimestamp) AS recordedTimestamp,
                                                                                            last(t1.recordedValue) AS recordedValue
                                                                                            FROM selected_group g
                                                                                            INNER JOIN temperature t1 ON t1.groupID = g.id
                                                                                            ORDER BY t1.recordedTimestamp, t1.id
                                                                                            GROUP BY t1.groupID;


                                                                                            which would only need to examine a few 100 rows in this case as it doesn't use any of the normal GROUP BY functions. This would execute in 0 seconds and hence be highly efficient.
                                                                                            Note that normally in MySQL we would see an ORDER BY clause following the GROUP BY clause however this ORDER BY clause is used to determine the ORDER for the last() function, if it was after the GROUP BY then it would be ordering the GROUPS. If no GROUP BY clause is present then the last values will be the same in all of the returned rows.



                                                                                            However MySQL does not have this so let's look at different ideas of what it does have and prove that none of these are efficient.



                                                                                            Example 1



                                                                                            SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
                                                                                            FROM selected_group g
                                                                                            INNER JOIN temperature t1 ON t1.id = (
                                                                                            SELECT t2.id
                                                                                            FROM temperature t2
                                                                                            WHERE t2.groupID = g.id
                                                                                            ORDER BY t2.recordedTimestamp DESC, t2.id DESC
                                                                                            LIMIT 1
                                                                                            );


                                                                                            This examined 3,009,254 rows and took ~0.859 seconds on 5.7.21 and slightly longer on 8.0.4-rc



                                                                                            Example 2



                                                                                            SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 
                                                                                            FROM temperature t1
                                                                                            INNER JOIN (
                                                                                            SELECT max(t2.id) AS id
                                                                                            FROM temperature t2
                                                                                            INNER JOIN (
                                                                                            SELECT t3.groupID, max(t3.recordedTimestamp) AS recordedTimestamp
                                                                                            FROM selected_group g
                                                                                            INNER JOIN temperature t3 ON t3.groupID = g.id
                                                                                            GROUP BY t3.groupID
                                                                                            ) t4 ON t4.groupID = t2.groupID AND t4.recordedTimestamp = t2.recordedTimestamp
                                                                                            GROUP BY t2.groupID
                                                                                            ) t5 ON t5.id = t1.id;


                                                                                            This examined 1,505,331 rows and took ~1.25 seconds on 5.7.21 and slightly longer on 8.0.4-rc



                                                                                            Example 3



                                                                                            SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 
                                                                                            FROM temperature t1
                                                                                            WHERE t1.id IN (
                                                                                            SELECT max(t2.id) AS id
                                                                                            FROM temperature t2
                                                                                            INNER JOIN (
                                                                                            SELECT t3.groupID, max(t3.recordedTimestamp) AS recordedTimestamp
                                                                                            FROM selected_group g
                                                                                            INNER JOIN temperature t3 ON t3.groupID = g.id
                                                                                            GROUP BY t3.groupID
                                                                                            ) t4 ON t4.groupID = t2.groupID AND t4.recordedTimestamp = t2.recordedTimestamp
                                                                                            GROUP BY t2.groupID
                                                                                            )
                                                                                            ORDER BY t1.groupID;


                                                                                            This examined 3,009,685 rows and took ~1.95 seconds on 5.7.21 and slightly longer on 8.0.4-rc



                                                                                            Example 4



                                                                                            SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
                                                                                            FROM selected_group g
                                                                                            INNER JOIN temperature t1 ON t1.id = (
                                                                                            SELECT max(t2.id)
                                                                                            FROM temperature t2
                                                                                            WHERE t2.groupID = g.id AND t2.recordedTimestamp = (
                                                                                            SELECT max(t3.recordedTimestamp)
                                                                                            FROM temperature t3
                                                                                            WHERE t3.groupID = g.id
                                                                                            )
                                                                                            );


                                                                                            This examined 6,137,810 rows and took ~2.2 seconds on 5.7.21 and slightly longer on 8.0.4-rc



                                                                                            Example 5



                                                                                            SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
                                                                                            FROM (
                                                                                            SELECT
                                                                                            t2.id,
                                                                                            t2.groupID,
                                                                                            t2.recordedTimestamp,
                                                                                            t2.recordedValue,
                                                                                            row_number() OVER (
                                                                                            PARTITION BY t2.groupID ORDER BY t2.recordedTimestamp DESC, t2.id DESC
                                                                                            ) AS rowNumber
                                                                                            FROM selected_group g
                                                                                            INNER JOIN temperature t2 ON t2.groupID = g.id
                                                                                            ) t1 WHERE t1.rowNumber = 1;


                                                                                            This examined 6,017,808 rows and took ~4.2 seconds on 8.0.4-rc



                                                                                            Example 6



                                                                                            SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 
                                                                                            FROM (
                                                                                            SELECT
                                                                                            last_value(t2.id) OVER w AS id,
                                                                                            t2.groupID,
                                                                                            last_value(t2.recordedTimestamp) OVER w AS recordedTimestamp,
                                                                                            last_value(t2.recordedValue) OVER w AS recordedValue
                                                                                            FROM selected_group g
                                                                                            INNER JOIN temperature t2 ON t2.groupID = g.id
                                                                                            WINDOW w AS (
                                                                                            PARTITION BY t2.groupID
                                                                                            ORDER BY t2.recordedTimestamp, t2.id
                                                                                            RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
                                                                                            )
                                                                                            ) t1
                                                                                            GROUP BY t1.groupID;


                                                                                            This examined 6,017,908 rows and took ~17.5 seconds on 8.0.4-rc



                                                                                            Example 7



                                                                                            SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 
                                                                                            FROM selected_group g
                                                                                            INNER JOIN temperature t1 ON t1.groupID = g.id
                                                                                            LEFT JOIN temperature t2
                                                                                            ON t2.groupID = g.id
                                                                                            AND (
                                                                                            t2.recordedTimestamp > t1.recordedTimestamp
                                                                                            OR (t2.recordedTimestamp = t1.recordedTimestamp AND t2.id > t1.id)
                                                                                            )
                                                                                            WHERE t2.id IS NULL
                                                                                            ORDER BY t1.groupID;


                                                                                            This one was taking forever so I had to kill it.







                                                                                            share|improve this answer














                                                                                            share|improve this answer



                                                                                            share|improve this answer








                                                                                            edited Apr 30 at 6:20

























                                                                                            answered Apr 18 at 7:45









                                                                                            Yoseph

                                                                                            10415




                                                                                            10415






















                                                                                                up vote
                                                                                                1
                                                                                                down vote













                                                                                                select * from messages group by name desc





                                                                                                share|improve this answer























                                                                                                • this works fine! see here also stackoverflow.com/questions/1313120/…
                                                                                                  – user2241289
                                                                                                  Feb 12 '17 at 18:45

















                                                                                                up vote
                                                                                                1
                                                                                                down vote













                                                                                                select * from messages group by name desc





                                                                                                share|improve this answer























                                                                                                • this works fine! see here also stackoverflow.com/questions/1313120/…
                                                                                                  – user2241289
                                                                                                  Feb 12 '17 at 18:45















                                                                                                up vote
                                                                                                1
                                                                                                down vote










                                                                                                up vote
                                                                                                1
                                                                                                down vote









                                                                                                select * from messages group by name desc





                                                                                                share|improve this answer














                                                                                                select * from messages group by name desc






                                                                                                share|improve this answer














                                                                                                share|improve this answer



                                                                                                share|improve this answer








                                                                                                edited Jun 18 '16 at 14:21









                                                                                                Tunaki

                                                                                                87.3k21188258




                                                                                                87.3k21188258










                                                                                                answered Jun 18 '16 at 14:12









                                                                                                huuang

                                                                                                9210




                                                                                                9210












                                                                                                • this works fine! see here also stackoverflow.com/questions/1313120/…
                                                                                                  – user2241289
                                                                                                  Feb 12 '17 at 18:45




















                                                                                                • this works fine! see here also stackoverflow.com/questions/1313120/…
                                                                                                  – user2241289
                                                                                                  Feb 12 '17 at 18:45


















                                                                                                this works fine! see here also stackoverflow.com/questions/1313120/…
                                                                                                – user2241289
                                                                                                Feb 12 '17 at 18:45






                                                                                                this works fine! see here also stackoverflow.com/questions/1313120/…
                                                                                                – user2241289
                                                                                                Feb 12 '17 at 18:45












                                                                                                up vote
                                                                                                1
                                                                                                down vote













                                                                                                How about this:



                                                                                                SELECT DISTINCT ON (name) *
                                                                                                FROM messages
                                                                                                ORDER BY name, id DESC;


                                                                                                I had similar issue (on postgresql tough) and on a 1M records table. This solution takes 1.7s vs 44s produced by the one with LEFT JOIN.
                                                                                                In my case I had to filter the corrispondant of your name field against NULL values, resulting in even better performances by 0.2 secs






                                                                                                share|improve this answer

























                                                                                                  up vote
                                                                                                  1
                                                                                                  down vote













                                                                                                  How about this:



                                                                                                  SELECT DISTINCT ON (name) *
                                                                                                  FROM messages
                                                                                                  ORDER BY name, id DESC;


                                                                                                  I had similar issue (on postgresql tough) and on a 1M records table. This solution takes 1.7s vs 44s produced by the one with LEFT JOIN.
                                                                                                  In my case I had to filter the corrispondant of your name field against NULL values, resulting in even better performances by 0.2 secs






                                                                                                  share|improve this answer























                                                                                                    up vote
                                                                                                    1
                                                                                                    down vote










                                                                                                    up vote
                                                                                                    1
                                                                                                    down vote









                                                                                                    How about this:



                                                                                                    SELECT DISTINCT ON (name) *
                                                                                                    FROM messages
                                                                                                    ORDER BY name, id DESC;


                                                                                                    I had similar issue (on postgresql tough) and on a 1M records table. This solution takes 1.7s vs 44s produced by the one with LEFT JOIN.
                                                                                                    In my case I had to filter the corrispondant of your name field against NULL values, resulting in even better performances by 0.2 secs






                                                                                                    share|improve this answer












                                                                                                    How about this:



                                                                                                    SELECT DISTINCT ON (name) *
                                                                                                    FROM messages
                                                                                                    ORDER BY name, id DESC;


                                                                                                    I had similar issue (on postgresql tough) and on a 1M records table. This solution takes 1.7s vs 44s produced by the one with LEFT JOIN.
                                                                                                    In my case I had to filter the corrispondant of your name field against NULL values, resulting in even better performances by 0.2 secs







                                                                                                    share|improve this answer












                                                                                                    share|improve this answer



                                                                                                    share|improve this answer










                                                                                                    answered Nov 30 '16 at 10:50









                                                                                                    Azathoth

                                                                                                    3601522




                                                                                                    3601522






















                                                                                                        up vote
                                                                                                        0
                                                                                                        down vote













                                                                                                        If performance is really your concern you can introduce a new column on the table called IsLastInGroup of type BIT.



                                                                                                        Set it to true on the columns which are last and maintain it with every row insert/update/delete. Writes will be slower, but you'll benefit on reads. It depends on your use case and I recommend it only if you're read-focused.



                                                                                                        So your query will look like:



                                                                                                        SELECT * FROM Messages WHERE IsLastInGroup = 1





                                                                                                        share|improve this answer

























                                                                                                          up vote
                                                                                                          0
                                                                                                          down vote













                                                                                                          If performance is really your concern you can introduce a new column on the table called IsLastInGroup of type BIT.



                                                                                                          Set it to true on the columns which are last and maintain it with every row insert/update/delete. Writes will be slower, but you'll benefit on reads. It depends on your use case and I recommend it only if you're read-focused.



                                                                                                          So your query will look like:



                                                                                                          SELECT * FROM Messages WHERE IsLastInGroup = 1





                                                                                                          share|improve this answer























                                                                                                            up vote
                                                                                                            0
                                                                                                            down vote










                                                                                                            up vote
                                                                                                            0
                                                                                                            down vote









                                                                                                            If performance is really your concern you can introduce a new column on the table called IsLastInGroup of type BIT.



                                                                                                            Set it to true on the columns which are last and maintain it with every row insert/update/delete. Writes will be slower, but you'll benefit on reads. It depends on your use case and I recommend it only if you're read-focused.



                                                                                                            So your query will look like:



                                                                                                            SELECT * FROM Messages WHERE IsLastInGroup = 1





                                                                                                            share|improve this answer












                                                                                                            If performance is really your concern you can introduce a new column on the table called IsLastInGroup of type BIT.



                                                                                                            Set it to true on the columns which are last and maintain it with every row insert/update/delete. Writes will be slower, but you'll benefit on reads. It depends on your use case and I recommend it only if you're read-focused.



                                                                                                            So your query will look like:



                                                                                                            SELECT * FROM Messages WHERE IsLastInGroup = 1






                                                                                                            share|improve this answer












                                                                                                            share|improve this answer



                                                                                                            share|improve this answer










                                                                                                            answered May 2 at 15:05









                                                                                                            jabko87

                                                                                                            1,46011221




                                                                                                            1,46011221

















                                                                                                                protected by Community Mar 30 '12 at 9:58



                                                                                                                Thank you for your interest in this question.
                                                                                                                Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).



                                                                                                                Would you like to answer one of these unanswered questions instead?



                                                                                                                Popular posts from this blog

                                                                                                                Can a sorcerer learn a 5th-level spell early by creating spell slots using the Font of Magic feature?

                                                                                                                Does disintegrating a polymorphed enemy still kill it after the 2018 errata?

                                                                                                                A Topological Invariant for $pi_3(U(n))$