Finding the Entry with the Largest Value of Another Column in MySQL
This is totally and 100% cheating. If someone catches you doing this, the SQL police will come after you and cite you for not using a self-join. Just so you know.
Let’s say you have a table that keeps track of users, colors, and timestamps, and you want to know the most recent color for each user. It might look something like:
Table: user_colors
user | color | timestamp |
---|---|---|
Sophie | green | 2019-08-12 12:34:56 |
Sophie | blue | 2019-03-28 10:42:23 |
Gwen | red | 2018-12-12 03:23:12 |
Sophie | purple | 2013-09-12 04:52:23 |
Gwen | fuchsia | 2017-07-02 11:12:13 |
There are two standard solutions.
SELECT `user`, `color`
FROM (
SELECT `user`, MAX(`timestamp`) AS `timestamp`
FROM `user_colors`
GROUP BY `user`
) `alias1`
INNER JOIN `user_colors`
USING (`user`, `timestamp`)
Or you could do a left join and handle this with the criterion for joining.
SELECT `user`, `color`
FROM `user_colors` AS `uc1`
LEFT JOIN `user_colors` AS `uc2`
ON (`uc1`.`user` = `uc2`.`user` AND `uc1`.`timestamp` < `uc2`.`timestamp`)
WHERE `uc2`.`user` IS NULL
I can never remember how to do the second one, and full disclosure, it might be wrong.
I don’t have anything easy around to test it on. I frequently get the direction wrong
on the inequality. Sometimes I forget the WHERE
clause until I look at my results and
notice that they are totally not what I wanted.
Don’t read any further if you are worried about the SQL police.
For whatever reason, there is something about my tables and their size and their structure that makes these self-joins really slow. I have come up with a work-around.
Behold:
SELECT `user`, SUBSTRING_INDEX( GROUP_CONCAT( `color`
ORDER BY `timestamp` DESC
SEPARATOR "," ), ",", 1 )
FROM `user_colors`
GROUP BY `user`
The GROUP_CONCAT
makes a comma-separted list of all the colors for the user, in
descending order by timestamp. The SUBSTRING_INDEX
takes the part of the string
before the first comma. To be honest, I have no idea what sort of witchcraft the
query planner is doing to make this work, but on my tables it is faster than the
self-join.
Note that a GROUP_CONCAT
will fail silently and horrifically if you have more than
a thousand characters in a group. There are ways around this, but the finer points
tend to depend on the system that you’re using.