PostgreSQL, unnest, and array_agg respecting order
tl;dr: WITH ORDINALITY
and ORDER BY ordinality
are your friends.
Once upon a time I could write fairly simple SQL queries, pull my data into R, and then manipulate the resulting data in R. It worked really well for me. But now I live in a world where my only two tools are PostgreSQL and Javascript, and I don’t know anyone who reaches for Javascript when she needs to reshape a bunch of tabular data.
The problem that I have right now is that each lesson is associated with an ordered list
of problemID
s. But these id numbers are not meaningful to humans; the problems have
other names that people know about. The lessons
table has a column called
problemIDs
, which contains an array of the problems’ ids (in order). Meanwhile,
the problems
table has a column of id
s and a column of the corresponding
problemName
s.
Something sort of like this:
lessons
id | problemIDs |
---|---|
100 | {1, 2, 3, 4} |
101 | {5, 6, 7, 8} |
problems
id | problemName |
---|---|
1 | 'some name' |
2 | 'whatever' |
3 | 'more stuff' |
4 | 'yadda yadda' |
(OK, not exactly. I have fudged the structure of things to simplify this example. Don’t even get me started on the column full of JSON that I also need to deal with in the real-life version of things.)
My goal is to get the array problemIDs
as well as the corresponding array of
problemName
s—and to have both of these in the same order as the original array
problemIDs
.
That is, I want rows that look something like this:
100, {1, 2, 3, 4}, {'some name', 'whatever', 'more stuff', 'yadda yadda'}
Philosophically, what I need to do is unnest(problemIDs)
, then JOIN
with the problems
table to get the corresponding name for each problem, and then
array_agg()
everything back together into arrays.
The issue, of course, is that goodness knows what order the query planner is going to
do things in, and there is no guarantee that my arrays are going to be in the same
order as the original array problemIDs
. I had been working my way around this by
using subqueries and various functions, but it was all pretty awkward.
But on Friday I learned about the good news of unnest(foo) WITH ORDINALITY
.
Pair that with array_agg(bar ORDER BY ordinality)
, and the problem is solved.
SELECT lessons.id,
array_agg("problemID" ORDER BY ordinality) AS "problemIDs",
array_agg(problems."problemName" ORDER BY ordinality) AS "problemNames"
FROM lessons, unnest("problemIDs") WITH ORDINALITY AS "problemID"
INNER JOIN problems
ON "problemID" = problems.id
GROUP BY lessons.id