I’m sure I’ve mentioned this before, but we have a bunch of database tables that are kind of ginormous and that do a good job at the basic nuts and bolts of keeping things running but that are kind of terrible at answering analytic questions. This particular post is about a PostgreSQL database, but a lot of this applies to MySQL/MariaDB.

People kept telling me that we could make things better by putting more indexes on the tables. That was not true. The metaphor of the database index being like a book’s index is really useful here: If the index tells you that you need to look at every page of the book (read: every row of the database) in order to find what you want, then that does not speed things up for you.

If you want to find the average of all the values in a particular column, this requires reading all the data from the entire table into memory. The standard storage engine in Postgres reads in rows in their entirety. Things just get worse from there. In order to solve my problem, we need to go outside Postgres.

Fortunately, this is where the Foreign Data Wrapper comes in. Postgres has a notion of foreign tables. They are like portals to other worlds. You can have information that is not in a Postgres table but you can query it as if it were in a Postgres table. This is roughly as magical as it sounds. The catch, of course, is that you need software to mediate the communication between Postgres and the non-Postgres information. However, we live in a future where there are a lot of people who write software and who make it available for other people to use.

The Foreign Data Wrapper that I’m using is one where the fundamental organizational structure of the table is the column rather than the row. The term “columnstore” is used to describe this way of organizing the information. It is just about perfect for the sorts of analysis work that I’m doing.

Not only is it theoretically perfect, but it really is amazing in practice. My unremarkable queries sped up a lot, some of them by a factor of 10. But the impossible query that refused to run at all and that crashed the computer? Now runs in five minutes. OK, this is not the sort of query that you can run on demand, but this is better than not running at all.