First off, I am going to state my opinion that most of the advice that you find on the internet for writing SQL queries is meant for people who are doing something involving websites in production. You have some user logged into your site, and you want to show a moderately-sized collection of information relating to that user. Or, the user does something, and you want to update a small amount of information in your database. Because this is happening all the time and at unpredictable times and with no one supervising it, the queries need to run quickly without gumming up the works.

You don’t really see much about the ways to write queries for analysis work. The database was designed to make the website run well, not to do some sort of analysis after the fact. Analysis work often involves grabbing data about a whole bunch of users and then doing a whole bunch of joins with other tables, possibly matching on columns that you were hoping to do calculations on.

There is a certain standard structure for queries. Often people will SELECT a bunch of columns that come from a whole bunch of tables, and the tables will be aliased in ways that make it hard for me to remember which tables they come from. Worse, all of our columns seem to be named things like noun_id and our tables are named by mixing and matching a bunch of the nouns. Things like homework_students_problems_results or students_homework_classes_problems and other things that are easy to get mixed up with each other. So I spend a lot of time asking myself, “Wait, which problem_id is that talking about?” And then the litany of tables is joined in. Finally, there are some WHEREs in which I can’t remember which tables or columns are being restricted.

So I have developed a personal style of query-writing which I call “rat nest of subqueries.” It’s great if you only need to run the query a few times during which time you are there babysitting it (and also it is running on some sort of test server and has no chance of melting a production server).

And then I end up with something of the form

SELECT * 
FROM 
(SELECT some_stuff
FROM some_table
WHERE something) T1 
INNER JOIN
(SELECT some_other_stuff
FROM some_other_table
WHERE something_else) T2
USING(whatever)
INNER JOIN
(SELECT still_more_stuff, possibly_aggregated AS nice_name
FROM yet_another_table
GROUP BY still_more_stuff) T3
USING(nice_name)

and I just keep on chaining things together. I’m sure that the optimizer hates me. Yet, in my defense, the other day I ran a query written in the conventional way by one of the most experienced people in the office, and it took a thousands seconds to run. So does it really matter if my version of it would have taken 1100 seconds to run? You’re still going to need to find some way to entertain yourself while it’s running.

And what I like about my way of doing things is that I can debug each of my subqueries on their own. I am terrible at tracking down the logic in the standard way of writing queries

Now that everyone is super-bored by my yammering on and on about databases, I’d like to take a moment to say that some of the email that I get at work is pretty wild. I’m the person who gets your email if you are applying for most of the full-time jobs that we are hiring for. And some people do not know how to apply for a job. I really do wonder how many of the applicants are trying to stay on unemployment by sending out the required number of applications a week but intentionally sabotaging themselves so that they won’t get hired anywhere.