• January Crafting

In nearly perfect timing, tomorrow evening we’re having the first meeting of the new crafting club at work. A bunch of us are going to stay late and use the good table in the lunch room and work on crafting projects.

We’ve had renewed interest in the business card cuboctahedra that I made out of my old cards when we updated the corporate logos, fonts, and all that roughly two years ago. (Shoutout to Jeannine Mosely’s talk at G4G for giving me the idea in the first place and to Tom Hull for helping me find a link to a model that is within my skill set.) I need to remember to put the good metal ruler and my improvised folding tool (an awl) in my laptop bag so that I can help anyone who wants to make some of these on their own.

The other day I noticed that we also have a lot of cubes made out of business cards around the office as well. According to the business card rumor mill, it looks like at least one order of cards came in with minor flaws, so we might have even more spare cards to make things with. Do we want to make a giant Menger Sponge? Could we put it in the conference room to fill in some of the awkward empty space that was created when we rotated the big table to give everyone a better view of the screen?

It looks like in addition to my owning a lifetime supply of yarn, fabric, and thread, I might also have access to a lifetime supply of $$2’’ \times 3\frac{1}{2}’’$$ cards to fold into geometric shapes. (But I probably will bring too much fabric to the craft club and see if I can come up with a better quilt design based on the fabric that I bought for an idea that I have since rejected.)

• Have You Heard the Good News of the Foreign Data Wrapper?

I’m sure I’ve mentioned this before, but we have a bunch of database tables that are kind of ginormous and that do a good job at the basic nuts and bolts of keeping things running but that are kind of terrible at answering analytic questions. This particular post is about a PostgreSQL database, but a lot of this applies to MySQL/MariaDB.

People kept telling me that we could make things better by putting more indexes on the tables. That was not true. The metaphor of the database index being like a book’s index is really useful here: If the index tells you that you need to look at every page of the book (read: every row of the database) in order to find what you want, then that does not speed things up for you.

If you want to find the average of all the values in a particular column, this requires reading all the data from the entire table into memory. The standard storage engine in Postgres reads in rows in their entirety. Things just get worse from there. In order to solve my problem, we need to go outside Postgres.

Fortunately, this is where the Foreign Data Wrapper comes in. Postgres has a notion of foreign tables. They are like portals to other worlds. You can have information that is not in a Postgres table but you can query it as if it were in a Postgres table. This is roughly as magical as it sounds. The catch, of course, is that you need software to mediate the communication between Postgres and the non-Postgres information. However, we live in a future where there are a lot of people who write software and who make it available for other people to use.

The Foreign Data Wrapper that I’m using is one where the fundamental organizational structure of the table is the column rather than the row. The term “columnstore” is used to describe this way of organizing the information. It is just about perfect for the sorts of analysis work that I’m doing.

Not only is it theoretically perfect, but it really is amazing in practice. My unremarkable queries sped up a lot, some of them by a factor of 10. But the impossible query that refused to run at all and that crashed the computer? Now runs in five minutes. OK, this is not the sort of query that you can run on demand, but this is better than not running at all.

• More Tales from the BugMaster

1. The answer to the problem is $$\frac{22}{3}$$, and the student submitted 7.3 as a response. The homework system automatically told the student to submit the exact value of the answer and not an estimate. The student tried again with a different decimal approximation. Bug report complains that the system keeps rejecting correct answers.

2. Different problem with thirds in it. I queried the database to get the list of bug reports on the problem from so far this calendar year and sent it to the curriculum team. These young students who can’t wrap their minds around thirds will be the ones to haunt future classes with their assumptions that exponentiation distributes over addition.

3. From the department of “not my department,” one of the language arts problems tells the students to “adhere” to the guidelines or whatever. A child has submitting a bug report alerting us that this is a typo and should be “ad here.”

4. This one is novel! I have not seen it before! And since the student did this in two distinct homework submissions and in the text of the bug report, it is not just an isolated typo. Complex numbers. Instead of using the notation i for the imaginary unit, the student is using an exclamation point. Legit, the student says that the answer to the problem is $$4 + 3!$$ instead of the standard notation of $$4 + 3i$$.

• Things in the Sky

Recently I’ve changed up my FlightAware alerts. I’m receiving alerts about more planes, but I’ve cut down on the number of alerts I get for each plane. Crucially, I no longer get alerts when the planes are diverted.

We get one 747 a day here in San Diego, the daily British Airways flight from LHR. It tends to arrive around 5pm, give or take. Thursday evening I checked my phone while I was eating dinner with some knitting friends and I saw that this plane had just filed a flight plan to San Diego roughly around the time when it was usually landing. Super-surprising? Had it been diverted?

A quick check of FlightAware showed that it flew in a circle over the airport, went to LAX, landed, and then flew to SAN.

I listened to the LiveATC archives, and this is what happened over the course of roughly 15 minutes.

1. Pilot contacted SoCal approach, like normal.

2. Pilot told SoCal approach that they had a problem and wanted to check it out before landing. Started to circle.

3. Pilot requested a diversion to LAX due to a problem with a flap. ATC provided directions on how to get to LAX.

4. Pilot asked for a short cut because they did not have a lot of fuel left.

I am not an aviation expert, but I am wondering about the motivation for requesting the diversion to LAX. There are two possibilites here:

1. When you have this sort of flap problem, it is safer to land at LAX than at SAN, even if you are using terms like “minimum fuel.”

2. If a plane has a flap problem, as soon as it lands it will be stuck to the ground until a mechanic can take appropriate action and this action is properly blessed by the correct paperwork. Do we have the right mechanics and parts here in San Diego? Maybe not. If the plane landed here as originally scheduled, would it have been able to fly back to London later that evening?

In unrelated plane-watching news, every Saturday I read more and more about this ridiculous paparazzi-style lens that I want to buy. Since B&H does not do transactions on Saturdays, I am less tempted to actually buy it.

• Friday Omens

1. Cat in cone. She had minor surgery on Wednesday afternoon. She also had her teeth cleaned while she was under anesthesia. She had been on a hunger strike since returning home after the procedures, but she appears to be eating again. I worry about the water situation because I’m not sure if she is smart enough to get her head in the water dish. When I offer her the dish inside the cone region, she just tries to rub her incision on the side of the dish.

2. Speaking of water, the hot water is out in my building today. I have lost track of how many slab leaks in the downstairs apartments have led to plumbing emergencies throughout the whole building.

3. I am writing code that is impossible to test. It will hit an external API many tens of thousands of times and then record the responses. We only need to run the code once in order to get the information that we need. Each call to the API costs real money. It costs enough real money that it is worth spending a decent number of hours of my time writing the code in a way that cuts down on the number of calls we need to make in order to get all the information. The steps of preparing the payload, querying the API, and then recording the response are slow enough that there are a lot of ways that this script could time out. And goodness knows how many ways the API could return an error response.

4. We have a utility function that we run on a somewhat regular basis on our test servers in order to get things all freshened up from time to time. This had been a shell script, as nature intended. Now we are rewriting them in JavaScript.

• I Hear There is a New Star Wars Movie

1. I have not seen the new Stars Wars movie, and I know nothing more about it than its title. And I sometimes get the title wrong and call it “The Revenge of Skywalker.”

2. I joke that I have movie amnesia. This has been a thing since well before any of my head injuries or weird neurological events. I see a movie, and roughly immediately afterwards, I forget most of what it was about. This is one of the things that has been keeping me from watching Season 4 of The Man in the High Castle because I am going to need to remind myself what had happened up to this point (this applies to TV shows as well as to movies).

3. OK, so there have been some Star Wars movies in the past.
• In original Star Wars, they had to put together a team to destroy the Death Star. In some ways it was like The A-Team, but without the plagiarism.
• Due to my generational alligiences, I am required to insist that Empire Strikes Back is the best Star Wars movie of all time, even though I don’t remember anything about it other than Luke wore black.
• Then there was Return of the Jedi which I don’t remember either, but maybe that was the one with the Ewoks?
• Many years elapsed, and I was probably in graduate school when they released more Star Wars movies. There must have been three of them? One of them had Jar-Jar. Was that the same one that had Natalie Portman?
• That clump of movies that I can not remember is making me nervous about seeing the new one because the name “Skywalker” is in the title. Wasn’t there some sort of weird quasi-supernatural thing going on with Skywalkers in those movies? Like Mrs. Skywalker was magical? Do I need to know anything about Mrs. Skywalker’s arcane backstory to understand the new one?
• There have also been a whole bunch of Star Wars movies recently. Even worse for me, I can’t keep track of which ones were from the main series and which ones were just there to bring in more money. There is the one where Han shot first, which I think was a side story. And I think that there was one with a girl who is different from the girl in the other movies?
4. Jim is out of town right now, so I will not be able to watch Star Wars with him until he gets back.

5. I am not particularly worried about hearing spoilers between now and when he gets back because I think that almost every character whose name I know is dead. The only two living characters that I am aware of are the girl who doesn’t dress warmly enough for the ice planet, and Adam from Girls. And since they are frenemies, they are probably going to have to battle it out. Plus, I’m just going to forget anyway.

6. I mean, it’s been many, many hours of good vs. evil with light saber battles and space ship battles and other ancillary battles, so they are probably going to stick to that because it has made them good money so far. Cute critters have also worked well for them (as long as they are small and don’t speak).

7. Perhaps we will learn that Alderaan shot first.

• Finding the Entry with the Largest Value of Another Column in MySQL

This is totally and 100% cheating. If someone catches you doing this, the SQL police will come after you and cite you for not using a self-join. Just so you know.

Let’s say you have a table that keeps track of users, colors, and timestamps, and you want to know the most recent color for each user. It might look something like:

Table: user_colors

user color timestamp
Sophie green 2019-08-12 12:34:56
Sophie blue 2019-03-28 10:42:23
Gwen red 2018-12-12 03:23:12
Sophie purple 2013-09-12 04:52:23
Gwen fuchsia 2017-07-02 11:12:13

There are two standard solutions.

SELECT user, color
FROM (
SELECT user, MAX(timestamp) AS timestamp
FROM user_colors
GROUP BY user
) alias1
INNER JOIN user_colors
USING (user, timestamp)


Or you could do a left join and handle this with the criterion for joining.

SELECT user, color
FROM user_colors AS uc1
LEFT JOIN user_colors AS uc2
ON (uc1.user = uc2.user AND uc1.timestamp < uc2.timestamp)
WHERE uc2.user IS NULL


I can never remember how to do the second one, and full disclosure, it might be wrong. I don’t have anything easy around to test it on. I frequently get the direction wrong on the inequality. Sometimes I forget the WHERE clause until I look at my results and notice that they are totally not what I wanted.

Don’t read any further if you are worried about the SQL police.

For whatever reason, there is something about my tables and their size and their structure that makes these self-joins really slow. I have come up with a work-around.

Behold:

SELECT user, SUBSTRING_INDEX( GROUP_CONCAT( color
ORDER BY timestamp DESC
SEPARATOR "," ), ",", 1 )
FROM user_colors
GROUP BY user


The GROUP_CONCAT makes a comma-separted list of all the colors for the user, in descending order by timestamp. The SUBSTRING_INDEX takes the part of the string before the first comma. To be honest, I have no idea what sort of witchcraft the query planner is doing to make this work, but on my tables it is faster than the self-join.

Note that a GROUP_CONCAT will fail silently and horrifically if you have more than a thousand characters in a group. There are ways around this, but the finer points tend to depend on the system that you’re using.