• My Questions about ORNL Summit Answered

    As you might know, before I got back into the ed biz and working with small data, I used to big data with a high performance computing (HPC) project that was a joint effort between the University of Tennessee and the Department of Energy’s Oak Ridge National Laboratory (ORNL). That was long enough ago that my t-shirt from “been there, done that, got the t-shirt” features Jaguar, which at a peak performance of under 2 petaflops is not only retro but also remarkably quaint. Soon we will have petascale performance in our phones! (OK, probably not.)

    Aside: When Titan (previous system) was being provisioned, some people in HPC were super-enthusiastic about GPUs and other accelerator technologies (such as the Intel Xeon Phi). Others referred to any capabilities provided by technologies other than traditional x86 cores as “crap flops.” This reminded me of discussions that I heard in the 1990s when I worked for a different government department with an interest in HPC and people were freaking out about having to take the work that they had been doing on vector processor machines (like the Cray C90) and figure out how to get work done on a parallel machine (at that time, the Cray T3D).

    But now we are on the road to the exascale. If you read anything about the exascale 5–10 years ago, you would have learned that this was an insurmountable challenge. Exascale systems would use more energy than a thousand suns. Exascale systems would need a constant tsunami of ice water to keep the energy of these thousand suns from causing them to burst into flames. There would be so many bits traveling through an exascale system that the mysterious subatomic particles emitted by the one regular sun would randomly and mischieviously flip some of them, requiring sophisticated on-chip error-correcting hardware that require the power of still a few more suns.

    Apparently we have solved many of these problems, as Summit has been deployed to production, and from what I can tell, it lives in the same machine room where Titan, and Jaguar before it, stood. This is a room about the size of a grocery store, and you can comfortably fit over 300 cabinets in it. I haven’t checked a Google Maps satellite view of ORNL lately (nor do I know if there is a recent picture), but you can get a decent sense of the amount of cooling that is being used by the size and number of air conditioner compressors outside of building 5400.

    My Facebook feed has lit up with articles about Summit. My relatives are emailing me articles about Summit. ORNL PR people have been putting a spin on things, and regular science journalists do not understand HPC as well as I do, so there were a lot of things about the story that seemed confusing and did not make sense to me.

    But here is what I have discovered:

    1. Summit is an IBM system and not a Cray system! This is mostly interesting for people who are watching the industry, as ORNL (and what I will subtly and euphemistically refer to as its collaborators) have been Cray shops for a really long time.

    2. The current version of Summit has a peak performance of 200 petaflops. That is, one-fifth of an exaflop.

    3. One of the press releases talks about how some genomics code achieved performance of 1.88 exaops. You will note that this are exaops, not exaflops. This calculation was not exclusively done with floating point numbers! From what I remember about GPUs and my limited experience with CUDA, this now makes perfect sense. So they haven’t yet been able to get LINPACK to achieve \(10^{18}\) floating point operations per second, but the genomics code doesn’t need that many floating point operations. Genomic data is fairly discrete. There are only four DNA base pairs, 64 possible codons, and 20 relevant amino acids. Probably the most likely place you would need floating point numbers is when calculating scores or probabilities after you’ve compared various DNA sequences.

    4. This also explains why the first application is in the biological sciences and not a simulation of an exploding star. The Department of Energy is really the Department of Nuclear Energy and Nuclear Bombs. A large fraction of its leadership computing power is devoted to simulating things blowing up. In public, they talk about this as astrophysics research, but a lot of the math and computing that goes into exploding stars also applies to other nuclear explosions. But simulating exploding stars requires floating point precision, so they were not going to get the most press-release-worthy results. I think that the “energy” angle here is that we are pretending that bio-energy can help protect us from having to buy oil from people who we think are icky.

    5. I was also wondering why Rick Perry thought that it was OK to spend a very large number of millions of dollars on a computer whose debut calculation was in the biological sciences. This seems counter to the current trend of cabinet-level officials destroying their departments from within. Also, Rick Perry doesn’t seem like the kind of guy who is interested in understanding hidden Markov models and multiple sequence alignments. But then I remembered that Chinese machines have been dominating the Top 500 for a while. It looks like the top Chinese system from November 2017 was at about 125 petaflops—or roughly half of Summit’s current capabilities on LINPACK. Odds are that Summit will take the top spot on the Top 500 list coming out later this month. And “USA NUMBER ONE! GO USA!” is certainly one of the current administration’s priorities.


  • Predicting the Future

    Up to this point, most of the data projects that I’ve been working on have been more along the lines of describing the past than of predicting the future. I’ve been doing a lot of stuff where I’ve been identifying which students demonstrate certain behaviors. I’ve been checking to see which of the small changes that we made in the homework system have made a difference in student performance. I parsed a whole bunch of JSON and wrote some regexes to see what sorts of things users were typing into a particular free-response box.

    But my next project is predicting the future! (Based on what happened in the past.)

    This is actually kind of terrifying because someone eyeballed the data once and very quickly came up with a very simple one-parameter model. And this model works pretty well most of the time! Later, I fed the data into something that actually calculates the parameter (instead of just guessing), and the calculated value was pretty close to the guess. In almost all cases!

    The trouble is that when it doesn’t work, it fails in spectacular ways.

    Also, making the model work better does not involve using a flashy algorithm. Making the model work better will require lots and lots of fussy details to improve the quality of the data being fed into the model. There are a lot of indirect measurements that a better model would need to know about. And since this is all data about humans in a system created by humans, if it looks like the future is evolving in a particular way and that the old model is not working, then someone will change some aspect of reality, and all the future data lives in the new reality. The database might not even know how we were changing the world, and nobody might have remembered to write down what we did and when. So it is hard to know which discontinuities are of our own making, and which are interesting and unexplained.

    So this project is big enough to be meaningful, important, and interesting. But it is also big enough to be scary. I got nothing done on it today. The morning was spent doing administrative stuff and helping out with an interview, and I took the afternoon off. Next week is a new week, and it will take all of my good sense to chip away at writing tiny little functions that will extract sparkling clean data rather than building a road map that resembles a crazy wall of newspaper clippings, photographs, and string (like you would see in a detective show).


  • Skills Learned on the Job

    1. Can now recognize on sight the IP addresses of several VPNs.

    2. The most important fact about the number 37 is that it is a factor of 111, so any time a problem involves the number 37, you are likely to see repeated digits or some other nice feature that comes from multiplying by 111.

    3. My reading knowledge of mathematical French has been extended into a reading knowledge of mathematical Romanian in response to many accusations of students cheating on Romanian mail-in math contests.


  • Tales from the Office

    1. Clearly we must not use the results of a particular piece of code too often or for anything too important because today I found a bug that has been there for weeks. (Not my code. My bugs are far more catastrophic.)

    2. We ran out of cake. Lots of people baked over the weekend and brought cakes in on Monday. But it was barely enough cake for a day and a half.

    3. The candy jar is full of the sort of candy that destroys dental work. Normally I love candy, but I will be rocking this temporary crown for at least another week, so I had to pass on the candy.

    4. One of our homework problems involves a character named Pythagoras making lentil soup. A student used the bug report feature to inform us: “In Pythogoras’s diet he was not allowed to eat beans.”

    5. Another bug report ended with: “I am very mad at you! From, parents.” We suspect that it was not actually the parents who sent this.

    6. Starting on a new project today. The previous project could have ended up becoming a deep dive into sophisticated machine learning techniques, but instead I was able to get things good enough for now with straightforward heuristics. I don’t want to talk too much about what we were working on, but this gives you a sense of the flavor of our solution: If a new user has the characters “69” in his username and then makes more than one post on the message board within the first hour of creating the account, then it is worth checking to see if the account is violating any of the rules of our site.


  • Terrible Ideas for Modern Times

    There is a frooferall happening on Twitter about a conference where all the organizers and all the speakers are male.

    I would like to make an offer to all men doing mediocre research in STEM fields: If you pay for my registration and travel, I will be your conference beard. I am smart enough that with a few months of effort, I should be able to contribute to your mediocre research in a sufficiently substantive way to be considered a co-author. And then you can submit the work as joint with me and tell the organizers that I would be giving the talk. And since I am a mathematcian, I expect the names on the paper to be listed in alphabetical order, so if your last name begins with A - SY, then you would be first author. And you would get to be a hero for the organization by sparing them from the embarrassment of having an all-male speaker line-up.


  • Forgotten

    Today on the message boards at work, I saw someone asking a question about describing \(\text{Spec}(\mathbb{Z}[x])\) as a topological space. The person asking the question either had no idea what he was talking about or else I have forgotten way too much about algebra because he was alluding to the orbits of an action of a Galois group, and I was thinking that this could not possibly be the right way to go about doing this because Galois groups are nice when you have groups and/or fields, and this does not sound at all like a problem about groups and/or fields. It doesn’t even seem like a problem about modules, so there really isn’t anything groupy about it.

    While I may have forgotten a lot about ring theory, I am pretty sure that when I am 90 years old and living in some sort of institution and no longer remember the names of my relatives, I will still remember that the most important reason that \(\mathbb{Z}[x]\) shows up in problems is because it is not a principal ideal domain, which means that the prime ideals are not necessarily maximal ideals.

    If this question had been about the Zariski topology of the prime spectrum of polynomials over an algebraically closed field, then it would have been trivial. Prime ideals are maximal, irreducible polynomials are linear, yawn.

    But since I had actual work to do and since we fastidiously avoid giving away the answer when we help students out on the message boards (also I think that this student was possibly cheating on homework), I did not bother to remind myself of what happens when you construct this topology. It’s pretty easy to classify the prime ideals of \(\mathbb{Z}[x]\). And I decided that I no longer care what exactly the closed sets are. Would one want to define an affine variety? Do something with Hilbert’s Nullstellensatz? Not me.

    So I steered the student away from his (likely) ill-fated voyage through Galois groups and his potential detours through something scary and cohomological. Someone else posted something that was definitely related but was scheme-y enough that I decided to ignore it.

    Good luck, student who is possibly cheating on homework! If you are not cheating on homework, then you are going to have a lot better luck getting a good answer on StackExchange, where the people answering questions remember a lot more algebra than I do!


  • Randomness

    Despite what I threatened, I did not just use copy-paste to make the data match the code. I have a special, ad hoc part of the code that is commented out, and I can reanimate it to make it deal with the annoying data files. Also, I used the cut function on my 0-1-2 scale and named the levels after the desired strings. I still need to turn my fudge factor into a variable that is defined near the beginning of the code, and from there I am good to go. I think.

    Today my mind wandered onto the topic of random sequences of bits that are controlled by a Bernoulli process. What if all of my complicated models of headaches are wrong and my headache frequency is indistinguishable from random? Maybe it is just so simple that there is some parameter, say \(p = \frac{15}{31}\), and on any given day I will have a headache with probability p. My internal narrative about headache states and triggers and Markov processes might just be a crazy story that I tell myself to make sense of the world. One of the standard demos in introductory statistics classes has the students assigned to either flip a coin or pretend to flip a coin, and the teacher can always tell which group pretended to flip the coin because there isn’t a long enough string of heads in a row. Maybe when I have a bunch of headache-free days, it is totally random. Maybe when I have a headache every day for a week, it is also random.

    You can’t prove that a sequence is random. The Kolmogorov complexity measures the randomness of a sequence, but it can’t be computed. At best you can estimate some bounds.

    I had 12 headache days in March and 13 headache days in April. I had a headache every day the entire last week of March. The chances of that happening just by chance are roughly 1 in 49. Unusual. But not unthinkably so. April looks a lot more random; no streaks longer that four days. Yet, wouldn’t randomness predict that I would have either a streak of headache days or a streak of headache-free days?

    (Even though I am pretending to entertain the assumption of true randomness, I know that flying on airplanes gives me a headache. Some people tell me that the low humidity on airplanes causes me to get dehydrated, which is what causes the headache. I do not believe them because this happens on one-hour flights. If I start out reasonably non-dehydrated and drink the complementary beverage, I don’t believe that I can get sufficiently dehydrated on a one-hour flight. I blame the change in air pressure.)

    And was the levetiracetam working in January and February? Is that why I had so few headaches during those months? If I stop taking levetiracetam once I get the taper-off instructions from the doctor, will my number of headaches increase?

    Yesterday I got a letter from the insurance company telling me that they will not pay for Botox unless I can provide them with headache diaries showing that I am having at least 15 headache days per month. In May 2017, I had 16 headache days. In June 2017, I had 18 headache days. Would I expect this high of a variance between last year and this year under assumptions of true randomness?

    And maybe the magical new medicine (the one with the coupon from the drug company) will work, and I will not have to worry about any of these calculations.


subscribe via RSS