Amy Szczepanski

Aug 6, 2018
Just a Few More Weeks with the Children of the Alt-Right

We have so many new people at work! Someone started today. Someone else is starting in a few weeks, and another someone else is also starting in a few weeks! We are trying to hire at least four more people by the end of the year! (DO YOU KNOW ANYONE WHO WANTS A JOB!?!?!?)

What this means for me is that I will get to hand off my duties dealing with the children of the alt-right. There are not a lot of them, but they are just so disappointing. They seem to fall into a few categories.
1. Some of them probably aren’t even from the alt-right. They are just garden-variety racists who don’t realize that they are being unacceptably racist. In a few years they will probably know enough to keep these thoughts to themselves and will do the sorts of subtle discrimination that come out in those various social science studies where researchers send out a bunch of identical resumes and find that “Chip” gets more call-backs than “Jamal” does. The future version of these students will really believe their excuses about why they didn’t reply to “Jamal.” Disappointing.
2. A very small number of the children have actual horrific views and are quite vocal about it. Not tolerated.
3. The ones that are most annoying to me are the ones whose online profiles are strewn with the symbology of the alt-right but who are trying to avoid explicitly violating any rules. Of course there is that stupid frog avatar. There are various code words and special numbers. References to some of the terrible things found on the terrible parts of Reddit. But they are not actually saying the terrible things. How much do they believe these things? I expect that they spend most of their time in online echo-chambers, so I don’t think they hear many contrary views. They probably wouldn’t go to a march, but if they ended up working at a large tech company that had employee affinity groups, they might very well end up eating lunch with people who are trying to peddle racism and sexism as “the truth.”
4. Most chilling to me: the historical Nazis. It has really been only a very small number of years since the symbols of white supremacy have gotten a lot of play in the popular media and on the mainstream parts of the internet. When I search the database for users who are connected to certain words, phrases, numbers, or symbols that have gotten a lot of press lately, there are some users who haven’t logged in in over a decade that come up in the results.
Comments
Aug 5, 2018
Scenes from Cross Validated
1. Question: I am a student taking Introduction to Statistics for Non-Majors. How do you interpret the standard deviation in this problem that I am supposed to do for homework?
  
  Answer: Lots and lots of answers from novice users who are trying to gain reputation and who are willing to risk taking the time to answer a question from an undergrad who may never come back to accept an answer.
2. Question: I have found a whole bunch of numbers lying around in the street, and I decided to put them in a table and apply Fisher’s Exact Test to them. SPSS tells me that $p = 0.0328473824$. What is the significance of these numbers?
  
  Answer: Crickets.
3. Question: Do I really have to use the Bonferroni correction?
  
  Answer: Yes.
4. Question: I am applying for a job as a machine learning engineer, and I lied on my resume about my ability to construct recurrent neural networks. How would you implement a genetic algorithm to minimize the error terms when building an RNN to solve this specific problem that the hiring manager gave me as a take-home test? Oh, and this needs to be in Python because I told them that I was an expert at Python.
  
  Answer: Crickets.
5. Question: I do not speak English, and I don’t want to give away the business applications of this situation. Therefore, I will describe it in a very unclear way and also use some jargon incorrectly. When asked for clarification, I might include an equation, none of whose variables are defined.
  
  Answer: Crickets
6. Question: I fit my data with a linear model, and some of my variables were three-star variables. Then I built a random forest out of the same data, and the computer said that some other variables were the good ones. Which of my variables are important?
  
  Answer: Crickets
7. Meta: Why are there so many unanswered questions on Cross Validated?
Comments
Aug 3, 2018
The Shape of the Future

I am still trying to predict the future. It seems like this is taking me a long time. Am I bad at predicting the future? Am I secretly waiting until the future becomes the present so that I don’t have to predict it anymore? Or perhaps too much of my time is spent yelling at children who think that the excuse “It’s a dank meme from Reddit” is a good reason to post inappropriate content on our message boards. I should point out to these children that the Fields Medalist who used to post on our site when he was in high school did not post dank memes.

For now I have given up on predicting a numeric future. There are just so many numbers out there. How do we define a reasonable numerology of the future? It’s just so overwhelming, especially since I am forced to pick between a linear future and an exponential future. Won’t anyone think about the polynomials?

So the next plan is to move over to a classification problem: Do we have an elliptical future, a parabolic future, or a hyperbolic future?

Mostly what I’m doing here is trading in one set of challenges for another. The real issue here is that most of the past was parabolic. Most of the present in parabolic. I expect most of the future will be parabolic. So it’s hard to find a training set for the other shapes of futures.

I don’t worry too much about the hyperbolic futures; perturbing the plane so that we cut off a hyperbola instead of parabola is pretty well understood in the data. When I make the graphs, you can see the point where the past clearly revealed its hyperbolic nature; it shouldn’t be too hard to teach the computer how to see that. Or how to read that directly out of the database; these sorts of changes are controlled by the front end and logged in the change log.

It’s the elliptical futures that I worry about. Things are not elliptical very often, but when the future is going to be elliptical, it’s important to know about this as early as possible. The elliptical past is rare, but messy. Instead of using nice tools to set the eccentricity of the ellipse, someone changes cells directly in the live database. None of this is recorded in the change log; none of the original data remains. Everything is regenerated using the new values. The past has been erased.
Comments
Aug 2, 2018
Artifacts
1. There are so many things that you can find on the internet. We live in such a vast and amazing internet.
2. Things keep changing at work. Desks keep moving to other places. Yesterday my desk moved to a different place. I spent some time cleaning it out before it moved. I found a piece of scrap paper that had a crude drawing of a snowflake on one side and some math on the other side.
  
  I am pretty sure that this was a page from my bootleg copy of Goodearl and Warfield that I downloaded from the internet roughly 20 years ago. Long enough ago that one would have downloaded a DVI file and not a PDF.
3. I had been keeping my used auto-injector in a Nalgene water bottle. The other day I received an official 8 quart sharps container from CVS. This sharps container is large enough to contain the smaller cat; the front of the cat can be very sharp. The noun “sharps” makes me think of Michelle because after that time she spent in the mental hospital, the term “sharps restriction” became part of the shared vocabulary of a subset of her friends.
4. The other day one of my Facebook friends started his new very-important big deal job at the college that Michelle had attended. That, of course, made me think of Michelle. I realized that the internet has changed so much over the years that I could perhaps find more information. The obituary in the Schenectady Gazette said that she died in Auburn, Cayuga County. My mom found out through the rumor mill that she killed herself in a hotel. The public library in Cayuga County has a very nice digitized archive of local newspapers! I was able to search for the word “police” (because any relevant article was going to have some sort of reference to the police) for the week after she died. Success! She died in Weedsport, not in Auburn.
5. Also from the internet: I was able to buy a postcard featuring a picture of the motel.
Comments
Jul 28, 2018
Noah, Brad, and the Dvorak Keyboard

On Tuesday at work not only did I stab myself with my new medicine, but we had free pizza for lunch for secret reasons. And then on Thursday after work we had free happy hour for the same secret reasons. I am terrible at keeping secrets; we soft-launched a new product. At the Thursday event, I was formally introduced to a new colleague.

Upon hearing my name, he immediately recognized me as: “the other person who uses the Dvorak keyboard.”

Once upon a time, I was at That Conference in Atlanta, and I was sitting near That Math YouTuber as she was putting the finishing touches on the slides for her talk. She was going to talk about some sort of 3D printed monkeys, and she was going to name the Keynote file mon.key. However, she was in the early stages of learning the Dvorak key layout, so she flubbed the punchline by starting to type mrb (relying on QWERTY muscle memory).

On the flight home from that conference, I wrote a Python script that searches for words that when typed on a Dvorak keyboard by a QWERTY typist are still words. For example, if you were to use my computer and press the keys labeled NOAH, you would have really typed BRAD.

Since I do an exhaustive search, I pre-trimmed the system dictionary to remove letter keys that map to punctuation marks.
```
grep -v -i -e 'q' -e 'z' -e 'e' -e 'w' /usr/share/dict/words > wordlist.txt
```
From there, I applied the permutation to all the words and then checked to see if they were still words.
```
wordlist = []
dictfile = open('wordlist.txt', 'rb')
for word in dictfile:
    wordlist.append(word.rstrip('\n'))
    
# Setting up the permutation that maps alphabet to dvorak
alphabet = "".join("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")
dv_permutation = "axje.uidchtnmbrl'poygk,qf;AXJE.UIDCHTNMBRL'POYGK,QF;"
dv_trans = maketrans(alphabet, dv_permutation)

# For each word in the dictionary, is its image also in the dictionary?
for theword in wordlist:
    newword = theword.translate(dv_trans)
    if newword in wordlist and len(newword) > 1:
        print theword, newword
```
The longest words are flossy mapping to unroof. My favorite that does not involve proper nouns is hot mapping to dry.

Tangential Update: Guys, I’ve been talking to still more people about pharmacy stuff. It’s gotten to a point where even I am bored by it. I struggle to add each new interaction to my phone log. Also, eventually this medicine is going to cost me $307.15 a month, so I’m focusing my efforts on keeping good records of how well it works so that I can determine if it’s worth paying over $1000 a year for. (NB: this is actually a good price for a “specialty” medicine, as one of my colleagues is paying over $400 a month for Advair Diskus, which is a non-specialty medicine whose active ingredients have gone off-patent.) Based on my last self-imposed medication trial, I’m thinking that I’m going to go with a sixty day trial and evaluate with Barnard’s exact test.
Comments
Jul 24, 2018
Thank You, Lady Chinese Hamsters

The first round of calling up the drug company and the specialty pharmacy is over! Earlier this afternoon I injected myself with a monoclonal antibody that was brewed in Chinese hamster ovary cells. According to my doctor’s nurse, a hundred thousand people are clammoring to get this medicine. Do they need to sacrifice a new batch of Chinese hamsters every time they make more of this antibody? Or is there just some Chinese hamster cell line that keeps replicating in a lab, and they just keep feeding it? I’m hoping that blocking my CGRP receptors with this antibody is helpful for me. I’m optimistic because there is some sort of science thing linking the way this antibody works with the mechanism that’s believed to be behind the rescue medication that works really well.

Currently I am experiencing what I believe is the #1 most common side effect: It hurts where I stabbed myself with the needle. My diabetic colleague provided advice on better spots to stab yourself that don’t hurt as much.

In other news, the future is still hard to predict except when it isn’t. In the cases where our simple one-parameter model works, it works really, really well. Still haven’t been able to identify ahead of time when it is going to work. Also, secretly I know that all of my data demonstrates survivorship bias, so one of these days I need to get up close and personal with the database in order to reconstruct all of the points that don’t show up in the easy data. Let’s hope that these new antibodies that I’ve acquired will allow me to put in enough full days of work to figure this out soon.
Comments
Jul 23, 2018
The Future is Either Linear or Exponential

Well, that is not entirely true. The future-prediction experts whose works I am carefully perusing note that there is also a family of future prediction procedures that expect the future to be pretty much exactly like the past and the present.

Other than that there are only two methods in play for predicting the future: linear and exponential.

NB: I am ignoring futures with seasonality. The futures that I am predicting do not need to worry about Christmas or whatever.

OK, well this paper that I’m reading actually has five types of non-seasonal futures. But it didn’t really come out any say that early enough in the paper for me. It starts out with things like “Automatic forecasts of large numbers of univariate time series are often needed in business” (yes, I believe that). From there it goes on to give a full taxonomy of the models, describe the related literature, and define a lot of notation. But it makes you read through all the equations in order to realize how primitive our understanding of future prediction really is.

It’s not until you read through all of the equations that you realize that there are only five (or two, or three) types of futures.
1. There is the “San Diego weather” future. Whatever is happening today will happen tomorrow. This is actually not a great example because we do have seasonality! We have many, many seasons in San Diego. Not too long ago, we had First Summer. Then we had Ant Season. Now we are in Second Summer. Next is Fire Season. After that is Second Spring. That’s followed by Rain. Then we have First Spring. After that comes May Gray and June Gloom (this is one season whose name changes partway through). And then we cycle back to First Summer.
2. The linear future. In the linear future, you do a lot of calculations in order to find Just The Right Slope. A whole bunch of parameters are optimized to figure out how much of the past goes into brewing your magical Goldilocks slope. Then you take the last of your real data points and you just keep adding this same magical slope to it for however many steps you want to predict you future. Seriously, that is the world-class, state-of-the-art in future prediction. Just set your future loose on a linear trajectory.
3. Well, there is also the damped linear model. You have both a slope $b_t$ and a constant parameter $\phi \in (0, 1)$, and you predict the value $h$ steps in the future by $\hat{y}_{t+h\vert t} = \ell_t + \phi_h b_t$, where $\phi_h = \phi + \phi^2 + \phi^3 + \cdots + \phi^h$. In this case $\ell_t$ is the value that you predicted at step $t$.
4. Exponential future. Just like the linear future, except you forecast with $\hat{y}_{t+h\vert t} = \ell_t b_t^h$.
5. Damped exponential future. Like above. Instead of raising $b_t$ to the $h$ power, you raise it to the $\phi_h$ power.
According to the people who make their academic livelihood predicting the future and assessing different methods of predicting the future, these five models (and the 10 variants that come up if your future cares about Christmas—five of them for a an affine Christmas and five of them for a multiplicative Christmas) are the world’s best future predicting methods.

This is a little bit disappointing and a little bit underwhelming. How can you impress people if you are predicting a future based on one (or two) simple-to-understand paramaters?

And that must be why people have developed very sophisticated future-predicting neural networks. No one knows how the algorithm is predicting the future, so it can not be belittled for being purely linear. (Instead, it is likely an overfit polynomial, but, whatever.)
Comments

Newer Page 23 of 50 Older

Just a Few More Weeks with the Children of the Alt-Right

Scenes from Cross Validated

The Shape of the Future

Artifacts

Noah, Brad, and the Dvorak Keyboard

Thank You, Lady Chinese Hamsters

The Future is Either Linear or Exponential