Bad Experiments

Like so many middle aged ladies with poorly-managed health problems, I spend more time than is strictly necessary browsing PubMed. My genre of choice would be studies trying to determine if a drug that is already approved for something (anything) might possibly be effective at preventing migraines.

Like any sort of genre writing, these all fit a pattern. Patients are recruited (tbh, the exclusion criteria are pretty darn interesting most of the time). Their baseline rates of migraine are measured; they are excluded from the study if they are having too many headaches per month. The patients then spend a few weeks in a wash-out phase, where they go off whatever previous migraine preventative medication they were taking. Next, they take several weeks (sometimes months) reaching the assigned dose of the medication being studied. Even though these studies are double-blinded, a lot of these medications have side effects that you can not ignore (for example, tingling in hands and feet and everything tasting like metal). Finally, they spend a few months recording how many migraines they are having. And then the researchers use statistical tests that are based on assumptions of normality on this (bounded) count data.

I do not have access to hundreds of patients, an IRB, or any of the other infrastructure that you need to run a medical study. What I do have is a leftover bottle of a medication that is not FDA-approved to say that it prevents migraines – but which did a remarkably good job of making my mysterious neurological symptoms from this summer, which were probably weird migraine auras, go away entirely. I also have an appointment with my regular neurologist tomorrow. And I keep good records.

At the end of August I went through a wash-out phase. Two of my neurologists (my regular neurologist and one of the vascular neurologists) told me to stop taking the medication that seemed to be working. I grudgingly agreed. And then I was scheduled for a few EEGs, so I stayed off this medication so that it would not interfere with the results of these tests (they came back normal). After the two-week wash-out phase, I was off the medication for an additional 91 days. During these 91 days, I had 33 headache days and 58 headache-free days.

Then I started taking the medication again. Today is the 22nd day that I have been on it, so I’ve been on it for 21 full days. During this time I have had two headache days. And so the question is whether \(\frac{2}{21}\) is sufficiently different from \(\frac{33}{91}\) for me to continue taking this medication. Really the question is whether I can convince my regular neurologist that these numbers are sufficiently different from one another for him to write me a prescription for this medication (on which I feel much better and experience no side effects).

One of the few things that I remember really clearly from the graduate courses I took in statistics is that count data is described by a Poisson distribution and not a normal distribution. So I’m wary of using any tests that rely on assumptions of normality.

The chi-square test is one of my favorite tests, but with fewer than five headache days while taking this medication, is it even appropriate here?

> chisq.test(matrix(c(33, 2, 58, 19), nrow=2))

	Pearson's Chi-squared test with Yates'
	continuity correction

data:  matrix(c(33, 2, 58, 19), nrow = 2)
X-squared = 4.5022, df = 1, p-value = 0.03385

And what about Fisher’s Exact Test? It’s great for count data, even when the counts are small (as we hope they are when we are trying to count the number of headache days with the experimental treatment). Here I worry that test expects the margins on the table to be fixed. And that won’t happen unless I can get enough more of this medication so that I can be on it for 91 days, so I shouldn’t expect my data to reflect a hypergeometric distribution.

> fisher.test(matrix(c(33, 2, 58, 19), nrow=2))

	Fisher's Exact Test for Count Data

data:  matrix(c(33, 2, 58, 19), nrow = 2)
p-value = 0.01853
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
  1.167942 50.180692
sample estimates:
odds ratio 
  5.342414 

And what if I look at the 21 days before I started taking this medication? Then I could have fixed margins. Ten headache days, 11 headache-free days (which is kind of what motivated me to rummage through my medicine cabinet looking for something that works).

> fisher.test(matrix(c(10, 2, 11, 19), nrow=2))

	Fisher's Exact Test for Count Data

data:  matrix(c(10, 2, 11, 19), nrow = 2)
p-value = 0.01479
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
  1.38263 90.38692
sample estimates:
odds ratio 
  8.185283 

Back when I was first learning about statistics, no one talked very much about Barnard’s test. I’m not entirely sure when it applies. (Or, if we’re being entirely honest here, what sort of underlying probability distribution describes the “let’s try taking some leftover medicine until we run out of it” experimental design. And, to continue with the real talk, I can not promise you that headache days and non-headache days are independent. There might be some sort of underlying Markov process where the probability of having a headache tomorrow depends on whether or not I have a headache today.)

> barnard.test(33, 2, 58, 19)

Barnard's Unconditional Test

           Treatment I Treatment II
Outcome I           33            2
Outcome II          58           19

Null hypothesis: Treatments have no effect on the outcomes
Score statistic = -2.38298
Nuisance parameter = 0.957 (One sided), 0.957 (Two sided)
P-value = 0.0206766 (One sided), 0.0206766 (Two sided)

And even if you accepted that I have a baseline headache rate of \(\frac{33}{91}\), you could calculate the binomial probability of having less than or equal to two headache days in a 21-day period. I calculated \[\binom{21}{2} \left(\frac{33}{91}\right)^2 \left(\frac{58}{91} \right)^{19} + \binom{21}{1} \left(\frac{33}{91}\right)^1 \left(\frac{58}{91} \right)^{20} + \binom{21}{0} \left(\frac{33}{91}\right)^0 \left(\frac{58}{91} \right)^{21} ,\] which is roughly 0.0063.

I really do feel like it is working.