The File Drawer Effect
I was so optimistic that this medication was going to help my migraines. My neurologist pointed out that it is not FDA approved for migraine, and there are no good studies showing that it works. But it did a remarkably good job of making those weird auras go away back over the summer, and I had a remarkably long string of headache-free days when I was taking it, so I wanted to give it another try.
Not counting a two-week washout period, as of today I was off the medication for 90 days and then on the medication for 90 days. I keep meticulous records.
My initial enthusiasm was not particularly well-measured because I didn’t have enough data to meet the distribution assumptions for either the chi-squared test (one cell had a value of 2) or Fisher’s exact test (didn’t have fixed margins). I found Barnard’s Unconditional test, which seemed to apply.
Qualitatively, I really feel like I’m doing better. Looking at my data, I have much longer runs of headache-free days. When I do have headaches, they don’t seem quite so intense.
> chisq.test(matrix(c(control_headache, treatment_headache, control_no_headache, treatment_no_headache), nrow=2))
Pearson's Chi-squared test with Yates'
continuity correction
data: matrix(c(control_headache, treatment_headache, control_no_headache, treatment_no_headache), nrow = 2)
X-squared = 3.2011, df = 1, p-value = 0.07359
> fisher.test(matrix(c(control_headache, treatment_headache, control_no_headache, treatment_no_headache), nrow=2))
Fisher's Exact Test for Count Data
data:
p-value = 0.07308
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.9472181 3.8570642
sample estimates:
odds ratio
1.89544
> barnard.test(control_headache, treatment_headache, control_no_headache, treatment_no_headache)
Barnard's Unconditional Test
Treatment I Treatment II
Outcome I 33 21
Outcome II 57 69
Null hypothesis: Treatments have no effect on the outcomes
Score statistic = -1.9518
Nuisance parameter = 0.4 (One sided), 0.4 (Two sided)
P-value = 0.0274702 (One sided), 0.0549405 (Two sided)
Maybe there is a bug in my code? Maybe I shouldn’t be considering headache days as count data because that overlooks the fact that this is really a time series? Does Barnard’s Unconditional Test have more power than these other tests? If I used Barnard’s test when setting up this ridiculous study, can I cherry-pick its favorable result that I get now? What is the underlying distribution here? Maybe a higher dose would work better? How many of the physicists who can’t find academic jobs and who are currently working as “data scientists” know more about statistics than I do?