Yale Masking Study Scientist Quits Rational After Debate
Logical contradictions lead to an inelegant end to otherwise mostly civil discourse.
I have to give credit to Dr. Abaluck for showing up for the debate, and for being willing to address criticism of his study of the effects of masking in Bangladesh.
I’m sorely disappointed in Dr. Abaluck’s inexplicable inability to follow his own illogic.
During an email exchange, post-debate, Dr. Jason Abaluck quit rational discourse with me after failing to address the concerns I raise. The email involved Steve Kirsch and others thanking him for meeting with me, and debating his masking study. He first informed Steve he’s done with him, and then after a barren reply to my correction to his misrepresentation of my side of the debate, he rage quits on me, too.
In one of the emails from Dr. Abaluck, he made the claim that the only issue I raised in the debate was the problem of multiple hypothesis testing, to which I objected. In my reply, I noted a few of my other concerns and added a couple more.
Before we start, I should point out that I had also raised the issue of generalizability: differences between the respiratory hygiene practices of the population in Bangladesh and, say, the US. But you’ll see that he failed to represent the breadth and depth of the concerns I raised, and would like to sweep away any and all concerns without directly addressing them in detail.
Let’s start with his email message that prompted my objection. He is referencing Steve Kirch’s blog post about purple masks:
Abaluck: I have never once looked at the "colleagues listed on [your] substack" and so can't comment on them. I am saying that *you specifically* do not realize how little you know about the relevant topics. You have the dangerous combination of having spent a great deal of time studying them without having the relevant background and knowledge to synthesize and evaluate what you read.
Pleased stop with this "flawed" bullshit. I spoke with James for two hours and he didn't even try to defend your blog post, which makes elementary statistical errors. I think the only "critique" he made is that he would have preferred we conduct multiple inference adjustments in our subgroup analyses. We discussed this at length -- I don't think his critique is right, but even if it were, it would apply equally well to almost all published RCTs.
I've no interest in corresponding with you further -- I don't expect that most of your readers are at least as ignorant as you, but perhaps our interview will give a few of them pause, which was my goal.
(JLW: Again, that was directed at Steve Kirsch, not me. I never intended to directly defend Kirsch’s observations about purple masks, specifically, but instead focused on how Dr. Abaluck interprets his own subgroup analyses in general. He is inconsistent in wanting to stand by them if they are positive, but claims that negative results do not count. That means they fail, as executed and interpreted by him, as a critical Popperian test of the hypothesis they are testing, and fall outside the realm of science.)
Lyons-Weiler: Hi, Jason,
Thank you for the reference to our discussion.
I have to disagree with your characterization that the only issue I had was multiple comparisons.
In addition to correction for multiple comparisons (which some statisticians consider to be study-wide risk, not parseable into "main analysis" and "subgroup analyses", with much formal methodological development (see Storey, Benjamini/Hochberg, and others)...
There was also the issue of not being able to screen for IgG evidence of Immunity first, and the difference of opinion on whether the main effect of "masks" was strong enough to carry noise from cloth masks.
I didn't get to it but I could not tell what the degrees of freedom were based on for your GLM. If it accommodated village-level independence, ok. But if you used N (# of individuals), that's pseudoreplication.
Here's a relevant Google search on same: https://bit.ly/3Jd44qi
If your study is pseudoreplicated, you should retract it due to methodological flaw. I can't tell from the paper nor from the appendices (you're supposed to report the df associated w/your test statistic and p-value, that would fall under the category of basic required info for studies I handle as Editor-in-Chief or as a reviewer).
Re: The overgeneralization, in plain terms you seem to want to have your cake & eat it too; you say the subgroup analyses don't count as much, because they are subgroup analyses. But only the negative subgroup analysis results don't count?
To statisticians, the mass of "Jupiter/Pluto" is 99.9999% gas... yet we know Pluto is not gas, just like all of medicine and engineering that bothers to be involved knows that cloth masks do not prevent transmission.
It's not so much a flaw in the study as a flaw in the application of logic. Had the study only published "masks", and later on heterogeneity of effect was found between surgical and cloth masks, that would reflect unaccounted heterogeneity - and the very basic assumption of "x is a random sample from a uniform population" would known to have been violated.
And of course I also pointed out it's only a single study, and that anyone who says it's proof that masks work (i.e, the press) would be over-interpreting, a point to which you agreed.
I'm delighted our interaction did not devolve into ad hominem, I thank you for that.
Staying focused on the attributes of the study is far more important than credentials.
As far as purple masks go (Steve Kirsch’s concern), the concern rests over how much one can even bother to interpret subgroup analyses, if they are at risk of uncontrolled Type 1 error inflation due to multiple comparisons.
I would like to see a practicing statisticians' opinion on these points, but frankly, I expect most will be too sheepish to weigh in objectively. Given false confidence of protection from cloth masks, the issue of prior knowledge (before the study) is important.
Specifically, CDC published on their website in early 2020 that 20 layers of cloth were equivalent to N95; they lowered that for a few days to 16, and then Fauci announced a single cloth layer was sufficient.
We all know that's false.
We can debate "sufficient" and "stops transmission"; even a 1% reduction is "stopping some transmission" but failure to stop 100% transmission is a failure to stop transmission.
But I think we can agree that public health policy has been a mess since early 2020 on every important topic, and that the public's, including Steve's skepticism is not unwarranted nor unexpected given the mishandling of the public trust on testing, masking, early treatment, pharmaceuticals, the lockdown, the vaccine studies, and so on.
JLW
PS Yes, all RCTs should perform p-value adjustment to control the FDR. And one could argue your hypothesis test was 2-tailed, e.g., in the event that masking could make infections more likely, for example, due to false confidence of protection. Then your alpha would not be 5%, but rather 2.5% and your p = 0.03 would be in jeopardy even without study-wide FDR.
Abaluck:
There is simply no issue with “not being able to screen for IgG Immunity first”. As I explained, we showed that symptomatic seropositivity was balanced at baseline. But even if we had not collected any baseline data, this would not compromise our analysis in a randomized experiment.
I explained in detail that our main effect included cloth masks. You have an issue with some newspaper headlines about this. You tried to give an analogy with chemotherapy treatments, and I explained that the headline, “All types of masks worked!” or “All three chemotherapy treatments work” would indeed be misleading, but neither of the headlines you referred to said that. This is in any case not a critique of the study, but with the media reporting. I certainly agree with you that we should encourage people to wear higher quality masks rather than cloth masks given the totality of the existing evidence, as I have repeatedly done.
RE: the subgroup analysis, I don’t want to “have my cake and eat it too” – multiple inference correction answers a very specific question that is not the one we were asking in our subgroup analysis.
I don’t have time to respond to additional critiques that you couldn’t bother to bring up during our 2 hr discussion (needless to say, they are not persuasive). Your remarks below about purple masks don’t engage at all with Steve’s absurd claim that failing to find a significant result after dropping 5/6th of the data in the treatment group invalidates the study.
I’m not interested in discussing this further with you. I wish that if you had further critiques of the study you had raised them during our lengthy discussion to give me a chance to reply.
—
Here’s the Atlantic article title:
There are other articles I pointed to that make the same unwarranted generalization; I noted to him that the public gets almost all of their information about science from the press, and that many will only ever read the headline.
Note that Abaluck started the debate by interrupting my introduction, saying that he disagreed with my criticism of headlines that claimed that his study proved the “masks” work, and he even admitted contacting the New York Times objecting over their article that made the same unwarranted generalizations.
I don’t see any qualifiers on cloth vs. surgical in the headlines that I read to him, and in the debate, Abaluck admitted that he had a problem with the press getting science wrong. I even told him these titles were not his responsibility, and he said he even contacted the NY Times to have them correct a title that made the same logical flaw. But now he claims there is nothing wrong with the headlines. What’s going on?
He’s contradicting himself, and he doesn’t even see it. Or he sees it, but is not allowed, or is not allowing himself, to applied logic in this case.
There are no credible data that show that cloth masks prevent transmission or infection with the SARS-CoV-2 virus.
In the debate, and in the paper of the study, Dr. Abaluck admitted that they could not perform IgG screening on the asymptomatic due to a lack of consent. Yet he sweeps this away as a non-issue. Again, he has contradicted himself.
I cannot explain the absence of clear logic, but I am glad to have had the chance to bring these issues to light.
We need far better science on matters of public health.
What do you think, did you watch the debate? (If you missed it, it is offered again, below). Feel free to share this with your colleagues, family and friends.
If you’re not a paid subscriber to Popular Rationalism, please consider joining the 245 readers who pitch in to help out by hitting the Subscribe button and upgrading your subscription.
If you ARE a paid subscriber, THANK YOU. And if you can only share this on social media, THANK YOU, TOO. You’re making a huge difference.
Also, consider taking my course, “How to Read and Interpret a Scientific Study” this fall at IPAK-EDU. There is a lot of study design in that course. I’m also putting together a course on formal design of research studies to empower the public to know how to call out bad science when it’s published.
Have a rational day!
Related:
One thing that occurred to me listening to your discussion was the question around reporting symptoms vs. an antibody test. Symptom reports are highly subjective and easily confounded by wanting to please the researcher, etc. Dr. Abaluck seemed to imply that symptom reports were just as good and this was no big deal.
"here's a flawed study proving what I want proven. it wasn't paid for by any nefarious funders. follow the sickness, uhh I mean science."
Welcome to the dark ages, we've got fun and games. We've got everything you'd want, NPCs know the name. In the dark ages, welcome to the dark ages, na na na na na-----