Prevalence, FP, TP and Biased FP:TP When Prevalence is Low
Here are the charts showing that, out of mathematical necessity, the false positive is very high FP>>TP when prevalence is low
When CDC launched and FDA approved of PCR testing with no way to estimate the baseline cycle threshold (Ct; for every patient, every time), they set us up for a future with unacceptably high false discovery rates.
In my “most important” article, I goofed in my calculation in way that led to 80:1 FP:TP bias when prevalence was at 5%. The minute it was pointed out (BenBongo), I fixed the problem and updated the article:
The relatively trivial error actually opens up the opportunity to underscore the issue revealed by Dr. Lee’s study quite well and teach more about the problem of high FPs with low prevalence.
At any false discovery rate that is not close to zero, at low prevalence mass testing will be a disaster because false positives will greatly outnumber true positives.
Obviously, the higher the false discovery rate at a given prevalence, the worse the problem will be.
So, some definitions:
TP = people w/the condition you’re diagnosing who test positive (let’s assume the test detects all of them for our purposes.
FP = people w/out the virus who test positive.
Total # of test positives = TP+FP
TPR = True positive rate = TP/(TP+FP)
FDR = False discovery rate = FP/(TP+FP)
Effect of Prevalence (the % of people w/the condition you’re diagnosing)
When prevalence is varied, then we can see that FP»TP initially. Remember, the test in this scenario is detecting all of the people with the condition.
The bias in FP:TP is
I chose 5% arbitrarily; I could have chose any; the bias is 80:1 at 0.05%, and it’s about 17:1 at 2.5%; at 4% it’s 10%.
This is true for any diagnostic that has a false discovery rate of 42%: the FP » TP when the prevalence is low.
And this is why Dr. Lee’s study underscore the point: the FP:TP bias has varied since 2020 to today with the prevalence of the actual infection rate, and the trials conducted used a test on populations that had very low prevalence. Thus, the outcomes of the trials in terms of number of cases in the vaccinated groups and number of cases in the unvaccinated groups are bogus.
This truth is fundamental.
So, what rate applies in COVID-19 depends on the prevalence.
The Marines study had at 37% FDR, but the investigators glossed over that.
And this is a separate issue that the biases induced by not counting people as vaccinated until 5 weeks after the first jab, and separate from the bias induced via the drop in the Ct threshold just for the vaccinated (by CDC for reporting).
In fact, CDC was using this well-known problem to their advantage to make the vaccine appear more effective than it was.
I hope this clarifies this particular point: the results of Dr. Lee’s study do not hinge and do not depend on my simple math example at all.
Thank you BongoBen for finding the error in my toy math example.
I'm confused.
How can you know prevalence without relying on PCR? What if the estimated percent of false positives is large? Won't that undermine your confidence in prevalence estimates?
Is there some calculation of the change of rate of total positives that can give you a reasonable estimate of when actual prevalence peaks? It seems to me that the peak of total positives will be larger than the peak of actual positives because the false positive component will inflate the total positive peak and the total positive peak will occur _after_ the actual positive peak.
Am I being clear?
"Thus, the outcomes of the trials in terms of number of cases in the vaccinated groups and number of cases in the unvaccinated groups are bogus."
Isn't there an even bigger problem wrt vaccine efficacy trials due to false negatives (true positives) being ignored? If a mere 20 false negatives occurred in each arm, vaccine efficacy would approach 50%. The NEJM article about Pfizer vaccine efficacy ignored this little problem totally.
If we knew the total number of PCR tests that were conducted in the Pfizer trial, we could apply an estimate of the false negative rate for this period of time to the trial's number of PCR tests to estimate the number of false negatives (true positives) for each arm.
I would appreciate seeing a post about this .