Critique of Gabrielsen et al. (2022): “Infralow Neurofeedback in the Treatment of Substance Use Disorders: A Randomized Controlled Trial”
This is a textbook case of misinterpretation of Type II error due to heterogeneity and endpoint misalignment as rejection of the null hypothesis. It seems to have been designed to fail.
J Psychiatry Neurosci 2022;47(3):E222–E229. DOI: 10.1503/jpn.210202 47-3-E222
Executive Critique:
This trial was structurally underpowered for its stated endpoint, contaminated by control group inequity, executed under diffuse inclusion criteria, and designed with a mismatch between intervention mechanism and outcome metric. It appears not only that the study was set up to fail—its primary endpoint (global QoL) was misaligned with the theorized and previously observed effects of neurofeedback (targeted symptom regulation)—but that it actively masked potential benefit by (1) failing to stratify by impulsivity/restlessness, (2) excluding participants on MOUD, and (3) choosing an unblinded, non-equivalent comparator (TAU of variable dose and content).
I. Design and Outcome Misalignment
Primary Endpoint: Quality of Life (QoL‑5)
QoL-5 is a blunt, global composite encompassing physical, emotional, relational, and existential well-being. The authors selected this as the primary endpoint despite the mechanistic theory of ILF-NF suggesting proximal regulation of restlessness, arousal, and craving. This mismatch guaranteed low sensitivity to change: no trial—not even pharmacologic—shows robust QoL shifts in SUD from one adjunct modality over 20 sessions.
QoL is driven by social capital, economic status, and long-term abstinence—not 10 weeks of 30-minute neurofeedback in outpatient settings. The authors’ own text acknowledges this misalignment post hoc, suggesting the scale lacked “sensitivity to detect the more specific effects” of ILF-NF47-3-E222.
No Mechanistic or Intermediate Outcomes Tracked
The trial collected only superficial self-reported endpoints: sleep, distress (SCL‑10), restlessness (VAS), and substance use (EuropASI composite), with no EEG metrics, no neurophysiological change, and no relapse latency. This omits mechanistic validation, particularly since ILF-NF is explicitly proposed to modulate infralow connectivity and bottom-up cue reactivity.
No fMRI, no spectral EEG analysis, no cue-reactivity paradigms, no impulsivity scales. No mechanism, no signal.
II. Sampling and Stratification Failures
No Stratification by SUD Subtype, Trait Impulsivity, or Comorbidity
The study included a mixed SUD outpatient population (alcohol, stimulants, cannabis, polysubstance)—but stratified none of it. The intervention might plausibly benefit stimulant users with high restlessness or low inhibitory control—but the authors lumped the cohort, tested no interaction effects, and reported no subgroup outcomes.
Further, they excluded individuals on MOUD (methadone or buprenorphine)—paradoxically eliminating opioid-dependent individuals most relevant to U.S. overdose policy. As such, the sample cannot inform ILF-NF use in opioid-related SUD, limiting translational relevance.
Baseline Use Scores Low (Ceiling Effect)
At baseline, mean EuropASI scores for substance use were 0.04 (drugs) and 0.15 (alcohol)—near-zero. The study population was barely using substances at enrollment, negating the possibility of observing reductions. This introduces a ceiling effect on the most policy-relevant secondary outcome (relapse/use). The authors failed to account for this in power calculations, inclusion criteria, or interpretation.
High Functional Impairment but No Targeted Outcomes
Subjects scored below population norms on QoL and ORS, indicating high baseline impairment. Yet the study failed to tie ILF-NF targeting to the actual dysregulated symptoms. The only statistically significant difference was in restlessness, a plausible target of ILF-NF and the only dimension showing a group effect (p = 0.006)47-3-E222. This signal was buried under an inappropriate primary outcome and not followed up with moderation analyses.
III. Comparator Arm Problems: TAU as a Moving Target
Control Arm: Non-equivalent “Usual Care” Dose
Patients in the ILF-NF arm received more total contact hours (14 NF sessions + 7 TAU = 21 sessions) than TAU alone (10.8 sessions), but no attempt was made to equalize contact time, use a sham neurofeedback comparator, or otherwise control for therapist engagement.
The TAU arm was highly variable, combining CBT, MI, and “psychosocial approaches” at provider discretion, creating noise. That ILF-NF achieved any significant signal on restlessness in this uncontrolled comparator landscape is notable, and should have prompted follow-up, not dismissal.
Blinding Absent, No Expectancy Control
There was no participant blinding, no attempt at sham feedback, no control for visual/tactile stimulation, and no measurement of expectancy or engagement. This leaves open the possibility that subjective effects (on sleep, restlessness, distress) are contaminated by expectancy bias, which a proper sham condition might have resolved.
IV. Statistical and Power Limitations
Underpowered Post-Attrition
Although the sample size was intended to detect a 0.10 shift in QoL (based on a prior mean of 0.57 ± 0.17), only 67 of 93 randomized participants (72%) contributed analyzable primary outcome data. The result: power to detect differences collapsed. The authors did not re-power the final analytic sample or correct for multiple comparisons across 8 secondary outcomes.
Effect Sizes Not Reported Transparently
The reported restlessness delta (−1.8 on 10 cm VAS) lacks a Cohen’s d or standardized effect size estimate. Yet this ~18% absolute reduction could be clinically meaningful. The lack of follow-up on this signal—whether it predicts longer retention, reduced relapse, or improved abstinence—is a missed opportunity.
V. Interpretation and Framing Problems
Minimizing a Positive Finding
The only statistically significant between-group difference—reduced restlessness in the ILF-NF arm (p = 0.006)—was framed as a footnote, not as a possible mechanistic foothold for relapse prevention. The authors mention restlessness as a cue-triggered relapse precursor yet make no effort to link it to craving suppression or abstinence extension. This finding should not be used to justify “adjusting for” restlessness in future studies; to do so would be to induce a collider effect.
Failure to Generate Hypotheses for Targeted Use
Rather than viewing ILF-NF as likely to work selectively in restless, dysregulated, high-impulsivity subgroups, the authors dismiss the method as failing “to show effectiveness” in this broad population. This is a textbook case of misinterpretation of Type II error due to heterogeneity and endpoint misalignment as rejection of the null—a failure of trial design, not the intervention.
VI. Recommendations Going Forward
For ILF-NF:
Do not abandon ILF-NF; test it in subpopulations with high restlessness, craving, trauma, and/or impulsivity.
Given strong evidence of interaction, study ILF-NF in combined therapy settings (family-level counseling/therapy, ketogenic diet).
Use targeted endpoints: cue-reactivity, restlessness, latency-to-relapse, EEG connectivity, other EEG-based shifts.
Require sham comparators and matched contact control arms.
Stratify by SUD subtype and medication status/history; include MOUD patients in U.S.-relevant trials.
For Policy:
The study does not justify de-implementation of NF, nor does it prove no effect. It is neutered for policy implication on the basis of its flawed designs. It shows that the authors could not detect global QoL change in a heterogenous outpatient cohort receiving short-term ILF-NF, not that global QoL does not change in a signicant percentage of patients.
Funding should support outcome-anchored trials (e.g. restlessness and craving suppression as primary endpoints), not blanket trials against incomplete measures like QoL in a heterogeneous population.
Conclusion:
This trial was designed with endpoints unlikely to reflect ILF-NF’s true mechanism of action or efficacy, known to be limited to about 50% of patients who make it to 16 sessions. They used a sample unlikely to change, and a control arm that could only obscure signal. That restlessness was reduced despite all of this should guide future, better-designed trials—not be dismissed as “disappointing.”
This is a trial that should have been a hypothesis-generator, not a eulogy. Studies on infralow neurofeedback need to be conducted in an epistemic-aware manner, subject to thorough scrutiny. Journals need to do a better job at epistemics. This study wasted everyone’s time and left a blemish on the record for future meta-analyses, which should exclude it for cause.



