HHS Responds to Community and IPAK Calls for a Meaningful Plan for the Use of Artificial Intelligence
HHS Puts Machine Learning at the Core Throughout its New Outcomes-Focused AI Program. We predict innovation supported by aggregate AI infrastructure.
Despite the massively overhyped messaging that artificial intelligence is something that will destroy or enslave us all, The U.S. Department of Health and Human Services (HHS) just published the most aggressive modernization blueprint in its history: a department-wide Artificial Intelligence (AI) Strategy and a companion Compliance Plan for OMB Memorandum M-25-21.
Thank goodness.
First, the Compliance Plan is not what you might think it is. It’s about interoperability, not submission. But both documents do announce the intake of master plans for the ethical use of AI to help overturn decades of tomfoolery and other words we won’t use in this article that have kept citizens, physicians, lawmakers and regulators in the dark about effective pathways to health.
Both documents revolve around a single fact: machine learning (ML) will sit in the critical path of how the federal health system discovers science, regulates products, pays for care, runs public-health surveillance, conducts research and research administration, and manages its own internal work.
The Strategy and Compliance Plan treat these models as first-class, prioritized infrastructure: HHS will fund them, train them, share them, regulate them, audit them, and shut them down when they fail. One of the main goals in the use of AI is to restore predictive validity, empirical reproducibility, and actionable clinical insight—principles long advocated by IPAK in its 2015 white paper on Machine Learning Prediction Modeling: “Statistical significance in predictive modeling is something, but generalizability of predictions is everything”, and in our recent Substack (see below).
Critically, the HHS AI Strategy explicitly incorporates these methodological priorities. HHS pledges to “facilitate reproducible pipelines (pre-registration, protocol standardization, data standardization, code and/or data release when feasible) to expedite validation”. In practice, this means the use of independent test sets, real-world validation environments, and performance monitoring over time. HHS reinforces this commitment by stating it will promote “designing and deploying tools to evaluate AI models for interpretability and generalizability of results, output quality, and usability over time”.
From the report:
WHAT HHS SAYS IT WILL BUILD
The AI Strategy organizes HHS’s plan into five pillars: governance, infrastructure, workforce, research, and care delivery. Machine learning runs through every pillar, not as a support tool, but as a structural operating system. Most notably, clinical outcomes—not theoretical capability—are the clear and stated priority.
GOVERNANCE WITH REAL TEETH. Pillar 1 commits HHS to “forward-leaning AI governance” that increases innovation speed while protecting the public. It requires an inventory of AI systems, pre-deployment testing, and risk assessments for all “high-impact AI.” Systems must pass independent review or be paused or shut down. The HHS is clearly being responsive to long-standing calls from IPAK and others to ensure that predictive models used in public health meet minimum performance (think accuracy, generalizability) and safety thresholds before they are deployed.
THE “ONEHHS AI COMMONS”: MODELS AS INFRASTRUCTURE. Pillar 2 proposes a shared computational and data layer, the OneHHS AI-integrated Commons. It includes FAIR-aligned data repositories, model hosting, evaluation testbeds, and cost-effectiveness tracking. Rather than allowing duplication across agencies like NIH and FDA, HHS will centralize the model development lifecycle, ensuring reuse, transparency, and real-world relevance. One of the main goals in the use of AI is to ensure that machine learning models are not just black-box solutions but tools built and evaluated with the same rigor as biomedical instrumentation. As IPAK argued, “a one-time feature selection explains why results from such studies show low reproducibility; generalizability must be enforced empirically.” Being able to find predictors across data sources requires integration and open weighting.
OPEN WEIGHTING. For those not familiar with the jargon, this means that machine-learning optimized weights of predictors will be made explicit so we can all see with 100% transparency which predictors are important. So much can be hidden behind relative risk ratios and p-values. But it also means an end to subjective model selection; criteria for weighting competing models will also be made explicit. Epidemiologists and public health data analysts who want to succeed in this area should contact their Computer Science and Bioinformatics departments and ask for a course “Machine Learning for Epidemiologists and Clinical Prediction Science” and learn evaluation science, ROC curves, weighted accuracy optimization, and the bedrock principles of machine learning. They need to understand that the clinical populations are heterogeneous and they have been living in a Simpson Paradox world in which whole-population studies are seen as best-in-class when they need to learn how to make generalizable predictions about how factors impact different categories of patients. “Generalizable” here does not mean “useful to many people”; it means “Accurate to the same degree on the same populations or subpopulations when the data are independent of the model”.
WORKFORCE MODERNIZATION: COPILOTS FOR CIVIL SERVANTS Pillar 3 focuses on people. HHS pledges to train all employees with role-specific AI education, deploy secure AI copilots, and support AI champions inside the Department. The goal is to make ML a familiar and trustworthy asset, not a mysterious add-on. This reflects IPAK’s position that effective AI integration depends not only on data infrastructure but on analytical fluency. “The broader clinical value of prediction must not be lost due to concern over liabilities... generalizability of models must be a core skill set,” IPAK wrote.
HHS hardlines this position in their report: “HHS aims to increase the use of AI so it can be used to deliver measurable improvements in both population health and individual patient outcomes. Importantly, AI will augment clinicians, caseworkers, and public health professionals, serving as a supportive tool that enhances human decision-making and efficiency without compromising the essential human touch (i.e., to build trust and to validate use of the tool) in health care and human services delivery.” Sound familiar?
GOLD-STANDARD SCIENCE WITH MACHINE LEARNING INSIDE IT Pillar 4 integrates ML into scientific workflows. It will foster and enable new mandates for pre-registered, reproducible, open-weight models (when lawful), and validated code pipelines. This is not window dressing. The HHS is clearly being responsive to these calls by integrating empirical generalizability as a first-class requirement. IPAK’s recommendation to shift from static statistical significance toward dynamic predictive accuracy is effectively embedded in this structure. In fact, IPAK stated, “Only [machine learning prediction modeling] results in clinically actionable prediction models.” As policy and practice will be based on science, not the other way around, Pillar 3 requires Pillar 4.
CARE AND PUBLIC HEALTH MODERNIZATION Pillar 5 turns ML toward delivery: early-warning, risk stratification, clinical decision support, and outbreak detection. These systems will be measured by impact: fewer readmissions, improved maternal outcomes, lives saved. Real outcomes. One of the main goals in the use of AI is to allow health interventions to anticipate rather than react—to optimize clinical care toward success, not merely measure failure rates. This pillar operationalizes that ideal, making optimized clinical outcomes—not 1980’s style epidemiological associations—the north star of HHS AI integration.
THE COMPLIANCE PLAN: HOW HHS MAKES THIS STRATEGY WORKABLE
While the Strategy lays out the vision, the Compliance Plan creates enforceable structure. It requires model weight sharing under the SHARE-IT Act, accelerates Authority to Operate approvals, and establishes deadlines to sunset non-compliant tools. IPAK long argued that AI use without independent validation is a recipe for harm. The HHS is clearly being responsive to these calls by defining risk-managed review cycles focused on generalizability and public transparency requirements.
WHAT THIS MEANS FOR PATIENTS AND THE PUBLIC
Patients will benefit from faster research, earlier detection, and predictive models that enable prevention. Clinical staff will see reduced administrative burden. The public will gain transparency, as open-weight models and plain-language summaries allow for oversight. One of the main goals in the use of AI is to make public-sector decisions more explainable, auditable, and reviewable by citizens and watchdogs alike. Just as IPAK emphasized, “reliable prediction models must be trained, evaluated, and re-evaluated with real data—not left to inference from assumed statistical properties.”
WHAT THIS MEANS FOR DEVELOPERS
Machine learning model builders will find a serious ecosystem at HHS. Models must be portable and explainable. Infrastructure will allow for shared testing and rapid iteration. Public-facing release requirements will raise the bar for rigor. The HHS is clearly being responsive to calls for a common model spine, where code and performance travel together—not disjointed pilots and hidden rule engines. As IPAK noted in its guidance to federal funders, “Ideal studies... specify that all analyses—exploratory or not—be published, at least online, regardless of whether the team considers them final.”
CONCLUSION
HHS has not merely endorsed AI—it has embedded machine learning into the core logic of how the Department will function. The Strategy and Compliance Plan represent the first large-scale realization of IPAK’s long-articulated framework: prediction over association, generalizability over single-study significance, and accountable public deployment over black-box experiments.
One of the main goals in the use of AI is to shift from plausible results to reliable, operationally useful knowledge. The HHS is clearly being responsive to that shift. For once, government isn’t trailing innovation. It’s institutionalizing the right parts.
And this time, it starts with the code.
If you have supported IPAK in the past, you have helped make this possible.
RELATED
POST NOTE:
How Builders Can Engage With the HHS AI Ecosystem
The U.S. Department of Health and Human Services (HHS) AI Strategy outlines several actionable opportunities for developers, researchers, and innovators to contribute to the federal AI modernization effort. This section offers a practical guide for engaging with HHS’s AI ecosystem, strictly based on published federal sources and directives.
This engagement framework reflects a sea change in how HHS expects to interact with external partners: through transparent infrastructure, reproducibility, and clinical value. Builders who meet these standards will find a welcoming—if exacting—federal AI ecosystem to join.
1. Align with the OneHHS AI Commons Infrastructure
HHS is building a shared infrastructure called the “OneHHS AI-integrated Commons” that includes scalable data repositories, model hosting, evaluation testbeds, and orchestration tools. This ecosystem is designed to promote reuse and cross-agency model adoption.
“By building this OneHHS AI-integrated Commons, the Department will ensure that new AI solutions can be developed, tested, and deployed rapidly, with the ability to operate in different environments and across different systems.”
HHS AI Strategy, p. 12
What to do: Design your models for transparency, portability, and FAIR (Findable, Accessible, Interoperable, and Reusable) data standards. Prepare them for hosting and evaluation.
2. Prioritize Generalizability and Reproducibility
HHS requires that AI projects meet strict reproducibility and external validation criteria.
“Facilitate reproducible pipelines (pre-registration, protocol standardization, data standardization, code and/or data release when feasible) to expedite validation.”
“Designing and deploying tools to evaluate AI models for interpretability and generalizability of results, output quality, and usability over time.”
HHS AI Strategy, p. 16 & p. 18
What to do: Build and document models using reproducible research frameworks. Include clear train/validation/test splits, source provenance, performance metrics (e.g., ROC AUC, sensitivity/specificity), and methods for generalizability assessment.
3. Focus on Health Outcomes and Clinical Utility
HHS AI priorities are organized around measurable improvements in population and individual health outcomes. Tools are expected to demonstrate impact, not just accuracy.
“HHS aims to increase the use of AI so it can be used to deliver measurable improvements in both population health and individual patient outcomes.”
What to do: Frame your tool’s utility in terms of actual health impacts—such as reducing readmissions, preventing adverse events, improving maternal or neonatal care, or optimizing staffing and workflows.
4. Prepare for Risk Management and Compliance Requirements
HHS governance enforces pre-deployment testing, impact assessments, and transparent risk controls for “high-impact AI.”
“Establish standardized minimum risk practices for high-impact AI including pre-deployment testing, AI impact assessments, independent review, monitoring, and safe termination if non-compliant.”
What to do: Develop audit-ready documentation. Include explainability methods (e.g., SHAP/LIME), fallback procedures, compliance statements, and alignment with the NIST AI Risk Management Framework.
5. Track HHS AI Use Case Inventory
HHS will publish and maintain an AI Use Case Inventory. Builders can identify gaps or trends and tailor proposals accordingly.
“The annual HHS AI Use Case Inventory is the starting point for this effort.”
What to do: Regularly monitor this inventory for unmet needs or replicable models that can be improved or adapted.
6. Build Tools that Support Workforce Augmentation
HHS plans widespread deployment of internal AI copilots and domain-specific automation.
“Deploy secure, approved AI tools integrated with HHS systems and data.”
“Encouraging development of reliable and secure AI-assistants and conversational tools... that deliver guidance to patients with appropriate disclaimers.”
HHS AI Strategy, p. 15 & p. 18
What to do: Propose AI solutions that improve workflows, automate administrative tasks, and augment clinician decision-making. Be prepared to show compliance with PHI protections and accessibility standards.
Watch Artificial Intelligence (AI) | HHS.gov for more details and funding announcements.
Popular Rationalism and IPAK are independent entities not affiliated with The U.S. Department of Health and Human Services. We are responsible for this content. All interpretations are ours. This article may be reproduced partially or in its entirety without restriction.






The deep problem with any of this is transparency. Open source code is an absolute necessity, if it is even possible. Nit even a mention of this is made. Without it, all of this becomes corruptible hand waiving and promises veiling what will, in all likelihood, inevitably become a blackbox sausage machine without real and serious commitment.
Skepticism is not only warranted, but mandatory.
Can you clarify what kind of "AI" is being implemented by HHS? I believe that only Machine Learning (ML), which is the use of digital networks to identify patterns in data, will be implemented. Large Language Models (LLM)s, which are trained on natural language and are used to imitate human speech, will not be implemented in this program. Is that correct?