What diagnostic tests can teach about spotting public fraud

There’s a thought exercise I like to do with my statistics students. Imagine a doctor who throws away all the swabs and just diagnoses every patient as sick. In one sense, the approach is accurate: No sick patient goes undiagnosed. If it’s harmless to tell people they’re sick, no problem.

But it’s not harmless: Patients and families may face undue stress, isolate themselves, or spend a lot of time and money on unnecessary tests and treatments. This can become a big problem, especially if not that many people are truly sick. I use this example that is so extreme as to be outrageous — no doctor would do such a thing! I never thought a president and his advisors would use that exact approach.

advertisement

As a public health researcher, I am deeply alarmed by the drastic and sudden cuts that President Trump, Elon Musk, and their new staff are making to the federal government, including in the health agencies. These cuts affect programs that are crucial for health, international aid, education, and many other important areas. They have claimed that many of these cuts, freezes, and firings are to reduce waste and fraud in public spending, without credible evidence to back that up. This claim has gained some sympathy even from those who generally do not support Trump’s policies. But if they truly want to strengthen government functions and cut spending by rooting out fraud and waste, the administration needs a different approach, informed by some basic lessons from probability, statistics, and medical tests.

I have taught many students about the concepts of sensitivity and specificity of medical tests, spam filters, and other methods of classification. The core probability idea is that understanding the performance of a test or algorithm requires knowing how well they perform two functions: correctly classifying true positives (sensitivity) and true negatives (specificity). For a lot of tests, including nasal swab PCR tests for Covid-19 and flu, the test has a quantitative measure that then gets turned into a positive or negative result based on some threshold. Some of my research is on the value of this information.

There are two major ways to change the performance of a test in a population; we can change the threshold for classifying positives vs. negatives or improve the underlying accuracy of the test. Trump, Musk, and company have set their threshold vanishingly low and made these “determinations” without proof or evidence. They seem to start with the presumption that every dollar spent that does not benefit them or their friends is a waste. If we think of their cuts as classifying spending as fraudulent, this approach has high sensitivity — it will stop any fraud that’s out there. But it has incredibly low specificity and throws out a lot of important, justified, and legally mandated spending with it. It’s like that doctor assuming every test is positive.

advertisement

This brings us to another key feature of assessing test performance: the cost function. Each wrong classification has some cost associated with it. Evaluating how well a test works depends on the balance of right and wrong decisions. Musk has claimed that any mistake can be undone by restoring funding later. But the federal government is not like Twitter: Mistakes have real-world consequences. While the cost of allowing a wasteful payment to go out is measured in dollars, the cost of stopping a valid payment can be long-term damage to organizationslocal economies, and trust in the government. It also can cause severe and immediate damage to the public’s health and harms to people around the world. These asymmetric risks make this an incredibly dangerous plan.

Actually reducing waste and fraud in government programs requires improving both the sensitivity and specificity of detecting them, not just declaring everything fraudulent. This requires careful research, dedicated staff monitoring their areas of expertise, and stable and consistent processes. Unfortunately, Trump and Musk have undermined all three of those, right from the beginning. One of their first acts was to dismiss at least 17 inspectors general, the leaders of the offices often responsible for identifying fraud and misconduct within government agencies. They have also cut down statistical agenciesoversight agencies, and evaluation grants that can assess the impact of various spending programs, and wreaked havoc on federal employees — those responsible for this oversight — more generally.

Well before this year, various government agencies had begun implementing machine learning algorithms for fraud detection. The human investigators who train and use these algorithms assess these very trade-offs. But the employees with that skill set, along with other dedicated public servants, now face mass firingsforced moves to different locations (at odds with recommendations for developing this work force), and Musk’s team frequently interrupting their actual work.

advertisement

If the fictional doctor ordered a test with zero specificity for their patient, they would be committing malpractice. That is exactly what Trump and Musk are doing to our federal government.

Lee Kennedy-Shaffer is a biostatistics educator and researcher at the Yale School of Public Health.