How skeptical should you be of an after-the-fact subgroup analysis in a failed clinical trial?

This article was adapted from STAT’s latest report, “Subgroup analysis: how to evaluate post hoc tests for significance in failed clinical trials.”

Clinical trials of newly developed drugs often don’t work out.

advertisement

When that happens, it’s common practice for biotech and pharma companies to look for ways to salvage their financial investment, often conducting further data analyses in particular types of patients after the fact, to see if the compound might have been effective in those smaller groups.

If one of these analyses finds a hint of an effect, the question for investors, patients, and the companies themselves then becomes: How how skeptical should they be of this result?

In the latest STAT Report, “Subgroup analysis: how to evaluate post hoc tests for significance in failed clinical trials,” Frank David, founder and managing director of Pharmagellan, a firm that consults on R&D strategy, and the author of “The Pharmagellan Guide to Analyzing Biotech Clinical Trials,” provides in-depth answers to this question. He points out the statistical pitfalls in after-the fact analyses of subgroups and offers readers valuable advice about what to look for in evaluating their potential.

advertisement

STAT sat down with David recently to talk about the new STAT Report.

Frank, why is so-called post hoc subgroup analysis worth explaining in detail?

When a biotech’s clinical trial fails, the sponsor often says, “Sure, the drug didn’t work overall, but look at this subset of patients. It seemed to work in that group!” The problem for readers of these so-called post hoc (after the fact) subgroup analyses is that it can be hard to decide what to think. Does the drug still have a chance, or should it be written off? That’s the problem I tried to address in this report.

Why do companies do after-the-fact analyses of patient data after their trial of a new drug fails? Isn’t it a waste of money?

From a scientific perspective, looking at different cuts of the data from a failed study can sometimes identify hypotheses about the patients in whom the drug might actually work. So for sponsors, the issue isn’t so much whether or not to evaluate post hoc subsets. It’s really about what to do with the results, because it’s very easy to get a positive post hoc result that arose purely by chance. So sponsors have to decide: Are the subgroup results convincing enough to justify running a follow-up trial? That’s where the potential to waste money comes in.

You note in the report that it’s important for investors to understand how to evaluate companies’ reports of planned or already conducted post hoc analyses after failed trials. Why is that?

Small biotechs are under enormous pressure to try to find reasons to keep hope alive when a drug fails a clinical trial. That’s why it’s fairly common to see companies point to a post hoc analysis as a reason to test the drug just once more, this time in a patient population where it seemed to have a glimmer of efficacy. But for investors, it may be quite risky to bet on the redo study, particularly if the post hoc analysis has obvious red flags that make it highly unlikely that the results will be reproduced.

When an after-the-fact analysis finds a statistically significant result in a subgroup, it’s often a good bet that if the company takes that result and runs with it, launching a larger trial, it will end up being a bust. But is that always the case?

Not always — but it’s important to be realistic about the odds of success. In my report, I did find a handful of examples of drugs that were “salvaged” after a failed trial by a post hoc analysis. But there were many more cases where the follow-up study didn’t pan out. So even in the best-looking cases, the chances are high that the post hoc results won’t be repeated.

What are three things to look out for when a company says that they have found a “hint” of an effect in a subgroup of patients who participated in a larger trial that failed?

When I see a report from a post hoc subgroup analysis, I look closely at three things. First, how many subgroups did they examine? That’s important because the odds that a positive finding arose by chance go up as you run more tests. In the best case, the company will have pre-specified which subgroups they planned to look at, but in other scenarios it may be hard to tell if they looked at three, or 30, or 300 patient subsets to find one that came out positive.

Second, if the subgroup involves a continuous variable, one like age or a lab value, that is measured not counted, I check to see if the cut point the sponsor used to define the subgroup makes sense. Even when the cutoff is a nice round number, like patients older than 55, I always check to see if it’s bolstered by guidelines or papers. If not, maybe the company looked at subgroups older than 45, 50, 55, and so on, and just picked the one that worked — which, going back to the earlier point, means the findings are more likely to be a red herring. And if you see a weird cutoff point, like patients older than 57, that’s a really bad sign that a sponsor ran a bunch of tests and just cherry-picked the one that yielded a positive p value.

And finally, I take a step back and ask myself whether the subgroup makes scientific and clinical sense. There’s a great paper the FDA often cites in which the investigators showed that a drug’s effects seemed to depend on whether the patients were born under certain astrological signs. It’s rare to see a subgroup analysis that’s so clearly absurd, but there are plenty more where the company’s arguments about why the drug should work better in those particular patients but not others don’t make much sense. A positive finding in an implausible subset is very likely to be a chance finding.