Health & Medical Kidney & Urinary System

Scandinavian Prostatectomy Study Illustrates Two Common Errors

Scandinavian Prostatectomy Study Illustrates Two Common Errors
In 2002, Holmberg and colleagues presented interim results from a large, randomized trial comparing radical prostatectomy to watchful waiting for the treatment of localized prostate cancer. The trial is virtually without precedent and provides, by far, the best evidence as to the effectiveness of one of the most common surgical operations for cancer.

Interpretation of the trial in the scientific literature appears to have been unequivocal. The journal Evidence-Based Medicine summarized the results thus: "Radical prostatectomy reduced death from prostate cancer but not all cause mortality." Similarly, a clinical guidelines paper for the US Preventive Services Task Force stated that though surgery reduced "prostate cancer mortality ... the groups did not differ in all-cause mortality". The British Medical Journal went further: "Watchful waiting as good as surgery for prostate cancer," trumpeted one headline. Another article, entitled "The operation was a success (but the patients died)," berated the lay press for reporting favorably on surgery ("How media spin distorted the outcomes of a study comparing radical prostatectomy with watchful waiting"). Even the editorial accompanying the original paper, whilst broadly supportive of prostatectomy, stated that there was "no difference between the two groups in overall mortality".

At first glance, the results seem to support the value of prostatectomy: 53 of 347 patients in the prostatectomy group died, 16 of them from prostate cancer; of 348 patients in the control group, 62 died, including 31 deaths from prostate cancer. If around 60,000 prostatectomies are conducted each year in the US, and if overall survival is improved by the 2-3% found in this trial, prostatectomy would extend approximately 1,500 American lives each year.

The claim that prostatectomy had no effect on overall survival seems to be based on consideration of P values: P = 0.02 for disease-specific survival ('statistically significant') versus P = 0.31 for overall survival ('non-significant'). Equating P = 0.31 with 'no difference' is, however, the most basic of statistical errors: failing to prove that surgery is effective is not the same as proving that surgery is ineffective.

We can put this in more formal terms. In classical statistical theory, one establishes a null hypothesis, such as 'there is no difference between groups'. If a test comparing the two groups yields a P value below a preset threshold called α-typically 0.05-one rejects the null hypothesis and concludes 'there is a difference between the groups'. If P is greater than α, this is a 'failure to reject the null hypothesis', leading to a conclusion such as 'we were unable to demonstrate a difference between groups'. Concluding that there is no difference between groups on the basis of a high P value is known as 'accepting the null hypothesis' and is unsupported by statistical theory.

The philosophical basis for never accepting the null hypothesis is the difficulty of proving a negative. We tend to make statements such as 'extensive searches have failed to find the Loch Ness Monster' rather than 'there is no Loch Ness Monster' because we cannot rule out the possibility that the monster is hiding somewhere we have yet to look. A more practical argument against accepting the null hypothesis relates to sample size. Take a trial of 8 patients randomized to streptomycin or placebo for a severe infection, with death rates of 25% for antibiotics and 100% for placebo. It would be unwise to conclude that streptomycin was ineffective because the P value of this small trial was 0.14.

Any conclusion of 'no effect' should be based on consideration of CONFIDENCE INTERVALS (CIs) for the difference between groups, and whether they include clinically relevant benefit. In the Scandinavian prostatectomy trial, the 95% CI for the HAZARD RATIO was 0.57 to 1.2. A near halving of hazard is of obvious clinical relevance, undermining the claim that 'the groups did not differ'.

Why were the P values for cancer-specific and overall mortality so different? One reason, given particular prominence in the British Medical Journal articles, was that prostatectomy might lower the risk of death from prostate cancer but increase the risk of death from other causes. This is not uncommon for treatments (such as chemotherapy) that are associated with important toxicities, but would be unusual for a relatively low-risk procedure such as prostatectomy. Indeed, cause of death is carefully described in the study report: although slightly more men in the prostatectomy group died of other causes, a large proportion of the difference is explained by deaths from other cancers.

We would not expect prostatectomy to cause cancer, certainly not within a few years. The more cogent explanation for the difference in P values is that, for most early cancers, analyses of overall mortality have less statistical power than analyses of cancer-specific mortality. This is partly because overall mortality includes the statistical 'noise' of deaths from other causes. But even if non-cancer deaths are identical in each group, analyses of overall mortality have less power, due to the shape of some underlying statistical distributions. Put simply, it is easier to see a difference between a 10% and 5% rate of cancer-specific death than between a 25% and 20% overall death rate because, whereas 5% is half of 10%, a change from 25% to 20% is a reduction of one-fifth. In the prostatectomy trial, there were 347 surgical patients and 348 controls. From a χ test we get a P value of 0.02 for comparing 16 and 31 deaths from prostate cancer in each group. If we divide the 68 non-prostate-cancer deaths equally between the groups, there would be 50 deaths in the surgery group and 65 deaths in the control group. This gives a P value for total deaths of 0.13 by χ. So even if prostatectomy has no effect on non-prostate-cancer deaths, changing the endpoint from prostate-cancer-specific to overall survival can change our conclusions about its effectiveness.

This article was conceived before the most recent update of the trial was published. The latest findings confirm the arguments I present here. The authors reported a statistically significant difference between groups for overall survival. Therefore, commentators who claimed 'no difference' on the basis of 'no statistically significant difference' were in error. The results also illustrate the lower statistical power of analyzing overall survival. Although there was about a 5% improvement in both disease-specific mortality (10-year probability 14.9% versus 9.6%) and overall mortality (32.0% versus 27.0%), the P value for overall mortality (P = 0.04) was considerably higher than that for cancer-specific mortality (P = 0.01).

Nearly 3 years separate the publication of the initial and the updated study results. Approximately 1.5 million men were diagnosed with prostate cancer during that period. If we very conservatively estimate that only 1% of those men decided against prostatectomy on the basis of the claim that it has 'no effect on overall survival', 15,000 fewer surgeries would have been conducted. Using the updated estimate of a 5% decrease in 10-year survival with watchful waiting, 750 men might have died prematurely as a result.

A mistake in the operating room can threaten the life of one patient; a mistake in statistical analysis or interpretation can lead to hundreds of early deaths. So it is perhaps odd that, while we allow a doctor to conduct surgery only after years of training, we give SPSS (SPSS, Chicago, IL) to almost anyone. Moreover, whilst only a surgeon would comment on surgical technique, it seems that anybody, regardless of statistical training, feels confident about commenting on statistical data. If we are to bring the vast efforts of research to fruition, and truly practice evidence-based medicine, we must learn to interpret the results of randomized trials appropriately. This will require greater awareness of statistical methods, including the dangers of accepting the null hypothesis and the relative statistical power of overall and cancer-specific survival. Getting these fundamentals right is the very least that we can do for those affected by cancer.


CLICK HERE for subscription information about this journal.



You might also like on "Health & Medical"

Leave a reply