text size A A A

Lies, Damn Lies, and Statistics

Supposedly it was Mark Twain who coined the phrase, "lies, damn lies, and statistics." Though doctors and scientific researchers rarely are guilty of lying, the phrase does describe the dangers of misunderstood or badly explained medical statistics. When people with a diagnosis of breast or ovarian cancer are given information about the "odds" for their prognosis, they should have at least a basic understanding of what the numbers mean.

Why We Need Statistics to Understand Treatment Choices

Statistics measure quantity and likelihood of specific treatment benefits in similarly situated patients. Using mathematical techniques, the conclusions drawn from studying one set of patients with a particular disease can be expanded to estimate a probable impact on all patients with the same disease. Without statistics doctors have only their own experience to inform themselves of the possibilities for successful treatment of each individual. Their experience of a few dozen or even hundreds of similar cases provides much less information than scientific research collecting data on thousands of patients.  

But statistics don't create certainty about treatment benefits, either for a population as a whole or for an individual. Any statistic has an associated level of significance, or confidence that it represents reality, but that confidence is seldom 100%. Statistics therefore are indicators which must be considered in the light of various factors, such as each patient's baseline risk of further disease and the toxicity of the treatment.

In this discussion I will deal generally with the use of statistics for breast and ovarian cancer treatment. 

How are Statistics about Treatments Derived?

What most patients are most eager to know is the effect of a specific intervention in the course of a disease. The intervention is most likely a drug, but it could be something else, for example, a radical change in diet. Let's assume that a drug is the intervention. To estimate the effectiveness of the drug, researchers compare the intervention to no intervention. Or, they measure the effect of this new intervention and compare it to an established successful intervention.  

In both situations, randomized double-blinded clinical trials generate effectiveness measurements. For such trials, A Treatment Group and a Control Group of individuals that do not get treatment are established. Taken together, the groups are known as Intent to Treat Group. No one participating in the research knows who in the Intent to Treat Group actually gets treatment. Those who don't get the drug being tested either get nothing new, or they get a placebo. 

The researchers set a time for the trial and define adverse events that will be looked for during this time. In breast cancer for example, events might be a new tumor within one year. It is expected that there will be more adverse events in the control group than in the treatment group.  

If at the end of the trial period there are the same proportion of adverse events in each group, it is obvious that the drug has no beneficial effect. But if the results are different in each group, that is, if different proportions of patients have adverse events, then the difference between those two proportions, in the context of the size of the two groups, describes the effectiveness of the drug.  

Effectiveness can be expressed as one of three statistics. These are Absolute Risk Reduction, Relative Risk Reduction and Number Needed to Treat Before Benefit.  These three are related in some ways but have somewhat different meanings for treatment decisions in a doctor's office. For now I'm going to label them Effectiveness Statistics, and deal with them individually in my next column.

Statistical Significance and Confidence

The Effectiveness Statistics from the trial derive from a small portion of the entire population that has the disease for which treatment is being tested. To know whether these sample results have a very good chance of occurring the same way if all people with the disease got the drug, mathematical tests of Significance are applied. We can think of statistical significance as Confidence that the results would be obtained in the world. A confidence Level of 95% is the general target for treatment research. From a layman's perspective, It can be understood to mean there is 95% chance that the same effectiveness results would be found in the entire population having the disease and being treated by the drug being evaluated.*   

Confidence in a statistic is determined by three main factors: 

  • The number of patients in the two groups being compared; this is sample size.
  • The number of events found in each group.
  • And the difficulties of measuring those events--called Noise.  

 

Sample size, number of events, and noise Influence the outcomes interactively.

Factor When Factor Increases 
Sample size Confidence increases 
Event  Confidence increases 
Noise  Confidence decreases 

Of the three factors, sample size is the most important.  But it is often difficult to control. Many trials have been unable to attract enough participants to yield a very high statistical significance or confidence level. There is a need for extremely careful measuring and recording of events so that noise does not impair confidence. Theoretically a patient could read about trials in medical journals to learn about these individual factors, or ask her doctor for an opinion about the validity of a trial she is interested in: but these are not easy things to do. Fortunately they are not necessary because the Confidence level sums it up. So patients would do well to ask what the confidence level is before relying on research results to choose a treatment.

Statistical Significance vs. Clinical Significance

Often patients and activists interested in what research can tell them are frustrated by the difference between statistical significance and clinical significance. It is confusing to be told BOTH that research results may be statistically significant without having much clinical merit, AND that there may be clinical value in treatments for which relatively small statistical significance is associated. Confidence is responsible for these two statements being true. Small effect sizes are usually considered clinically relevant if there is great confidence in them. And conversely, a very high statistical significance would hardly guarantee clinical benefits if the confidence level is low.  

For any individual, baseline risk also plays a role in determining the potential clinical benefit of a treatment shown to be statistically significant. For a disease in which the risk is roughly the same across an entire population, then an effectiveness statistic is a good guide to clinical practice because it would apply to a large class of patients. But if there is a treatment risk that varies with a patient's previous treatment, or current general health, or biological type of disease, then effectiveness statistics, even if strong, might not indicate that the treatment will be best for her.

These concerns will be pursued in the future discussion of three effectiveness statistics.

------

*Alexandra Barratt, Peter C. Wyer, et al.  Tips for learners of evidence-based medicine: 1. Relative risk reduction, absolute risk reduction and number needed to treat.  CMAJ • August 17, 2004; 171 (4). doi:10.1503/cmaj.1021197

Canadian Medical Association



Posted April 28, 2011.

« Next Post  |  » Previous Post

Comments

Please feel free to leave comments, as they’re helpful for other readers. However, if you need support from SHARE, please call or email our Hotline.

All comments are reviewed by SHARE before they are posted. You will not see your comment here immediately.

 clear!