Evaluating Biomarkers for Cancer Screening

Although the path from biomarker discovery to cancer screening evaluation is long and arduous, it is important to start the journey in the right direction and not take short cuts that take us to the wrong destination.

Researchers may wish to design a study to either (a) evaluate the classification performance of a candidate biomarker or (b) discover a promising biomarker and then evaluate the classification performance of this newly discovered biomarker.  Evaluating the classification performance of a candidate biomarker requires a smaller sample size than discovery of a new biomarker and evaluation of its classification performance, simply because it excludes the discovery phase.  However use of a candidate biomarker as an indicator of the full carcinogenesis pathway may represent a false economy. The process of carcinogenesis is less well understood than many researchers appreciate1. Many candidate biomarkers are identified using clinical specimens and not the preferred preclinical specimens that are the target of screening.  For biological reasons, biomarkers identified from clinical specimens may not be relevant for early detection.  Many reports of promising candidate biomarkers are therefore overly optimistic.

We recommend that researchers conduct both discovery and evaluation of biomarker for cancer screening using a nested case control study from a biorepository of prospectively stored specimens.  Some specimens would come from persons who later developed cancer and some from persons who did not develop cancer.  To avoid statistical bias, different specimens should be used for discovery and evaluation. Based on desirable classification performances, we suggest 200 specimens for discovery (100 from persons who developed cancer and 100 from persons who did not develop cancer) and 180 for specimens for evaluation (70 from persons who developed cancer and 110 from persons who did not develop cancer)2.  This sample size is admittedly large, but we believe it is well worth the effort, as it should discover promising biomarkers even with high-throughput methods, and it should detect meaningful changes in classification performance2,3.  Smaller sample sizes are prone to false positive discoveries and a consequent net waste of resource to validate them.  If high-throughput methods are used for discovery, we suggest simple classification rules involving only a few biomarkers4.   To avoid bias, investigators should use the same blinded techniques to collect and handle all specimens5.

In the nested case control study, one should take great care in assessing the clinical importance of a cancer associated with a particular biomarker.  The cancer should be a life-threatening one.   Cancer should not be defined as any tumor detected on screening because non-aggressive cancers may not cause medical problems in the absence of screening, a phenomenon known as “overdiagnosis”.  For that reason, specimens collected from a study that does not include screening for a specific cause of interest are most useful.  In other words, although the goal of the biomarker is to identify preclinical cancer (and hence the use of stored specimens so that cancers detected are preclinical), the final evaluation should be based on clinical determination of cancer to avoid “overdiagnosis” bias.

The evaluation of biomarkers should involve plotting a receiver-operating characteristic (ROC) curve which plots true positive rate (fraction who develop cancer who were classified as test positive) versus false positive rate (fraction who did not develop cancer who were classified as positive). A promising biomarker would have a high true positive rate (greater than 0.80) and a very low false positive rate (less than 0.01) because small false positives are needed to avoid harm to large numbers of asymptomatic persons, the usual target of screening programs.  The precise cutpoints for desirable true and false positive rates are a matter of clinical judgment that depends on the invasiveness of follow-up for positive tests and the aggressiveness of the target cancer.

Once a promising biomarker is identified by the above methods, one can evaluate its effect on the rate of interval cancers (symptomatic cancers detected between screenings) using a variation of the paired availability design2,6.  A new biomarker test is administered at various screening centers where an established screening test has been given. Investigators collect data on the number of interval cases associated with screening in time periods before and after the introduction of the new biomarker test.  The estimated effect of the biomarker test on interval cases equals the average, over centers, of the difference in the fraction of interval cases between the time periods divided by the difference in the fraction of persons receiving the new biomarker test between the time periods.

The final stage in the path from biomarker discovery to cancer screening is the evaluation of the biomarker as a trigger of early intervention in a randomised trial with a cancer mortality endpoint, a major undertaking typically involving fifty to one hundred thousand participants.2,7 Despite the size and duration of such a trial, it is the most efficient and reliable way to directly assess the balance of benefits and harms of an investigational new intervention.


  1. Baker, SG and Kramer, BS.  Paradoxes in carcinogenesis: New opportunities for research directions BMC Cancer 7, 151 (2007)
  2. Baker, SG. Improving the biomarker pipeline to develop and evaluate cancer screening tests.  Journal of the National Cancer Institute 101, 1116-1119 (2009).
  3. Baker, SGand Kramer, BS. Using microarrays to study the microenvironment in tumor biology:  The crucial role of statistics. Seminars in Cancer Biology 18,305-310 (2008).
  4. Baker SG, and Kramer BS. Identifying genes that contribute most to good classification in microarrays. BMC Bioinformatics7, 407 (2006).
  5. Ransohoff, DE. Bias as a threat to the validity of cancer molecular marker research. Nat Rev Cancer. 5, 142-149 (2005).
  6. Baker SG,Lindeman KL, Kramer, BS. The paired availability design for historical controls. BMC Med Res Methodol.  1, 9 (2001). 
  7. Baker SG, Kramer BS, and Prorok, PC. Statistical issues in randomized trials of cancer screening. BMCMedical Research Methodology 2002, 2:11