Observational Studies - Study Design and Statistics

AMA Manual of Style - Stacy L. Christiansen, Cheryl Iverson 2020

Observational Studies
Study Design and Statistics

In an observational study, the researcher identifies a condition or outcome of interest and then measures factors that may be related to that outcome (Table 19.3-1). Although observational studies cannot lead to causal inferences, they may nonetheless generate hypotheses that may be tested (eg, if event A generally precedes event B, then it is possible that A may be responsible for causing B). Such studies may be either retrospective (the investigator tries to reconstruct what happened in the past) or prospective (the investigator identifies a group of individuals and then observes them for a specified period). Prospective studies generally yield more reliable conclusions than retrospective studies. Cause and effect cannot be established by observational studies; consequently, study findings from observational studies should not be presented using causal language. The term association should be used instead of causal terms, such as effect or relationship, when reporting and discussing variables in observational studies (see 2.1, Manuscript Preparation for Submission and Publication, Titles and Subtitles).

Example: Change the title “Effects of Indulgent Descriptions on Vegetable Consumption” to “Association Between Indulgent Descriptions and Vegetable Consumption” because the study was an observational study and not a randomized trial.

Because individuals in observational studies are not randomly assigned to conditions, there are often large baseline differences between groups in such studies. For instance, individuals with better exercise habits often differ in several important ways (eg, educational level, income, diet, smoking) from those who do not exercise regularly. Because exercise and all these other factors influence health outcomes, exercise is confounded with these variables, and it is difficult to know whether exercise is responsible for any differences in health outcomes. Researchers may use several different statistical techniques to minimize the effects of confounding, including matching, stratification, multivariable analysis, instrumental analysis, and propensity analysis.

Even with the most extensive attempts to minimize confounding, it is always possible that results of observational studies may be attributable to variables that the authors did not measure. Because residual confounding is unavoidable in observational studies, findings from observational research are not as reliable as those from RCTs. Sometimes the results of observational studies may differ significantly from those of RCTs.31 On the other hand, because observational studies are more often based on the outcomes of a large range of people in realistic situations, they may add useful insights to disease processes because they occur beyond the limited conditions of RCTs. Furthermore, observational studies may be the only way to investigate certain problems (eg, automobile crashes, exposure to toxic chemicals) for which it would be unethical to perform RCTs.

The EQUATOR Network’s STROBE guidelines should be used to report most observational studies (http://www.equator-network.org/reporting-guidelines/strobe/)32 (see 19.3.1, Cohort Studies).

Table 19.3-1. Summary Description of Common Observational Study Designs


Brief description

Starting point




Reporting guideline


A case-control study, which is always retrospective, compares those who have had an outcome or event (cases) with those who have not (controls)

Outcome event status


Overcomes temporal delays and the need for large sample sizes to accumulate rare events

Susceptible to recall bias


Case series

Case series describe characteristics of a group of patients or participants with a particular disease, disorder, signs, or symptoms or a group of patients or participants who have undergone a particular procedure or experienced a specific exposure or event

Consecutive series of patients or participants with similar characteristics receiving the same intervention or experiencing the same exposure

Characterizes a disease entity, treatment response, or exposure risk

Characterizes a disease or its treatment

Subject to selection bias if patients are not enrolled consecutively; no control group

Case series reporting guideline33


A prospective cohort study follows a group, or cohort, of individuals who are initially free of the outcome of interest; a retrospective cohort study is a weaker design and investigators should be blinded to study outcomes when formulating the hypothesis and determining the dependent and independent variables

Exposure status

Outcome event status

Feasible when randomization of exposure not possible; generalizability

Susceptible to bias


Comparative effectiveness

Research attempting to understand how effective various interventions are when applied to patients in real-world settings

Groups of patients receiving different treatments

Treatment outcomes

Facilitates comparison of outcomes for different treatments of the same disease

Potential for selection bias


Cost-effectiveness analysis, cost-benefit analysis

Cost-effectiveness analysis determines all costs associated with an intervention relative to the benefits of that intervention; cost-benefit analysis determines the costs incurred to derive an effect of an intervention

Groups of patients receiving treatments

All costs associated with achieving the treatment outcomes

Facilitates awareness of the cost of treatments to achieve desired outcomes. This adds important context to clinical outcomes enabling clinicians to know if the costs associated with achieving those outcomes are worth it

Cost-effectiveness is an inherently subjective determination; how much should be expended to achieve a certain therapeutic goal is a matter of opinion; rarely do cost-effectiveness studies completely capture all costs associated with delivering an intervention



Examination of an entire population of participants at a single point in time or during a specific interval, in which exposure and outcome are ascertained simultaneously; study of groups or patients within the cross-section facilitates understanding of a risk factor’s relationship with a disease

An entire population at one point in time

Presence of disease, disorder, or risk

Facilitates determination of the prevalence of disease, disorders, or risks; large samples of patients can be studied; can be used to examine the association between an exposure and disease when subpopulations are examined

Does not allow for observations at different times; length time bias: diseases with a long course will be overrepresented in the population at any given time


Diagnostic/prognostic studies

Research describing the ability of diagnostic studies to confirm the presence of a disease; prognostic studies are those that examine the ability of some model to predict the presence or course of the disease

The point in the course of the disease where a test is obtained to establish a diagnosis; for prognostic studies, it is that point along the course of the disease where a prediction is made regarding some future outcome

Test the ability of a test to establish a diagnosis made by a reference standard; usually the sensitivity or specificity is used to determine how well the test performs; for prognostic models, various approaches are used such as the C index (area under the curve) or Akaike or Bayes Information Criteria; prognostic models should be validated in populations in which they will be used and should be different from the populations from which the models were derived

Diagnostic studies can be easily validated if high-quality references standards are available; prognostic studies are useful for counseling patients about risks of various outcomes given certain circumstances that, hopefully if modified, can result in better clinical outcomes

Many diagnostic studies have suboptimal sensitivity and specificity resulting in uncertain or inaccurate diagnoses; prognostic studies rarely account for all the factors that contribute to an outcome resulting in them being less than ideal for predicting outcomes in many circumstances



Examine groups or populations of patients but not individual patients. Useful in understanding disease prevalence or incidence

Groups of people exposed to some factor

Outcome event status in the group

Facilitates analysis of large numbers of people. May uncover relationships between exposure factors and diseases.

Ecologic fallacy; group-level exposure and outcome may not relate well to individuals within that group; highly susceptible to bias



Aggregation of results from similar studies to increase the power to make conclusions about the effectiveness of interventions

Published studies; the studies being compared should all begin at the same time point in a patient's course of disease or relative to when an intervention occurs

Summary point estimate for an effect of an intervention derived from several studies of that intervention; CIs should be included

Increases the ability to establish the efficacy of an effect when individual studies are too small to show that effect on their own

Methodology frequently abused; studies can only be aggregated if they have similar research designs and outcomes; conclusions may be misleading if the studies are heterogeneous


Qualitative research

Research about social interactions and personal experiences; intended to better understand people's perspectives about a topic

The time when someone experiences an event that the investigators want to better understand and how someone reacted to the event

Typically derived from interviews or surveys of individuals who were exposed to some intervention; these data cannot generally be assessed by statistical methodologies; they are summarized and reported as a subjective assessment of events.

Provides important information about how people perceive events/interventions, which may be as or more important than the physiological effects of the intervention

There may be incomplete reporting of key information; reporting of events might be biased by a person’s perception of events


Quality improvement

Research on how systems works with the intent to improve their efficiency or safety

The time when an event occurs that the investigators want to study to improve processes that led to the event

Investigation of events occurring from some process is collected; this includes information about a particular scenario that is collected (medical record information, incident reports, interviews with participants, etc) and analyzed

When data are obtained from actual events, it may be high quality, actionable information facilitating interventions that can improve processes; the highest-quality information tends to be local and not multi-institutional

Findings may only relate to an individual institution or process and not generalizable to others; data obtained from administrative or other sources may not reliably represent the events being studied



A sample of a larger population is queried about a topic an investigator wants to understand about the larger population

The time when the survey is conducted or queries may be made about events that occurred in the past

Basic summary statistics of the answers acquired from the survey; population estimates with CIs representing the likelihood of the survey data being representative of the larger population are calculated

Obtain information from a large number of people, enhancing the generalizability of the findings; low cost to implement

Findings may be influenced by how questions are asked; difficult to get an adequate response rate to retain statistical validity; responders may be biased, resulting in their willingness to participate


19.3.1 Cohort Studies.

In a cohort study, a defined group of people (the cohort) is followed up over time to examine associations between different interventions and subsequent interventions. Cohort studies may be concurrent (prospective) or nonconcurrent (retrospective). A prospective cohort study follows up a group, or cohort, of individuals who are initially free of the outcome of interest. Individuals in a cohort generally share some underlying characteristic, such as age, sex, or exposure to a risk factor. Some studies may comprise several different cohorts. The study is usually conducted for a predetermined period, long enough for some members of the cohort to develop the outcome of interest. Individuals who developed the outcome are compared with those who did not. The report of the study should include a description of the cohort and the length of follow-up, what independent variables were measured and how, and what outcomes were measured and how. The number of individuals lost to or unavailable for follow-up and whether they differed from those with complete follow-up should also be included. All adverse events should be reported.

Any previous published reports of closely related studies from the same cohort should be cited in the text or should be clear from the study name (eg, the Framingham Study). All previous reports on the same or similar outcomes should be cited.

Retrospective cohort studies may be appropriate if investigators are blinded to study outcomes when formulating the hypothesis and determining the dependent and independent variables, but many of the strengths of prospective cohort studies are lost with retrospective studies, such as identifying the population to study and defining the variables and outcomes before the events occur.

Reports of cohort studies should follow the STROBE reporting guidelines (http://www.equator-network.org/reporting-guidelines/strobe/).32

19.3.2 Case-Control Studies.

Case-control studies, which are always retrospective, compare those who have had an outcome or event (cases) with those who have not (controls). Cases and controls are then evaluated for exposure to various risk factors and thus should not be selected on the basis of their exposure to the risk factors under investigation. Cases and controls generally are matched according to specific characteristics (eg, age, sex, duration of disease) to reduce confounding by these variables. However, if the matched variables are inextricably linked with the exposure of interest (not necessarily with the disease or outcome of interest), matching may confound the analysis (eg, matching on the consumption of cream substitutes instead of coffee drinking itself)46 (see also overmatching). The independent variable is exposure to an item of interest (eg, a drug or disease). Information about the source of both cases and controls must be included, and inclusion and exclusion criteria must be listed for each. Cases and controls should be drawn from the same or similar populations to avoid selection bias. Pairs (1:1 match) or groups (eg, 1:2 or 1:3 match) of cases and controls may be matched on 1 or more variables. The analysis generally is unpaired, however, because of the difficulty in matching every important characteristic. Nonetheless, paired analysis reduces the necessary sample size to detect a difference and may be justified if individuals are well matched. Recall bias is common in all retrospective studies and is especially a concern when participants perceive that a factor related to the independent variable may be associated with the outcome. If recall bias may have occurred, the authors should discuss how they addressed this possibility.

In a nested case-control study, the cases and controls are drawn from some larger population or cohort that may have been convened for some other purpose. In these instances, authors should clearly indicate how the original sample was defined, the size of the original sample, and how the cases and controls were selected from it.

Box 19.3-1. Distinguishing Case Series From Cohort Studies

It is common to confuse cohort and case series studies. For example, a study of 20 consecutive patients with a certain disease can be treated in 2 different ways. A study that divides the 20 patients into 2 groups according to the treatment received and compares the outcomes of these groups (eg, provides aggregated absolute risks per group or a risk ratio) would be probably classified as a cohort study. In contrast, a publication that describes the interventions received and outcomes for each patient/case separately would probably be classified as a case series.

Summary of the distinction between cohort and case series studies proposed by Dekkers et al.47

Cohort study: Patients are sampled on the basis of exposure. The occurrence of outcomes is assessed during a specified follow-up period.

Case series: Patients with a particular disease or disease-related outcome are sampled. Case series exist in 2 types:

1. Sampling is based on a specific outcome and presence of a specific exposure.

2. Selection is based only on a specific outcome, and data are collected on previous exposures. Cases are reported regardless of whether they have specific exposures. This type of case series can be seen as the case group from a case-control study.

Reports of case-control studies should follow the STROBE reporting guidelines (http://www.equator-network.org/reporting-guidelines/strobe/).32

19.3.3 Cross-sectional Studies.

Cross-sectional studies observe individuals at a single point or during a specific interval, in which exposure and outcome are ascertained simultaneously. Such studies may be helpful for suggesting associations among variables but cannot address whether one condition may precede or follow another. Thus, cross-sectional studies cannot establish causation, but they may nonetheless be helpful for suggesting hypotheses to guide more rigorous studies.

19.3.4 Case Series.

In a case series study, observations are made on a series of individuals, before and after they have received the same intervention, exposure, or diagnosis but have no control group. Case series describe characteristics of a group of patients or participants with a particular disease, disorder, signs, or symptoms or a group of patients or participants who have undergone a particular procedure or experienced a specific exposure or event. A case series may also examine larger units, such as groups of hospitals or municipalities. Case series can be useful to formulate a case definition of a disease or describe the experience of an individual or institution in treating a disease or performing a type of procedure. Case series should comprise consecutive patients or observations seen by the individual or institution to minimize selection bias. Case series are not used to test a hypothesis because there is no comparison group. (Occasionally, comparisons are made with historical controls or published studies, but these comparisons are informal and should not include a formal statistical analysis.) A report of a case series should include the rationale for publishing the population described and inclusion and exclusion criteria. Case series are subject to several types of biases. Authors should be conservative regarding the conclusion drawn from case series analysis.

Box 19.3-1 describes how case series and cohort studies differ. A guideline for the appropriate reporting of uncontrolled case series is available and should be followed.33

19.3.5 Comparative Effectiveness Studies.

A comparative effectiveness study compares different interventions or strategies to prevent, diagnose, treat, and monitor health conditions to determine which work best for which patients and under what circumstances and which are associated with the greatest benefits and harms. Comparative effectiveness studies evaluate how effective existing therapies are in achieving various clinical outcomes. The outcomes may be tested by conducting RCTs or by observational analysis of existing data. Thus, from a study design perspective, they differ little from conventional studies of clinical efficacy.48 The intent of comparative effectiveness studies is to inform patients, clinicians, and policy makers of the relative value of individual therapies when applied to certain groups of patients.49

Follow the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) reporting guidelines for reports of comparative-effectiveness analyses.34

19.3.6 Meta-analyses.

A meta-analysis is a systematic, statistical pooling of the results of 2 or more similar studies to address a question of interest or hypothesis. According to Moher and Olkin,50

[Meta-analyses] provide a systematic and explicit method for synthesizing evidence, a quantitative overall estimate (and CIs) derived from the individual studies, and early evidence as to the effectiveness of treatments, thus reducing the need for continued study. They also can address questions in specific subgroups that individual studies may not have examined.

A meta-analysis quantitatively summarizes the evidence regarding a treatment, procedure, or association. It is a more statistically powerful test of the null hypothesis than is provided by the separate studies themselves because the sample size is substantially larger than those in the individual studies.51 However, as detailed herein, there are controversies associated with meta-analyses.52,53,54,55,56,57 Meta-analyses of RCTs should follow the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) reporting guidelines, and meta-analyses of observational studies should follow the Meta-analysis of Observational Studies in Epidemiology (MOOSE) reporting guidelines; both guidelines include recommendations for including flow diagrams and checklists.39,40

To ensure that a meta-analysis accurately reflects all the available evidence, the methods of identifying possible studies for inclusion should be explicitly stated (eg, literature search, reference search, contacting authors regarding other or unpublished work). Authors should state the dates covered by their search and the search terms used (see 21.9.4 Italics). A search strategy that includes several approaches to identify articles is preferable to a single database search.58 Authors should make all attempts to include results of non—English-language articles. Collaborating with a medical librarian can greatly facilitate this process.59

Meta-analyses are considered observational such that causation cannot be implied from the results of a meta-analysis, including meta-analyses of RCTs; only associations among various risk factors, interventions, and outcomes can be determined. There are conflicting views on whether individual RCTs provide better evidence of the effect of treatments relative to a well-conducted meta-analysis. The statistical power of a meta-analysis is increased by aggregating results, but heterogeneity among studies can result in misleading conclusions, as can varying, subjective interpretations of study bias.60,61 One particularly powerful method of aggregating data is to obtain original trial data on original patients from various studies and aggregate them into a single analysis: an individual patient meta-analysis. This requires use of reporting techniques that account for this type of study design: the PRISMA-IPD (individual patient data).62,63

Publication bias, or the tendency of authors and journals to publish articles with positive results, is a potential limitation of any systematic review of the literature.64 Unpublished studies may be included in meta-analyses if they meet predefined inclusion criteria. One approach to addressing whether publication bias might affect the results is to define the number of negative studies that would be needed to change the results of a meta-analysis from positive to negative. Authors may also provide funnel plots, which can also reveal publication bias (see, Funnel Plot).

Other controversial issues include which study designs are acceptable for inclusion, whether and how studies should be rated for quality,65 and whether and how to combine results from studies with disparate study characteristics. When data from the same study are reported in several publications, the various publications should be assessed to determine how to include the information in the meta-analysis. Options include linking information from several reports or assessing the individual reports to determine which should be included or not included in the meta-analysis.66 Although few would disagree that meta-analysis of RCTs is most appropriate when possible, many topics include too few RCTs to permit meta-analysis or cannot be studied in a trial.

Gerbarg and Horwitz67 have suggested that criteria for combining studies should be similar to those for multicenter trials and should include similar prognostic factors, which would justify combining them. Whether studies can be appropriately combined can be determined statistically by analyzing the degree of heterogeneity (ie, the variability in outcomes across studies). Assessment of heterogeneity includes examining the effect size, the sample size in each group, and whether the effect sizes from different studies are homogeneous. A commonly used test for the degree of heterogeneity is the Cochran Q statistic.7 This is calculated by summing the squares of the difference between the mean effect size for all the studies and the square of the effect size of the individual studies. These differences are multiplied by the inverse of the variance of the individual studies to minimize the influence of small studies that will have a large variance. Conceptually, it is easier to understand the amount of heterogeneity by calculating the I 2 statistic, and the I 2 statistic is preferred because it focuses on the magnitude of variability rather than the statistical significance of the variability.68 I 2 is calculated as follows: I 2 = 100 × (Qdf)/Q. The degrees of freedom, df (see 21.9.4, Editing, Proofreading, Tagging, and Display, Specific Uses of Fonts and Styles, Italics), equal the number of included studies minus 1. Negative values of I 2 are considered equal to 0. I 2 ranges from 0 (no heterogeneity) to 1 (complete heterogeneity).

If statistically significant heterogeneity is found, then combining the studies into a single analysis may not be valid.69 Another concern is the influence a small number of large trials may have on the results; large trials in a small pool of studies can dominate the analysis, and the meta-analysis may reflect little more than the individual large trial. In such cases, it may be appropriate to perform sensitivity analyses comparing results with and without inclusion of the large trial(s).

Meta-analyses are often analyzed by means of both fixed-effects and random-effects models to determine how different assumptions affect the results. An example of how results of a meta-analysis may be depicted graphically is shown in Figure 4.2-17 in, Forest Plots). The more conservative random-effects model is generally preferred.

A network meta-analysis provides a mechanism for assessment of the relative efficacy of an intervention compared with another, neither of which was directly tested against each other in clinical trials.70 For example, a study might have compared drug A with placebo and a different study compared drug B with placebo. A network meta-analysis facilitates comparison of the relative efficacy of drug A with drug B.

A meta-analysis is useful only as long as it reflects current literature. Thus, a concern of meta-analysts and clinicians is that the meta-analysis should be updated as new studies are published. One international effort, the Cochrane Collaboration, publishes and frequently updates a large number of systematic reviews and meta-analyses on a variety of topics,71 as does the US Preventive Services Task Force.72

19.3.7 Economic Analyses.

Although a treatment or screening technique may be proven effective in an RCT, it still may not be clinically useful. Some interventions are prohibitively expensive, may benefit only a small fraction of a population, or may lead to significant downstream costs that preclude short-term savings or benefits.

Cost-effectiveness analyses and cost-benefit analyses comprise a set of mathematical techniques to model these complex consequences of medical interventions.73,74 A cost-effectiveness analysis “compares the net monetary costs of a health care intervention with some measure of clinical outcome or effectiveness such as mortality rates or life-years saved.”75 A cost-benefit analysis is similar but converts clinical measures of outcomes into monetary units, allowing both costs and benefits to be expressed on a single scale. This use of a common metric thus enables comparisons between different treatment or screening strategies.

The results of a cost-effectiveness analysis are usually expressed in terms of a cost-effectiveness ratio, for example, the cost per year of life gained. The use of quality-adjusted life-years (QALYs) or disability-adjusted life-years (DALYs) permits direct comparison of different types of interventions using the same measure for outcomes. The use of such composite measures allows researchers to weigh the relative benefits of length and quality of life.

The complexity of these analyses and the many decisions required when selecting data and choosing assumptions may be of particular concern when the analysis is performed by an investigator or company with financial interest in the treatment being evaluated.76 Such analyses may have biases that are difficult to detect even with the most rigorous peer review process.77

One approach frequently used by cost-effectiveness analysts is to define a base case that represents the choices to be considered, perform an analysis for the base case, and then perform sensitivity analyses to determine how varying the data used and assumptions made for the base case affects the results. Sometimes authors test their conclusions by performing bootstrap method or jackknife test analyses. This involves taking a large number of repeated random samples from the data and then observing whether this procedure generally replicates the previous analytic conclusions. Several journals have published guidelines and approaches to cost-effectiveness analyses, but consensus has yet to emerge on their reporting78,79,80,81 or interpretation.74 Nonetheless, authors should clearly indicate all sources of data for both treatment effects and costs. Graphical approaches may help readers better understand the basic conclusions of the analysis.73 The JAMA Network journals require authors of cost-effectiveness analyses and decision analyses to submit a copy of the decision tree comprising their model. Although this need not be included in the body of the published article, such information is necessary for reviewers and editors to assess the details of the model and its analysis.

Standards for reporting economic evaluations are available from the EQUATOR Network and are known as Consolidated Health Economic Evaluation Reporting Standards (CHEERS).82

19.3.8 Studies of Diagnostic and Prognostic Tests.

Diagnostic and prognostic studies are designed to develop, validate, or update the diagnostic or prognostic accuracy of a test or model. Correct treatment depends on accurate diagnosis. Diagnoses may be made based on a patient’s history, physical signs or physical examination findings, or procedures such as blood tests and radiologic imaging. Few diagnostic tests, however, can be relied on to yield accurate diagnoses 100% of the time. Thus, it is important to study the performance of diagnostic tests.83

Studies to determine the diagnostic accuracy of a test are a vital part in this evaluation process. The EQUATOR Network recommends that authors use the Standards for Reporting of Diagnostic Accuracy (STARD) guideline or the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) guideline for studies of diagnostic and prognostic tests.36,37 As an example, The Rational Clinical Examination series in JAMAprovides detailed information about the usefulness of many clinical findings and diagnostic tests and has information about how to best assess the utility of diagnostic tests.84

Studies of diagnostic and prognostic tests generally yield estimates of likelihood ratios, sensitivity, specificity, positive predictive values, and negative predictive values. Authors should report confidence intervals associated with these statistics. It is also common for these studies to report receiver operating characteristic curves, such as area under the curve.

Table 19.3-2 shows how aggregated diagnostic studies can be displayed.85

Table 19.3-2. Accuracy of the White Blood Cell Count for the Diagnosis of Infectious Mononucleosis

No. of studies and reference No.

Sensitivity (95% CI)b

Specificity (95% CI)b

Positive LR (95% CI)b

Negative LR (95% CI)b

Only sensitivitya

Sensitivity and specificity

Atypical lymphocytosis




0.25 (0.19-0.32)

1.0 (0.98-1.0)

50 (38-64)

0.75 (0.68-0.81)




0.56 (0.49-0.64)

0.98 (0.94-0.99)

26 (9.6-68)

0.45 (0.38-0.53)




0.66 (0.52-0.78)

0.92 (0.71-0.98)

11 (2.7-35)c

0.37 (0.26-0.51)d

≥50% Lymphocytes and ≥10% atypical lymphocytes



0.43 (0.23-0.65)

0.99 (0.92-1.0)

54 (8.4-189)e

0.58 (0.39-0.77)f

Lymphocytosis (≥4 × 109/L lymphocytes) by age group, y



0.84 (0.71-0.93)

0.94 (0.92-0.96)

15 (11-21)

0.17 (0.09-0.32)




0.97 (0.82-0.99)

0.96 (0.84-0.98)

26 (17-42)

0.04 (0.01-0.25)




0.65 (0.43-0.84)

0.88 (0.83-0.93)

5.6 (3.4-9.2)

0.39 (0.22-0.69)

Ratio of lymphocytes to WBC count




0.55 (0.44-0.67)

0.92 (0.81-0.97)

8.5 (2.8-20)g

0.49 (0.36-0.64)g




0.65 (0.61-0.69)

0.93 (0.90-0.95)

9.3 (6.7-13)

0.38 (0.33-0.43)




0.74 (0.70-0.78)

0.86 (0.83-0.89)

5.3 (4.2-6.6)

0.30 (0.26-0.35)




0.84 (0.80-0.87)

0.72 (0.68-0.76)

3.0 (2.6-3.5)

0.22 (0.18-0.27)

Monocytosis (>1 × 109/L monocytes)







Leukocytosis (>10 × 109/L WBC count)



0.40 (0.28-0.53)

0.87 (0.62-0.96)

2.7 (1.2-5.7)c

0.79 (0.73-0.85)h

Abbreviations: LR, likelihood ratio; WBC, white blood cell.

a Some of the studies were case series and were not studies of diagnostic accuracy; therefore, the data could only be used to calculate sensitivity. Heterogeneity (I2 statistic) is only reported for LRs when there are at least 3 studies providing data.

b If there are only data from a single study, the point estimate and a 95% CI are presented. If there are data from 2 studies, ranges are presented. For 3 studies, data from a univariate meta-analysis (calculated using data from Comprehensive Meta-Analysis) are presented. For 4 or more studies, data from a bivariate meta-analysis (using the metandi procedure in Stata version 13.1) are presented.

c I2 = 88%.

d I2 = 80%.

e I2 = 71%.

f I2 = 76%.

g I2 = 100%.

h I2 = 0%.

Receiver operating characteristic curves show the association between a test’s cutoff point for being positive or negative and its sensitivity and specificity (Figure 19.3-1). The test’s sensitivity (reflecting the true positive rate) is plotted against 1 − specificity (the false-positive rate). When results are pooled, these curves are generated by regression methods.

Figure 19.3-1. Receiver Operating Characteristic Curves


19.3.9 Survey Studies.

In a survey study, a representative sample of individuals is asked to describe their opinions, values, or behaviors.87 For surveys of behavior (eg, diet, exercise, smoking), authors should provide evidence that the survey correlates with the actual, observed behaviors of a similar sample of individuals. That is, the survey should have been shown to have validity. If the survey instrument is different in any way from that given to the previous validation sample (eg, wording, order, omission of questions), then it may no longer be a valid measure of those behaviors.

For surveys, as for other studies, it is critical to provide detailed inclusion and exclusion criteria and describe how and when individuals no longer participated in the survey once they were initially identified. Flow diagrams can be a useful way of presenting this information. There is currently no standard reporting format for survey studies, however, and authors have usually reported no more than a single response rate for their survey. To address this situation, the American Association for Public Opinion Research (AAPOR) has published a set of expanded definitions.44 The AAPOR document defines response rate as “the number of complete interviews with reporting units divided by the number of eligible reporting units in the sample.” The document points out that this general definition allows for at least 6 different ways of actually computing this statistic, depending on how the numbers of “complete interviews” and the “number of eligible reporting units” are defined. There are also several ways to define cooperation rates (the proportion of all cases interviewed of all eligible units ever contacted), refusal rates (the proportion of all cases in which a housing unit or respondent refuses to do an interview), and contact rates (the proportion of all cases in which some responsible member of the housing unit was reached by the survey). Thus, authors should be clear about how they assigned individuals to categories and which categories they used to compute these statistics.

The AAPOR document defines specific reporting procedures for the 3 most common survey designs: random-digit-dial telephone surveys, in-person surveys, and mail surveys. The AAPOR recently reviewed online surveys and highlighted challenges posed by low response rates, systematic bias in responses (certain types of individuals may not have internet access), and respondents not paying careful attention to the questions.

Survey studies may be longitudinal (the same respondents are surveyed at several time points) or cross-sectional. Causality may be cautiously inferred from longitudinal surveys but never from cross-sectional surveys. Case-control studies (see 19.3.2, Case-Control Studies) and cohort studies (see 19.3.1, Cohort Studies) may exclusively use survey methods to obtain their dependent variables, and thus in practice, the distinction between observational studies and survey studies may be nuanced.

As a general rule, survey response rates should exceed 60% to ensure an adequate sampling of the study population.88

19.3.10 Qualitative Research.

Qualitative studies are based on observations and interviews with individuals. Qualitative studies discover, interpret, and describe rather than test and evaluate. Mixed-methods studies are included in this category that combine quantitative and qualitative designs in a sequential or concurrent manner.

Qualitative studies should be reported using Standards for Reporting Qualitative Research (SRQR) reporting guidelines or Consolidated Criteria for Reporting Qualitative Research (COREQ) reporting guidelines.41,42

19.3.11 Quality Improvement Studies.

Quality improvement studies involve examination of health care systems and how specific changes in processes or procedures can improve the quality of health care delivery. These studies use methods that differ from other medically related studies, such as iterative changes using plan-do-study-act cycles in a single health care system, randomized trials (usually cluster randomized trials), and retrospective observational analyses of quality improvement interventions in various health care systems.

Standards for reporting quality improvement studies are available from the EQUATOR Network and are known as the Standards for Quality Improvement Reporting Excellence (SQUIRE).43

19.3.12 Ecologic Studies.

Ecologic studies examine groups or populations of patients but not the individuals themselves. They are useful for understanding disease prevalence or incidence and facilitating analysis of large numbers of people. These studies may uncover associations between exposure factors and diseases. Ecologic studies are limited by the ecologic fallacy. Group-level exposure and the outcome may not relate well to individuals within that group. These studies are highly susceptible to bias.

19.3.13 Mendelian Randomization Studies.

Mendelian randomization uses genetic variants to determine whether an observational association between a risk factor and an outcome is consistent with a potential causal effect.90,91 Mendelian randomization relies on the natural, random assortment of genetic variants during meiosis, yielding a random distribution of genetic variants in a population. Individuals are naturally assigned at birth to inherit a genetic variant that affects a risk factor (eg, a gene variant that raises low-density lipoprotein cholesterol [LDL-C] levels) or not to inherit such a variant. Individuals who carry the variant and those who do not are then followed up for the development of an outcome of interest. Because these genetic variants may be unassociated with confounders, differences in the outcome between those who carry the variant and those who do not can be attributed to the difference in the risk factor. For example, a genetic variant associated with higher LDL-C levels that also is associated with a higher risk of coronary heart disease may provide supportive evidence to infer a potential causal effect of LDL-C on coronary heart disease.

Mendelian randomization rests on 3 assumptions: (1) the genetic variant is associated with the risk factor, (2) the genetic variant is not associated with confounders, and (3) the genetic variant influences the outcome only through the risk factor. The second and third assumptions are collectively known as independence from pleiotropy. Pleiotropy refers to a genetic variant that influences the outcome through pathways independent of the risk factor. The first assumption can be evaluated directly by examining the strength of association of the genetic variant with the risk factor. The second and third assumptions, however, cannot be empirically proven and require judgment by the investigators and the performance of various sensitivity analyses.

19.3.14 Mediation Analysis.

In studies that use mediation analysis, the relationship between intervention and outcome is partitioned into indirect and direct effects or associations. These relationships are often shown in a diagram89 (see Figure 19.3-2). Mediation analysis can estimate indirect and direct relationships and the proportion mediated, a statistical measure estimating how much of the total intervention works through a particular mediator.The explicit objective of mediation analyses is to demonstrate potential causal relationships; however, this may not be possible and requires that specific assumptions be met. In a mediation analysis, the intervention-outcome, intervention-mediator, and mediator-outcome relationships must be unconfounded to permit valid causal inferences. In a randomized trial, participants are randomly assigned to intervention groups, so the intervention-outcome and intervention-mediator effects can be assumed to be unconfounded. However, trial participants are not usually randomly assigned to receive or not receive the mediator, so the mediator-outcome relationship may be confounded, even in randomized trials. To overcome this potential source of bias, investigators can control for known confounders of the mediator-outcome effect by using techniques such as regression adjustment. However, unmeasured confounding may still introduce bias even if known confounders have been adjusted for. Sensitivity analyses should be used to assess the potential bias caused by unmeasured confounding in mediation analyses. The risk of confounding in mediation analyses is greater in observational studies than in randomized trials, and in these cases, caution is required when interpreting findings and is best reported as interpreting estimates of indirect and direct associations.89

Figure 19.3-2. Pathways of Relationships in a Mediation Analysis