Clinical Trials - Study Design and Statistics

AMA Manual of Style - Stacy L. Christiansen, Cheryl Iverson 2020

Clinical Trials
Study Design and Statistics

The International Committee of Medical Journal Editors (ICMJE) defines a clinical trial as “any research project that prospectively assigns human participants to intervention, with or without concurrent comparison or control groups, to study the relationship between a health-related intervention and a health outcome.”6 All clinical trials must be registered at an appropriate online public registry. Interventions include but are not limited to drugs, surgical procedures, devices, behavioral treatments, educational programs, dietary interventions, quality improvement interventions, process-of-care changes, and the like.

Randomized clinical trials (RCTs) generally yield the strongest inferences about the effects of medical treatments.7,8 RCTs assess the efficacy of the treatment intervention in controlled, standardized, and highly monitored settings and usually among highly selected samples of patients. Thus, their results might not reflect the effects of the treatment in real-world settings or in other groups of individuals who were not enrolled in the trial. Treatment decisions will by necessity be made from a combination of information from RCTs and observational studies (see 19.3, Observational Studies).

As with any research, it is important to provide a detailed summary of an RCT’s methods to facilitate a reader’s understanding of the study’s quality, replication of the study intervention, comparison of the study with other, similar studies, and the population to which the study relates. At a minimum, reports of RCTs should follow the CONSORT reporting guideline. The EQUATOR Network’s CONSORT statement9,10 has a checklist (Table 19.2-1) to facilitate complete reporting of RCT methods and results. The ICMJE recommends that authors complete the CONSORT checklist. Although completing the checklist does not guarantee that a study is high quality, it ensures that information critical to interpretation of the study and its limitations is accessible to readers, editors, and reviewers. The registration number should be reported in the manuscript’s Abstract and/or Methods section. Journal editors may ask authors to provide a more detailed description of the study protocol. Many journals require that the original trial protocol and statistical analysis plan accompany the manuscript when it is submitted for publication. The ICMJE recommends that protocols be published with reports of clinical trials. For example, the JAMA Network journals and many journals publish trial protocols and statistical analysis plans in an online supplement to a published article.11

Reporting in the manuscript should be consistent with a prespecified outcome and prespecified analytic plan in the protocol and statistical analysis plan.

Table 19.2-1. CONSORT Checklist of Items to Include When Reporting a Randomized Triala


Item No.

Checklist item

Reported on page No.

Title and abstract


Identification as a randomized trial in the title


Structured summary of trial design, methods, results, and conclusions (for specific guidance see CONSORT for abstracts)


Background and objectives


Scientific background and explanation of rationale


Specific objectives or hypotheses


Trial design


Description of trial design (such as parallel, factorial) including allocation ratio


Important changes to methods after trial commencement (such as eligibility criteria), with reasons



Eligibility criteria for participants


Settings and locations where the data were collected



The interventions for each group with sufficient details to allow replication, including how and when they were actually administered



Completely defined prespecified primary and secondary outcome measures, including how and when they were assessed


Any changes to trial outcomes after the trial commenced, with reasons

Sample size


How sample size was determined


When applicable, explanation of any interim analyses and stopping guidelines


 Sequence generation


Method used to generate the random allocation sequence


Type of randomization; details of any restriction (such as blocking and block size)

 Allocation concealment mechanism


Mechanism used to implement the random allocation sequence (such as sequentially numbered containers), describing any steps taken to conceal the sequence until interventions were assigned



Who generated the random allocation sequence, who enrolled participants, and who assigned participants to interventions



If done, who was blinded after assignment to interventions (for example, participants, care providers, those assessing outcomes) and how


If relevant, description of the similarity of interventions

Statistical methods


Statistical methods used to compare groups for primary and secondary outcomes


Methods for additional analyses, such as subgroup analyses and adjusted analyses


Participant flow (a diagram is strongly recommended)


For each group, the numbers of participants who were randomly assigned, received intended treatment, and were analyzed for the primary outcome


For each group, losses and exclusions after randomization, together with reasons



Dates defining the periods of recruitment and follow-up


Why the trial ended or was stopped

Baseline data


A table showing baseline demographic and clinical characteristics for each group

Numbers analyzed


For each group, number of participants (denominator) included in each analysis and whether the analysis was by original assigned groups

Outcomes and estimation


For each primary and secondary outcome, results for each group, and the estimated effect size and its precision (such as 95% confidence interval)


For binary outcomes, presentation of both absolute and relative effect sizes is recommended

Ancillary analyses


Results of any other analyses performed, including subgroup analyses and adjusted analyses, distinguishing prespecified from exploratory



All important harms or unintended effects in each group (for specific guidance see CONSORT for harms)




Trial limitations, addressing sources of potential bias, imprecision, and, if relevant, multiplicity of analyses



Generalizability (external validity, applicability) of the trial findings



Interpretation consistent with results, balancing benefits and harms, and considering other relevant evidence

Other information



Registration number and name of trial registry



Where the full trial protocol can be accessed, if available



Sources of funding and other support (such as supply of drugs), role of funders

a From the JAMA Network Instructions for Authors.11 Check the EQUATOR website for updates (

Flow diagrams provide an easy way for readers to understand how study participants flowed through the study, including when and why they dropped out or were lost to or unavailable for follow-up and how many participants were evaluated for the study end points. These diagrams are also useful for summarizing the sample selection process for both clinical studies and systematic reviews. Authors should include a flow diagram (typically as the first figure) in their manuscript, and, if the manuscript is accepted for publication, the CONSORT flow diagram (Figure 19.2-1) should be included in the published article (see 4.2.2, Diagrams). CONSORT is frequently updated to account for changes in how RCTs are performed.10,12 Current information is available from the EQUATOR Network website.4

Figure 19.2-1. CONSORT Flow Diagram Showing the Progress of Patients Throughout the Trial


The report of an RCT should include a comparison of the participants’ characteristics in the trial’s different groups, usually as a table. Performing significance testing on the baseline differences between groups is controversial. Even with perfect random assignment, a mean of 1 in every 20 comparisons will appear to be “significant” at the .05 level by chance alone; such random findings illustrate the dangers of post hoc analyses (or ad hoc analyses). For this reason, reporting of statistical tests comparing the baseline characteristics of participants in RCTs is not recommended. Nevertheless, in randomized trials, baseline comparisons should be examined for statistical or clinical imbalances that may need to be addressed in the analysis.

In small studies, large differences may not be statistically significant because of small sample sizes and limited statistical power. Nonetheless, it may be helpful for authors to report statistical comparisons between groups. Such information should not be interpreted as a null hypothesis test of baseline differences between groups but rather as an estimate of the magnitude of any baseline differences that may cause difficulty in interpreting the intervention’s true effect. These results should be reported in a table or in running text. Information about baseline differences can help readers decide if the authors should have accounted for baseline differences in their statistical analysis of the prespecified outcomes.

Intention-to-treat (ITT) analysis is the preferred way to report randomized trial results.13 Final results are based on analysis of data from all the participants who were originally randomly assigned, whether or not they completed the trial. Participants may have varying degrees of missing data, requiring some method for imputation to estimate the effect of missing results. Because patients in the standard treatment group of an ITT analysis may not adhere to the treatment regimen and have worse than expected outcomes, ITT analyses may overstate the equivalence of experimental conditions for noninferiority and equivalence trials.13 Noninferiority and equivalence trial designs should report the outcomes for participants who completed the trial (known as per-protocol, as-treated analysis, or completers’ analysis) (see 19.2.3, Equivalence Trials and Noninferiority Trials). A per-protocol analysis reports a study’s results by the treatment received and not by the group into which the participants were randomly assigned. In general, per-protocol analyses are not advisable because the balance between groups achieved by random assignment may be lost (see 19.5, Glossary of Statistical Terms).

There is ongoing debate about when performance of an RCT may be unethical.14,15 There is general agreement, however, that RCTs are unethical if the intervention is already known to be better than the control treatment received by the population under investigation, if there is an accepted standard of care that will not be provided to patients, or if participants could be unduly harmed by any condition in the experiment.

The decision to perform an interim analysis should be made before the study begins.16 (Data and safety monitoring boards, however, may monitor adverse events continually throughout the study.) Investigators also usually define prospective stopping rules for such analyses; if the stopping rule is met, collection of additional data is not likely to change the interpretation of the study. If the criteria for the stopping rules have not been met, the results of interim analyses should not be reported unless the treatment has important adverse effects and reporting is necessary for patient safety. When a manuscript provides the results of an interim analysis, it should be clearly stated why the interim results are being reported. The a priori plans for an interim analysis and process for doing the interim analysis as described in the original study protocol should be reported in the manuscript. If the interim analysis deviates from the study protocol, the reasons for the change should be justified. If a manuscript reports the final results of a study for which an interim analysis was previously published, the reason for publishing both reports should be stated and the interim analysis referenced.

The number needed to treat (NNT) and number needed to harm (NNH) should be provided to make study results more accessible to clinicians and patients7,8 (see 19.5, Glossary of Statistical Terms, for definitions of these terms). The NNT adds an easily understood perspective on the usefulness of a treatment by providing a number of patients who must receive treatment for 1 to benefit from it. Similarly, the NNH is the number of patients who will be exposed to a treatment or risk factor for 1 to be harmed by it.

Publication bias is the tendency of authors to submit and journals to preferentially publish studies with statistically significant results (see 19.3.6, Meta-analyses). To address the problem of publication bias, the ICMJE has required since July 1, 2005, that a clinical trial be registered in a public trials registry as a condition of publication.6,17

19.2.1 Parallel-Design, Double-blind Trials.

In parallel-design, double-blind trials, participants are assigned to only 1 treatment group of the study. These trials are generally designed to assess whether 1 or more treatments are more effective than the others. Participants and those administering the intervention should all be unaware of which intervention individual participants are receiving (double-blinding). Ideally, those rating the outcomes should also be blinded to treatment assignment (triple-blinding). Blinded parallel-design trials are often the optimal design to compare 2 or more types of drugs or other therapy because known and unknown potentially confounding factors should be randomly distributed between intervention and control groups. Reports of these types of trials should follow the CONSORT reporting guideline ( The CONSORT participant flow diagram should clearly indicate how many participants were assigned to each treatment group, how many were lost or unavailable at various stages of the trial, and the reasons that individuals did not complete the trial.10 Methods of random assignment, allocation concealment, and assessment of the success of blinding should be reported. If there is no significant difference between groups, authors cannot claim that the treatments are equivalent; such a conclusion would require an equivalence or noninferiority trial design (see 19.2.3, Equivalence Trials and Noninferiority Trials). If the trial was not specifically designed as an equivalence trial, the absence of a difference between groups should be viewed as an inability to detect a difference, not as an indication that a difference does not exist.

19.2.2 Crossover Trials.

In a crossover trial, participants receive more than 1 of the treatments under investigation, usually in a randomly determined sequence and with a prespecified amount of time (a washout period) between sequential treatments. The participants and the investigators are generally blinded to the treatment assignment (double-blinded). This experimental design is often used for evaluating drug treatments. Each participant serves as his or her own control, thereby eliminating variability when comparing treatment effects and reducing the sample size needed to detect a statistically significant effect. Most considerations of parallel-design randomized trials apply. Reports of these types of trials should follow the CONSORT reporting guideline ( Rather than indicating which participants were assigned to which condition, the CONSORT participant flow diagram should indicate how many were assigned to each sequence of conditions (see Figure 4.2-22 in, Flowchart). Flow diagrams in the CONSORT recommendations are intended to be a flexible reporting device. The concept of flow diagrams (or visual summaries) are more important than using the exact diagram in the CONSORT statement.19 Other information important to this study design includes possible carryover effects (ie, effect of intervention persists after completion of the intervention) and length of washout period (intervention effects should have ended completely before crossover to the other treatment). If the actual period of crossover differs from the original study protocol, how and why decisions were made to cross over to the alternate treatment and when the crossover occurred should be stated. The treatment sequence should be randomized to ensure that investigators remain blinded and that no systematic differences arise because of treatment order. Otherwise, unblinding is likely, treatment order may confound the analysis, and carryover effects will be more difficult to assess. If carryover effects are significant, or if a washout period with no treatment is undesirable or unethical, a parallel-group design (possibly with a larger sample size) may be necessary.

19.2.3 Equivalence Trials and Noninferiority Trials.

It is sometimes desirable to compare a treatment or intervention that is already known to be effective with a treatment or intervention that is less expensive or has other advantages (eg, easier administration such as oral dosing).20 In these cases, it would be unethical to expose participants to an inactive placebo. Thus, trial designs assess whether the treatment or intervention under study (the “new intervention”) is the same as (for equivalence trials) and no worse than an existing alternative or “active control” (for noninferiority trials).8,13,21,22,23 Reports of equivalence trials and noninferiority trials should follow the CONSORT reporting guideline (

In equivalence and noninferiority trials, authors must prespecify a margin of noninferiority (usually represent by the difference symbol Δ [delta]) within which the new intervention can be assumed to be no worse than the active control. There are a number of methods for arriving at the value Δ. Because different methods of estimating Δ may be more defensible in some situations than others, authors should provide clear explanations of their method and rationale for arriving at their value for Δ. Noninferiority trials test the 1-sided hypothesis that the effect of the new intervention is no more than Δ units less than the active control. Equivalence trials, which are less common than noninferiority trials, test the 2-sided hypothesis that the effect of the new treatment lies within the range of Δ to −Δ. For these trials, P values for noninferiority or P values for equivalence, respectively, should be calculated.

Although use of ITT analysis is optimal in trials that test whether one treatment is superior to another, use of such analysis can bias the results of equivalence and noninferiority trials. Analyzing a noninferiority trial by ITT could make an inferior treatment appear to be noninferior if poor patient adherence resulted in both treatments being similarly ineffective. Thus, when analyzing a noninferiority trial, both ITT and per-protocol analyses should be conducted. The results are most meaningful when both approaches demonstrate noninferiority.21

Interpretation of the results depends on the confidence interval for the difference between the new intervention and the active placebo, and whether this CI crosses Δ, −Δ, and 0. See the following examples and Table 19.2-2.22,24

Example for of how to report the results of an equivalence study:

The event rate in the new treatment group was 37%, and the event rate in the standard treatment group was 42%, constituting a difference of 5% (95% CI, 2%-9%), which was within the equivalence margin of ±10%, meeting criteria for equivalence.

Example for how to report the results of a noninferiority study:

The event rate in the new treatment group was 37%, and the event rate in the standard treatment group was 42%, constituting a difference of −5% (1-sided 97.5% CI, −infinity to 9%; P < .001 for noninferiority), which was within the noninferiority margin of 10%, meeting criteria for noninfinity.

Table 19.2-2. Checklist of Items for Reporting Noninferiority or Equivalence Trials (Additions or Modifications to the CONSORT Checklist Are Indicated in Footnotes)a

Paper section and topic

Item No.

Noninferiority or equivalence trials

Title and abstract


How participants were allocated to interventions (eg, “random allocation,” “randomized,” or “randomly assigned”), specifying that the trial is a noninferiority or equivalence trial.

Introduction  Background


Scientific background and explanation of rationale, including the rationale for using a noninferiority or equivalence design.

Methods  Participants


Eligibility criteria for participants (details whether participants in the noninferiority or equivalence trial are similar to those in any trial[s] that established efficacy of the reference treatment) and the settings and locations where the data were collected.



Precise details of the intervention intended for each group, detailing whether the reference treatment in the noninferiority or equivalence trial are identical (or very similar) to that in any trial(s) that established efficacy, and how and when they were actually administered.



Specific objective and hypotheses, including the hypothesis concerning noninferiority or equivalence.



Clearly defined primary and secondary outcome measures, detailing whether the outcomes in the noninferiority or equivalence trial are identical (or very similar) to those in any trial(s) that established efficacy of the reference treatment and, when applicable, any methods used to enhance the quality of measurements (eg, multiple observations, training of assessors).

 Sample size


How sample size was determined, detailing whether it was calculated using a noninferiority or equivalence criterion and specifying the margin of equivalence with the rationale for its choice. When applicable, explanation of any interim analysis and stopping rules (and whether related to a noninferiority or equivalence hypothesis).

 Randomization   Sequence generation


Method used to generate the random allocation sequence, including details of any restriction (eg, blocking, stratification).

  Allocation concealment


Method used to implement the random allocation sequence (eg, numbered containers or central telephone), clarifying whether the sequence was concealed until interventions were assigned.



Who generated the allocation sequence, who enrolled participants, and who assigned participants to their groups.

 Blinding (masking)


Whether participants, those administering the interventions, and those assessing the outcomes were blinded to group assignment. When relevant, how the success of blinding was evaluated.

 Statistical methods


Statistical methods used to compare groups for primary outcome(s), specifying whether a 1- or 2-sided confidence interval approach was used. Methods for additional analyses, such as subgroup analyses and adjusted analyses.

Results  Participant flow


Flow of participants through each stage (a diagram is strongly recommended). Specifically, for each group report the numbers of participants randomly assigned, receiving intended treatment, completing the trial protocol, and analyzed for the primary outcome. Describe protocol deviations from trial as planned, together with reasons.



Dates defining the periods of recruitment and follow-up.

 Baseline data


Baseline demographic and clinical characteristics of each group.

 Numbers analyzed


Number of participants (denominator) in each group included in each analysis and whether “intention-to-treat” and/or alternative analyses were conducted. State the results in absolute numbers when feasible (eg, 10 of 20, not 50%).

 Outcomes and estimation


For each primary and secondary outcome, a summary of results for each group and the estimated effect size and its precision (eg, 95% CI). For the outcome(s) for which noninferiority or equivalence is hypothesized, a figure showing confidence intervals and margins of equivalence may be useful.

 Ancillary analyses


Address multiplicity by reporting any other analyses performed, including subgroup analyses and adjusted analyses, indicating those prespecified and those exploratory.

 Adverse events


All important adverse events or side effects in each intervention group.

Comment  Interpretation


Interpretation of the results, taking into account the noninferiority or equivalence hypothesis and any other trial hypotheses, sources of potential bias or imprecision, and the dangers associated with multiplicity of analyses and outcomes.



Generalizability (external validity) of the trial findings.

 Overall evidence


General interpretation of the results in the context of current evidence.

a From the EQUATOR website.22

b Expansion of corresponding item on CONSORT checklist.10,24 Authors should refer to specific CONSORT guidelines for reporting the design and results of equivalence and noninferiority trials.4

19.2.4 Cluster Trials.

Cluster randomization is undertaken when performance of the intervention risks contamination of the control group. Imagine a multifaceted intervention that involves sedation protocols and measures of arousal levels and readiness for weaning from a mechanical ventilator in a trial of extubation.25 In this scenario, intensive care unit (ICU) personnel performing these functions may be influenced by the effectiveness of the interventions and consciously or unconsciously use them on patients assigned to the control group. In cases such as this, it is best to perform the intervention in one ICU and apply the control intervention in a separate ICU. Instead of randomizing individual patients to intervention or control groups, ICUs are randomized. Each ICU is considered a cluster of patients.

Cluster randomized trials cannot be truly blinded, so their use risks introduction of bias into the study.26 They also violate one of the most important assumptions for most statistical tests: that the individuals in the study are independent of one another. For these reasons, studies using cluster techniques should specify why a cluster approach was used. They also should use analytic techniques that account for clustering, such as general estimating equations, mixed linear models, and hierarchical models. Studies that report the results of cluster randomized trials should explicitly state how clustering was accounted for in the statistical analysis.26,27 The EQUATOR Network has guidance for the reporting of cluster trials, and reports of these types of trials should follow the CONSORT reporting guideline and its extension for cluster trials (

Stepped-wedge cluster trials are a special class of cluster trials.28 They are used when resources are limited and it is not feasible to apply an intervention at an individual patient level. Examples include a quality improvement intervention or implementation of a hospital-wide protocol, such as implementation of a new cleaning method. In a stepped-wedge design, all the clusters will eventually receive the intervention, and the randomization is based on the order in which clusters are entered into the study. Reports of these types of trials should follow the CONSORT reporting guideline and its extension for cluster trials ( and adopt specific reporting elements for stepped-wedge designs.27,29

19.2.5 Nonrandomized Trials.

A nonrandomized trial prospectively assigns groups or populations to study the efficacy or effectiveness of an intervention but the assignment to the intervention occurs through self-selection or administrator selection rather than through randomization. Control groups can be historical, concurrent, or both. This design is sometimes called a quasi-experimental design. Reports of these trials should follow the Transparent Reporting of Evaluations With Nonrandomized Designs (TREND) reporting guideline (