Home » Analysis population and missing data

Analysis population and missing data

Item 20c: Definition of analysis population relating to protocol non-adherence (e.g., as randomised analysis), and any statistical methods to handle missing data (e.g., multiple imputation).


“Nevertheless, we propose to test non-inferiority using two analysis sets; the intention-to-treat set, considering all patients as randomized regardless of whether they received the randomized treatment, and the “per protocol” analysis set. Criteria for determining the “per protocol” group assignment would be established by the Steering Committee and approved by the PSMB [Performance and Safety Monitoring Board] before the trial begins. Given our expectation that very few patients will crossover or be lost to follow-up, these analyses should agree very closely. We propose declaring medical management non-inferior to interventional therapy, only if shown to be non-inferior using both the “intention to treat” and “per protocol” analysis sets.
. . .
10.4.7 Imputation Procedure for Missing Data

While the analysis of the primary endpoint (death or stroke) will be based on a log-rank test and, therefore, not affected by patient withdrawals (as they will be censored) provided that dropping out is unrelated to prognosis; other outcomes, such as the Rankin Score at five years post-randomization, could be missing for patients who withdraw from the trial. We will report reasons for withdrawal for each randomization group and compare the reasons qualitatively . . . The effect that any missing data might have on results will be assessed via sensitivity analysis of augmented data sets. Dropouts (essentially, participants who withdraw consent for continued follow-up) will be included in the analysis by modern imputation methods for missing data.

The main feature of the approach is the creation of a set of clinically reasonable imputations for the respective outcome for each dropout. This will be accomplished using a set of repeated imputations created by predictive models based on the majority of participants with complete data. The imputation models will reflect uncertainty in the modeling process and inherent variability in patient outcomes, as reflected in the complete data.

After the imputations are completed, all of the data (complete and imputed) will be combined and the analysis performed for each imputed-and-completed dataset. Rubin’s method of multiple (i.e., repeated) imputation will be used to estimate treatment effect. We propose to use 15 datasets (an odd number to allow use of one of the datasets to represent the median analytic result).

These methods are preferable to simple mean imputation, or simple “best-worst” or “worst-worst” imputation, because the categorization of patients into clinically meaningful subgroups, and the imputation of their missing data by appropriately different models, accords well with best clinical judgment concerning the likely outcomes of the dropouts, and therefore will enhance the trial’s results.”313


In order to preserve the unique benefit of randomisation as a mechanism to avoid selection bias, an “as randomised” analysis retains participants in the group to which they were originally allocated. To prevent attrition bias, outcome data obtained from all participants are included in the data analysis, regardless of protocol adherence (Items 11c and 18b).249;250 These two conditions (i.e., all participants, as randomised) define an “intention to treat” analysis, which is widely recommended as the preferred analysis strategy.17

Some trialists use other types of data analyses (commonly labelled as “modified intention to treat” or “per protocol”) that exclude data from certain participants – such as those who are found to be ineligible after randomisation or who deviate from the intervention or follow-up protocols. This exclusion of data from protocol non-adherers can introduce bias, particularly if the frequency of and the reasons for non-adherence vary between the study groups.314;315 In some trials, the participants to be included in the analysis will vary by outcome – for example, analysis of harms (adverse events) is sometimes restricted to participants who received the intervention, so that absence or occurrence of harm is not attributed to a treatment that was never received.

Protocols should explicitly describe which participants will be included in the main analyses (e.g., all randomised participants, regardless of protocol adherence) and define the study group in which they will be analysed (e.g., as-randomised). In one cohort of randomised trials approved in 1994-5, this information was missing in half of the protocols.6 The ambiguous use of labels such as “intention to treat” or “per protocol” should be avoided unless they are fully defined in the protocol.6;314 Most analyses labelled as “intention to treat” do not actually adhere to its definition because of missing data or exclusion of participants who do not meet certain post-randomisation criteria (e.g., specific level of adherence to intervention).6;316 Other ambiguous labels such as “modified intention to treat” are also variably defined from one trial to another.314

In addition to defining the analysis population, it is necessary to address the issue of missing data in the protocol. Most trials have some degree of missing data,317;318 which can introduce bias depending on the pattern of missingness (e.g., not missing at random). Strategies to maximise follow-up and prevent missing data, as well as the recording of reasons for missing data, are thus important to develop and document (Item 18b).152

The protocol should also state how missing data will be handled in the analysis and detail any planned methods to impute (estimate) missing outcome data, including which variables will be used in the imputation process (if applicable).152 Different statistical approaches can lead to different results and conclusions,317;319 but one study found that only 23% of trial protocols specified the planned statistical methods to account for missing data.6

Imputation of missing data allows the analysis to conform to intention to treat analysis but requires strong assumptions that are untestable and may be hard to justify.152;318;320;321 Methods of multiple imputation are more complex but are widely preferred to single imputation methods (e.g., last observation carried forward; baseline observation carried forward), as the latter introduce greater bias and produce confidence intervals that are too narrow.152;320-322 Specific issues arise when outcome data are missing for crossover or cluster randomised trials.323 Finally, sensitivity analyses are highly recommended to assess the robustness of trial results under different methods of handling missing data.152;324

20b: Additional analyses 21a: Formal committee
SPIRIT Checklist
Publications & Downloads
SEPTRE (SPIRIT Electronic Protocol Tool & Resource)