Sample Size – GUIDANCE FOR CLINICAL TRIAL PROTOCOLS

Item 14: Estimated number of participants needed to achieve study objectives and how it was determined, including clinical and statistical assumptions supporting any sample size calculations.

Example 1

“The sample size was calculated on the basis of the primary hypothesis. In the exploratory study [Reference X], those referred to PEPS [Psycho-education with problem solving] had a greater improvement in social functioning at 6 month follow-up equivalent to 1.05 points on the SFQ [Social Functioning Questionnaire]. However, a number of people received PEPS who were not included in the trial (e.g., the wait-list control) and, for this larger sample (N=93), the mean pre-post- treatment difference was 1.79 (pre-treatment mean=13.85, SD=4.21; post-treatment mean=12.06, SD=4.21). (Note: a lower SFQ score is more desirable). This difference of almost 2 points accords with other evidence that this is a clinically significant and important difference [Reference X]. A reduction of 2 points or more on the SFQ at 1 year follow-up in an RCT of cognitive behaviour therapy in health anxiety was associated with a halving of secondary care appointments (1.24.vs 0.65), a clinically significant reduction in the Hospital Anxiety and Depression Scale (HADS [Reference X]) Anxiety score of 2.5 (9.9 vs 7.45) and a reduction in health anxiety (the main outcome) of 5.6 points (17.8 vs 12.2) (11 is a normal population score and 18 is pathological) [Reference X]. These findings suggest that improvements in social functioning may accrue over 1 year, [sic] hence we expect to find a greater magnitude of response at the 72 week follow-up than we did in the exploratory trial. Therefore, we have powered this trial to be able to detect a difference in SFQ score of 2 points. SFQ standard deviations vary between treatment, control, and the wait-list samples, ranging from 3.78 to 4.53. We have based our sample size estimate on the most conservative (i.e., largest) SD [Standard deviation]. To detect a mean difference in SFQ score of 2 point (SD = 4.53) at 72 weeks with a two-sided significance level of 1% and power of 80% with equal allocation to two arms would require 120 patients in each arm of the trial. To allow for 30% drop out, 170 will be recruited per arm, i.e., 340 in total.” ¹⁸³

Example 2

“Superficial and deep incisional surgical site infection rates for patients in the PDS II® [polydioxanone suture] group are estimated to occur at a rate of 0.12 [Reference X]. The trials by [Reference X] have shown a reduction of SSI [surgical site infections] of more than 50% (from 10.8% to 4.9% and from 9.2% to 4.3% respectively). Therefore, we estimate a rate of 0.06 for PDS Plus® [triclosan-coated continuous polydioxanone suture].

For a fixed sample size design, the sample size required to achieve a power of 1-β = 0.80 for the one-sided chi-square test at level α = 0.025 under these assumptions amounts to 2 × 356 = 712 (nQuery Advisor®, version 7.0). It can be expected that including covariates of prognostic importance in the logistic regression model as defined for the confirmatory analysis will increase the power as compared to the chi-square test. As the individual results for the primary endpoint are available within 30 days after surgery, the drop-out rate is expected to be small. Nevertheless, a potential dilution of the treatment effect due to drop-outs is taken into account (e.g. no photographs available, loss to follow up); it is assumed that this can be compensated by additional 5% of patients to be randomized, and therefore the total sample size required for a fixed sample size design amounts to n = 712 + 38 = 750 patients.
. . .
An adaptive interim analysis [Reference X] will be performed after availability of the results for the primary endpoint for a total of 375 randomized patients (i.e., 50% of the number of patients required in a fixed sample size design). The following type I error rates and decision boundaries for the interim and the final analysis are specified:

• overall one-sided type I error rate: 0.025

• boundary for the one-sided p-value of the first stage for accepting the null-hypothesis within the interim analysis: α₀ = 0.5

• one-sided local type I error rate for testing the null-hypothesis within the interim analysis: α₁ = 0.0102

• boundary for the product of the one-sided p-values of both stages for the rejection of the null-hypothesis in the final analysis: c_α = 0.0038

If the trial will be continued with a second stage after the interim analysis (this is possible if for the one-sided p-value p1 of the interim analysis p1∈]0.0102,0.5[ [ie. 0.5≥p1≥0.0102] holds true), the results of the interim analysis can be taken into account for a recalculation of the required sample size. If the sample size recalculation leads to the conclusion that more than 1200 patients are required, the study is stopped, because the related treatment group difference is judged to be of minor clinical importance.
. . .
The actually achieved sample size is then not fixed but random, and a variety of scenarios can be considered. If the sample size is calculated under the same assumptions with respect to the SSI rates for the two groups, applying the same the overall significance level of α = 0.025 (one-sided) but employing additionally the defined stopping boundaries and recalculating the sample size for the second stage at a conditional power of 80% on the basis of the SSI rates observed in the interim analysis results in an average total sample size of n = 766 patients; the overall power of the study is then 90% (ADDPLAN®, version 5.0).” ¹⁰⁰

Explanation

The planned number of trial participants is a key aspect of study design, budgeting, and feasibility that is usually determined using a formal sample size calculation. If the planned sample size is not derived statistically, then this should be explicitly stated along with a rationale for the intended sample size (e.g., exploratory nature of pilot studies; pragmatic considerations for trials in rare diseases).^17;184

For trials that involve a formal sample size calculation, the guiding principle is that the planned sample size should be large enough to have a high probability (power) of detecting a true effect of a given magnitude, should it exist. Sample size calculations are generally based on one primary outcome; however, it may also be worthwhile to plan for adequate study power or report the power that will be available (given the proposed sample size) for other important outcomes or analyses because trials are often underpowered to detect harms or subgroup effects.^185;186

Among randomised trial protocols that describe a sample size calculation, 4-40% do not state all components of the calculation.^6;¹¹ The protocol should generally include the following: the outcome (Item 12); the values assumed for the outcome in each study group (e.g., proportion with event, or mean and standard deviation) (Table 2); the statistical test (Item 20a); alpha (type 1 error) level; power; and the calculated sample size per group – both assuming no loss of data and, if relevant, after any inflation for anticipated missing data (Item 20c). Trial investigators are also encouraged to provide a rationale or reference for the outcome values assumed for each study group.¹⁸⁷ The values of certain pre-specified variables tend to be inappropriately inflated (e.g., clinically important treatment effect size)^188;189 or underestimated (e.g., standard deviation for continuous outcomes),¹⁹⁰ leading to trials having less power in the end than what was originally calculated. Finally, when uncertainty of a sample size estimate is acknowledged, methods exist for sample size re-estimation.¹⁹¹ The intended use of such an adaptive design approach should be stated in the protocol.

Table 2. Outcome values to report in sample size calculation.[table nl=”~~” div style=”float:left” class=”table table-bordered” tablesorter=”0″ table delimiter=”|” ]
|

Note: Although the sample size calculation uses the anticipated outcome value for each group, it is also recommended to report the corresponding contrast between groups (estimated effect).

For designs and frameworks other than parallel group superiority trials, additional elements are required in the sample size calculation. For example, an estimate of the standard deviation of within-person changes from baseline should be included for crossover trials¹⁹²; the intracluster correlation coefficient for cluster randomised trials¹⁹³; and the equivalence or non-inferiority margin for equivalence or non-inferiority trials respectively.^108;194 Such elements are often not described in final trial reports,^110;195-198 and it is unclear how often they are specified in the protocol.

Complete description of sample size calculations in the protocol enables an assessment of whether the trial will be adequately powered to detect a clinically important difference.^189;199-206 It also promotes transparency and discourages inappropriate post hoc revision that is intended to support a favourable interpretation of results or portray consistency between planned and achieved sample sizes.⁶^;207

13: Participant timeline

15: Recruitment

LEARN MORE FREE TRIAL LOGIN