Item 14: Estimated number of participants needed to achieve study objectives and how it was determined, including clinical and statistical assumptions supporting any sample size calculations.
Example 1
“The sample size was calculated on the basis of the primary hypothesis. In the exploratory study [Reference X], those referred to PEPS [Psychoeducation with problem solving] had a greater improvement in social functioning at 6 month followup equivalent to 1.05 points on the SFQ [Social Functioning Questionnaire]. However, a number of people received PEPS who were not included in the trial (e.g., the waitlist control) and, for this larger sample (N=93), the mean prepost treatment difference was 1.79 (pretreatment mean=13.85, SD=4.21; posttreatment mean=12.06, SD=4.21). (Note: a lower SFQ score is more desirable). This difference of almost 2 points accords with other evidence that this is a clinically significant and important difference [Reference X]. A reduction of 2 points or more on the SFQ at 1 year followup in an RCT of cognitive behaviour therapy in health anxiety was associated with a halving of secondary care appointments (1.24.vs 0.65), a clinically significant reduction in the Hospital Anxiety and Depression Scale (HADS [Reference X]) Anxiety score of 2.5 (9.9 vs 7.45) and a reduction in health anxiety (the main outcome) of 5.6 points (17.8 vs 12.2) (11 is a normal population score and 18 is pathological) [Reference X]. These findings suggest that improvements in social functioning may accrue over 1 year, [sic] hence we expect to find a greater magnitude of response at the 72 week followup than we did in the exploratory trial. Therefore, we have powered this trial to be able to detect a difference in SFQ score of 2 points. SFQ standard deviations vary between treatment, control, and the waitlist samples, ranging from 3.78 to 4.53. We have based our sample size estimate on the most conservative (i.e., largest) SD [Standard deviation]. To detect a mean difference in SFQ score of 2 point (SD = 4.53) at 72 weeks with a twosided significance level of 1% and power of 80% with equal allocation to two arms would require 120 patients in each arm of the trial. To allow for 30% drop out, 170 will be recruited per arm, i.e., 340 in total.” ^{183}
Example 2
“Superficial and deep incisional surgical site infection rates for patients in the PDS II® [polydioxanone suture] group are estimated to occur at a rate of 0.12 [Reference X]. The trials by [Reference X] have shown a reduction of SSI [surgical site infections] of more than 50% (from 10.8% to 4.9% and from 9.2% to 4.3% respectively). Therefore, we estimate a rate of 0.06 for PDS Plus® [triclosancoated continuous polydioxanone suture].
For a fixed sample size design, the sample size required to achieve a power of 1β = 0.80 for the onesided chisquare test at level α = 0.025 under these assumptions amounts to 2 × 356 = 712 (nQuery Advisor®, version 7.0). It can be expected that including covariates of prognostic importance in the logistic regression model as defined for the confirmatory analysis will increase the power as compared to the chisquare test. As the individual results for the primary endpoint are available within 30 days after surgery, the dropout rate is expected to be small. Nevertheless, a potential dilution of the treatment effect due to dropouts is taken into account (e.g. no photographs available, loss to follow up); it is assumed that this can be compensated by additional 5% of patients to be randomized, and therefore the total sample size required for a fixed sample size design amounts to n = 712 + 38 = 750 patients.
. . .
An adaptive interim analysis [Reference X] will be performed after availability of the results for the primary endpoint for a total of 375 randomized patients (i.e., 50% of the number of patients required in a fixed sample size design). The following type I error rates and decision boundaries for the interim and the final analysis are specified:
• overall onesided type I error rate: 0.025
• boundary for the onesided pvalue of the first stage for accepting the nullhypothesis within the interim analysis: α_{0} = 0.5
• onesided local type I error rate for testing the nullhypothesis within the interim analysis: α_{1} = 0.0102
• boundary for the product of the onesided pvalues of both stages for the rejection of the nullhypothesis in the final analysis: c_{α} = 0.0038
If the trial will be continued with a second stage after the interim analysis (this is possible if for the onesided pvalue p1 of the interim analysis p1∈]0.0102,0.5[ [ie. 0.5≥p1≥0.0102] holds true), the results of the interim analysis can be taken into account for a recalculation of the required sample size. If the sample size recalculation leads to the conclusion that more than 1200 patients are required, the study is stopped, because the related treatment group difference is judged to be of minor clinical importance.
. . .
The actually achieved sample size is then not fixed but random, and a variety of scenarios can be considered. If the sample size is calculated under the same assumptions with respect to the SSI rates for the two groups, applying the same the overall significance level of α = 0.025 (onesided) but employing additionally the defined stopping boundaries and recalculating the sample size for the second stage at a conditional power of 80% on the basis of the SSI rates observed in the interim analysis results in an average total sample size of n = 766 patients; the overall power of the study is then 90% (ADDPLAN®, version 5.0).” ^{100}
Explanation
The planned number of trial participants is a key aspect of study design, budgeting, and feasibility that is usually determined using a formal sample size calculation. If the planned sample size is not derived statistically, then this should be explicitly stated along with a rationale for the intended sample size (e.g., exploratory nature of pilot studies; pragmatic considerations for trials in rare diseases).^{17;184}
For trials that involve a formal sample size calculation, the guiding principle is that the planned sample size should be large enough to have a high probability (power) of detecting a true effect of a given magnitude, should it exist. Sample size calculations are generally based on one primary outcome; however, it may also be worthwhile to plan for adequate study power or report the power that will be available (given the proposed sample size) for other important outcomes or analyses because trials are often underpowered to detect harms or subgroup effects.^{185;186}
Among randomised trial protocols that describe a sample size calculation, 440% do not state all components of the calculation.^{6;}^{11} The protocol should generally include the following: the outcome (Item 12); the values assumed for the outcome in each study group (e.g., proportion with event, or mean and standard deviation) (Table 2); the statistical test (Item 20a); alpha (type 1 error) level; power; and the calculated sample size per group – both assuming no loss of data and, if relevant, after any inflation for anticipated missing data (Item 20c). Trial investigators are also encouraged to provide a rationale or reference for the outcome values assumed for each study group.^{187} The values of certain prespecified variables tend to be inappropriately inflated (e.g., clinically important treatment effect size)^{188;189} or underestimated (e.g., standard deviation for continuous outcomes),^{190} leading to trials having less power in the end than what was originally calculated. Finally, when uncertainty of a sample size estimate is acknowledged, methods exist for sample size reestimation.^{191} The intended use of such an adaptive design approach should be stated in the protocol.
Table 2. Outcome values to report in sample size calculation.



Element  Binary  Continuous  Timetoevent 
Assumed result for each study group  Proportion (%) with event  Mean and standard deviation  Proportion (%) with event at a given time point 
Effect measure  Relative risk, odds ratio  Difference in means  Hazard ratio 
Note: Although the sample size calculation uses the anticipated outcome value for each group, it is also recommended to report the corresponding contrast between groups (estimated effect).
For designs and frameworks other than parallel group superiority trials, additional elements are required in the sample size calculation. For example, an estimate of the standard deviation of withinperson changes from baseline should be included for crossover trials^{192}; the intracluster correlation coefficient for cluster randomised trials^{193}; and the equivalence or noninferiority margin for equivalence or noninferiority trials respectively.^{108;194} Such elements are often not described in final trial reports,^{110;195198} and it is unclear how often they are specified in the protocol.
Complete description of sample size calculations in the protocol enables an assessment of whether the trial will be adequately powered to detect a clinically important difference.^{189;199206} It also promotes transparency and discourages inappropriate post hoc revision that is intended to support a favourable interpretation of results or portray consistency between planned and achieved sample sizes.^{6}^{;207}
13: Participant timeline  15: Recruitment 