The number of animals requested in an animal use protocol should be “sufficient but not excessive” for the purposes of a study1, and be governed by what is justifiable, appropriate, reasonable, and ethical. Statistical power analysis is considered to be the gold standard for determining animal numbers. For example, the Guide for the Care and Use of Laboratory Animals states that “whenever possible, the number of animals and experimental group sizes should be statistically justified (e.g., provision of a power analysis)”2. Unfortunately, it is not unusual for investigators to request approval for numbers that defy common sense, even when ‘justified’ by a power calculation. Investigators might defend requests for tens or even hundreds of thousands of animals for a 3-year proposal because “unforeseen problems might happen” or “it is unknown how many animals we will require because this is an exploratory study,” for example. Conversely, many investigators simply choose a sample size that “worked” in previous studies “to obtain significance”, and use a power calculation as an ad hoc justification3.

The goal for any method used to estimate animal numbers is to obtain reasonable approximations of the number of animals necessary to meet study objectives. However, power analyses cannot, and should not, be the only method used. First, power calculations are only appropriate for studies for which the primary focus is null hypothesis significance testing (NHST) and statistical estimation (concerned with sampling distributions, and point and interval estimation). For those studies, the goal is to make inferences, compare groups, or assess the probability that study results are true. By contrast, exploratory studies, demand–supply protocols, feasibility assessments, and pilot studies are not testing hypotheses; therefore numbers based on NHST and P-values are meaningless in these contexts. Second, all projected animal use numbers (for NHST or not) require cross-validation to ensure that they are operationally feasible. Supporting evidence should be provided to confirm that the proposed work practices, procedures, and personnel are sufficient to support the project, and that the project can be completed in the allotted time and within budget. It is clearly unethical to propose or approve studies that require numbers of animals that far exceed processing capacity, and therefore do not have a realistic chance of completion. In all cases, reasonable approximations of numbers are required to solve practical problems determined by study objectives.

Quantitative verifiable approximations of animal numbers for a given project can be obtained by structured back-of-the-envelope calculations4. The process, known as Fermi estimation5, consists of a logical formulation of the approximation problem, systematically broken down into smaller subproblems or components, each of which is solved by basic arithmetic. Demonstration of calculations enables reviewers to understand and verify animal numbers. The method was named after the physicist Enrico Fermi, who had an extraordinary aptitude for finding quick and accurate answers to practical complex problems using simple calculations5,6.

The Fermi estimation method

Fermi estimation is a four-step cyclical process consisting of the identification of the study focus, structured calculations, a reality check, and a refinement step (Fig. 1). Study focus provides the initial framework for subsequent calculations. If the study requires NHST, the objective is to obtain a sample size likely to detect a biologically or clinically meaningful effect size if it exists, and avoid false positives7. In this situation, the investigator specifies the hypothesis and all relevant statistical components (such as primary outcome measures, clinically relevant differences to be detected, measures of variability, settings for α and β), and then performs the power calculations appropriate to the statistical hypothesis and type of analyses required8. On the other hand, if the problem type is operational—that is, numbers are to be determined by the demand and/or supply needs of the study—NHST and power calculations are not appropriate. Unlike power calculations, which depend on mathematical relationships between statistical components, operational approximations are derived from pragmatic information specific to the problem to be solved. Simple arithmetic is then used to compute numbers for each component, and the total is obtained by adding all component parts together. Because precise values for each component are usually unknown, or different background information sources might present very different values, approximations should be bracketed by setting biologically reasonable upper and lower bounds.

Fig. 1: Animal numbers decision tree.
figure 1

NHST; null hypothesis significance testing.

Reality checks, or feasibility assessments, using Fermi calculations should be performed for both NHST and operational studies. These checks consist of comparing the proposed numbers with approximations based on laboratory-specific processing capacity and operational constraints, such as number of personnel and personnel-hours available, time required to perform important procedures, resource availability, estimated costs, total available budget, deadlines and milestones. If the proposed number of animals is too large to be feasible, the approximation problem will need to be reformulated or refined to meet operational capability.

Examples of Fermi estimation

Tissue-based studies

Power-based sample size calculation algorithms are available for high dimensionality exploratory studies (such as multiple DNA/RNA microarrays, biochemistry assays, biomarker studies, proteomics, inflammasome profiles). However, sample sizes based on these algorithms are determined, not by considering the animal as the replicate, but by array-specific characteristics, such as the number of arrays, investigator-specified sensitivity, or proportion of differentially expressed genes detected, number of expected false positives, correlation between expression levels of different genes, and number of sampling time points9. Therefore, a separate set of calculations is needed to approximate the expected number of animals required for tissue harvest. The number required will depend on the amount of tissue, cells, or antibodies that can be harvested from a single animal, and the total amount of tissue required for the microarrays (Table 1).

Table 1 Estimating numbers of animals required (N) for tissue harvest studies

Wildlife studies

Wildlife studies that include capture, handling, marking, non-invasive and invasive procedures, or lethal take, or that do not meet the regulatory definition of a field study, are subject to IACUC review and as such, a rationale for animal numbers is required. Projected take numbers must be subject to oversight because the number of captured animals cannot exceed the number on issued permits, and trap-removal procedures must be evaluated for their potential effect on wild populations. Although many field studies involve NHST, it has been argued that NHST in wildlife science applications is often misapplied and is seldom useful10,11. Regardless, NHST rationale might not be appropriate for numbers proposed for permit applications, species surveys, and inventories. In these cases, projected animal numbers are determined by ecological and investigator-sourced information. This information includes prior census information, expected species density, proposed census and/or capture methods, census coverage and intensity, and anticipated capture ratios. Potential adverse events, estimated trap attrition or mortality, methods to reduce loss, and humane endpoints should also be documented. This information can also be used to assess the feasibility of hypothesis-testing studies (Table 2).

Table 2 Estimating numbers of animals captured (N) for wildlife species inventories or surveys

Mouse breeding colony

Breeding protocols are required when requesting production of research animals that cannot be obtained commercially. Examples include the creation of new transgenic, knockout, or other genetically modified animals; back-crossing of genetically modified lines; or production of prenatal or early neonate subjects12. For some applications such as genome-wide association studies, total numbers might be approximated by simple rules of thumb12. However, protocols requesting production of a target genotype will require an estimate of the total numbers of adults and pups (or embryos if necessary). Because it is difficult to predict pup production12, approximations can be based on data from projections from past demand, past protocols, breeding colony records, current breeding stock numbers, and facility and personnel capability. Other information to be reported includes methods for genotype confirmation or other quality assurance methods, as well as means to dispose of unused animals, together with contingency plans to reduce the number of animals with the unwanted genotype that will be euthanized (Table 3).

Table 3 Estimating pup production or target genotype numbers in breeding colony protocols

Feasibility assessments

Feasibility assessments are performed to confirm that the study is possible given the available resources13. Calculations should be used to support projected numbers of animals (“Does the estimated number of animals make sense?”) in the context of operations (“Are current work practices, procedures, resources, and trained personnel sufficient to support the project with the estimated numbers of animals?”) and time-related or budget-related criteria (“Can the project be completed with the estimated numbers of animals in the allotted time and with the allotted amount of funding?”, Table 4)14,15.

Table 4 Feasibility calculations (proposed animal numbers versus processing capacity)

Pilot studies: a special case

Preclinical pilot or feasibility studies are essentially mini-rehearsals of the proposed definitive experiment. They are conducted to assess practical logistics and inform the design and conduct of the subsequent large trial16,17,18,19. A well-designed pilot can thus improve translation and reduce animal use by allowing early identification of problems that contribute to study futility and subsequent waste of animals12. However, criteria for determining and justifying the best size for a pilot study are problematic because they might not be directly quantifiable. Because pilot trials do not test hypotheses about the effects of an intervention, power-based sample size estimations are generally not applicable (although sample sizes for human clinical pilots can be justified statistically under certain conditions)18,19,20. Preclinical pilot studies are usually deliberately kept small (≤ 10 subjects), with numbers derived from personal subjective criteria, such as the investigator’s experience and guesswork21. Unfortunately, pilot studies can be poorly thought-out and without discernible purpose18. Without clear-cut objectives, studies cannot fulfill their intended purpose of improving the development and design of later experiments, and data will be insufficient to either obtain power-based sample size calculations for later trials or provide preliminary assessments of efficacy and safety of the treatment intervention14,22.

Pilot studies should have one or more specific objectives with quantifiable success metrics (that is, how will the investigator know the pilot trial works?). Pilot objectives can be utilitarian (“Can it work?”), empirical (“Does it work?”), or translational (“Will it work?”)15. Utilitarian pilots are designed to assess study logistics and enable standardization, correction, and refinement of experimental procedures. Simple assessment metrics can include percent success of subject recruitment and enrollment procedures19 (for example for veterinary clinical trials), procedure times, target performance measures, and human performance errors23. Empirical pilots determine if study endpoints, especially novel ones, are related to study objectives, can be measured or are sufficiently expressed to be useful, and have a clear relationship to the independent or treatment variable; the study also determines if data can be collected correctly and completely, and if results are accurate and reliable19. Other useful information might be obtained, such as response rates of controls and occurrence of adverse events. Translational pilots are designed to assess if the experimental intervention can be made more broadly applicable to other animal strains, species, or operating conditions. In these cases, standardization of, and strict adherence to, protocols will enable comparisons (“Does the process work the same way in B as it did in A?”). Depending on pilot objectives, sample sizes can be approximated with information from the literature and from previously described studies, or by using simple rules of thumb if estimates of study endpoints are required24.

Conclusions

Fermi estimation is a formalized method of approximating animal numbers, using a few simple calculations and basic arithmetic, and supported by operationally sensible assumptions about the items and resources involved. Such approximations are appropriate for exploratory and demand-based studies, and can be used as a feasibility assessment for any proposed study, whether or not hypothesis testing is required. At a minimum, adequate justification of animal numbers should be based on four main reporting requirements: study focus, a description of one or more assessment variables, a clear description of any calculations performed (“show your work”), and a feasibility assessment. For complex studies and/or multiple experiments, a flow chart showing group sizes, approximate numbers per group, timeframes, key interventions, primary outcomes, experimental and/or humane endpoints, and other information can be helpful for explaining how the total number of animals was determined and the animals were disposed. Investigators should be expected to practically implement the 3Rs, especially for protocols involving unrelieved pain and distress. Submission of protocols with unfeasible numbers of animals should always be discouraged.

Justification of animal numbers is related explicitly to considerations of animal use and welfare, and is thus an ethical obligation, as well as a requirement of federal funding agencies25. However, due diligence must be exercised by investigators, oversight committees, and reviewers to confirm that numbers proposed are scientifically and operationally valid. Approximation and justification of animal numbers should be based on more objective and verifiable criteria than unsubstantiated assertions based on experience and guesswork. By providing an evidence and fact-based structure for subsequent calculations, Fermi estimation methods can enhance rigor and reproducibility of the proposed research.