Introduction

Trypanosomiasis transmitted by the tsetse fly occurs in 37 countries in Africa, and there is evidence of genetic resistance to trypanosomiasis in some cattle breeds (Murray & Trail, 1984). A programme has commenced at the International Livestock Research Institute in Nairobi, Kenya, with the aim of identifying genes for trypanotolerance in N'Dama cattle (Teale, 1993). The intention is to use marker-assisted introgression (MAI) to develop trypanotolerant cattle types. Trypanoresistance is well known in some laboratory mouse strains (Morrison et al., 1978), and three autosomal chromosomal regions have been identified (Kemp et al., 1997). Consequently, mice are being used as an animal model in a pilot study to assess the effectiveness of MAI of trypanotolerance genes for cattle.

In the design stage of the experiment, a literature survey was undertaken to identify how many founder donor animals would be needed for the experiment. Although there is a considerable volume of material dealing with MAI, relatively little deals with experimental design. Gama et al. (1992) studied introgression strategies of transgenes in pigs and considered the effect of various strategies (number of male and female donor animals) on the number to be genotyped to detect some deleterious transgene effect. Groen & Smith (1995) conducted a stochastic simulation study of the efficiency of marker-assisted introgression in livestock, and investigated the number of animals with the desired genotype. Several studies have reported methods for reducing the number of backcross generations required for the introgression process (e.g. Hospital et al., 1992; Hillel et al., 1993; Yancovich et al., 1996). Hillel et al. (1990), using genomic selection based on DNA fingerprints for gene introgression in a breeding programme, investigated the theoretical distribution of the proportion of the donor's genome in successive backcross generations. Visscher et al. (1996) focussed more on the proportion of the recipient genotypes (background genotypes) existing after specified numbers of backcrossing generations. They studied the simultaneous effect of introgressing an allele and selecting for a desired genomic background. They also looked at a selection index combining markers and phenotypic information. However, none of these studies provides strategies for determining the number of animals required for the experiment. The determination of this number is quite important to plan properly the logistics necessary for such an experiment and to estimate its cost.

Consequently, the present paper intends to provide a mathematical model for estimating the mean and variance of the number of animals at each backcross generation resulting in the targeted number of favourable animals, i.e. those with the desired chromosomal region(s), at the end of the backcross process. The ultimate goal is to intercross those animals to produce sufficient animals that are homozygous for the desired loci in one breeding cycle if possible. Opportunities might exist for the breeder to repeat the intercross process from heterozygous individuals to obtain the required number of animals.

The objective of the paper is to predict the amount of genotyping required during a backcross experiment in order to produce a specified number of individuals with desired genotypes. In addition, the consequences of reducing the risk of not obtaining enough animals of the desired genotype to a set level are studied. Aspects considered here are: the number of founder animals; the number of times each male is used with the objective of minimizing the genotyping workload; the estimated number of animals to be genotyped for the whole experiment and at each generation, as well as the expected number of animals with a favourable genotype. The effect of the number of backcross generations, the number and the size of the regions to introgress as well as the influence of risk limitation will be studied.

Theory

Model of population growth

Some assumptions have been made: the donor and recipient lines used in the experiment are fully inbred; the introgressed alleles or regions are unambiguously identifiable in founder donor as well as in crossbred animals, and they lie on autosomal chromosomes. Each of a fixed number of males and females selected from the donor line is mated only once with an animal from the recipient line to produce F1 offspring. The first backcross generation is produced by mating F1 animals to animals of the recipient line. Selection of animals for the production of subsequent backcross generations is based solely on the presence of all the entire desired chromosomal regions from the donor strain. Animals which did not inherit all regions or animals carrying one or more recombinations in any of the regions are not selected. For the development of the model, it is also assumed that all the favourable animals will be used in subsequent production of the next generation; in practice this does not need to be the case though, depending on the number of animals available. Because of differences in reproductive capacities, favourable females are mated with only one male in the recipient line, whereas selected males can be mated to more than one recipient line female. The number of times that males are used will be treated as a design parameter. Attention will also be paid to the situation in which only males are genotyped and selected during the backcross generations. It will also be assumed that there are always sufficient recipient animals available for mating.

A further assumption is that litter size varies stochastically, and independently, from female to female. As a working approximation, a Poisson distribution for litter size is assumed (see, e.g. Foulley et al., 1987; Matos et al., 1997), the major consequence being that the variance of litter size is taken as being equal to the mean.

In developing the theory, the mean and variance of the number of animals with the desired genotype at each generation are provided. Derivation of the full probability distribution of the number of animals at each generation is a more intractable problem. These issues are considered in mathematical theories of population dynamics (e.g. Renshaw, 1991). However, it is believed that use of summary statistics will characterize the population sufficiently well, and also allows a relatively simple development of the theory.

The following notation will be used here:

t=generation number (0=F1; 1, 2, 3, ...=backcross generation 1, 2, 3, ...);

Mt=number of favourable males at backcross generation t;

Ft=number of favourable females at backcross generation t;

Nt=Mt+Ft, the total number of favourable animals at generation t;

n=number of males and number of females initially selected from each strain to start the introgression process (i.e. 2n donor animals and 2n recipient animals);

p=probability of inheriting the desired chromosomal region(s), i.e. favourable;

r=number of times each favourable male is used for breeding;

s=sex ratio in offspring at breeding age;

λ=mean number of animals produced in a litter and surviving to breeding age.

Although it would usually be assumed that the sex ratio s would equal ½, other values might occur as a result of differential mortality of males and females up to the time of breeding. Note that the probability of inheriting the desired chromosomal region(s) is a function of the number and the size of the region(s) to be introgressed. Let di be the length of a region (Morgans) and z the number of independent regions. With unequal chromosomal region lengths and applying Haldane's (1919)(1986) Poisson model for crossing-over, p is calculated as p = ( 1 2 ) z i = 1 z e - d i which reduces to p=(½)zezd when all regions are of equal length.

Assuming that the parameters r, p, s and λ are constant over generations, it may also be shown that there are explicit solutions for the mean and variance of Nt, namely E(Nt)=2nλ[pλ(rs+1−s)]t and var(Nt)=E(Nt)[β+γE(Nt)] where

and

As an illustration (Fig. 1, 20 simulations were performed, each starting from 2n=36 donor animals with r=4; λ=5; s=½; p=0.0686 (based on z=3 and d=20 cM). This also shows the means±2 standard deviations, E(Nt)±2[var(Nt)]1/2. As expected, relatively few simulated values lie outside these limits.

Fig. 1
figure 1

Simulation of the number of favourable animals at each generation, Nt. Twenty simulations are shown all starting with 2n=36 founder animals; other parameters are r=4; λ=5; s=½; p=0.0686 (based on z=3 and d=20 cM). Also shown are the theoretical mean (squares)±2 standard deviations (triangles). Generation ‘−1’ refers to the founders, ‘0’ to the F1 generation, and subsequent generations are backcrosses.

Constant breeding strategy

Two parameters (n and r) are under control of the breeder. When specifying that a certain mean number (say Nmean) of favourable animals are to be available after k backcross generations and for a specified value of r, the initial number of animals required is clearly n=Nmean/{2λ[pλ(rs+1−s)]k}. For this strategy, however, there will be (approximately) a 50% risk of obtaining fewer animals than this at the end of the backcross process. An alternative strategy would be to specify the minimum number of favourable animals, Nmin, and the nominated risk of obtaining fewer animals than this, α.

Provided that Nmin is not too small, the distribution of the number of favourable animals at generation k can be taken as approximately normal, with mean μ=E(Nk) and variance σ2=var(Nk). So for a chosen value of r, we select the value of n so that P(Nk<Nmin)≤α. This may be approximated as the value n satisfying

where Φ−1(α) is the 100α percentage point of the standard normal distribution. However, when Nmin is small, a better approximation to normality is achieved by using a square-root transformation. These means and variances may be approximated as E(√\(\overline{Nk}\))≈√\(\overline{E(Nk)}\)+¼var(Nk)[E(Nk)]−3/2 and var(√\(\overline{Nk}\))≈¼var(Nk)/E(Nk), and n then chosen to satisfy

Note that different values of male usage, r, result in different minimum values of n. One possible option is to choose the value of r that would require the smallest expected number of animals to be genotyped during the entire experiment (amount of genotyping). Because for any backcross generation, the amount of genotyping required has a mean of p−1E(Nt) if males and females are genotyped, the total expected amount of genotyping over k backcross generations would be

Instead of genotyping males and females, one could only genotype and select males from generation 1 to k−1, and subsequently males and females in generation k, which results in a total expected amount of genotyping of:

Alternatively, a constant amount of genotyping each generation may be the preferred option. In this situation, we need the growth rate to be (close to) one; this is achieved by choosing r to be the nearest integer to (pλs)−1s−1+1, that is 2/(pλ)−1 assuming s=½.

Adaptive breeding strategies

Another issue is that the breeding strategy may be adaptive in the sense that it might be changed between generation 0 and k, or perhaps extended for additional generations. Although n cannot be altered, r may be altered during the programme, particularly when Nt at generation t becomes critically small. For this, we need to calculate the conditional mean and variance of Nk given the actual number of males and females at generation t. These may be calculated as

and

where β and γ are as defined above. So at each intervening generation, t=0, ..., k−1, we may select the value of r so that

Another adaptation to face situations in which the number of favourable animals has dropped below the desired level is to produce more offspring than planned in subsequent generations.

Numerical application

In this section, consequences of the theory outlined previously will be investigated numerically. For all calculations, regions to be introgressed are assumed to be of equal size. The sex ratio (s) was assumed to be ½ and the mean litter size (λ) assumed to be 5. First the situation in which both males and females are genotyped will be considered. Subsequently, results for situations in which only males are genotyped during the backcrossing process are presented.

Genotyping males and females

In the first investigation, the mean number of favourable animals was specified (Nmean=50), and the initial number of donor animals (n males and n females) determined (Table 1)). Also shown is the expected number of animals to be genotyped for the whole experiment (EGk). The probability p of inheriting the desired genotype decreases when z or d increases. Note that in some situations, values of n less than one result in the required number of animals in the target generation. In these situations, entries for higher values of r have been left empty, as higher usage of males is clearly not necessary. Both n and the number of animals to be genotyped for the whole experiment (EGk) decrease with increasing r and p, but increase when the number of backcrosses k becomes higher. Fig. 2 shows the pattern of E(Nt) over the backcross generations which converges to a fixed value of Nmean=50. Note that a plot of the expected amount of genotyping over time would show the same pattern, converging to 729 animals after k=5 backcross generations.

Table 1 Expected number of animals to be genotyped (EGk) for breeding scheme (r), the expected number (Nmean) of favourable animals required at the end of backcrosses being 50. (p=probability of inheriting the desired chromosomal region(s))
Fig. 2
figure 2

Expected number of favourable animals (E(Nt)) at each generation of backcrossing for different breeding schemes (r). Fixed parameters were Nmean=50 at k=5 backcross generations; λ=5; s=½; p=0.0686 (based on z=3 and d=20 cM). The broken line is drawn at Nmean=50.

For the second investigation, the minimum number of animals after k backcross generations was specified (Nmin=50), with a risk of α=0.05 of obtaining fewer than this number (Table 2). This results in an increase of all corresponding values of EGk and n compared with the values in Table 1 where the mean was set at 50; similarly n decreases with increasing p, increasing r and for greater number of backcross generations. The expected amount of genotyping (EGk) behaves differently however; for smaller values of p, EGk initially decreases with increasing r, reaches a minimum value and then starts increasing (Table 2). However, the initial decrease is not observed when p exceeds 0.216 (for the given values of λ, s).

Table 2 Expected number of animals to be genotyped (EGk) for breeding scheme (r), the minimum number (Nmin) of favourable animals required at the end of backcrosses being 50, α=0.05. (p=probability of inheriting the desired chromosomal region(s))

Fig. 3 shows the behaviour of E(Nt) over the backcross generations. The mean population trajectories do not converge to the same value for all values of r; Table 3 shows the means and variances of Nt at the fifth generation for a given Nmean or Nmin. With high r-values, the initial number of founders (2n) is relatively small, but the final mean (E(Nk)) tends to be higher, compared with the designs with a lower r-value. When the number of founders is small, there is relatively more variability in the number of favourable animals in F1 and subsequent backcross generations. Consequently, the mean in the target backcross generation, E(Nk), needs to be relatively high in order to ‘ensure’ that the minimum number of favourable animals is achieved.

Fig. 3
figure 3

Expected number of favourable animals (E(Nt)) at each generation of backcrossing for different breeding schemes (r). Fixed parameters were Nmin=50 at k=5 backcross generations with risk α=0.05; λ=5; s=½; p=0.0686 (based on z=3 and d=20 cM). The broken line is drawn at Nmin=50.

Table 3 Means and variances of N5 for α=0.50 and α=0.05; z=3 and d=20 cM

As expected in the third investigation (Table 4), increasing the number of animals required results in an increased amount of genotyping, as does reducing the risk level. Further, doubling the number of animals required doubles the amount of genotyping when Nmean is specified. However, when the risk is considered (i.e. for α<0.50), doubling Nmin leads to a less than doubling of the amount of genotyping.

Table 4 Expected number of animals to be genotyped (EGk) for r=4, k=5, z=3 and d=20 cM †Nmin corresponds to Nmean when α=0.50.

Genotyping males only

Males in general have a higher reproductive rate than females and consequently it might be interesting to look at the situation in which only males are genotyped and selected. During the backcrossing generations these selected males can be mated to ungenotyped females from the recipient line. Males and females in the final backcross generation need to be genotyped and selected for the intercross. Table 5 gives the number of founder animals n that are needed in order to have a risk of α=0.05 of obtaining fewer than Nmin=50 animals after five generations of backcrossing for the alternative of genotyping all animals (rF=1) or males only (rF=0). For higher values of r (≥4) the expected amount of genotyping is smaller by genotyping males only. However, if there is a limit to r then the Table indicates there are situations where genotyping females would be optimal with respect to the amount of genotyping. Genotyping males only results in an increase in the number of animals being produced in the different generations as illustrated by the number of founder animals n in Table 5.

Table 5 Expected number of animals to start the experiment (n) and expected number of animals to be genotyped (EGk) for a situation with (rF=1) and without (rF=0) genotyping and selection of females for α=0.05, Nmin=50, k=5, z=3 and d=20 cM

Discussion

The model developed in this study can be used to assist the experimentalist in designing an MAI experiment. It provides important insights into parameters that need to be considered and the scale at which the introgression needs to be carried out in order to be successful. When the aim is to minimize the amount of genotyping there might be options to reduce this amount by using an adaptive breeding scheme when r is lower than the maximum reproductive rate of males. Using low levels of r during the initial generations and high values in later generations will lead to smaller values of EGk than those given in Tables 1 and 2. The results obtained for a constant breeding scheme will serve as a good starting point for the calculations in that case. An additional reason for a departure from a constant breeding strategy might be if the breeder wishes to introgress two regions and exclude a third region. Once the breeder has eliminated that third region, p would be changed to the ‘original’ level for two regions. This situation can be accommodated by allowing p to vary over the generations using this adaptive approach. Setting the mean (Nmean) at 50, one would expect a number of progeny of 125 after one generation of intercrossing. Introgressing z=3 regions each of length 20 cM, then (½e−0.2)6=0.47% (corresponding to 0.6 animal) of the offspring is expected to be homozygous for the three-locus genotype. This demonstrates that nominating an expected number of 50 animals at the end of the backcross generations is too low. Now consider the case where the minimum (Nmin) is set at 50, and assume that the breeder used each male r=7 times during the backcrossing process (Table 3). This would result in approximately 1.7 animals, still a low number. Nevertheless, opportunities might exist for the breeder to repeat the intercross process, or additionally to intercross any heterozygous animals resulting from the intercross.

During the experiment, the experimenter will try to modify the scheme when the results would permit this. Genotyping of animals will be stopped as soon as sufficient animals with a desirable genotype have been found. It is clearly demonstrated that the amount of genotyping can be reduced by using such an adaptive breeding scheme (Table 5). There might be practical limitations in implementing a scheme which results in minimum genotyping. To make efficient use of genotyping facilities, animals are genotyped in batches rather than individually. Because of culling of surplus animals at an early age, it may be difficult to go back to genotyping more males or females if insufficient males with favourable genotypes are found. These practical constraints need to be taken into account in executing a MAI programme in order to make efficient use of experimental and genotyping facilities.

The number of founder animals from the donor as well as the recipient strain (n female and n male) are represented as a decimal number (Tables 1 and 3). In practice this cannot be realized. Imposing a minimum value of 1 for n would under a constant breeding strategy lead to too large numbers at the end of the experiment. Values smaller than 1 indicate that during the initial generations of the backcrossing process the number of selected animals should be kept constant instead of increased, perhaps by not using all the available favourable animals.

In MAI the major aim of backcrosses is the recovery of the recipient genome (Hospital et al., 1992; Hillel et al., 1993; Yancovich et al., 1996). The number of generations of backcrossing is determined by the desired proportion of genes coming from the donor line for other parts of the genome. With five backcrosses, the expected proportion of the donor genome in the recipient animal is still 1.562%, and it is 0.049% after 10 backcross generations (Soller & Plotkin-Hazan, 1977; Hillel et al., 1990). In this paper, selection is entirely based on the genotypic information for the identified regions. To reduce the number of backcross generations, genomic selection against the donor genes from other parts of the genome may be applied (Hospital et al., 1992; Hillel et al., 1993; Yancovich et al., 1996). Introgressing alleles simultaneously with genomic selection on the recipient genotype (Visscher et al., 1996) would be another way of reducing backcross generations. These techniques accelerate the recovery of the recipient genome but will lead to an increase in the number of animals to be produced within a generation and the amount of genotyping per animal; indeed the genomic selection is applied to those individuals known to carry the relevant segments.

One of the goals of this study is to minimize the number of animals to be genotyped throughout the experiment. If we ignore the risk of obtaining too few animals in the target backcross generation, then Table 1 indicates that we need to select a strategy with a small n and large r. In fact the expected amount of genotyping would be minimized further by a low constant population size up to the second last generation (k−1), then the highest possible increase in the last generation (i.e. high r). However, when risk is considered, this may no longer be the optimal strategy. As is seen in Table 2, the expected amount of genotyping is minimized by selecting a low or intermediate value of r; higher values of r can lead to an increase in the amount of genotyping involved. An option to minimize the amount of genotyping would be to only type, select and use favourable males during the backcross process. From Table 5 it follows that when the reproductive rate of males is sufficiently large, the expected total amount of genotyping can be reduced by genotyping males only, at the expense of producing a larger number of individuals. In designing an experiment, the amount of genotyping needs to be balanced against the number of experimental animals needed. One will first determine the number of backcross generations that is needed to obtain the acceptable proportion of donor genome in the newly formed line. To determine the optimum choice for the other parameters one will need to incorporate the costs of producing and keeping an animal, the genotyping costs, the costs of other resources and the time required to achieve the target.

Alternatively, the choice of ‘optimum’ may also be driven by the genotyping capacity of the laboratory. Low values of r are usually associated with high rates of population decrease, resulting in a large volume of genotyping being conducted in the early generations (Figs. 2 and 3). Conversely, when risk is considered, a high value of r may lead to an increased amount of genotyping in the later generations (Fig. 3). Clearly, for the parameters nominated for Fig. 3, a choice of r=5 would be optimal in terms of approximate constant genotyping over all backcross generations.

This model was developed to assist in the design stage of a marker-assisted introgression experiment of trypanoresistance genes in mice. The findings are now to be exploited in this programme. More generally, this model when implemented in a computer spreadsheet programme, allows the breeder to assess the effect of changes in biological and design parameters on the population growth, as well as effects of random causes. Such a system could be incorporated into a computer-aided decision-support system for use by breeders. A spreadsheet program (Microsoft Excel) for this model is available from the corresponding author.