Introduction

Cutaneous malignant melanoma (CMM; MIM 15560) is common in the Australian state of Queensland, where a fair skinned population lives in tropical and subtropical latitude.1 Although fewer than 1% of cases in this population are strongly familial, the overall heritability of CMM is approximately 45%.2, 3, 4

The genes so far identified as involved in the aetiology of CMM have been either discovered in high-risk families – rare highly penetrant mutations such as those in the cell cycle regulators CDKN2A, CDK4 and INK4, or are common alleles at pigmentation loci such as MC1R, underlying well known phenotypic risk factors such as hair, skin and eye colour.5, 6, 7, 8

One of the greatest phenotypic risk factors for CMM is the number of acquired benign melanocytic naevi (moles) present on an individual's skin9 – presence of more than 4 moles (over 2 mm in diameter) on the arms increases CMM risk four-fold in pale-skinned individuals.10, 11 The exact nature of moles is still not completely clear. They are benign tumours of melanocytes and naevus cells that appear on the skin in childhood, increasing in number until early adulthood. Each mole is thought to represent the clonal expansion (10–23 doublings) of a single melanocyte12, 13, 14 harbouring a somatic mutation in a cell cycle regulatory or senescence gene.15, 16, 17 Melanocytes from up to 80% of moles carry BRAF or NRAS somatic mutations,18, 19, 20, 21 and transgenic fish expressing the commonest BRAFV600E mutant spontaneously develop naevus-like lesions.22

Twin analyses have shown naevus count to be more heritable than CMM risk per se.23, 24, 25, 26, 27 In carriers of the rare functional mutations of CDKN2A, nevogenesis, and thus total mole count, is generally increased.28 These naevi are often macroscopically unusual in appearance as well as exhibiting dysplastic histopathology. In a study of unselected adolescent twins,25 we showed a linkage of mole count and mole density to the region around CDKN2A on chromosome 9, though this finding was not replicated by a smaller study of older twins from the UK.27

In the present paper, we extend our previous report of linkage to markers near CDKN2A (chromosome 9p) and present results from the first whole genome scan of naevus count using a considerably expanded sample of adolescent twins and their families ascertained through Brisbane primary schools.

Results

Measured covariates and naevus counts

Box-Cox regressions, either including only an intercept or the covariates, found that a cube-root transformation of raised and flat naevus counts was superior to the more commonly used log transformation (Shapiro–Wilks test of normality of residuals, W=0. 9977, P=0.1232). A negative binomial generalised additive mixed model for the covariates also fitted well, giving a very similar distribution of residuals to that obtained from the cube-root transformed Gaussian model. In addition, the transformed values output by the SQTL program were also virtually identical to those from the regression model for the cube-root transformed data. Naevus counts were higher in males than females, increased with age up to 16 years old, were lower in redheads, and higher in lighter skinned individuals. Parental and self assessed sun exposure weakly increased naevus counts (see Table 1 and Figure 1).

Table 1 Fixed effect parameter estimates from negative binomial generalized additive mixed model for flat nevus count in twins versus significant covariates
Figure 1
figure 1

Relationship between age and flat and raised mole counts from cross-sectional data (XS) on male and female adolescent twins and their siblings, including longitudinal data (Long) from two visits two years apart. Dense lines represent smoothed effects of age on mole count based on cross-sectional data from the first visit (these are fitted via locally weighted regression). The dotted lines represent changes in mole count from the first to the second visit for individuals who lie outside 1.5 times the interquartile range of counts.

Polygenic additive effects estimation

Multivariate genetic variance components for the three different type of naevus count were estimated by combining data from monozygotic and dizygotic twins and sibs in a classical twin family analysis. The proportion of variance due to additive polygenic effects for flat, raised and atypical mole counts were 57, 67 and 42%, respectively. There were moderate genetic correlations (0.40 between flat and atypical, 0.49 for flat with raised and 0.58 between raised and atypical) among the counts of the different types of naevus, as well as effects of family environment on flat and raised naevus counts (see Figure 2).

Figure 2
figure 2

Multivariate genetic analysis (Cholesky decomposition) for flat, raised and atypical naevus count. A=additive polygenic effects, C=Common environmental effects and E=unique environmental effects. Standardized variance components contributions (proportion of total variance) are given beside paths, which are all positive.

Univariate genome scan results

In Figure 3 each plot displays lod scores for three different types of naevus counts, as well as a multivariate lod combining all three counts. The strongest evidence for linkage was obtained for flat mole count at 2p and in the region of the CDKN2A gene on chromosome 9 (for both, lod=2.95, genome-wide P=0.08). Inclusion of fine mapping markers increased the lod for 9p to 3.43, and that obtained from SQTL to 3.27 (Figure 4). The highest lod score for raised mole count was 1.87 (genome-wide P=0.34) on chromosome 16 near D16S3068. For atypical mole count, there were three hits with lods of 2.20 (genome-wide P=0.39), 2.00 (P=0.48) and 2.00 (P=0.49) on chromosomes 1, 8 and X, respectively. These and lesser peaks of interest are listed in Table 2.

Figure 3
figure 3

Genome scan for flat, raised and atypical mole count. Variance components linkage test scores (expressed as P-values) for flat moles (red), raised moles (green), atypical moles (blue), and for the multivariate analysis of all three (black).

Figure 4
figure 4

Variance components linkage analysis lod curves (MERLIN) for flat mole count around peak lods on four best chromosomes. For chromosome 9 (panel a), the maximum likelihood and semiparametric (SQTL) linkage lod scores are shown: black solid line, VC using 5 cM resolution marker set; red solid line, VC using scan and fine mapping markers; red broken line, semiparametric analysis using scan and fine mapping markers. Each x-axis tick represents a marker.

Table 2 Prominent linkage peaks and nearest marker for flat (F), raised (R) and atypical (A) mole counts

Multivariate genome scan results

The multivariate linkage results (see Figure 3, black curve) are consistent with the univariate results, notably for chromosome 9, but do show a chromosome 4 peak that is not seen for any of the individual phenotypes. For the chromosome 9 peak, the trait specific QTL heritabilities were 22, 7 and 5% for flat, raised and atypical naevus counts, respectively, at Visit 1, and 20, 0 and 18% at Visit 2. A model holding the heritabilities of the three types equal was rejected. Ideally, we would obtain the empirical genome-wide P-value but for the multivariate case this would take many months of computational time so we are restricted to the empirical pointwise P-value for the peak marker D9S925 which was 0.005, equivalent to a lod of about 2.0.

The peak markers on chromosome 4 were D4S406 and D4S402, with an empirical pointwise P-value of 0.001, equivalent to lod∼2.6. The trait specific heritabilities were 10, 0 and 2% for flat, raised and atypical naevus counts at Visit 1 (1, 1 and 3% at Visit 2).

Discussion

Regions linked to macular naevus count

Our phenotypic twin analyses of these data show that 60% of the variance in flat mole count is due to polygenic additive genetic factors, 30% to common family environment and 10% due to unique environmental variance, including errors of measurement (which would contribute half of this29). The genetic correlation between the subcounts of different mole types was only moderate, suggesting one might expect linkage to different regions, as well as some loci in common to all three mole types.

Likely candidate genes under linkage peaks for these phenotypes would include traditional oncogenes and tumour suppressors. As we have controlled for sun exposure, ancestry, facial freckling and skin and hair colouring, we would not expect any of these peaks to represent pigmentation genes such as MC1R and OCA2. A simple one-hit model for nevogenesis, following Blewitt,15 is that all common naevi represent effects of a single somatic mutation (such as BRAFV600E) in a melanocyte. In the absence of DNA repair and surveillance mechanisms, we might then expect the appearance of approximately 2000 naevi per individual (given a final melanocyte population of 2 × 109 and spontaneous somatic mutation rate of 1 × 10−6). The far lower observed mean mole count probably reflects the usually high efficiency of DNA damage surveillance and repair, essential in melanocytes given their role in photoprotection and known resistance to apoptotic signals following UV exposure.30 Environmental exposures that affect the somatic mutation rate (such as UV exposure) and genetic variation in known and yet to be characterised tumour suppressor genes will explain both interindividual difference and the considerable familial correlations in mole count.

The most significantly linked regions for flat mole count were: the telomeric region of chromosome 2p; chromosome 9p in the vicinity of CDKN2A; around D8S373 on chromosome 8q24.3; and chromosome 17p11, in the vicinity of the CMT1A duplication.

In an earlier study using a subset of these families, we detected linkage of mole count to the microsatellite marker D9S942 close to CDKN2A.25 In the present superset of families, using the 5 cM resolution marker set, evidence for linkage to chromosome 9p21 using only the genome scan markers peaked at a lod of 2.95. However, replicating the original single-point variance components analysis using the present set of families and the marker D9S942 obtained a lod score of 3.44, similar to that from multipoint analysis including additional fine-mapping markers in and around CDKN2A (lod=3.42; with SQTL lod=3.23). This lod score is of the same magnitude as that reported by Zhu et al25 despite a doubling in sample size. However, detailed analysis of the mole count data demonstrates that a cube root transformation is more appropriate than the log transform we had used earlier and produces more conservative lod-scores.

The drop-1 lod interval on the tip of chromosome 2p spans approximately 6 Mbp and contains numerous genes of unknown function. The region does contain the p53-responsive gene 2 (PRG2), one alias for which is ‘melanoma associated gene 50’,31 but tissue expression data and the known functions of the Drosophila homologue, peroxidasin, do not support this as a likely naevus gene.

There are 105 recognised genes within the confidence region on chromosome 8 in SWISS-PROT, TREMBL, mRNA and/or RefSeq. These include some members of the mitogen-activated protein kinase (MAPK) pathway, PTK2 (protein tyrosine kinase 2 and ERK8.

Among the known genes underlying the central region (11.6–14.1 Mbp) of the chromosome 17 peak, there are two good candidates for moliness, MAP2K4 and ELAC2. Mitogen-activated protein kinase 4 (MAP2K4/MKK4/SEK1) lies at 11.86 Mb and is a potent physiologic activator of the stress-activated protein kinases implicated in breast and pancreatic cancer.32, 33 It is in the JUN kinase pathway which is related to the RAS/RAF/MAPK pathway which in turn is heavily implicated in melanoma.18, 19, 34, 35 The recent work of Solit et al36 suggests cells carrying a BRAF mutation usually retain sensitivity to growth inhibition via blocking of the RAS/RAF/MEK/ERK pathway. ELAC2 (at 12.83 Mb, in the middle of the peak), has been identified as an hereditary prostate cancer gene.37 Meta analyses have found significant association between variants in ELAC2 and prostate cancer risk,38, 39 although the risk genotypes may cause only 2% of the population risk. Significantly, there is epidemiologic evidence of a link between familial prostate cancer and melanoma.40, 41 The other obvious chromosome 17 candidate, TP53, is approximately 6 Mbp distal to the most telomeric marker D17S299 of that interval.

Finally, the peak observed on chromosome 4 in the multivariate analysis was unusual, in that there were no corresponding peaks in the univariate analyses. As in the case of other peaks, the multivariate signal was being driven largely by flat mole count (with a specific QTL heritability of 10%). A gene of possible interest in this region is TIFA (TRAF-interacting protein with a forkhead-associated domain), in that it lies almost directly under the peak, and modulates TRAF6, involved in NF-κB activation; NF-κB is upregulated in dysplastic naevi and malignant melanomas.42

Higher heritability not associated with higher linkage peaks

By contrast with the results for flat mole count, and despite a slightly higher heritability, the genome scan showed only weakly suggestive linkage peaks for raised mole count on chromosomes 16p pericentromerically, and on distal 7p and 2q, none of which overlap with those for flat naevi. These peaks do not overlie obvious candidate genes. It is not clear whether this represents a genuine difference between the genetic architectures of the development of macular versus papular naevi. There is surprisingly little known about the natural history of flat and raised naevi. It is believed that raised naevi represent a later stage of development of flat naevi, but based on relatively small studies. Even though the sample size used in this study was larger than in the majority of genome scans of any trait to date, it is still well short of those required to reliably detect loci of even moderate effect.

As this paper was in process of revision, Falchi et al62 published a similar genome scan study of naevus count in 865 adult twin families. Using 194 families (141 DZ) where the twins were under the age of 35, they found evidence of linkage of log total naevus count to chromosomes 9p (lod=2.54 at D9S157) and 9q (lod=2.55 at D9S167) and to chromosome 5q (lod=3.47 at D5S638). These peaks did not appear in the older families, where the best linkage was to chromosome 2p25 (lod=2.75 at WIAF-933).

In conclusion, ours is a large study using a complete genome scan in an attempt to map genes responsible for three different types of naevus in adolescent twins. The peak on 9p21 lies directly over CDKN2A, which we have previously implicated in control of variation in flat mole count,25 and which is now confirmed by Falchi et al.62 We also find linkage of naevus count to multiple other regions, some not previously linked to melanoma, which may contain novel melanoma risk loci. Continuing expansion of our sample along with a planned genome-wide association scan will clarify the identity and importance of some of these loci.

Materials and methods

Twin sample

Twins were recruited in the context of an ongoing study of melanoma risk factors including benign melanocytic naevi (moles), sun exposure time and pigmentation related variables. The clinical protocol has been described in detail elsewhere.25, 43, 44 Briefly, twins were enlisted by contacting the principals of primary schools (first 7 years of education) in the greater Brisbane area, media appeals and by word of mouth. It is estimated that approximately 50 percent of the eligible birth cohort were recruited into the study. Twins were examined at age 12 years, and siblings at the same occasion if under 20 years of age, as described below. At the same time, twins and their parents completed questionnaires measuring risk factors, which we supplemented by sun diaries recorded at age 13 years. The examination of the twins was repeated at age 14 years, but not in the siblings.

The sample appears representative with respect to naevus count25 and IQ45 and it seems reasonable, therefore, to suppose it also representative of the Queensland population. Informed consent was obtained from all participants and parents before testing. This report is based on phenotype data collected from the study inception in May 1992 to February 2004.

Phenotype information

A nurse examined each individual and counted all naevi on the entire body surface, excluding buttocks, chest and abdomen. The naevi were counted and classified into three types: flat (macular), raised (papular) or atypical. The flat and raised naevus counts have a distribution with a long tail to the right, so for most purposes they are cube-root transformed to stabilise variances and improve normality. As atypical naevi were infrequent, counts of these have been binned into four categories (nil, 1–2, 3–4, five or more) and analysed as an ordinal variable.

The covariates used in the QTL model included sex, age, body surface area, hours of sun exposure, year and season examined and ancestry. Pigmentation variables such as skin colour, hair colour and freckling on the face were also used as covariates in the model, as detailed in Zhu et al.25

Naevus count and genotyping were available for 424 twin families (355 DZ, 69 MZ). Genotypes were available for both parents in 308 of these families, for one parent only in 74 families, and for neither parent in 42 families. One or more extra siblings were both phenotyped and genotyped in 133 DZ families and, of necessity for linkage analysis, all 69 MZ families. The number of offspring with unique genotypes (ie omitting MZ co-twins) was 1024 (874 from DZ families, 150 from MZ families). An additional 221 pairs of MZ twins have phenotypic (but no genotypic) information available, and inclusion of this group in the variance components model allows us to partition and stabilise the estimation of the proportion of variance due to common environmental effects.46

Zygosity and genotype information

Zygosity of same-sex twins was determined by typing nine independent DNA microsatellite polymorphisms plus the sex marker amelogenin at QIMR using the Profiler multiplex marker set (AmpFLSTRR Profiler PlusT, Applied Biosystems, Foster City, CA). The probability of dizygosity given concordance for all markers in our panel was <10−4. All twins and available parents were also typed for ABO, Rh and MNS blood groups by the Red Cross Blood Service, Brisbane.

Genome scans were carried out at the Australian Genome Research Facility, Melbourne (AGRF) and Center for Inherited Disease Research, Baltimore (CIDR). After integrating both genome scans, the marker set consisted of 762 highly polymorphic autosomal and 34 X-linked microsatellite markers at an average spacing of 4.8 cM. The average heterozygosity of markers was 0.78, and the mean information content was 0.77. Full details of the genome scan are provided elsewhere.47

In the region around CDKN2A, since we had previously observed linkage to mole count25 and because of the plausibility of this gene as a candidate, we genotyped an additional 19 microsatellite (D9S104, D9S126, D9S1604, D9S162, D9S1678, D9S169, D9S1748, D9S1749, D9S1875, D9S2136, D9S319, D9S52, D9S736, D9S942, D9S958, D9S974, D9S975, D9S976, IFNA) and SNP (CDKN2A*-981G>T, rs3731249, rs3731249, rs3814960, rs2069426, rs115115, rs3088440, rs12353062, rs10967819, rs1360589, rs1333039, rs1333050, rs1360590, rs1412829, rs1537370, rs1537371, rs1537375, rs1537378, rs1547704, rs1556516, rs1591136, rs1853186, rs2106119, rs2157719, rs7032979, rs7853090, rs7866783, rs944797, rs944800, rs974336) markers.

Statistical methods

In preliminary analyses, we tested various transformations of the mole count phenotypes, and performed generalized additive mixed model regression analyses versus the measured covariates, including family as a random effect. These analyses were carried out using the R statistical package,48 and specifically the mgcv package.49

The genotype and pedigree data were tested for errors: Mendelian inconsistencies were checked using Sib-pair;50 GRR51 and Relpair52 were used to identify pedigree errors such as those due to sample mix up; potential genotyping errors were detected using MENDEL,53 MERLIN54 and SIBMED.55 The multipoint identity by descent (IBD) probabilities were estimated by using MERLIN for autosomal markers and MINX54, 56 for chromosome X markers. For the fine mapping markers around CDKN2A, we used the new ‘cluster’ approach implemented in MERLIN 1.0 to form haplotypes from markers in linkage disequilibrium before IBD estimation. Structural equation modelling analyses were performed using the Mx software package57 as described elsewhere.25, 47, 58 Both univariate (each mole count variable in turn) and multivariate (flat, raised and atypical mole count) variance components linkage analyses were performed, using MERLIN 1.0 (univariate), and Mx and MENDEL 6.0 (multivariate). The multivariate analyses incorporate measures from both visits by the twins, so there are up to six mole count variables in these analyses. The genetic and environmental covariances between mole counts at the two visits were left unconstrained, to allow for the presence of gene by age interaction. That is, we have fitted repeated measures models with random regression terms for visit, thus incorporating the data from both occasions, but allowing for the testing for genetic heterogeneity (which was absent).

We have also analysed these data using the recently described semi-parametric variance components linkage analysis approach, implemented in the program SQTL.59 In this approach, optimal transformation of the data to fit the underlying hypothesis of multivariate normality of the variance components analysis is included in the semiparametric likelihood evaluated.60

For the univariate genetic analyses, we calculated genome-wide significance levels based on analysis of 1000 gene-dropping simulated datasets generated by the MERLIN computer program. The highest lod score for each chromosome was kept and the number of false positives counted as per Kruglyak and Daly.61

The empirical thresholds for suggestive linkage (one expected false positive per genome scan) were ∼1.7 for flat and atypical mole count, and ∼1.4 for raised mole count. The thresholds for significant linkage (one expected false positive per 20 genome scans) were ∼3.3 for flat, ∼3 for raised and ∼3.65 for atypical mole counts. Production of genome-wide thresholds was not feasible for the multivariate linkage analyses, as they are computationally intensive, but we did estimate a point-wise empirical P-value for the highest peaks and other several comparison regions to test the adequacy of the asymptotic P-values generated assuming the test statistic was χ2 with 6 degrees of freedom. These simulations found the test statistic was better described as a χ2 variate with 8.9 degrees of freedom, and this interpolation has been used to adjust the plotted asymptotic P-values for the multivariate analysis.