Main

Human breast carcinomas have hetero-geneous pathologies and molecular profiles, and in this respect it is plausible that breast cancer might be more similar to haematological malignancies than to other common epithelial cancers. Indeed, unlike colon cancer or pancreatic cancer, in which in virtually all tumours mutation within a single pathway has a dominant role during tumour progression1,2,3, in breast tumours no single dominant pathway or histological presentation has emerged4. Characterization of the chromosomal aberrations, gene mutations and gene expression profiles of breast tumours has shown that breast tumorigenesis does not necessarily progress in a stepwise linear fashion from well-differentiated to poorly differentiated tumours5,6,7. Human breast tumours have been historically categorized into approximately 18 subtypes according to the histological features of the primary tumour at the time of diagnosis4. Features used in the classification of breast tumours include lesion size, cellular arrangement patterns and the presence of necrosis, as well as cellular features such as nuclear grade and mitotic index. Clinical responses of patients to therapy reveal that although this method of tumour classification has prognostic value, considerable heterogeneity in response to therapy still exists8. As a result there has been considerable effort in the breast cancer research community to identify biomarkers that more accurately predict patient outcome (see Brenton et al. for a review9).

The classification of human breast tumours on the basis of histological criteria is confounded by a number of factors. These include scoring subjectivity, regions of distinct morphologies within a given tumour and a general inexactitude in the classification of breast tumours. This is exemplified by the observation that 60–70% of all breast tumours are ultimately classified as invasive ductal carcinoma not otherwise specified (IDC NOS), also designated breast carcinoma of no special type (breast carcinoma NST). Another difficulty in the categorization and understanding of the origins of breast cancer has been the bewildering number of genes that are associated with the disease10 — it is uncommon for different tumours to share mutations in the same gene (with the exception of TP53 and PIK3CA (phosphatidylinositol 3-kinase, catalytic, α-polypeptide)). Also, there is a lack of understanding of the target cells involved. Unlike the haematopoietic system, the cellular hierarchy that is present in the mammary epithelium is still only partially understood, and so it has been difficult to relate a given mutation or cancer cell property to a specific type of breast epithelial cell. This situation is exemplified by the fact that breast cancer cell lines are commonly used as a model in breast cancer research, yet until recently there has been little regard as to the type of breast cell (for example, luminal or basal) under study.

Recent gene expression profiling by microarray analysis has offered a new way to classify human breast tumours11,12,13,14. Classifying them according to the levels of mRNA expression of specific genes typically identifies at least five reproducible subtypes: luminal A, luminal B, ERBB2, basal and normal-like. Correlating this type of classification system with the traditional method based on tumour histology has revealed that some tumours that are classified according to their morphology correlate with a particular gene expression subset, whereas others do not11,15,16. This new approach of categorizing breast tumours represents a paradigm shift in how we consider the origins and categorization of breast cancer. Indeed, gene expression patterns of luminal and basal cells in vivo were used to establish the definition of the original molecular portraits of breast cancers11,17. However, in order to make sense of this new classification system it is important to understand the cellular hierarchy that is present in the normal mammary epithelium. For example, some human breast tumours are classified as being luminal A and others luminal B, but is there a normal cellular counterpart of a luminal A or a luminal B cell, and if so, what is it?

Mammary epithelial cell hierarchy

The human mammary gland is a compound tubulo-alveolar gland that consists of two general lineages of epithelial cells: luminal cells and myoepithelial cells (Fig. 1a). Electron microscopic studies have long identified heterogeneity within the mammary epithelium, with a number of cells having an undifferentiated morpho-logy. Among these are a small number of basally positioned small undifferentiated electron-lucent cells (small light cells) that may represent a mammary gland stem cell (MaSC). The study of the stem cells in the human mammary epithelium has been hampered by the lack of a validated in vivo xenotransplantation assay that can be used to detect human mammary repopulating units (MRUs). However, in vitro colony assays have been used to detect progenitor cells within the mammary gland, including progenitors that have the ability to generate progeny of multiple lineages18,19,20. These multilineage progenitors, which are perceived to reside in a suprabasal position in the ducts of the mammary glands, have a keratin (K) 19+K14+EpCAMhighCD49f+MUC1SSEA-4high phenotype and are thought to represent MaSCs because they generate multilineage ductal lobular outgrowths in reconstituted basement membrane cultures20,21.

Figure 1: Proposed epithelial cell hierarchy and the respective cellular phenotypes present in the human and mouse mammary glands.
figure 1

Arrows indicate the potential target cells of oncogenic mutations of TP53 and BRCA1 (breast cancer 1) in the human and Wnt and Erbb2 in the mouse. ER, oestrogen receptor; PR, progesterone receptor.

Surrogate stem cell assays that are based on the ability of cells to generate clonal 'mammospheres' under anchorage-independent culture conditions have also identified cells that appear to have properties of MaSCs (multilineage differentiation potential and self-renewal)22. These mammospheres are highly enriched in multilineage progenitors, which suggests that a subpopulation of these progenitors could be a candidate for the MaSC. Unfortunately, the relatively short time period over which these various in vitro approaches can maintain proliferation (3 weeks) makes it difficult to conclude that the candidate cells identified are MaSCs, as opposed to multilineage progenitors that are at an intermediate position in the developmental hierarchy. Recent advances in xenotransplant systems offer the means to more definitively identify a human MaSC candidate that is defined by its ability to generate clonal multilineage outgrowths over extended periods of time in primary and secondary in vivo assays23,24,25.

In addition to multilineage progenitors, two types of luminal-restricted progenitors and one type of myoepithelial-restricted progenitor have been identified18,19,20. Luminal-restricted progenitors are perceived to reside in the luminal cell compartment of the ducts as they express the luminal marker mucin 1 (MUC1)19,20 and can be subdivided into two categories, those that are K19+K14 and those that are K19K14 (Ref. 20). Unfortunately, the respective contributions of these progenitors during the development and maintenance of the mammary ductal lobular epithelium is not well understood. A non-clonogenic luminal epithelial cell population can also be identified19. These cells have an EpCAM+MUC1+CD49f phenotype. Myoepithelial cells can be isolated on the basis of an EpCAMlowMUC1CD49f+ phenotype19,20.

In the mouse, cells that can generate ductal lobular outgrowths in vivo (MRUs) have also been isolated and enumerated26,27 (Box 1). The markers used to identify these MRUs include a lack of expression of haematopoietic (CD45 (also known as PTPRC) and Ter119) and endothelial (CD31; also known as PECAM1) markers; and include the expression of CD24, low levels of Sca-1 and high levels of α6- and β1-integrins (CD24+Sca-1lowCD49fhighCD29high). The high expression of CD49f (also known as ITGA6) and CD29 (also known as ITGB1) by the MRUs suggests that MaSCs have a basal position within the mammary epithelium. This and other data (Box 1) indicate that these MRUs meet the functional definition of an MaSC. The transplantation of donor cells at limiting dilutions into cleared fat pads provides an estimate of the frequency of MRUs among total mammary cells in young adult nulliparous mice to be about 1 in 1,400 total cells, which corresponds to about 1,400 MRUs per mouse mammary gland26.

Progenitor cells, which are defined as any cell with a proliferative capacity, can also be detected within the mouse mammary epithelium by the use of in vitro colony assays19,26,27,28,29. Interestingly, a large pool (>10,000 cells) of luminal-restricted progenitors within the luminal (CD24highCD49flowCD29low) cell compartment of the mouse mammary gland has been identified and these cells have a phenotype (CD24highCD49flowCD61+CD14+) that is distinct from that of the stem cells (Refs 26,29; J.S. and C. Watson, unpublished observations). A population of progenitors in the basal (CD24+CD49fhighCD29high) cell compartment of the mouse mammary gland has also been detected by colony assays; however, the phenotypic profile that allows their separation from MaSCs has yet to be determined27. Differentiated myoepithelial cells in the mammary gland can be purified on the basis of their CD24lowCD49f+ phenotype26,30.

ER and the mammary epithelium

The expression of the oestrogen receptor (ER) is the single most important predictive marker in anti-oestrogen-based therapies for the treatment of breast cancer, but surprisingly the distribution of the ER among cells of the mammary epithelial cell hierarchy is poorly understood. Stem cells in the mouse mammary gland do not express ER; instead, ER is expressed by approximately 40% of luminal epithelial cells31. Fluorescence-activated cell sorting of mouse mammary epithelial cells using CD133, CD24 and Sca-1 has revealed that the luminal cell compartment can be subdivided into two lineages: those cells that are enriched for ER expression (CD133+Sca1+CD24high) and those that are enriched for milk protein expression (CD133Sca1CD24high)30.

Interestingly, when assayed for progenitor content, the ER+ cell population is relatively deficient when compared with their milk-protein-expressing counterparts. Considering this, the milk-protein-expressing progenitor cells may represent alveolar precursors that are stimulated to divide during pregnancy. Another study, using CD61 as a marker of luminal-restricted progenitors, also observed the CD61 subpopulation as enriched for ER-expressing cells29. These observations, along with the observation that ER-expressing cells are rarely found to be dividing in both adult mouse and human mammary tissue, raises the possibility that ER+ cells represent a relatively differentiated cell that has a limited proliferative capacity32. However, the idea that ER+ cells are terminally differentiated cells is difficult to resolve considering the paradoxical observation that ER+ tumours contain proliferating ER+ cells that are responsive to anti-oestrogen therapies such as tamoxifen33. One explanation for this paradox is that ER expression and cell proliferation are not mutually exclusive, and that cellular intermediates that express ER and have a limited proliferative capacity exist. Another possibility is that there is a second type of ER+ cell that functions as a primitive progenitor in the luminal cell compartment and is difficult to detect in quiescent tissues34. A recent report in the literature suggests that this could be the case, as a cell that expresses ER and retains its template DNA strand during cell division has been identified35.

Asymmetrical DNA segregation during cell division is a property normally ascribed to stem cells36,37. Although these rare asymmetrically dividing cells are unlikely to represent true stem cells because they express ER (and thus are deficient in generating ductal lobular outgrowths in vivo31), they might represent a pool of primitive progenitors with stem cell properties (for example, self-renewal) in the luminal cell compartment. Further purification and characterization of these cells and other ER-expressing cells will be required to fully understand the oestrogen biology of the mammary epithelium and to fully understand the role of ER as a predictive marker in breast cancer.

Interpreting gene expression profiles

Classifying human breast tumours according to mRNA expression levels has identified at least five reproducible subtypes of breast cancer: luminal A, luminal B, ERBB2, basal and normal-like. A similar, but not identical, classification system has also been defined on the basis of protein expression using tissue microarray analysis38,39,40,41. Because cancer is essentially organogenesis gone awry, the cellular categories that are defined by these gene expression profiles might just reflect the different lineages and stages of mammary epithelial cell differentiation that are present in the normal environment. Unfortunately, the corresponding cellular equivalents in the normal epithelium are not known because gene expression analysis of subtypes of luminal and basal progenitor and non-progenitor cells have not been published. It would be interesting to determine, for example, whether luminal A tumours, which have a much better prognosis and higher levels of ER expression than luminal B tumours12,13,14,16,42,43, are composed of relatively well-differentiated cells of the ER+ lineage, whereas luminal B tumours have a larger component of primitive ER+ progenitors. This concept is supported by the fact that GATA3, a member of the gene cluster that defines luminal A tumours, is the key transcription factor that determines luminal cell differentiation from luminal-restricted progenitors in both embryonic and post-natal murine mammary glands29,44. Another interpretation of the gene expression signatures of tumours is that these represent novel signatures that do not have a normal cell equivalent, but instead are the unique gene expression pattern of transformed cells that have evolved over decades of tumour progression.

Methods of classifying human breast tumours that are based on gene expression profiles could theoretically allow the ontological history of the tumour to be determined. For example, this gene expression classification strategy could mean that each of these types of human breast tumour originates in one of five different types of cell and that the tumour generated is a reflection of that cell type. Indeed, a precedent for this paradigm — that the biology of a malignant cell is indicative of its non-transformed cellular progenitor — has been established for diffuse, large B-cell lymphomas, with one type expressing genes that are characteristic of germinal centre (lymphoid-tissue-residing) B cells and a second type expressing genes that are normally induced during in vitro activation of peripheral B cells isolated from the blood45. However, it could also mean that the tumours are initiated in a rare cell type such as a stem cell, with the specific combination of mutations driving the malignant clone down a specific differentiation pathway that is reflected in the tumour phenotype.

Another possible scenario is that the initial mutation occurs in the stem cell, but subsequent mutations in downstream non-stem cell progeny are required for the acquisition of the malignant phenotype. An mRNA analysis of a tumour cell population represents the average mRNA levels that are found in a heterogeneous population of cells. Although informative, the ultimate imperative in terms of obtaining a curative treatment is identifying and targeting the tumour stem cells. Whether or not each molecular subtype of human breast cancer is propagated by a tumour stem cell that has similar properties between all the subtypes or if each subtype has its own unique tumour stem cell remains to be resolved.

In the initial landmark paper by Al-Hajj and colleagues, the human mammary tumour-initiating cell, as detected by transplantation into immune-deficient recipient mice, had an EpCAM+CD44+CD24−/low phenotype in 8 out of 9 tumour specimens sampled46. Assuming that the in vivo engraftment followed single-hit engraftment kinetics and was efficient, the frequency of these tumour-initiating cells was <1% within the EpCAM+CD44+CD24−/low subpopulation, which itself is just a subpopulation of the total tumour cell population. Therefore, the frequency of tumour-initiating cells is very low among the total number of cells in the tumour. Similar observations have been made for various human tumour types, including acute myeloid leukaemia and colon, brain and pancreatic tumours47,48,49,50. Owing to the low frequency of these tumour-initiating cells, the gene expression profile of these cells is likely to be masked when tumours as a whole are analysed for gene expression. However, it may be that the detection of human tumour stem cells by transplantation into immune-deficient mice is inefficient and that the true frequency of these tumour stem cells within tumours is higher than originally perceived51. If this were the case, this could explain why the signature derived from breast cancer stem cells may have prognostic value52, although this is difficult to reconcile with the significant molecular heterogeneity of breast cancers and the difficulty in separating prognostic signatures from the ER status of tumours43. Regardless, it is imperative that strategies to purify tumour stem cells from each of the five molecular subtypes of breast cancer be developed such that meaningful gene expression profiles of these cells can be obtained.

Human breast tumour cell hierarchies

Stem and progenitor cells are believed to be the initial targets for malignant transformation. This is because most mutational events rely on DNA replication and cell division36 in cells that have the capacity to generate enough target progeny such that the probability of obtaining subsequent genetic mutations becomes likely, and the self-renewal properties of such cells can be harnessed to propagate the malignancy. As previously mentioned, human mammary tumour-initiating cells that were isolated from patients had an EpCAM+CD44+CD24−/low phenotype in most specimens46,53. The lack of expression of CD24 among these tumour-initiating cells suggests that their counterparts in the non-malignant epithelium are basally positioned cells, as CD24, like MUC1, is expressed solely by human luminal epithelial cells17. The EpCAM+CD24−/low phenotype is similar to that previously described for bipotent human mammary epithelial progenitors18,19,21. Similarly, the observation that human mammary tumour-initiating cells are possibly derived from a basal progenitor/stem cell correlates well with the observation that mouse MaSCs are in the basal compartment26,27. The question that now arises is whether or not all breast tumour-initiating cells are derived from a basal progenitor/stem cell. The limited study by Al-Hajj et al. is too small to come to any definitive conclusion regarding this; however, some convincing supporting data can be obtained from cell line studies.

Several studies analysing the gene expression profiles of breast cancer cell lines reveal that they can be broadly divided into luminal subtypes and two basal cell subtypes54,55. Flow cytometric analysis of luminal cell lines such as MCF-7 that is based on the expression of CD24 and CD44 reveals that these cell lines do not have a CD44+CD24−/low subpopulation of cells56. Considering that cancer cell lines from various tissues, including breast, have a stem cell component that is responsible for maintaining the line57,58,59, the observation that not even a small subpopulation of cells with basal characteristics can be detected in luminal cell lines suggests that the tumour stem cells propagating these luminal breast cell lines is probably derived from a progenitor in the luminal cell compartment.

The concept that a luminal progenitor cell could be the cell of origin of a subset of breast cancers is not really surprising considering the size of the progenitor cell pool in the luminal cell compartment in the mouse. The presence of a large pool of progenitor cells within the human luminal cell compartment has also been described20. Interestingly, in this study the basal compartment was shown to be relatively deficient for progenitors. This distribution of progenitors in the luminal cell compartment is consistent with the observation that the highest frequency of proliferating cells in situ in both rodent60,61,62 and human63,64,65 mammary glands is in cells that have a luminal cell morphology. A lower frequency of proliferation is observed in the basal compartment, with little cell division observed in cells with a differentiated myoepithelial morphology. Considering the strong link between cell proliferation and cancer risk66, and the observation that there is a very large pool of progenitor cells in the luminal cell compartment and a relatively small pool of stem cells in the basal compartment, raises the question of whether this is why most human breast cancers have a luminal cell phenotype. Although it could be argued that progenitor cells would require more mutational events to acquire a malignant phenotype than a stem cell, a substantially larger pool of these progenitor cells could compensate for this and thus make luminal cell-derived tumours possible.

Links with haematopoietic cancers

Several reports have shown that committed haematopoietic progenitors are potential targets for malignant transformation and can function as leukaemia-initiating cells67,68,69,70. For example, in a recent study by Krivtsov and colleagues71, committed granulocyte macrophage (GM) progenitors from mice, which in the normal state do not possess self-renewal properties, were transduced to express the mixed lineage leukaemia–AF9 (MLL–AF9) fusion protein, and the resulting cells were transplanted into syngeneic recipient mice. These transduced GM progenitors were able to induce acute myeloid leukaemia in the recipient mice and were able to undergo self-renewal as demonstrated by transplantation into secondary recipients. Interestingly, these MLL–AF9 GM progenitors retained a gene expression pattern similar to that of normal GM progenitors, rather than haematopoietic stem cells (HSCs). Evidence implicating the committed progenitors in leukaemia in humans comes from studies of acute promyelocytic leukaemia (APL). APL is characterized by a 15:17 chromosome translocation, which results in the fusion of the promyelocytic leukaemia (PML) and retinoic acid receptor-α (RARA) genes. Analysis of the HSC (CD34+CD38) and progenitor (CD34+ CD38+) populations by PCR to detect the PML–RARA fusion gene reveals that this fusion is not found in the HSC fraction but only in the more mature progenitor fraction. This result suggests that HSCs are not involved in the neoplastic process in APL72.

It is possible that an initial mutation occurs within a stem cell, with subsequent mutations occurring in downstream progeny and resulting in a more differentiated progenitor that functions as a tumour stem cell (reviewed in Ref. 73). Although there is no direct evidence to date to support this in human breast cancer, such a situation has been described in AML1–ETO-expressing acute myelogenous leukaemia (AML)74,75,76. This fusion transcript can be detected in the HSC compartment in patients who are in long-term (>10 years) remission, but these HSCs are not leukaemic as they display normal differentiation function. Considering that the leukaemic stem cells in AML do not express Thy1, a marker of normal HSCs, it suggests that the initial mutational event may occur in the stem cell compartment, and that further mutations in downstream progeny are required for the generation of leukaemic stem cells77. A similar situation has been described in human chronic myelogenous leukaemia (CML). In CML, the BCR–ABL mutation can be detected in HSCs, but this mutation is not leukaemogenic in itself. Instead, further downstream mutations that involve β-catenin signalling and self-renewal of GM progenitors are thought to be responsible for the disease progression78. In breast cancer, it has been observed that CD44+ and CD24+ tumour cells are clonally related as CD24+ cells, which are presumed to represent the luminal progeny of CD44+ tumour stem cells, show all the mutations that are found in CD44+ cells79. However, CD24+ cells can also undergo further clonal evolution as they sometimes contain additional genetic mutations79.

Emerging evidence from studies in the haematopoietic system shows that the stem cell compartment in humans is composed of cells that are variable in their proliferation and self-renewal properties. For example, when HSCs are marked by unique lentiviral integration sites and clonally tracked when implanted into immune-deficient mice, some HSCs immediately repopulate a primary host with more differentiated cells while also undergoing self-renewal, whereas other HSCs primarily undergo self-renewal divisions and only repopulate subsequent serially transplanted recipients80. An identical heterogeneity in stem cell self-renewal has also been described for leukaemic stem cells (LSCs) in human AML81. Further complexity within the LSC compartment is also observed as LSCs undergo clonal evolution over time82. Although such detailed studies using breast tumour tissue have yet to be carried out, experiments using normal mouse mammary cells have shown that MRUs are heterogeneous in the amount of cleared mammary fat pad they can fill on transplantation at limiting dilutions26,30,83. Variation in the number of self-renewal divisions that these MRU undergo during engraftment is also observed26. Considering all of this, a similar heterogeneity in cancer stem cells within individual breast tumours would not be surprising. If this were to hold true, it would have serious implications for the design of therapies to eradicate these cells because their heterogeneous nature may preclude a given therapy successfully targeting all the cells of the population.

Cellular targets of cancer mutations

In recent years empirical evidence has challenged the model that breast cancer progresses in a stepwise linear fashion from well-differentiated to poorly differentiated tumours5,6,7. For example, analysis of the gross genetic mutations that are present within tumours with different degrees of differentiation reveals that the long arm of chromosome 16 (16q) is almost exclusively lost in well and intermediately differentiated ductal carcinoma in situ (DCIS), but is rarely lost in poorly differentiated DCIS. This is despite the fact that poorly differentiated DCIS contains more genetic aberrations on average. Similarly, analysis of the invasive component that is adjacent to well-differentiated or poorly differentiated DCIS reveals that these invasive tumour cells have an almost identical genetic profile to their respective DCIS tumours. These results are significant as they suggest that well-differentiated and poorly differentiated invasive tumours are derived from well-differentiated and poorly differentiated DCIS respectively, and that these two types of tumour represent two independent progression pathways.

One potential source of heterogeneity in the evolution of breast tumours is that different mammary cells have varying susceptibilities to malignant transformation. In vitro studies have certainly demonstrated that different types of mammary cell culture have varying susceptibilities for lifespan extension and immortalization with different viral oncogenes. For example, early passage human mammary epithelial cell cultures are susceptible to lifespan extension by human papilloma virus (HPV) E7 (which binds and inactivates the retinoblastoma protein), whereas late passage cultures could be immortalized by E6 alone, which abrogates the p53–p21 checkpoint. Cultures derived from samples of human milk required both E6 and E7 for complete immortalization (reviewed in Ref. 84). Unfortunately, the heterogeneity and the exact constituents of these cultures preclude any identification of the cells that are preferentially immortalized in these experiments. Studies using more defined populations of cells show that the luminal-like cells with multilineage potential are more susceptible to telomerase activation and immortalization by the Simian virus 40 (SV40) large T antigen than their myoepithelial-like counterparts85.

Sequence and cellular context

The cells involved and the sequence of mutations that occur in the progression of different breast tumours have been difficult to determine because the cellular markers and quantitative functional assays to identify and detect different subsets of mammary epithelial cells have only recently been described. Only now is the mammary field adopting the methods long used by the haematopoietic field in which perturbations in stem and progenitor function are measured by quantitative in vivo and in vitro assays22,27,86,87,88.

Studies in mice have clearly shown that different oncogenes exert their influence in different cell subpopulations. For example, enhanced signalling of the Wnt signalling pathway, either through overexpression of mouse mammary tumour virus (MMTV)-Wnt-1 or gain-of-function Δ89β-catenin mutation, results in tumours that are composed of both luminal and myoepithelial cells and an expansion of the MaSC pool27,86,89. When these Wnt-1 tumours were induced in mice with a heterozygous Pten background, most of the Wnt-1 tumours generated showed a loss of heterozygosity of the Pten allele in both the luminal and myoepithelial cells of the tumour, thereby indicating that a loss of Pten occurs in cells that have multilineage differentiation potential89. This is in contrast to MMTV-Erbb2 (Neu) mice, which generate luminal cell-restricted tumours and display an expansion of the luminal cell compartment, although the effect on luminal progenitor cell numbers and function has yet to be rigorously studied (Refs 27,89; reviewed in Ref. 90). Interestingly, the tumour cells in these MMTV-Erbb2 tumours do not express high levels of Sca-1 (Ref. 89) and are ER negative91,92. This suggests that the target cell for malignant transformation is the milk-protein-expressing, steroid hormone receptor-negative, Sca-1low luminal progenitor cells described by Sleeman and colleagues30. Consistent with the concept that these milk-protein-expressing progenitors (alveolar progenitors?) are the cellular targets for malignant transformation in MMTV-Erbb2-driven mammary tumours is the observation that pregnancy promotes tumorigenesis in this model92. Another mouse model of Erbb2-induced tumorigenesis in which an activated Erbb2 oncogene is expressed under the endogenous Erbb2 promoter results in precocious acinar formation, which again suggests an alveolar progenitor is the target cell for malignant transformation93.

A bigger question is whether or not the MMTV-Erbb2 mouse tumour model accurately mimics ERBB2-overexpressing tumours in humans. Analysis of ERBB2 levels in a panel of human breast cancer cell lines reveals that ERBB2 amplification is scattered across both luminal and basal cell lines54. Although microarray analysis has demonstrated that ERBB2 amplification is typically associated with ER negativity, there is a subset of ERBB2 tumours that are ER positive. Also, tissue arrays have demonstrated that ERBB2-overexpressing tumours typically have a luminal (MUC1+) phenotype, but a rare subset of ERBB2 tumours can be composed of basal cells38. Although these results do not provide insight into the cell of origin of human ERBB2 tumours, they do suggest that ERBB2 amplification can transcend multiple lineages within the mammary epithelium.

Basal breast cancers

Approximately 15–21% of human breast tumours have a basal phenotype: a phenotype that has been associated with a poor prognosis11,12,13,94,95,96. This probably represents an oversimplification as basal tumours are heterogeneous117,118 and their poor prognosis has not been universally confirmed97. These basal breast tumours are characterized as being ER, PR, ERBB2, epidermal growth factor receptor (EGFR)+ and keratin 6+ and/or 17+. It has been proposed that basal tumours are derived from MaSCs because the gene expression profiles of these tumours are perceived to be similar to that expected of an MaSC98. Basal tumours are not thought to be derived from differentiated myoepithelial cells because most of these tumours do not express smooth muscle actin, which is a functional marker of differentiated myoepithelial cells99. Surprisingly, basal breast tumours commonly express luminal-associated keratins 8 and 18 in addition to basal keratins such as keratin 14 (Ref. 99), although it is not clear whether these tumours are composed of cells that co-express both basal and luminal keratins (which might suggest an expansion of a primitive population of cells) or are composed of two distinct lineages of epithelial cells. A strong prognostic marker of basal carcinomas is expression of keratin 6 (Ref. 99), a keratin that in both the normal mouse and human mammary epithelium is associated with colony-forming cells in the luminal lineage20,26, although the distribution of keratin 6 among MaSCs is still unknown. The high level of expression of keratin 6 among multilineage progenitor cells in the human mammary epithelium raises the possibility that these multilineage progenitors are the targets for malignant transformation in basal breast carcinomas. However, as previously discussed, gene expression profiling of a tumour cell population unfortunately does not necessarily permit inferences about the cell of origin of the tumour.

Why are p53 mutations common in basal breast tumours12, a tumour type that is perceived to be derived from MaSCs? It has previously been proposed that p53 might serve as a rate-limiting step in controlling the proliferative lifespan of MaSCs100. In the haematopoietic system, it has been shown that p53 dosage is inversely correlated with HSC function101. The mechanism controlling this is not known, although it has been speculated that p53 dosage might influence the self-renewal rates of HSCs. In the neural system, an inverse correlation between p53 levels and neural stem cell self-renewal has been described102. Considering this, it is tempting to speculate that a similar mechanism occurs in MaSCs, where the loss of p53 increases MaSC function (thereby expanding the MaSC pool) and results in an increased risk of tumour incidence in these cells. Consistent with the hypothesis that the loss of p53 regulates MaSC function is the observation that the luminal and myoepithelial lineage differentiation markers keratin 19 and smooth muscle actin are p53 target genes103,104 and that the loss of p53 in mammary tumours results in downregulation of these gene products105.

Loss of BRCA1 (breast cancer 1) is synonymous with basal breast cancer, although the reasons for this are unclear106. It has been suggested that the loss of BRCA1 from breast, and indeed ovarian, tumours is often seen because these tissues can survive for extended periods in the absence of BRCA1 (reviewed in Ref. 107). It has been postulated that BRCA1 might function as an MaSC regulator, possibly through a mechanism in which the loss of BRCA1 results in the loss of the ability of MaSCs to differentiate108. Certainly, conditional knockout of Brca1 in the mammary glands of mice demonstrates a proliferation defect during pregnancy109, but whether the deletion of Brca1 is exerting its effect at the level of the MaSC or a progenitor cell remains unknown. The predominance of p53 mutations in basal-like breast cancers may also explain why BRCA1 mutations are predominant in this type of cancer, as the loss of p53, which inhibits some apoptotic pathways, may result in the survival of BRCA1−/− cells. Synergy in tumour formation has been observed in Brca1-null mammary cells that have also lost Trp53 (Ref. 109). Similar results are observed when another genomic stability enzyme, Brca2, and Trp53 are deleted in a mouse mammary tumour model110,111.

Future directions

Delineating the mammary epithelial cell hierarchy is essential for providing a framework for determining the cellular targets of breast cancer mutations. In order to achieve this, better strategies to identify and purify stem, progenitor and the differentiated cells of the mammary epithelium must be developed. This is particularly important if meaningful gene expression profiles of these populations are to be obtained as current protocols to isolate mouse mammary stem cells result in purities of 5% at best and even lower frequencies for human breast tumour stem cells. Also essential in this process is the development of a xenotransplantation assay that supports the clonal growth and self-renewal of human MRUs. Several candidate assays look promising24,25 but remain to be validated. Also integral in this process is having an understanding of the limitations of the assays used to study these cells.

The strategies used to characterize normal stem cells are now being applied to tumour stem cells. A popular approach is to assay phenotypically distinct subpopulations that are isolated from transgenic mouse models for the ability to generate tumours in syngeneic recipients. Although informative, this approach warrants caution as some of these transgenic models rely on promoters that are functional only in specific, and sometimes undefined, cellular contexts, and could potentially force transgene expression into inappropriate cell populations. For example, the commonly used whey acidic protein (WAP) promoter, which relies on a forced pregnancy to induce transgene expression and targets alveolar cells and their precursors112, might miss entire subsets of mammary epithelial cells. Similarly, the commonly used MMTV promoter, which in young mice targets most mammary epithelial cells, is expressed in a more non-uniform mosaic pattern in older animals113.

It is without doubt that new strategies to purify mouse stem and progenitor cells, beyond what is currently achievable, will be developed and that accurate gene expression profiles of these cells will be obtained. As a result, specific cell differentiation stage genes that are more suitable to function as promoters in transgenic mouse mammary models will be identified. This requirement for more appropriate promoters is clearly evident as most transgenic mouse mammary tumour models result in ER tumours114, yet most human mammary tumours are ER+. In hindsight, the generation of ER tumours using WAP-promoter-driven oncogenes is not surprising considering that milk-protein-expressing progenitor cells are ER (Ref. 30). More difficult to explain is why other promoters do not allow the generation of ER+ tumours. For example, a recent publication describes the modelling of invasive lobular carcinoma in mice using K14cre;Cdh1F/F;Trp53F/F, which results in the recapitulation of this type of breast tumour histologically. However, although human lobular cancers are typically ER+, mouse tumours are apparently imposed onto an ER luminal cell lineage115. An alternative and complementary strategy to help to resolve these issues is to adopt the approach taken by the haematopoietic field (Fig. 2), whereby phenotypically distinct subsets of cells can be genetically manipulated ex vivo and assayed for altered stem and progenitor function. Such an approach will provide a direct method to help to determine the cell of origin in human breast cancer.

Figure 2: Strategy to decipher the cellular targets of different oncogenic mutations and different cancer stem cell phenotypes.
figure 2

Breast cancer research should adopt the approach taken by the haematopoietic field in which cells at specific stages of differentiation within the cell hierarchy can be isolated by flow cytometry and manipulated ex vivo and then assayed in vitro and in vivo to determine the influence on progenitor cell and stem cell function (a). This approach has recently been applied to study the influence of hedgehog signalling components and the polycomb protein BMI1 on human mammary repopulating unit (MRU) function87. Using such an approach, tumours could theoretically be reverse-engineered on different cellular backgrounds to determine which cell subtypes can be transformed with a given combination of genetic mutations and what type of tumour is generated with this cellular and mutation background. A complementary approach would be to isolate cancer-initiating cells from the different molecular subtypes of breast cancer and to compare the expression profiles of these cancer stem cells with the putative normal counterpart (b). CFC, colony-forming cells.

Determining the cellular targets of different oncogenic mutations and how these mutations promote the generation of tumour stem cells will be essential in understanding what is currently a bewildering disease. Of particular interest are the pathways, either acquired or maintained, that regulate tumour stem cell self-renewal, as the disruption of these pathways offers a rational therapeutic target to be tested in human clinical trials.