Introduction

It is now 20 years ago that myc genes were discovered as the transforming sequences of chicken retroviruses. Subsequently cellular homologs (c-myc) were identified in different vertebrae species, including chicken, mice, and humans. Besides c-myc two additional genes, N-myc and L-myc, were described that are highly related not only in the coding sequence but also in their genomic organization. Since the discovery of myc genes a large body of work has been compiled that provides strong evidence for their potent role in tumor formation, both in experimental model systems as well as in humans. Well documented examples are the translocations of the c-myc locus into either the heavy chain or one of the light chain immunoglobulin loci found in Burkitt's lymphomas or the amplifications of the N-myc and L-myc genes observed in neuroblastomas and small cell lung cancer, respectively. In the above mentioned examples as well as in a number of other tumors, alterations at one of the myc loci are detected and thought to be of decisive character for cancer development. In addition the analysis of Myc proteins has revealed that they are overexpressed in the majority of human tumors, in many cases no genetic alterations have been found and thus the reason for these overexpressions are not clear. Together these findings demonstrate quite impressively that myc genes are among the most frequently affected genes in human tumors (for review see Marcu et al., 1992; Henriksson and Lüscher, 1996).

The initiation or stimulation of tumor formation is obviously not the normal function of Myc proteins but a reflection of its ability to stimulate cell cycle progression. This has been best studied for c-Myc, which will be the focus of the following discussion, but N- and L-Myc have comparable activities although specific aspects and/or the magnitude of the observed effects may differ. The ability of Myc proteins to affect the cell cycle is well documented (for review see Amati et al., 1998; Bürgin et al., 1998). Activation of Myc in resting fibroblasts is sufficient to drive such cells into S phase, in the absence of serum. This is paralleled by the stimulation of cyclin/cyclin-dependent kinase (CDK) activities. In addition, the constitutive expression of Myc in many cell types blocks their differentiation. These powerful functions of Myc proteins are likely the bases for its potent role in tumor development. Experimentally this interpretation is supported by different animal model systems that have demonstrated that the deregulated expression of Myc leads to an increase in number of the respective cell type, a decrease in their differentiation potential, and ultimately in increased tumor incidence. Myc has, however, not only been associated with increased growth but also with apoptosis. It appears that cells that express elevated levels of Myc are more susceptible to apoptosis induced by various stimuli (for review see Packham and Cleveland, 1995; Thompson, 1998). Thus Myc plays a dual role in the regulation of cell growth in that it stimulates cell cycle progression on one hand and on the other limits growth by sensitizing cells to apoptosis.

In support of a primary role of Myc proteins in the positive regulation of cell growth are the well documented expression pattern of myc genes and proteins (for review see Lemaitre et al., 1996). Cycling cells express one of the three Myc proteins and inhibition of expression leads to a severe reduction of growth. In contrast Myc protein levels are in general very low or not detectable in resting or differentiated cells.

Over the years many different molecular functions have been attributed to Myc proteins, including roles in DNA replication, splicing, nuclear structure, and gene transcription (for review see Lüscher and Eisenman, 1990; Facchini and Penn, 1998). In recent years the latter has attracted most attention and the available evidence suggests that Myc proteins function as transcriptional regulators. However a role of Myc in any of the other aspects has not been ruled out. Myc proteins possess a DNA binding/dimerization domain and a transactivation domain (TAD), both hallmarks of transcription factors. These two domains are central to Myc's function in cell growth control, differentiation, transformation, and apoptosis. While the principal functions of the DNA binding/dimerization domain have been unraveled in recent years, the TAD is still rather poorly defined and it is not very well understood how this domain mediates the transcriptional activity of Myc proteins.

With the realization that Myc proteins can function as transcription factors, different search strategies were undertaken to identify Myc-regulated genes. One hope was to define the molecular targets of Myc that are responsible to mediate its tumorigenic activity. The identification of such targets might be of prognostic as well as therapeutic value. A number of potential target genes have been identified that appear to be regulated by Myc (Grandori and Eisenman, 1997; Facchini and Penn, 1998; Dang, 1999). However whether these are critical targets to mediate tumorigenicity is not fully established. Thus it seems likely that additional Myc-regulated genes will be identified and it is thought that no single gene but the combined regulation of many genes by Myc will explain its role in cell growth control and ultimately in transformation.

In this review we will summarize the function of the DNA binding/dimerization domain, one of the two critical regions relevant for Myc biology. This domain is not only responsible for DNA binding but also mediates the interaction with several proteins that are believed to regulate Myc function.

Myc function requires Max

The two functionally relevant domains, the TAD and the DNA binding/dimerization domain, are localized at the N-terminal and the C-terminal ends, respectively, of Myc proteins (Figure 1a). The sequences in between these two domains are ill defined in terms of structural motifs and little is known about their functional relevance. The TAD, represented by roughly the first 150 amino acids, can mediate transcriptional activation when fused to a heterologous DNA binding domain and thus fulfils a basic criterion to be classified as transcriptional activation domain. Despite a number of proteins identified that specifically interact with the TAD it is not understood in detail how this domain mediates regulation of gene transcription (for recent studies see McMahon et al., 1998; Bush et al., 1998; Xiao et al., 1998; Boyd et al., 1998; and references therein).

Figure 1
figure 1

DNA binding by Myc/Max/Mad complexes and comparison of basic regions of E box binding proteins. (a) Schematic representation of a Myc/Max heterodimer. The structural elements in c-Myc and in Max are indicated: transactivation domain (TAD), acidic region (A), basic region (b), helix – loop – helix region (HLH), and leucine zipper (Zip). P indicates regions of in vivo phosphorylation. The bHLHZip domain is enlarged to indicate that the basic region and helix 1 as well as helix 2 and the leucine zipper form two α-helices connected by the loop region (Ferré-D'Amaré et al., 1993; Brownlie et al., 1997). The interaction of these four α-helices allows the precise positioning of the basic region to interact with the E box sequence 5′-GACCACGTGGTC. The bases that are specified by protein-base contacts in the Max/Max-DNA co-crystal are shaded. (B) Basic regions of E box-binding proteins. Sequences of the basic regions (numbered 1 – 13) of the indicated E box-binding proteins are displayed. Subclass A and B refers to proteins that preferentially bind to 5′-CAGCTG and 5′-CACGTG E boxes, respectively (Dang et al., 1992). Identical and highly conserved amino acids are indicated in dark and light green, respectively. The basic region sequences were derived from the following references: human c-Myc (Stanton et al., 1983), human N-Myc (Blackwood and Eisenman, 1991), human Mad1 (Ayer et al., 1993), human Mxi1 (Zervos et al., 1993), murine Mad3 (Hurlin et al., 1995a), murine Mad4 (Hurlin et al., 1995a), human Rox/Mnt (Meroni et al., 1997), human USF (Gregor et al., 1990), human TFE3 (Beckman et al., 1990), human TFEB (Carr and Sharp, 1990), human MyoD (Pearson-White, 1991), human myogenin (Braun et al., 1989), human AP4 (Ou et al., 1994), human E12 (Murre et al., 1989), human TAL1 (Chen et al., 1990), D. melanogaster da (Caudy et al., 1988). (C) Basic region sequences of Myc/Max/Mad family members. Sequences of the basic regions (numbered 1 – 13) of the indicated Myc/Max/Mad network proteins are compared between different species. Identical and highly conserved amino acids for each protein are indicated in dark and light green, respectively. The following sequences were used: human c-Myc (Stanton et al., 1983), murine c-Myc (Bernard et al., 1983), chicken c-Myc (Vennström et al., 1982), Xenopus laevis Myc (King et al., 1986), zebra fish Myc (Schreiber-Agus et al., 1993), sea star Myc (Walker et al., 1992), Drosophila melanogaster Myc (Gallant et al., 1996); human Max (Blackwood and Eisenman, 1991), murine Max (Prendergast et al., 1991), chicken Max (Sollenberger et al., 1994), Xenopus laevis Max (King et al., 1993), Drosophila melanogaster Max (Gallant et al., 1996), Caenorhabditis elegans Max (MXL-1) (Yuan et al., 1998); human Mad1 (Ayer et al., 1993), mouse Mad1 (Hurlin et al., 1995a), Caenorhabditis elegans Mad (MDL-1) (Yuan et al., 1998). (D) DNA binding by the Max/Max homodimer. The sequence of the basic region of Max is indicated [the amino acids are numbered according to their position in Max p22 (Blackwood and Eisenman, 1991)]. The three amino acids that have been shown to make base contacts in Max/Max-DNA co-crystal are indicated by filled arrow heads (Ferré-D'Amaré et al., 1993; Brownlie et al., 1997). Amino acids that make phosphodiester backbone contacts are indicated by open arrow heads. When drawn on a helical wheel the amino acids that make either base or phosphodiester backbone contacts are on one side of the α-helix. The three amino acids that interact with specific bases are shown in green. The panel on the right indicates schematically the specific base contacts established by His-28, Glu-32, and Arg-36 of one basic region and by His-28*, Glu-32*, and Arg-36* of the second basic region (Ferré-D'Amaré et al., 1993; Brownlie et al., 1997). The bases contacted by the two basic regions are shown in green and red, respectively

The DNA binding/dimerization domain is the part of Myc proteins that in its basic functions and in its structure is best understood. This domain is composed of three different elements, the basic, the helix – loop – helix, and the leucine zipper regions (bHLHZip) (Figure 1a). The bHLHZip domain is characteristic for a class of transcription factors binding to so called E box DNA recognition sequences with the core motif 5′-CANNTG (see Figure 1b). The function of this domain is to specify homo- or heterodimerization through the HLHZip region and interaction with DNA through the basic region. The realization that the HLHZip region of Myc might function as a protein-protein interaction surface stimulated searches for interaction partners. It was the discovery of Max as a heterodimerization partner of Myc that, after several years of little progress towards the understanding of Myc function, boosted the field into a new area at the beginning of the 1990s (Blackwood and Eisenman, 1991; Prendergast et al., 1991; Wenzel et al., 1991; Blackwood et al., 1992). Max is, like Myc, a bHLHZip protein and the HLHZip region is responsible for the specific interaction of the two proteins and enables the Myc/Max heterodimer to bind to a subset of E box recognition sequences (Figure 1a). Unlike Max, Myc is unable to form homodimers in vivo and can therefore not bind DNA sequence-specifically on its own. In an elegant series of experiments it was demonstrated that Myc function in transactivation, transformation, and apoptosis is dependent on the heterodimerization with Max (Amati et al., 1992, 1993a,b). Thus from these studies it appears that the minimal functional unit is the Myc/Max heterodimeric complex. Since Max does not seem to contain a transactivation domain, Myc is the part of the heterodimer responsible for activation of transcription through its TAD. Various forms of Max proteins have been identified due to alternative splicing with p21 and p22 (or Max and Max9, respectively, differing by a nine amino acid insertion in p22) being the predominant forms (Blackwood and Eisenman, 1991; Prendergast et al., 1991; Mäkelä et al., 1992; Västrik et al., 1993; Arsura et al., 1995). In contrast to Myc, Max proteins are ubiquitously expressed (for review see Henriksson and Lüscher, 1996).

At this point it is worthwhile to remember that Myc proteins are not the only dimerization partners of Max. Over recent years several other bHLHZip proteins, Mad1, Mxi1, Mad3, Mad4, and Mnt (or Rox) have been identified that heterodimerize with Max (Ayer et al., 1993; Zervos et al., 1993; Hurlin et al., 1995a, 1997; Meroni et al., 1997). Thus Max is the central component of the Myc/Max/Mad network of transcriptional regulators. Unlike Myc, Mad and Mnt proteins affect cell growth negatively and appear to antagonize Myc functions. This is based on the findings that Mad proteins, as far as analysed, are expressed predominantly in resting or differentiating cells which is in contrast to Myc proteins (Ayer et al., 1993; Ayer and Eisenman, 1993; Zervos et al., 1993; Larsson et al., 1994; Hurlin et al., 1995b; Larsson et al., 1997; Queva et al., 1998). In addition Mad proteins inhibit cell growth and interfere with the transforming function of Myc (Lahoz et al., 1994; Cerni et al., 1995; Chen et al., 1995; Hurlin et al., 1995a, 1997; Koskinen et al., 1995; Västrik et al., 1995; Roussel et al., 1996). Thus the Myc/Max/Mad network is composed of proteins that both positively and negatively affect different aspects of cellular growth and it is thought that this network plays a pivotal role as molecular switch between proliferation, differentiation, quiescence, and apoptosis.

Structural aspects of the bHLHZip domain

It is 10 years ago that HLH and Zip domains were identified as protein-protein interaction motifs (for review see Pabo and Sauer, 1992; Patikoglou and Burley, 1997). These exist in different classes of transcription factors that contain either a HLH, a Zip, or a combination of both, a HLHZip domain. From structural analyses as well as from protein modeling it has been suggested that two amphipathic helices connected by a loop and one amphipathic helix are formed by the HLH and Zip domains, respectively (Patikoglou and Burley, 1997). The determination of the structures of Max/Max homodimer-DNA co-crystals corroborated and extended these predictions (Ferré-D'Amaré et al., 1993; Brownlie et al., 1997). The Max/Max complex forms a left-handed, four-helix bundle. The two α-helical segments of each Max monomer are composed of the basic region plus helix 1 of the HLH region and helix 2 plus the Zip region, respectively. The two α-helices are connected by the loop. Thus the basic region and helix 1 as well as helix 2 and the Zip form contiguous α-helices.

With the identification of the bHLHZip in Myc and Max proteins as the likely domains responsible for interaction with DNA, binding site selection studies were undertaken to determine specific binding sites. A consensus sequence with the core 5′-CACGTG could be defined, which is referred to as Myc E box and evidence for additional preferred nucleotides flanking the core sequence was obtained whereby the sequence 5′-GAC CACGTG GTC represents a high affinity binding site in vitro (Blackwell et al., 1990, 1993; Halazonetis and Kadil, 1991; Kerkhoff et al., 1991; Prendergast and Ziff, 1991; Berberich et al., 1992; Fisher et al., 1993; Ma et al., 1993; Solomon et al., 1993; Sommer et al., 1998). The basic region of the bHLHZip domain makes base contacts and by this determines the specificity of DNA binding (Ferré-D'Amaré et al., 1993; Brownlie et al., 1997; see also below).

Two subclasses of E box binding proteins have been defined according to the identity of the central two nucleotides in the 5′-CANNTG sequence. Proteins of subclass A and B recognize 5′-CAGCTG and 5′-CACGTG, respectively (Dang et al., 1992). The specificity of Myc/Max and Max/Max dimers for the 5′-CACGTG subtype of E boxes places these proteins into subclass B which in addition contains several other transcription factors including USF, TFE3, TFEB, TFEC as well as the transcriptional repressors of the Mad family and Mnt. The subclass A contains factors like MyoD, myogenin, AP-4, E12, E47 and Tal1. As shown in Figure 1b, certain residues among the 13 amino acids of the basic regions of these proteins are conserved between the two subclasses. While positions 9 and 12 are invariable (Glu and Arg, respectively), positions 2 and 10 are highly conserved (Arg or Lys at either position). Noticeable is that in the 5′-CACGTG-binding subclass B proteins, additional amino acids are conserved at positions 5, 6, 8 and 13 but are variable in subclass A proteins. Amino acids at positions 1, 3, 4, 7 and 11 are non-conserved within the 5′-CACGTG-binding subclass B, but some of these can, however, be well conserved evolutionary for each protein, as exemplified by position 11 in c-Myc and position 7 in Max (Figure 1c), indicating that these amino acids may serve specific functions.

The two crystal structures of Max/Max homodimers bound to DNA that have been solved led to the identification of the amino acids involved in the interaction with specific bases of the Myc E box (Ferré-D'Amaré et al., 1993; Brownlie et al., 1997). Out of the 13 amino acids which constitute the basic region three appear to make specific base contacts. In Max p22 His-28 contacts the G at position 3′ of the E box core sequence, Glu-32 the C and A at positions 2 and 3, respectively, and Arg-36 the G at position 1′ (summarized in Figure 1d, the three amino acids are indicated by filled arrow heads). Thus Arg-36 (position 13 of the basic region) determines the 5′-CACGTG specificity. The relevance of this Arg is also supported by mutagenesis experiments (Dang et al., 1992) and by the analysis of the USF-DNA co-crystal demonstrating that Arg-212 (corresponding to Arg-36 in Max) also specifies the central CG nucleotide pair (Ferré-D'Amaré et al., 1994). In addition His-28 recognizes the G (position 4′) adjacent to the E box core sequence (Brownlie et al., 1997). Thus the specifically recognized sequence can be extended to eight bases. Each basic region of the homodimer contacts bases on both DNA strands and these specific contacts explain the identity of the central eight bases, including the core sequence, of the Myc E box. In comparison to the subclass A of E box binding proteins two aspects are of interest to note. First, the Arg at position 13 of the basic region which specifies the central two nucleotides in Max and USF and which is invariant in subclass B proteins is not found in subclass A proteins (Figure 1b and c). Second, the His at position 5 of the basic region of subclass B proteins which contacts the two Gs at positions 3′ and 4′ (Figure 1d), is also not conserved in subclass A proteins. These two differences in the basic region of the two classes of E box binding proteins are important to determine DNA binding specificity. Binding site selection studies have also indicated that the first G in the high affinity site (5′-GACCACGTGCACGTGGTC) is preferred. However at present it is not clear what the structural basis is for this preference.

In addition to the specific contacts summarized above, Lys-24, Arg-25, Asn-29, Arg-33, Arg-35 and Arg-36 of the basic region (Figure 1d, open arrow heads) interact with the phosphodiester backbone and thus are important to stabilize the interaction with DNA (Ferré-D'Amaré et al., 1993; Brownlie et al., 1997). As indicated in Figure 1d, all amino acids of the basic region that contact specific bases and/or the phosphodiester backbone of the Max/Max DNA binding site are exposed on one side of the α-helix facing the major groove of the DNA. Further contacts to the phosphodiester backbone are made by Lys-57 and Arg-60 which are located in the loop and at the beginning of helix 2, respectively. This indicates that not only the basic region but also the dimerization domain interacts with DNA.

Since the bHLHZip domains are highly conserved among all the Myc/Max/Mad network proteins identified to date, it is probably safe to assume that the structure of the different dimers are comparable. Thus the Max/Max structure can be used as a framework for a more detailed analysis of the function of amino acids not involved in heterodimerization. This is also relevant in view of several proteins that interact with the bHLHZip domain of Myc and their potential role in regulating Myc function (see below).

DNA binding specificity of Myc/Max/Mad network and other E box binding proteins

The findings summarized in the previous section define the structural bases for the specificity of DNA binding by Max/Max homodimers. The conserved nature of the amino acids in the basic regions of Myc/Max/Mad network proteins that contact the DNA (Figure 1b and c) suggests that the different dimeric complexes bind the same DNA elements. In addition to the proteins of the Myc/Max/Mad network several other transcriptional regulators, including USF1, USF2, TFE3, TFEB, TFEC, and microphthalmia can also bind to the Myc E box core sequence 5′-CACGTG (Atchley and Fitch, 1997, see Figure 1b). An important issue is therefore whether these proteins compete for binding to recognition sites in the same target genes and thus affect Myc-mediated gene regulation or whether there are mechanisms that discriminate between the binding of any of these factors to a given response element. This aspect is particularly relevant since the activities and/or the abundance of different complexes may differ widely. In this respect it is worthwhile to remember that while some of these factors are expressed in a tissue specific fashion, both USF and TFE3 are found ubiquitously. Of particular importance are USF complexes, since they are abundant and in EMSA experiments of cellular extracts represent the main Myc E box binding activity (Sommer et al., 1998; Sommer et al., submitted). Also while many of the factors mentioned can positively regulate gene transcription, different promoters may only respond to some but not all proteins. In addition proteins such as Mad and Mnt have repressing functions. Different possibilities can be envisaged how such target gene specificity among the CACGTG-binding proteins could be achieved. Potential models are depicted in Figure 2 and will be discussed below.

Figure 2
figure 2

DNA binding specificity of Myc/Max heterodimers and of other E box binding proteins. Besides the Myc/Max/Mad network proteins several other transcription factors bind to 5′-CACGTG E box sequences. It is largely unclear whether and how the binding of the different proteins capable to interact with this sequence is regulated. Different possibilities exist how differential binding and by this specific regulation of target genes could be achieved (see text for details). (a) Optimal binding sites may be bound by different complexes including Myc/Max, Max/Max, and USF/USF. These complexes might compete for such sites and thus the relative protein concentration would determine which complex preferentially binds. This might also be regulated by post-translational means. (b) Different flanking sequences might affect the affinity of different protein complexes to an E box. (c) Nucleotide substitutions in the core sequence as found in non-canonical sites might allow preferential binding of a subset of complexes. (d) Binding to DNA elements with more than one potential binding site may be facilitated by cooperative effects. These may occur either by direct interaction of the DNA binding competent factors or through bridging proteins. (e) Since cellular genes are most likely packed into chromatin the ability of the different complexes to interact with binding sites not freely accessible may influence binding specificity. (f) Modification of DNA may affect the binding of some but not of other transcription factor complexes. Thus such modifications (e.g. methylation) may limit the number of complexes that can bind and by this affect the regulation of genes. (g) The position of E boxes relative to the promoter as well as the quality of the promoter itself may determine whether bound protein complexes can affect the expression of a target gene

As described in the previous section, the three residues identified in Max that contact specific bases are conserved in all CACGTG-binding proteins (see Figure 1b – d, filled arrow heads). In addition five of the six residues in the Max basic region contacting the phosphodiester backbone (open arrow heads) are also conserved or very similar in these bHLHZip proteins. In summary, all amino acids that are relevant for DNA binding are conserved and thus from a theoretical point of view one might expect that the DNA binding affinities of the different subclass B proteins for the 5′-CACGTG core sequence are identical, or at least very similar (Figure 2a). Indeed, studies of the DNA-binding activities of c-Myc/Max, Mad/Max, Mnt/Max and USF in vitro suggest that all these complexes bind to a core recognition site with optimal flanking sequences with similar affinities (Sommer et al., 1998, Sommer et al., submitted; J Vervoorts and B Lüscher, unpublished observation). However, as will be discussed further in the next section, the relative binding of the different dimeric complexes to DNA is not only determined by their affinity but also by the stability of the various homo- and heterodimers. Early studies using bacterially produced, in vitro translated or immunoprecipitated/denatured/renatured proteins suggested that Max/Max homodimers bind the same sequences less efficiently than Myc/Max heterodimers, possibly due to less stable dimer formation (summarized in Lüscher et al., 1997). Several additional aspects may contribute to the observed differences. The protein preparations used in these studies may differ in respect to posttranslational modifications or their folding status. Also due to the difficulties to obtain DNA-binding competent Myc proteins, in several studies fragments of Myc were used consisting in essence of the bHLHZip domain. To overcome some of these disadvantages we have used whole cell extracts prepared under non-denaturing conditions of COS-7 cells transiently overexpressing the proteins of interest to study DNA binding of Myc/Max/Mad network proteins by electrophoretic mobility shift assays (EMSA) (Sommer et al., 1998; Sommer et al., submitted; J Vervoorts and B Lüscher, unpublished observation). These analyses demonstrated that Max/Max bound to DNA with similar specificity and affinity as c-Myc/Max, Mad/Max, and Mnt/Max complexes. Together these studies suggest that the different Myc/Max/Mad network complexes bind the 5′-CCACGTGG DNA sequence with similar affinities which is consistent with the prediction formulated above. Since this interpretation is based on in vitro studies it remains to be determined whether this also holds in vivo.

In addition to the core DNA-recognition sequence it is clear from studies of various E box-binding proteins that the nucleotide composition of the sequences flanking the core influence DNA binding affinity. Inspection of flanking sequences of E boxes either from DNA-fragments immunoprecipitated with Myc- and Max-specific antibodies from chromatin or of suggested Myc target genes show a preference for 5′-GC, 5′-CG or 5′-AG immediately preceding the core sequence (Grandori et al., 1996 and references therein). Also binding site selection studies have indicated that Max/Max homodimers, Myc/Max heterodimers and USF homodimers show slightly distinct optimal binding sites with respect to flanking sequences (summarized in Lüscher et al., 1997). For instance, an A or T nucleotide immediately preceding the core was suggested to disfavour Myc/Max binding but to be of less negative influence for Max/Max and USF binding (Figure 2b). The observed specificity for flanking residues between Myc/Max and Max/Max dimers is difficult to explain considering the sequence similarity of the basic region of the two proteins. Possibly the variable residues at positions 3 and 4 of the basic region might have a role in flanking nucleotide selection. An indication in support is that the amino acid at position 4 in USF was suggested to contact the phosphodiester backbone of flanking sequences (Ferré-D'Amaré et al., 1994). However, the residues at position 3 and 4 were not assigned any particular function in Max/Max homodimers, but since no crystal structure of Myc/Max heterodimers has been obtained it is difficult to judge whether these amino acids in Myc are relevant for the interactions with flanking sequences. Another possibility is that sequences N-terminal to the bHLHZip domain in Myc contribute to DNA binding specificity or affect the structure of the basic region and by this modulate DNA binding. Again, without structural information beyond the bHLHZip domain, these possibilities remain speculative. The binding site selection experiments showed also differences depending on the source of proteins used (i.e. bacterial, in vitro translated, or immunoprecipitated/denatured/renatured). The difference in specificity for the flanking T nucleotide mentioned above between Myc/Max and Max/Max dimers seemed less pronounced when analysing overexpressed proteins in COS-7 cells (Sommer et al., 1998; J Vervoorts and B Lüscher, unpublished observation). The specificity for flanking sequences seems to differ more extensively between Myc/Max and USF (Prendergast and Ziff, 1991; Blackwell et al., 1993; Bendall and Molloy, 1994), but again there appears little difference when the comparisons are performed with COS-7-derived native protein complexes (Sommer et al., 1998). Thus the in vitro studies summarized here are not sufficient to come to final conclusions regarding the binding site specificities of the different protein complexes. One approach which we might allow to resolve this issue is to compile in vivo binding sites using a cross-linking approach (Boyd et al., 1998).

Besides the high affinity Myc E box (5′-CACGTG), variations of the core sequence have been identified both by site selection studies as well as by chromatin immunoprecipitation experiments (Blackwell et al., 1993; Grandori et al., 1996). These include 5′-CATGTG, 5′-CACGCG, 5′-CATGCG and the heptamer sequence 5′-CAACGTG. These different binding sites are not bound equally well by Myc/Max/Mad network complexes and other subclass B proteins in vitro. Binding of Myc/Max, Max/Max and USF complexes to 5′-CACGCG is comparable to the consensus Myc E box, whereas 5′-CATGCG is a low affinity binding site (J Vervoorts and B Lüscher, unpublished observation). These differences in binding to non-canonical core sequences is unexpected considering the conservation of the amino acids in the basic region that contact DNA as discussed above. It is likely that interactions with flanking sequences also contribute to the affinity to non-canonical sites and these contacts may also vary among the Myc E box binding proteins. However since some of these low affinity sites were identified on DNA fragments immunoprecipitated with Myc- and Max-specific antibodies from chromatin (Grandori et al., 1996), the in vitro binding studies may not reflect precisely the in vivo binding behaviour. It is therefore an open question whether these alternative non-canonical core recognition sequences might be a basis for target gene specificity among Myc/Max/Mad and other Myc E box binding proteins.

Another possibility that has been suggested recently is that Myc/Max complexes bind cooperatively to adjacent E boxes in the ODC gene (Walhout et al., 1997, 1998). This is of interest since several other Myc target genes, including eIF-2a, eIF-4E, and cdc25A, possess two E boxes and since Max/Max or USF complexes did not show cooperative binding. Such cooperative binding is suggested to increase the affinity of Myc/Max complexes also to non-canonical sites (Walhout et al., 1997), which thus may be a mechanism for site selection (Figure 2d). However, cooperative binding of Myc/Max occurs only at certain sites, which seems to be dependent on flanking sequences and sequences between the two E boxes (Walhout et al., 1998). However using the above mentioned COS-7 expression system we have not been able to observe cooperative binding of Myc/Max heterodimers to the ODC gene sequence containing two E boxes (J Vervoorts and B Lüscher, unpublished observation). Thus these findings will require further studies to verify whether cooperative DNA binding is a relevant selection mechanism. It is also possible that additional proteins are required for Myc/Max/Mad proteins to recognize and/or to bind to specific sites. This remains to be determined.

A likely selection criterion in vivo is the accessibility of binding sites for Myc/Max/Mad and other Myc E box binding complexes which could be imposed by the state of the chromatin (Figure 2e). For both Max/Max and Myc/Max dimers binding to nucleosomes has been demonstrated (Wechsler et al., 1994). It is possible that the affinity for canonical and non-canonical core sequences as well as the dependence of flanking sequences will be influenced by chromatin structure. Therefore it will be interesting to determine the binding to different sites in the context of nucleosomes and how chromatin remodeling machines affect the binding of these proteins to E boxes. It is tempting to speculate that this interaction will be affected by Myc or Mad proteins themselves through the recruitment of chromatin remodeling proteins such as histone acetyltransferases or deacetylases (Kiermaier and Eilers, 1997). Modifications of DNA, such as methylation, may also determine target gene specificity (Figure 2f). The Myc basic region as part of a E12 chimeric protein was unable to bind to a 5′-CACGTG site containing a methylated C nucleotide at the central position, whereas USF did not discriminate between methylated and unmethylated core sequences (Prendergast and Ziff, 1991).

Several lines of evidence have shown that Myc proteins regulate a growing number of target genes by binding to E box DNA elements located in control regions of the respective genes (for review see Grandori and Eisenman, 1997; Facchini and Penn, 1998). However until recently no direct evidence was available demonstrating Myc binding to such E boxes in vivo. Using an in vivo cross-linking approach it was shown recently that Myc is bound to the portion of the promoter of the carbamoyl-phosphate synthase/aspartate transferase/dihydroorotase (cad) gene that contains an E box (Boyd et al., 1998). This DNA element was implicated previously in mediating the transcriptional response to Myc (Miltenberger et al., 1995; Boyd and Farnham, 1997). Interestingly this portion of the cad gene could also be identified by its binding to USF (Boyd et al., 1998). Since USF is more abundant than Myc it will be of interest to determine whether the exchange of USF complexes by Myc/Max complexes is regulated. The binding of USF to the cad gene appears not to be productive, since cad transcription correlates with Myc but not USF binding and since USF can not activate a cad reporter construct (Boyd et al., 1998). This suggests that the specificity of gene transcription through E box binding proteins is in part regulated by post-DNA binding mechanisms. In the majority of described Myc target genes the Myc E boxes are located down-stream of the promoter regions, in introns, 5′ untranslated, or even in coding regions. The positions of these elements seem to be important for the target gene specificity of Myc versus USF. Like for the Myc E box in the 5′ untranslated region of the cad gene, USF can readily bind the Myc binding site in the first intron of the α-prothymosin gene, another Myc target gene, in vitro. Although USF is a potent activator of transcription from nearby upstream positions, it was shown to be a poor activator from the downstream positions both in the α-prothymosin and the cad genes, whereas c-Myc activated transcription from these positions (Desbarats et al., 1996; Boyd et al., 1998). Therefore target gene specificity may be determined by the position of the E box relative to the promoter and possibly also by the promoter context (Figure 2g). In addition to the position of the E box, the core promoter itself may also determine whether USF can activate or not and thus this adds a further level of complexity that is not understood presently (Boyd et al., 1998).

Although DNA binding activity of bacterially expressed or in vitro transcribed/translated Myc/Max/Mad complexes was shown early on, the analysis of endogenous Myc/Max and Mad/Max complexes has been difficult (see discussion in Sommer et al., 1998). The protocol established using the COS-7 cell system allowed the unambiguous identification of the different complexes in cellular lysates by EMSA. Nevertheless only some of the expected endogenous complexes could be identified. Consistently the most abundant complex is Mnt/Max, while the Max/Max complex is weak (Sommer et al., 1998; Sommer et al., submitted). Myc/Max complexes could only be observed in lysates of cells that either overexpress Myc or that were harvested at time points of maximal Myc expression (e.g. early in G1 of serum stimulated fibroblasts) (Sommer et al., 1998; A Sommer and B Lüscher, unpublished observation). Until today under no circumstances have we seen endogenous Mad1/Max, Mxi1/Max, Mad3/Max, or Mad4/Max complexes. The most likely explanation for these findings is that the expression levels of the different Mad proteins as well as of Myc are too low to be detected by EMSA.

In order to increase the sensitivity of DNA binding, we developed a solid phase DNA binding assay (SODA). This assay uses immunopurified protein complexes as ligands for DNA, and specific DNA binding can be detected associated with Myc, Max, and Mad1 (Larsson et al., 1997). Thus although no Mad1/Max complex could be detected using EMSA in HL-60 cells, immunoprecipitates with antibodies recognizing Mad1 allowed us to measure specific DNA binding activity.

Comparing EMSA with SODA reveals obvious differences of the two assay systems. A disadvantage of SODAs is that non-specific binding activities are difficult to assess, while they are in general easily distinguished from specific activities in EMSAs. Also using antibodies that detect more than one dimer such as anti-Max sera will only reveal information of a population of complexes (which, however, may be useful under some circumstances). Interestingly we have not seen Myc/Max complexes in EMSAs that are larger than heterodimers. This is somewhat surprising since a large number of Myc interacting proteins have been described which are expected to affect the mobility of the Myc/Max complex. However it is not uncommon that multimeric protein complexes are not stable in EMSAs. For DNA binding analysis of potentially large complexes, SODA may be of advantage since conditions can be used that allow the preservation of such complexes. Thus DNA binding activities measured with the two assays may give different but complementary results.

Regulation of the c-Myc bHLHZip domain

The interaction of Myc with Max is, as discussed above, essential for the known functions of Myc proteins. Coimmunoprecipitation studies indicated early on that the majority of Myc is in a complex with Max (Blackwood et al., 1992). Mechanisms that modulate either DNA binding of the Myc/Max complex or dimerization of the two proteins will most likely profoundly affect the activity of Myc. This could be achieved by at least three different means, namely by competition for Max by other members of the Myc/Max/Mad network, by posttranslational modification of Myc or Max, or by other interacting proteins. These possibilities will be discussed in the following section.

c-Myc and other members of the Myc-family are primarly expressed in undifferentiated, proliferating tissues whereas Max and Mnt are ubiquitously expressed. In contrast, most of the Mad-family members are expressed in differentiated, non-proliferative tissues. Mad1 is upregulated during myeloid and keratinocyte differentiation in culture leading to a shift from predominantly Myc/Max heterodimers in undifferentiated, proliferating cells to predominantly Mad1/Max complexes in differentiated, quiescent cells (Ayer and Eisenman, 1993; Larsson et al., 1997; Queva et al., 1998). The analysis of DNA binding activities of Myc/Max and Mad/Max complexes during monocytic differentiation using the SODA technique revealed a similar shift in relative activity as determined by studying protein-protein interactions (Larsson et al., 1997). These observations have led to the hypothesis that the Myc/Max/Mad network may constitute a molecular switch where the prevalence of Myc- versus Mad-containing heterodimers determines cell fate. The network is therefore thought to be regulated primarily through the expression of its positive and negative components which would compete for the access to Max which is necessary for the DNA binding activity of both components. Since Max is an abundant protein in comparison with Myc and Mad in most cells, it is, however, unclear whether this competition occurs at the level of interaction with Max or rather manifests as a competition between Myc/Max and Mad/Max heterodimers for binding sites of target genes. As discussed above it also remains to be proven that these target genes are identical.

Myc proteins are modified by phosphorylation and glycosylation. While the relevance of the latter is not well understood, phosphorylation of the TAD modulates the transforming potential of Myc (Henriksson et al., 1993; Pulverer et al., 1994). This is most likely due to modulating the activities of the TAD but effects on DNA binding or interaction with Max can not be excluded at present. In addition several phosphorylation sites were identified in the C-terminal half of c-Myc (Lüscher et al., 1989; Lutterbach and Hann, 1997). Some of these sites are in close proximity to the basic region, the most prominent being phosphorylated by protein kinase CK2, but their mutational analysis has revealed so far no evidence for a regulatory role in Myc function (Street et al., 1990; Lutterbach and Hann, 1997). This is in contrast to similarly positioned phosphorylation sites in Max that affect DNA binding kinetics (Berberich and Cole, 1992; Bousset et al., 1993).

Despite these negative findings regarding a role of the phosphorylation sites adjacent to the basic region, Myc DNA binding is reduced in mitosis as compared to interphase which correlates with an altered phosphorylation pattern (Lüscher and Eisenman, 1992). More recently using SODA we have found that interferon-γ signaling affects the phosphorylation status of Myc which appears to result in a decrease of specific DNA binding due to a reduced interaction with Max (Bahram et al., 1999). Since the relevant sites have not been identified yet the precise molecular mechanism is presently not known. However these examples indicate that there is still a great deal to learn about the regulation of Myc by phosphorylation. In addition the analysis of Myc/Max/Mad network complexes during differentiation of HL-60 myeloid cells indicated that Max/Max/DNA binding might be regulated by post-translational means (Sommer et al., 1998). Together, little by little, evidence is accumulating that phosphorylation or other posttranslational modifications are involved in regulating the bHLHZip function of Myc and Max proteins.

Several proteins have been identified that interact with the C-terminal region of c-Myc. In a yeast two-hybrid screen with the transcriptional regulator Yin-Yang-1 (YY1) as bait Myc was identified as an interaction partner (Shrivastava et al., 1993). Although the region required in Myc to bind to YY1 was not mapped in detail, the finding that the interaction of Myc with Max and YY1 was exclusive suggested an involvement of the bHLHZip domain. Further analysis revealed that Myc and YY1 can be coimmunoprecipitated from cells and that the amount of c-Myc in complex with YY1 correlated with the expression of c-Myc (Shrivastava et al., 1996; Zhao et al., 1998; M Austen and B Lüscher, unpublished results). Since c-Myc overexpression was able to interfere with the activity of YY1 both as transcriptional activator and repressor in cotransfections (Shrivastava et al., 1993), a model was proposed where increased endogenous Myc levels as a result of mitogenic stimulation or oncogenic events would compete out positive YY1-interacting proteins through direct interaction with YY1, thereby leading to its inactivation and repression of YY1 target genes (Shrivastava et al., 1996; Zhao et al., 1998). Since YY1 is a much more abundant protein than Myc, the stochiometry of this competition remains to be explained. Since YY1 competes with Max for binding to Myc we asked whether YY1 could affect the function of Myc in transformation. YY1 inhibits efficiently Myc/Ras cotransformation (Austen et al., 1998). However this was not dependent on direct interaction but rather YY1 affected the TAD of Myc by an indirect mechanism. Thus YY1 interferes with Myc function in at least two ways, one by direct interaction and another by signaling onto the TAD of Myc. Further studies will have to address the nature of this signaling and the biological relevance of the YY1-Myc interaction.

The transcription factor AP-2 has been shown to bind to the bHLHZip domain of c-Myc but not of Max or Mad proteins (Gaubatz et al., 1995). Unlike YY1, AP-2 does not compete with Max for binding to Myc but nevertheless it inhibits Myc/Max DNA binding by an as yet unknown mechanism. AP-2 sites immediately adjacent to Myc-regulated E boxes have been found in the α-prothymosin and ODC genes (Gaubatz et al., 1995). However the interplay between Myc and AP-2 is more complex since recent evidence suggests that Myc can function as a coactivator of AP-2-specific gene regulation of the E-cadherin gene in epithelial cells (Batsché et al., 1998). This promoter seems to lack Myc E boxes, and the mechanism by which c-Myc supports AP-2 activity at this promoter has not been unravelled. Since the two studies summarized above are not fully compatible further work will be required to understand in more detail the functional relationship between Myc and AP-2.

Recently Myc was identified through a yeast two-hybrid screen as an interaction partner of the protein encoded by the breast cancer susceptibility gene BRCA1 (Wang et al., 1998). This protein binds to the HLH region of Myc and inhibits Myc-specific transactivation and the growth of Myc/Ras-transformed rat embryo fibroblasts (REF). In contrast SV40 transformed REF cells were not inhibited by BRCA1. It is not quite clear what the difference in sensitivity to BRCA1 of these two different transformed REF clones means considering that SV40 large T induced S phase progression requires Myc (Hermeking et al., 1994) and that the same regions of Myc are relevant for both transformation and S phase stimulation (for review see Henriksson and Lüscher, 1996). Nevertheless BRCA1 is a potentially important Myc interaction partner. BRCA1 is frequently mutated in breast carcinoma whereas the c-myc locus is amplified in a subset of these tumors. However BRCA1-linked carcinomas do not show c-myc amplifications (Rhei et al., 1998). One possible explanation is that BRCA1 and Myc lie on the same pathway and that its activation can be achieved either by inhibiting BRCA1 function or by overexpressing Myc.

An additional protein that binds to the Myc HLH region is the POZ domain protein Miz-1 that was identified in a yeast two-hybrid screen, an interaction which was also demonstrated in vivo after cotransfection of HeLa cells (Peukert et al., 1997). Miz-1 binds to transcriptional start sites including that of the adenovirus major late and cyclin D1 promoters and activates transcription. The protein was shown to be localized both in the nucleus and the cytoplasm. Miz-1 seems to lack a nuclear translocation signal and is probably transported to the nucleus by interacting with other proteins. Overexpression of Miz-1 in NIH3T3 or HeLa cells inhibits cell growth, apparently independent of its interaction with c-Myc. Coexpression of Myc leads to increased nuclear translocation of Miz-1, inhibits its ability to transactivate and to arrest growth, and renders Miz-1 insoluble, suggesting that Myc is a negative regulator of Miz-1.

In addition to its activation of transcription through E box elements, Myc proteins have been shown to repress the transcription of a number of genes including MHC class I, C/EBPα, cyclin D1, the adenovirus major late promoter, gadd45 and c-myc itself (for review see Faccini and Penn, 1998). The mechanism for this repression does not seem to be dependent on E boxes, and it is as yet unclear whether the effect is a direct or an indirect consequence of Myc-induced phenotypic changes. Many of the Myc-repressed promoters contain initiator sequences. Since some of the proteins interacting with the C-terminus of c-Myc mentioned above, YY1, Miz-1, and another suggested interaction partner of c-Myc, TFII-I (Roy et al., 1993) activate transcription through binding to initiator sequences, and since c-Myc can repress the function of these proteins, it has been proposed that Myc may repress transcription in a DNA-independent manner by targeting such proteins. Further research is clearly required to validate this hypothesis.

Outlook

The bHLHZip domain is central to Myc function since it provides the DNA binding activity. The structural studies on Max and in analogy of Myc have provided us with a detailed molecular knowledge how bHLHZip domains fold and interact with DNA. However we know comparatively little about how the TAD, and especially the sequences in between the TAD and the bHLHZip, which comprise the majority of the Myc protein, and how regulatory signals and interacting proteins affect the function of the Myc bHLHZip domain. To understand the potential role of these modulators and their networking in regulating the bHLHZip domain of Myc, we will require more sensitive assay systems to study DNA binding as well as more downstream functions of Myc. It is possible that due to the widely and routinely used overexpression systems, subtle regulatory effects can not be visualized. The recently described DNA binding conditions are a step into this direction. Also the use of cross-linking to preserve the in vivo binding pattern of proteins to DNA will undoubtedly shed more light on the complexity of genes targeted directly by Myc. An important issue for the understanding of the role of the Myc/Max/Mad network is to determine whether Myc and Mad bind the same, different, or overlapping target genes. The potential role of the Myc bHLHZip domain in other functions than E box binding such as the targeting and regulation of other non-E box-binding transcription factors or cofactors also remains to be elucidated. Thus it seems likely that with the recent advances made we will learn more about new and exciting aspects of Myc function.