Main

Twenty-five years ago, the primary structure of the first identified G protein-coupled receptor (GPCR), rhodopsin, was published1,2,3. It was shown that this protein is a membrane-spanning protein that has the ability to transfer energy from light into intracellular signalling cascades, which enable us to see. Significant technical advances were made through the development of methods for radioligand binding, solubilization and purification of monoamine-binding GPCRs, and these further developed the concept of a GPCR superfamily4. The number of known GPCRs grew rapidly, and it became evident that this family of proteins could bind to a broad range of ligands, including small organic compounds5,6, eicosanoids7, peptides8 and proteins9.

Members of the GPCR superfamily are diverse in their primary structure, and this has been used for the phylogenetic classification of the family members. Attwood and Findlay made the first attempt to classify this family in 1993 when they developed sequence-based fingerprints of the seven characteristic GPCR hydrophobic domains10. These were subsequently used as diagnostic tools for identifying sequences belonging to the GPCR superfamily. They later extended their data set from 240 to 393 rhodopsin-like GPCRs from different species, and adopted the term 'clans' to describe the different GPCR families11.

In 1994, Kolakowski presented an important overview of the GPCR superfamily: the well-known A–F classification system12. Kolakowski included all the receptor proteins that were proven to bind G-proteins, while the remaining seven transmembrane (7TM)-spanning proteins were assigned to the O (Other) family. This system is also used by the International Union of Pharmacology, Committee on Receptor Nomenclature and Classification (NC-IUPHAR), with the exception that the frizzled receptors are referred to as a separate family instead of as part of the O family12,13. Bockaert and Pin introduced a similar but extended nomenclature system for classifying GPCRs in 1999, in which the GPCRs were divided into family 1–5 on the basis of structural and ligand-binding criteria14.

A more comprehensive view of the human GPCR repertoire was possible when the first draft of the human genome became available in 2001 (Refs 15,16). Many of the full-length GPCR sequences were collected through TBLASTN (translated nucleotide database using a protein query) and sequence hidden Markov model searches17, and this also provided several pseudogenes and partial sequences. Simultaneously and independently, Fredriksson and colleagues divided 802 (known and predicted) human GPCRs into families on the basis of phylogenetic criteria. This showed that most of the human GPCRs can be found in five main families, termed Glutamate, Rhodopsin, Adhesion, Frizzled/Taste2 and Secretin (shortened to the acronym GRAFS)18. The phylogenetic classification by Fredriksson and colleagues also provided a subgrouping of the large Rhodopsin family. The main difference between this nomenclature system and the former classification systems is the further division of family B into the Secretin family and the Adhesion family. It also contained an extended total number of unique classified receptor proteins and the inclusion of the recently discovered bitter taste 2 receptors (Taste2/T2Rs)18,19,20. These five major GRAFS families are dominant in terms of the number of genes in most Bilateral species21.

The total number of known and verified human GPCRs has continued to grow and now consists of at least 799 unique full-length members22 to which several new GPCRs with highly complex genomic structures — for example, additional members of the Adhesion GPCRs and divergent Rhodopsin GPCRs — have recently been added. In this Review, we provide an overview of the GPCR families and their structural features. We present this in the context of structural studies on the role of domains and specific residues in both the N-terminal and the core 7TM regions of the different GPCR families and discuss their potential as future drug targets.

In this article, we will refer to the GPCR families using the GRAFS nomenclature system together with the Kolakowski/NC-IUPHAR extended nomenclature system. The GPCR families are written in italics with an initial capital letter (Rhodopsin, Secretin, Adhesion, Glutamate, Frizzled/Taste2). This should clarify the instances when we are discussing the GPCR families and thus avoid possible confusion with, for example, the secretin receptor or rhodopsin. The names/abbreviations of the receptor proteins are used in agreement with the human gene symbol or the official IUPHAR nomenclature of the NC-IUPHAR13, or otherwise indicated.

The Rhodopsin receptor family/class A

The Rhodopsin receptor family is the largest family of GPCRs and contains 670 full-length human receptor proteins22. The family can be further divided into four groups — α, β, γ and δ — in which the largest cluster of members, the olfactory receptors, is found in the δ-group18,23. The Rhodopsin family of GPCRs is highly heterogeneous when both primary structure and ligand preference are considered. The diversity is not found in their N termini, where most receptors have only a short stretch of amino acids, but within the TM regions, although most Rhodopsin family receptors do share specific sequence motifs within the 7TM regions (Fig. 1).

Figure 1: Conserved features and structural motifs within the Rhodopsin receptor family/class A.
figure 1

The upper part of the figure illustrates the differences within the secondary structure of the N termini of the Rhodopsin receptors. The scissor indicates the cleavage site for the protease-activated receptors (PARs). In the lower part of the figure, the schematic transmembrane (TM) regions display the consensus of an alignment generated in ClustalW 1.82 (Ref. 233) of eight diverse human Rhodopsin receptors. The eight selected receptors represent the four subgroups of the Rhodopsin family: α, β, γ and δ. The α-group comprises rhodopsin and cannabinoid receptor 2. The β-group comprises neuromedin U receptor 2 and endothelin receptor type B. The γ-group comprises bradykinin receptor 1 and interleukin 8 receptor-α (CXCR1). The δ-group comprises purinergic receptor P2Y8 and coagulation factor II (thrombin) receptor. Residues conserved in all eight sequences are displayed as circles in which conserved aliphatic residues are shown in beige, polar in orange, aromatic in purple, positively charged in red and negatively charged in blue. The positions of the residues are calculated from the TM boundary (established by Palczewski et al.24) starting with 1 in the N- to C-terminal direction. Numbers in italic correspond to the first position in each TM region of rhodopsin. Conserved sequence motifs found in the TM regions of the Rhodopsin receptor family are surrounded by blue boxes. Uppercase letters indicate completely conserved positions, lowercase letters indicate well-conserved positions (>50%), whereas x indicates variable positions. Conserved cysteine residues are pictured as yellow circles and the cysteine bridge between the extracellular loop 1 and 2, which is common to most G protein-coupled receptor (GPCR) families, is indicated by two straight lines. Dashed black lines visualize hydrogen bonds within bovine rhodopsin in which dashed blue lines show the postulated 'ionic lock'24. Dashed red lines display van der Waals interactions within the β2-adrenoceptor (ADRB2) model28. FSHR, follicle-stimulating hormone receptor; LDLa, low-density lipoprotein receptor class A domain; LGR, leucine-rich repeat-containing GPCR; LHCGR, luteinizing hormone receptor; LRR, leucine-rich repeat; PARs, protease-activated receptors; TSHR, thyrotropin receptor.

In 2000, Palczewski and colleagues presented the first crystallized high-resolution structure of a GPCR: the bovine rhodopsin model24. This study confirmed that the seven helices are arranged in an anticlockwise manner when seen from the extracellular side of the membrane24,25. It is also evident from the X-ray studies that several of the conserved residues within the Rhodopsin family form interhelical networks that play a central role in the stabilization and activation of rhodopsin24 (Fig. 1). Besides these motifs, which are common to most Rhodopsin GPCRs, studies of ligand-interacting residues have identified unique patterns of conserved amino acids for each ligand–receptor complex. This, together with incorporation of ligand information from related receptors, can provide a pharmacophore-based approach to optimize high-throughput screening26. Most structure-based design, however, relies on a high-quality three-dimensional computational model of the ligand pocket of the GPCR.

So far, most computational models of GPCRs are homology models that are based on the coordinates of the bovine rhodopsin receptor. However, recent data on the high-resolution structure of the β2-adrenoceptor (ADRB2) provides a second model for Rhodopsin GPCRs, which has highlighted the challenge of using only the rhodopsin model as a template in homology modelling27. The two structures diverge primarily in the TM1, TM3, TM4, TM5 and TM6 regions. A major difference is the lack of a proline-induced kink in the TM1 region of ADRB2, which is present in the rhodopsin receptor27. Moreover, retinal is covalently bound to rhodopsin, whereas ADRB2 binds to diffusible ligands27. Comparison of the inactive (dark state) rhodopsin24 with ADRB2 bound to the inverse agonist carazolol, which still displays a basal activity28, also provides insight into the activation process of GPCRs. In rhodopsin, the structure is thought to stay in the inactive form through the 'ionic lock' between R135 (the R in the DRY motif in TM3) and E247 (TM6), whereas these interactions are not possible in the ADRB2–carazolol model28 (Fig. 1). However, the ADRB2–carazolol structure is thought to stay in a less active form through van der Waals interactions between L272 (TM6) and I135 (TM3) and Y219 and V222 (TM5)28. The crystallization of ADRB2 increases the possibility of generating models of other Rhodopsin GPCRs by combining the information of these two known three-dimensional structures27. In addition, the successful crystallization approach might be used to speed up crystallization of other GPCRs.

As previously noted, most receptor proteins within the Rhodopsin family of GPCRs have short N termini without any common conserved domains. However, there are some exceptions (Table 1). The human thrombin receptor (PAR1/F2R) has an intrinsic cleavage site in the N terminus, which, upon cleavage by thrombin, reveals a tethered ligand that is able to activate the receptor29 (Fig. 1). This report was followed by the publication of three similar human receptors that also display a protease-dependent activation mechanism: the protease-activated receptors 2–4 (PAR2–4)30,31,32.Thrombin binds and activates PAR1, PAR3 and PAR4, whereas PAR2 is targeted by trypsin (for a review, see Ref. 33). We therefore count the thrombin-cleaved PARs as drug-targeted GPCRs, although technically heparin inhibits the actions of thrombin and not the receptors directly. Moreover, the relaxin-binding GPCRs LGR7 and LGR8 (leucine-rich repeat-containing GPCRs) have a low-density lipoprotein receptor class A domain in addition to the leucine-rich repeat region that all five LGRs contain34,35 (Fig. 1). The leucine-rich repeat-containing region can also be found in the follicle-stimulating hormone receptor (FSHR), the luteinizing hormone receptor (LHCGR) and the thyrotropin receptor (TSHR) (Fig. 1). For these receptors, the leucine-rich repeat-containing region is responsible for parts of the interaction between the glycoprotein and the receptor36. However, most Rhodopsin receptors are primarily activated by interactions between the ligand and the TM regions and extracellular loops owing to their short N-terminal stretch of amino acids14.

Table 1 A summary of properties for the G protein-coupled receptor (GPCR) families

The Rhodopsin members bind a vast variety of ligands, such as peptides, amines and purines, and the family also contains the largest number of receptors that are targeted by clinically used drugs37. Interestingly, there is no overall correlation between the phylogenetic location of a receptor (that is, to which phylogenetic group the receptor protein belongs) and the type of endogenous ligand that the receptor binds. For example, peptide-binding receptors are found in all four groups of the Rhodopsin family18 and the receptors that bind lipid-like compounds are found in at least three of the phylogenetic groups. There are, however, some phylogenetic clusters of receptors that bind similar types of ligands. The largest is the amine-binding cluster in the α-group and all the known ligands to the receptors in the β-group are peptides. The nucleotide-binding receptors (P2YRs) and the glycoprotein receptors are both confined to the δ-group18. Moreover, there is a considerable overlap between the phylogenetic location and ligand preference based on critical antagonist cavity-lining residues38.

The α-group contains at least 18 important drug targets: the histamine receptors 1 and 2; the dopamine receptors 1 and 2; the serotonin receptors 1A, 1D and 2A; the adrenoceptors 1A, 2A, B1 and B2; the muscarinic receptor 3; the prostanoid receptors TP, EP1, EP3, IP1 and FP; and the cannabinoid receptor 1 (CNR1) (for reviews, see Refs 37,39). Drugs that target these receptors include the widely used antihistamines, antacid drugs, cardiovascular drugs and antipsychotics. These receptors generally bind the ligand within a pocket embedded in the TM cavity, and in the prototype of the amine-binding receptors — ADRB2 — this involves the TM3, TM5 and TM6 regions40,41,42. It is likely that more drugs that target the biogenic monoamine receptors will be developed. However, the main problem in the development of such drugs is the potential for cardiovascular side effects due to off-target interactions with the adrenoceptors, which are highly expressed in many tissues and are important in both heart rate and blood-pressure regulation. Such side effects are perhaps not as likely for the α-group receptors, which are less similar to the amine receptors. For example, the prostanoid receptors, which are targeted in the treatment of glaucoma and ulcers, and CNR1, which is targeted for the treatment of obesity43.

The β-group of Rhodopsin GPCRs includes mainly peptide-binding receptors, and marketed drugs for this type of receptor include endothelin, gonadotropin-releasing hormone and oxytocin receptor ligands37. Most of the peptide ligands in this group bind to a binding pocket within the TM regions with participation of the extracellular loops and the N terminus. This group also includes several receptors that have been heavily pursued as drug targets, including the neuropeptide Y receptors. The specificity of the binding profile of peptide-binding receptors is often high; however, neuropeptide receptors participate in many physiological functions, which complicate their potential as drug targets. It is also challenging to find agonists for peptide receptors, as the flexible peptide ligands use a multitude of interaction sites to convey their signals, and small molecules are seldom able to mimic the interactions required to induce a full agonistic signal.

The γ-group includes receptors for both peptides and lipid-like compounds18. The three opioid receptors, somatostatin receptor 2 and 5 (SSTR2 and SSTR5) and angiotensin receptor 1 (AGTR1) represent important drug targets within this group37. AGTR1 is targeted by antagonists to treat hypertension, whereas the opioid receptors are targeted in the treatment of pain, cough and alcoholism, and are also involved in the abuse of opioids such as heroin. Also, the chemokine receptors in the γ-group represent an interesting group of receptors for drug targeting because of their importance in acute and chronic inflammation. So far, only one drug that targets chemokine receptors has received regulatory approval: maraviroc, an antagonist of chemokine (C–C motif) receptor 5 (CCR5) that was approved in 2007 for the treatment of HIV253. This receptor is used by some strains of HIV as a co-receptor during viral entry, and maraviroc inhibits this process. Several other chemokine receptor modulators are in early stage clinical trials44.

The δ-group mostly contains the P2RYs, the glycoprotein-binding receptors, the PARs and the olfactory receptors18. P2Y12, leukotriene receptor 1 (Ref. 37) and the PARs represent important drug targets within this group. The other 11 P2RYs could also turn out to be important targets as their specific functional roles become clearer. All three glycoprotein receptors (FSHR, TSHR and LHCGR) are targeted by recombinant peptides37.

Recently de-orphanized Rhodopsin GPCRs. Between 1990 and 2004, many GPCRs were paired with their endogenous ligands. Examples include ghrelin, orexin, prolactin-releasing peptide, metastatin, neuropeptide B/W, neuropeptide S, melanin-concentrating hormone, neuromedin U and neuropeptide FF45. After 2005, the rate of GPCR de-orphanization has markedly declined and focus has shifted from peptide ligands to lipid-like compounds. For example, the atypical Rhodopsin receptor GPR119 was found to bind oleoylethanolamide, an endogenous lipid that reduces food intake and body weight gain in rats46. The receptor protein GPR120, which was identified in the same analysis as GPR119 (Ref. 47) was later shown to bind unsaturated long-chain fatty acids48. Activation of GPR120 stimulates secretion of glucagon-like peptide-1, an important mediator in insulin release39. In addition, oestrogen binds to GPR30 (Ref. 49), and kynurenic acid, a metabolite in the metabolism of tryptophan, interacts with GPR35 (Ref. 50). Also, GPR87, a member of the δ-group of Rhodopsin GPCRs, was recently shown to bind lysophosphatidic acid51.

Currently, the number of GPCRs that are known to bind lipid-like compounds is similar to the number of GPCRs that bind amines52. The progress of GPCR de-orphanization can be followed on the IUPHAR homepage (http://www.iuphar-db.org/). There are still 60 Rhodopsin GPCRs left to be de-orphanized22, although there is a possibility that some of them lack an endogenous ligand. A recent study indicated that orphan GPR50 is able to inhibit signalling of melatonin receptor 1 through heterodimerization53, prompting speculation that some orphan GPCRs are not capable of binding any endogenous ligands, but instead regulate the function of non-orphan GPCRs through heterodimerization or other mechanisms. The two receptors that form the well-characterized γ-aminobutyric acid B receptor (GABABR) heterodimer have distinct functions. GABABR1 is involved in ligand-binding, whereas GABABR2 functions as the signalling unit54,55,56, technically implicating GABABR2 as an orphan receptor in the heterodimer. The TAS1R1–TAS1R3 and TAS1R2–TAS1R3 heterodimers also function as heterodimers between an orphan and a non-orphan receptor57,58,59,60,61. However, the hypothesis that some orphan GPCRs may not have any endogenous ligand may be difficult to establish; for example, it is difficult to experimentally prove that GPR50 does not have any endogenous ligand.

Olfactory receptors. The first mammalian olfactory receptor was cloned from dogs in 1989 (Ref. 62), closely followed by the cloning of rat and human homologues63,64. Buck and Axel anticipated that the final number of mammalian olfactory receptors would be close to 1,000 genes and several post-genomic studies proved this prediction correct63,65,66,67,68. According to a recent publication, the human genome contains 388 potentially functional olfactory genes and 479 pseudogenes22. The functional olfactory receptors can, based on phylogeny, be divided into class I and class II67,69 in which class II can be further divided into 19 clades (A–S)67. All 388 olfactory receptors are intronless and between 320 and 370 amino acids in length. They are also most conserved in the beginning of TM2, the end of TM3 (including the Rhodopsin family motif DRY70) and the end of TM7 (including the Rhodopsin family motif NPxxY)67,68. The binding site of olfactory receptors has been localized to the most extracellular parts of TM3, TM5, TM6 and TM7 together with extracellular loop 2 on the basis of high conservation among orthologues and variability among paralogues71. Moreover, the importance of these areas in olfactory binding has also been highlighted in ligand receptor docking studies and site-directed mutagenesis72,73,74.

An olfactory receptor can be activated by a broad range of odorants and an odorant can bind to several olfactory receptors. This introduces the concept of combinatorial receptor codes for every odorant, in which each odorant binds and activates a unique set of olfactory receptors75. In this way, the human repertoire of olfactory receptors can distinguish between an enormous number of odorants. Different sets of olfactory receptors discriminate between odorants based on the concentration, the length of the carbon chain and the functional group of the odorant. Alcohols are usually described as pleasant, whereas carboxylic acids are described as repulsive75. Some human olfactory receptors have been de-orphanized: OR17-40 (HsOR17.1.11, OR3A1) responds to helional and heliotroplyacetone76; OR17-4 (HsOR17.1.2, OR1D2) binds bourgeonal and undecanal77; OR43 responds to citronellal78,79; and OR17-209 (HsOR17.1.4, OR1G1) binds esters, whereas OR17-210 responds to ketones80. Tremendous work remains to de-orphanize the other olfactory receptors and only a few of the receptors with known ligands have been studied in detail. However, it is clear that these receptors evolve at a rapid pace81 considering the high variation in gene numbers across mammalian species. The potential for olfactory receptors as drug targets is considered to be limited because they have not yet been implicated in the pathology of any common disease. However, these receptors are important targets for the fragrance industry, which is likely to prompt further structural and pharmacological characterization.

The Secretin receptor family/class B

The Secretin family is a small family of GPCRs that all have an extracellular hormone-binding domain and bind peptide hormones (Table 1). The 15 members of this family are the calcitonin and calcitonin-like receptors (CALCR, CALCRL); the corticotropin-releasing hormone receptors (CRHR1, CRHR2); the glucagon receptor (GCGR); the gastric inhibitory polypeptide receptor (GIPR); the glucagon-like peptide receptors (GLP1R, GLP2R); the growth-hormone-releasing hormone receptor (GHRHR); the adenylate cyclase activating polypeptide receptor (PAC1/ADCYAP1R1); the parathyroid hormone receptors (PTHR1, PTHR2); the secretin receptor (SCTR); and the vasoactive intestinal peptide receptors (VIPR1, VIPR2)82. The family name is derived from the first receptor to be discovered from this family: SCTR from the rat83.

The 15 Secretin receptors share between 21 and 67% sequence identity and most of the variation is in the N-terminal regions. However, all of the Secretin family receptors contain conserved cysteine residues in the first and second extracellular loop of the TM regions (Fig. 2). Also, almost all of these receptors contain conserved cysteine residues that form a network of three cysteine bridges in the N termini84,85,86. The stabilization of the N-terminal structure by these cysteine bridges is also evident from the nuclear magnetic resonance structure of the N terminus of the mouse CRHR287. The N terminus has been shown to be crucial for ligand interactions and this is evident both from the missense-mutation study of the little mouse phenotype and from extensive mutagenesis, chimerical and photoaffinity labelling studies in PTHR1, GCGR, SCTR, GHRHR and CALCR87,88,89,90,91,92,93,94,95 (Fig. 2). The binding profile of the Secretin receptors can be illustrated mainly by three binding domains consisting of the proximal region and the juxtamembrane region of the N terminus and the extracellular loops together with TM6 (Fig. 2; and references therein). The ligand is thought to activate the receptor by bridging the N-terminal and the TM segments/extracellular loops87,92,95,96, thereby stabilizing the active conformation of the receptor, which increases the probability of activation of the signalling units.

Figure 2: Conserved features and structural motifs within the Secretin receptor family/class B.
figure 2

The schematic transmembrane (TM) regions display the consensus of an alignment generated in ClustalW 1.82 (Ref. 233) of the 15 Secretin receptors from the human genome. The positions of the residues are calculated from the TM boundary (established by Donnelly234) starting with 1 in the N- to C-terminal direction. Numbers in italic correspond to the first position in each TM region of the human secretin receptor (SCTR). Uppercase letters indicate completely conserved positions, lowercase letters indicate well-conserved positions (>50%), whereas x indicates variable positions. Residues conserved in all 15 sequences are displayed as circles in which conserved aliphatic residues are shown in beige, polar in orange, aromatic in purple, positively charged in red and negatively charged in blue. Conserved sequence motifs found in the TM regions of the Secretin family are surrounded by red boxes. Conserved cysteine residues are pictured as yellow circles, the N-terminal cysteine bridges are drawn as lines and the cysteine bridge between extracellular loops 1 and 2, which is common to most G protein-coupled receptor (GPCR) families, as two straight lines. The conserved cysteine encircled blue is not conserved in the adenylate cyclase-activating polypeptide receptor (PAC1/ADCYAP1R1)87,88,90,91,92,93,95,96,235,236,237,238,239,240,241,242,243,244,245,246. *Represents residues that are important for binding to vasoactive intestinal peptide receptor 1 (VIPR1)235. Represents a residue important for binding to SCTR240. GCGR, glucagon receptor; GHRHR, growth-hormone-releasing hormone receptor.

So far, three of these hormones are used in the clinic: calcitonin, glucagon and parathyroid hormone37, for the treatment of hypercalcaemia, hypoglycaemia and osteoporosis, respectively. The Secretin receptors have a large potential as targets for further drug development owing to their importance in central homeostatic functions. GLP1R and GLP2R are particularly interesting because of their role in appetite regulation and in the treatment of type 2 diabetes97. It is generally problematic to develop drugs that mimic peptidergic ligands of this size so, in addition to recombinant peptides, allosteric ligands might provide an additional option.

The Adhesion receptor family/class B

According to the GRAFS GPCR classification, the second largest GPCR family in humans, with 33 members, is called the Adhesion family18. This family is also referred to as the LNB7TM family13, whereby LN stands for long N termini and B for the sequence similarity between the TM regions of Adhesion GPCRs and the Secretin receptors (class B)82,98. The distinction of the Adhesion family as a separate GPCR family and not only as a part of class B is based on the overall phylogenetic analyses of the 7TM regions of most human GPCRs18. Moreover, this classification is supported by the striking differences within the N-terminal domain architecture between the Secretin and the Adhesion receptors. Also, Adhesion GPCRs display the GPCR proteolytic (GPS) domain, whereas the Secretin receptors lack this. Furthermore, both Adhesion and Secretin receptors can be identified as separate families in both protostomes (for instance in fruitfly and nematode) and deuterostomes, which indicates an early origin of these as separate families21. In addition, the preferred ligands of the different families deviate: the de-orphanized Adhesion receptors bind extracellular matrix molecules, whereas Secretin GPCRs bind peptide hormones. The members of the Adhesion family can be divided into eight subgroups I–VIII based on the phylogenetic relationship between the TM regions99 (Fig. 3). This phylogenetic classification is also supported by the composition of functional domains in the N termini.

Figure 3: Conserved features and structural motifs within the Adhesion receptor family.
figure 3

The upper part of the figure displays the diversity within the N termini of the Adhesion receptors (family B2)82. The domains in the N termini were identified with rps-blast against the CDD (Conserved Domain Database) E-value of 0.01 at: http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi. In the lower part, the schematic transmembrane (TM) regions display the consensus of an alignment generated in ClustalW 1.82 (Ref. 233) of the 33 human Adhesion receptors from the human genome. The positions of the residues are calculated from the TM boundary (established by those in Refs 104,128, 248) starting with 1 in the N- to C-terminal direction. Numbers in italic correspond to the first position in each TM region of BAI1. Uppercase letters indicate completely conserved positions, lowercase letters indicate well-conserved positions, whereas x indicates variable positions. Residues conserved in all 33 sequences are displayed as circles in which conserved aliphatic residues are shown in beige, polar in orange, aromatic in purple and positively charged in red. The conserved sequence motif within the G protein-coupled receptor (GPCR) proteolytic site (GPS) domain of the Adhesion receptors is surrounded by a purple box. Conserved cysteine residues are pictured as yellow circles in which the cysteine bridge that is conserved in most GPCR families is drawn as two straight lines. *GPR123 lacks GPS domain. Domain scoring with an E-value 0.1>x>0.01. A caution has to be made regarding the interpretation of the domain repertoire using the CDD search tool as several domains are low scoring (for example for the domains marked by ). #Nonsense mutation V2250X in mouse linked to audiogenic seizures. §Mutation D1040G in associated with the mouse Crsh mutant. ||Mutation N1110K associated with the mouse Scy mutant. C-type lectin, similar to the C-type lectin or carbohydrate-recognition domain99,126,247; CA, cadherin domain; Calx-beta, domain found in Na–Ca exchangers; CUB, resembles the structure of immunoglobins; EAR, epilepsy-associated repeat; EGF, epidermal growth factor domain; EGF-Lam, Laminin EGF-like domain; GBL, galactose-binding lectin domain; HBD, hormone-binding domain; Herpes_gp2, resembles the equine herpes virus glycoprotein gp2 structure; Ig, immunoglobulin domain; LamG, laminin G domain; LRR, leucine-rich repeat domain; OLF, olfactomedin domain; PTX, pentraxin domain; Puf, displays structural similarity to RNA-binding protein from the Puf family; SEA, domain found in sea-urchin sperm protein; SIN, resembles the primary structure of the SIN component of the histone deacetylase complex; TSP1, trombospondin domain. Parts of the figure are adapted from Ref. 99.

The Adhesion GPCRs are rich in functional domains and most of the receptors have long and diverse N termini99 (Fig. 3), which are thought to be highly glycosylated and form a rigid structure that protrudes from the cell surface100. These extracellular regions contain a GPS domain that acts as an intracellular autocatalytic processing site that yields two non-covalently attached subunits101. The cleavage site is located between a conserved aliphatic residue — most often a leucine — and a threonine, serine or cysteine (HL↓T/S/C)101,102,103,104. The proteolytic cleavage of the receptor protein occurs in the endoplasmic reticulum or in the early compartment of the Golgi apparatus102. Mutations in the GPS domain have been shown to inhibit proteolytic cleavage and subsequent cell-surface expression, which may indicate that specific post-translational processing may be required for correct folding and transport to the membrane102.

The diverse N termini of Adhesion GPCRs may contain several domains that can also be found in other proteins, such as cadherin, lectin, laminin, olfactomedin, immunoglobulin and thrombospondin domains (Fig. 3). The number and structure of these domains have been shown to have an important role in the specificity of receptor–ligand binding interactions105. The Adhesion receptors also have conserved cysteine residues in extracellular loops 1 and 2, much like the other GPCR families82 (Fig. 3). In 1997, lectomedin receptor 1 (LEC1/LPHN2) was co-purified with the G-protein-α0 (Ref. 106). Also, GPR56 has been shown to form a complex with G-protein-αq/11 (Ref. 107), which indicates that this family signals through a G-protein-mediated pathway. However, the lack of ligands for most of the Adhesion receptors and under-developed functional assays have hampered studies on the signalling pathways for these receptors.

De-orphanized Adhesion GPCRs. So far, only three of the Adhesion GPCRs have been de-orphanized (Table 1). Glycosaminoglycan chondroitin sulphate has been shown to interact with epidermal growth factor (EGF)-like module containing mucin-like receptor protein 2 (EMR2) through an EGF domain, which mediates cell attachment108. EMR3 can interact with a ligand expressed at the surface of macrophages and activated neutrophils109, whereas EMR4 was recently shown to interact with a cell-surface ligand on A20 B-lymphoma cells110. However, the identity of both ligands is still unknown. The leukocyte activation antigen CD97 has been shown to bind the decay accelerating factor (CD55 or DAF)111. The longest splice variant of CD97 has the highest CD55-expressing cell-binding capacity111,112. Only three amino acids differ within the EGF domains of the shortest splice forms of CD97 and EMR2 (Ref. 105). However, this marginal difference results in one order of potency weaker binding of CD55 to EMR2 than to CD97 (Ref. 105). The most recently de-orphanized Adhesion GPCR, GPR56 (TM7XN1)113,114, has been shown to contribute to the suppression of melanoma metastasis and tumour growth through an interaction with tissue transglutaminase (TG2), which is expressed in the extracellular matrix113.

Orphan Adhesion GPCRs. Most Adhesion GPCRs are still orphans; that is, their endogenous ligand is still unknown. This includes the LEC receptors106,115 and EGF-TM7-latrophilin related protein (ETL/ELTD1)104. The endogenous ligand of the LEC receptors is still unknown; however, α-latrotoxin, a component of black widow spider venom, can bind and activate the LEC1 receptor101. α-Latrotoxin is thought to activate the LEC1 receptor by first interacting with the extracellular adhesion part, followed by an interaction with the first TM region116.

The human genome contains three cadherin EGF LAG seven-pass G-type receptors (CELSR1–3)117,118. In Drosophila, the CELSR homologue was named flamingo after its appearance (long extracellular neck, a TM body and a long intracellular leg)119. Mutations in the last cadherin domain of mouse Celsr1 gave rise to the spin cycle (Scy) mutant, a phenotype with abnormal head-shaking behaviour and neural tube defects120 (Fig. 3). The same phenotype is present in the crash (Crsh) mutant, which has a missense mutation in the second-last cadherin domain of mouse Celsr1 (Ref. 120) (Fig. 3).

GPR97, GPR110, GPR111, GPR112, GPR113, GPR114, GPR115, GPR116, GPR123, GPR124, GPR125, GPR126, GPR128, GPR133 (extended version NP_942,122.2), GPR144 and HE6 (human epididymal gene product 6; GPR64), the three brain angiogenesis inhibitor (BAI) receptor proteins and the very large GPCR VLGR1/MASS1 are all orphan Adhesion GPCRs99,121,122,123,124,125,126,127. Notably, rat GPR116 (Ig-hepta) has been shown to exist as a homodimer that is linked by disulphide bonds128, and the receptor Ig-hepta has also been shown to undergo endoproteolytic cleavage, both within the conserved motif in the GPS domain and in the SEA domain (a domain found in sea-urchin sperm protein)129. Moreover, the 20-kDa cleavage fragment generated by cleavage in the SEA domain is thought to act as a ligand to Ig-hepta129,130. VLGR1/MASS1 is the most extreme version of an Adhesion GPCR (Fig. 3). This receptor protein can be expressed as three different isoforms (VLGR1a, b, c) and the longest isoform, VLGR1b, is composed of 6,307 amino acids124,131. The nonsense mutation V2250X in mouse (V2254 in human), affecting a Calx-beta domain between the pentraxin domain and the epilepsy-associated repeat (EAR) domain, is linked to audiogenic seizures in mice132.

Figure 3 depicts the diversity within the N termini in this receptor family. Based on ligand-binding studies and the association between mutations within certain functional domains and changes in phenotype, it is most likely that these conserved regions mediate the function of these proteins105,120. Owing to the present limitation of known ligands to these receptor proteins, no drugs are known to be targeted against these GPCRs. However, the potential role of this family in cell growth and the immune system makes it an important family for future drug development. So far, all of the identified ligands that target this receptor family are large membrane-bound ligands. A way of targeting these receptor proteins may be to bypass the ligand-interacting domain and directly influence the TM regions. Also, there is an increased interest in generating monoclonal antibodies as drugs, and for this purpose the Adhesion receptors with long N termini may be suitable. It is also worth noting that several of these receptors are found in CNS tissues133. For instance, GPR123 was recently shown to be highly expressed in the thalamus, several nuclei of the amygdala, cortical layers 5 and 6, the subiculum and the inferior olive134. However, the functional role of most Adhesion GPCRs in the CNS is still not well understood.

The Glutamate receptor family/class C

The Glutamate family18 or class C12 consists of 22 human proteins: eight metabotropic glutamate receptors (GRMs), two GABABRs (also referred to as one receptor with two subunits), the calcium-sensing receptor (CASR), the sweet and umami taste receptors (TAS1R1–3), GPRC6A and seven orphan receptors135 (Table 1). Most Glutamate members bind their respective endogenous ligand within the N-terminal region. The crystallization of the extracellular region of rat GRM1 illustrates how this region of the receptor is folded into two domains in which the tertiary structure is fixed by intraprotomeric disulphide bridges136 (Fig. 4). This structure has been shown to share structural homology with bacterial amino-acid-binding proteins such as the bacterial periplasmic-binding protein LIVBP137,138. The ligand-binding mechanism of the extracellular region has been compared to a Venus flytrap mechanism (VFTM), in which the two lobes of the region form a cavity where glutamate binds and thereby activates the receptor136. Glutamate binds to a ligand-binding site that is fairly conserved within GRMs, CASR, TAS1Rs and GPRC6A (Fig. 4). The exact interactions have been resolved for GRM1, GRM3 and GRM7 (Refs 136,139). The overall structure of the extracellular regions of the GRMs can be applied to CASR, TAS1Rs and GPRC6A57,140,141,142,143 (Fig. 4).

Figure 4: Conserved features and structural motifs within the Glutamate receptor family/class C.
figure 4

The upper part of the figure illustrates the different conformations of the extracellular part of the Glutamate receptors. Conserved cysteine residues are pictured as yellow circles in which cysteine bridges visualized by crystallization are drawn as a single straight line (except the bridge between the Venus fly trap mechanism (VFTM) and cysteine-rich domain (CRD), here shown in red), postulated cysteine bridges as dotted lines and the cysteine bridge between the extracellular loop 1 and 2, which is common to most G-protein-coupled receptor (GPCR) families, as two straight lines. The background structure of the VFTM of the rat glutamate receptor, metabotropic 1 (GRM1)136 is downloaded from PDB database (accession number: 1EWK) and visualized using the Sybyl software (Tripos, Germany). Black-filled positions indicate positive metal-binding sites in the rat GRM1 and green-filled circles indicate interaction points between the L-glutamate and the VFTM of rat GRM1 (Ref. 136). The dotted circles indicate positions among the L-glutamate-interacting residues in which chemical properties are conserved in the sweet and umami taste receptors (TAS1Rs). The crossed circle illustrates the residue that is conserved between the GRMs and the calcium-sensing receptor (CASR) and is shown to be important for L-amino acid interaction in CASR249, whereas a green circle surrounded by a red line illustrates a homologous residue in CASR (E297), which is associated with both activating and inactivating naturally occurring mutations and is important for Ca2+ binding144. A green circle encircled with extra thick black line shows a conserved residue between rat GRM1 and γ-aminobutyric acid (GABA) B receptor, 1 (GABABR1), which has been shown to interact with GABA250. Green circles encircled with a thick yellow line denote conserved residues in the amino acid-binding GPRC6A receptor143. The extracellular regions of the GABABRs are shown as a schematic structure of the bilobular structure as no crystallization data has been presented for this structure. The schematic transmembrane (TM) regions display the consensus of an alignment generated in ClustalW 1.82 (Ref. 233) of the eight GRMs from the human genome. The positions of the residues are calculated from the TM boundary (established by those in Refs 145,251) starting with 1 in the N- to C-terminal direction. Numbers in italic correspond to the first position in each TM region of GRM1. Residues conserved in all eight sequences are displayed as circles in which conserved aliphatic residues are shown in beige, polar in orange, aromatic in purple, and positively charged in red. Conserved sequence motifs found in the TM regions of GRMs, CASR and TAS1R are surrounded by orange boxes. Lowercase letters indicate a well-conserved position and x indicates variable positions. In the TM regions, a red-encircled position denotes the interaction point for a positive allosteric enhancer of the GRM type I or II, blue indicates an interaction point for negative allosteric modulators of the GRM type I, green highlights the interaction points of allosteric modulators of the CASR, yellow highlights the interaction points of allosteric modulators of TAS1R3 (Refs 169,171), whereas grey circles indicate positions for naturally occurring mutants in CASR252.

CASR binds Ca2+ in the large extracellular region140,144. Intriguingly, Ca2+ induces enhancement of glutamate binding in type I GRMs (for classifications of GRMs see Refs 145,146) and in GRM3 (Ref. 147). CASR, in turn, has been shown to bind aromatic amino acids, which thereby enhances the sensitivity of the CASR agonists Ca2+, gadolinium (Gd3+) and spermine148. Consequently, Ca2+ enhances the effect of glutamate on GRMs and amino acids enhance the effect of Ca2+ on CASR. Many of the Ca2+ contacting residues are well conserved within the VFTM of most Glutamate GPCRs144, which highlights the importance of these residues for drug targeting. It is likely that this binding pocket could be targeted with similar small-molecule ligands to affect a wide range of biological responses associated with these receptors, but whether sufficient specificity can be reached remains to be determined.

The TAS1Rs consists of three GPCRs — TAS1R1, TAS1R2 and TAS1R3 — which function as protomers in heterodimeric complexes57,58,59,60,61. The dimer complex between TAS1R1 and TAS1R3 senses the L-glutamate taste (umami), whereas the combination of TAS1R2 and TAS1R3 detects natural and unnatural sweeteners58,60. L-Glutamate has been postulated to interact with the extracellular domain of the TAS1R1 unit, whereas sweeteners such as aspartame and neotame are shown to interact with the corresponding extracellular domain of the TAS1R2 unit149.

Apart from the structurally similar ligand-binding domain, the extracellular regions also contain a cysteine-rich domain (CRD), consisting of nine conserved cysteine residues forming three predicted disulphide bridges150,151,152, which could function as a spring between the ligand-binding domain and the intracellular signalling mechanism connected to the TM regions (Fig. 4). Lobe 2 of the human GRM2 was recently shown, through mutagenesis studies, to be covalently linked to the third conserved cysteine in the CRD150 (Fig. 4). The study showed how this interaction is crucial for the activation of the receptor dimer after agonist binding150. The position of the covalent link between lobe 2 and the CRD was later confirmed by the crystallization of GRM3 (Ref. 139). The CRD is also crucial for signal transmission in the CASR152. A gain-of-function mutation that has been associated with hypocalcaemia has been found in this domain, which led the authors to suggest that the region may suppress CASR activity in the presence of low extracellular Ca2+ concentrations153. The CRD in TAS1R3 has been shown to be crucial for the activity of brazzein, a sweet-tasting plant protein154. Moreover, it was also shown that substitutions in this area of the receptor protein affected the signalling properties of most sweeteners154, further strengthening the importance of this area in signal transduction.

The structure of the extracellular region of the two GABA-binding GPCRs, GABABR1 (GABABR1a–c155,156) and GABABR2 (Ref. 155), differ slightly from the GRMs CASR and TAS1Rs (Fig. 4). The extracellular regions in GABABRs are thought to contain a bilobular ligand-binding structure (a VFTM), which is less similar to the GRMs than the VFTM of CASR and TAS1Rs136,155,157 (Fig. 4). Also, the GABABRs lack the CRD found in the GRMs, CASR and TAS1Rs155,157. The GABABR1a subunit contains two SUSHI domains (also known as complement control protein modules or short consensus repeats) close to the VFTM in the N terminus156 (Fig. 4). The SUSHI domain contains a minimum of four cysteine residues forming two disulphide bonds158. The structure can also be found in complementary proteins such as transglutaminases (coagulation factor XIII) and CD21 antigen (Epstein–Barr virus receptor), which are involved in the immune system159. GABABRs are heterodimers in which GABABR1 functions as the ligand-binding domain and GABABR2 as the signalling unit54,55,56. GABABR2 has also been shown to release the suggested inhibitory constrains between the VFTM and the TM regions of GABABR1, possibly by an interaction between the two VFTMs160. These inhibitory constrains are thought to favour the open conformation of the GABABR1 VFTM, thus keeping the receptor in the inactive state160, whereas the closed conformation leads to activation of the receptor161. So far, the GABABRs and CASR have been successfully targeted with therapeutic drugs37,162. For example, cinacalcet, a positive allosteric ligand for CASR, has been shown to normalize serum calcium levels in subjects with primary hyperparathyroidism162.

The TM regions of the GRMs are well conserved, especially the third, sixth and seventh helices (Fig. 4). The conserved positions are most often non-polar hydrophobic residues, positions that are most likely to be conserved owing to the environment in the lipid membrane. In addition, numerous conserved positions in TM3, TM6 and TM7 helices are polar, charged or aromatic residues. These residues are probably involved in interhelical interactions, such as hydrogen bonds and ionic bonds, which possibly stabilize different conformations of the receptor. Among these conserved residues are the wl motif in TM6, which aligns with the CwxP motif in Rhodopsin receptors18, and the pkxy motif in TM7, which may be homologous to the nPxxy motif in the Rhodopsin receptors (Fig. 4). The pkxy motif is also highly conserved within the CASR, TAS1Rs and GABABRs, whereas the GABABRs and TAS1R2 lack the W in the wl motif.

Although all of the known endogenous Glutamate ligands interact primarily with the N-terminal region of the receptor protein, many allosteric ligands of the GRMs have been found to interact with TM3, TM5, TM6 and TM7 (Refs 163–165) (Fig. 4). These particular TM regions are also the main target for allosteric modulators of CASR166,167. Surprisingly, four out of six interacting residues in CASR correspond to interaction points for negative allosteric compounds binding to type I GRMs (Fig. 4). Moreover, the interaction points for the negative modulators of CASR are also located in TM3, where one of the interacting residues aligns with a residue that is important for both positive and negative modulators of GRMs168. Altogether, this interaction pattern indicates a conserved activation mechanism for the GRMs and CASR. The signalling properties of the TAS1Rs can also be modulated by direct interaction with the TM regions, as is the case for the sweet-tasting compound cyclamate, which, instead of binding to the VFTM, binds to the second and/or third extracellular loops of the TM regions of TAS1R3 and residues within TM3, TM5 and TM6 (Refs 149,169). The TAS1R3 antagonist lactisole interacts with residues in TM3, TM5 and TM6 to inhibit the sweet taste of most sweeteners170 (Fig. 4). Also, allosteric compounds can modulate the function of the GABABR heterodimer by interacting with the TM regions of GABABR2 (Ref. 171). Consequently, an amino-acid interacting-binding pocket is conserved in the extracellular domains of most Glutamate members, whereas the TM regions function as the signalling unit in which many allosteric interaction sites are located. One way of targeting a specific member of the glutamate binding receptors would be to develop more specific high-affinity allosteric ligands or potentiators172, as ligands interacting with the glutamate binding pocket in the N termini are likely to be less specific compared with those utilizing the amino-acid diversity within the TM regions.

The latest orphan receptor to be de-orphanized within this group of receptor proteins is GPRC6A173. This receptor has the general GRMs/CASR/TAS1R structure with a large extracellular domain that aligns well with the VFTM of the GRMs/CASR/TAS1R (Fig. 4). Moreover, GPRC6A also contains the CRD adjacent to the TM regions143 and the receptor can also be positively modulated by divalent cations174. The structure of the TM regions also resemble the TM regions of GRMs/CASR/TAS1Rs, keeping the two structurally important cysteines in the first and second extracellular loop, the W in TM6 and the PK and Y in the pkxY motif in TM7 (Ref. 143) (Fig. 4). GPRC6A was recently shown to respond to basic amino acids and most preferentially to the L-α amino acids arginine, lysine and ornithine173,174.

Orphan Glutamate family GPCRs. Recently, a new proposed member of the GABABR group, GABABRL, was cloned in humans and rats175. The sequence identity in the TM region of this protein is 30% conserved with respect to GABABR1 and GABABR2. The N terminus, however, deviates drastically from the GABABRs, in that GABABRL lacks the VFTM but contains cysteine residues adjacent to the TMs175 (Fig. 4). Cells expressing GABABRL alone or together with GABABR1 or GABABR2 were not able to respond to GABA, which led the authors to suggest that this protein is a GABABR-like orphan with the ligand still awaiting identification175.

Besides the GABABRL protein, six orphans have been reported, starting with the GPRC5A or RAIG1 (retinoic acid-inducible gene) in 1998 (Ref. 176). GPRC5A was later followed by GPRC5B, 5C and 5D177,178,179. These orphans share a similar N-terminal structure in that they are short and contain two conserved cysteine residues (Fig. 4). The two cysteine residues are analogues to the two cysteines closest to the membrane in the CRD found in GRMs, CASR179 and TAS1Rs, which further strengthens the evolutionary relationship between these groups of receptors. The orphans can be divided into two clusters based on sequence identity in the TM regions: GPRC5B and GPRC5C are 50% identical, and GPRC5A and GPRC5D are 52% identical. The sequence identities between the clusters vary between 37 and 41%. Furthermore, GPRC5B and GPRC5C both display the two conserved cysteines in the extracellular parts of the TM regions, GPRC5D only contains the conserved cysteine in extracellular loop 2, whereas GPRC5A lacks both cysteines. Both clusters contain the well-conserved W in TM6 (position 13 in Fig. 4) and the P in the motif pkxy in TM7 (Fig. 4) — residues that, owing to the high level of conservation within the Glutamate family, are most likely to be involved in the activation machinery and/or G-protein signalling. Recently, two additional orphans were identified, GPR158 and GPR158L135,180. These proteins do not contain any conserved domains within their respective N termini.

The Frizzled/Taste2 family

The frizzled and smoothened receptors. This group consists of ten frizzled receptors (FZD1–10) and the smoothened receptor (SMO)18. The first report of a seven hydrophobic domain-containing protein, assigned to the tissue polarity locus (frizzled) in Drosophila, was published in 1989 (Ref. 181). During the following 12 years, ten human homologues of the Drosophila FZD were cloned and characterized182,183,184,185,186,187,188,189,190. The FZDs bind the family of Wnt glycoproteins191, whereas the SMO protein seems to function in a ligand-independent manner as the signalling unit in the patched, sonic hedgehog (SHH) and SMO complex192. Until 1997, it was unclear whether the FZDs were GPCRs; that is, transmitting their signal through G proteins. However, Xenopus Wnt-5a was shown to increase the intracellular level of Ca2+ through the phosphatidylinositol signalling pathway (Gq) by interaction with the rat FZD2 (Ref. 193) and SMO signalling proceeds via the Gi pathway194. The relationship to the GPCR superfamily was further strengthened when sequence comparisons with Secretin receptors revealed resemblance in the extracellular regions and the presence of the well-conserved cysteines in the first and second extracellular loops195 (Fig. 5). Moreover, the Xenopus Fzd3 was recently found to be functional as a homodimer, a feature that is shared with the GRMs196.

Figure 5: Conserved features and structural motifs within the frizzled and smoothened receptors.
figure 5

The upper part of the figure illustrates the conformation of the extracellular part of the mouse frizzled receptor 8 (FZD8)197. Conserved cysteine residues within the FZDs are pictured as yellow circles in which the cysteine bridge between the extracellular loop 1 and 2, which is common to most G protein-coupled receptor (GPCR) families, is drawn as two straight lines, and cysteine bridges visualized by crystallization are drawn as one straight line. The background structure of the ligand-binding region of the mouse FZD8 is downloaded from PDB database (accession number: 1IJY)197, and visualized using the Sybyl software (Tripos, Germany). Green-filled circles indicate important ligand-binding residues determined by en bloc alanine mutations between the Xenopus Wnt8-alkaline phosphatase (XWnt8-AP) and the extracellular region of mouse FZD8, whereas green dotted circles equal positions in which the chemical property is conserved between the mouse FZD8 and human smoothened receptor (SMO). Red-encircled green circles indicate positions of natural mutations in the human FZD4, which cause severely defective Norrin-dependent signalling199. Red-encircled yellow circles denote cysteine residues associated with SMO mutants with defective signalling properties203,204. Red-encircled residues in the transmembrane (TM) regions indicate gain of function mutations in the SMO receptor homologues205. The schematic TM regions display the consensus of an alignment generated in ClustalW 1.82 (Ref. 233) of the ten FZDs from the human genome. The positions of the residues are calculated from the TM boundary (established by Barnes et al.195) starting with 1 in the N- to C-terminal direction. Numbers in italic correspond to the first position in each TM region of FZD1. Residues conserved in all ten sequences are displayed as circles in which conserved aliphatic residues are shown in beige, polar in orange, aromatic in purple, and positively charged in red.

The extracellular part of the FZDs range from 200 to 320 amino acids in length in which the differences mostly lie in the linker region between the TM part and the extracellular ligand binding domain182. The Wnt ligands bind to a cysteine-rich region in the extracellular part of the receptor protein where the positions of nine cysteines are conserved192 (Fig. 5; Table 11). However, new evidence shows that there may be additional binding sites outside this region, which may be located in the extracellular loops of the TM regions198. Residues that are important for XWnt8–mouse FZD8 interactions have been identified by en bloc alanine mutations197 (Fig. 5). Wnts are not the only ligands for the FZDs. Recently, the secreted protein Norrin was shown to interact and signal through the mouse Fzd4 (Ref. 199). Two familial exudative vitreoretinopathy (FEVR)-associated mutations in the ligand binding domain of FZD4, M105V and M157V (Fig. 5) are capable of binding to Norrin but are severely defective in signalling199. A FEVR-associated mutant that lacks two amino acids in the end of TM7 has also been shown to have defective signalling properties200. One of the residues missing, W494, may be analogous to the Y in the well-conserved nPxxy motif in TM7 of Rhodopsin receptors, a position that is also associated with GPCR signalling201.

The human orthologue of the SMO protein in Drosophila was first discovered in 1996 by Stone and co-workers202. The SMO protein shares several structural features with the FZDs203. Eight of the conserved cysteines in the extracellular region are preserved in the SMO protein (Fig. 5) and the importance of these residues has been highlighted by the two inactive Drosophila mutants, smo1A3(Ref. 203) and smoF5(Ref. 204), which both contain a missense mutation with respect to a conserved cysteine. Moreover, the chemical properties of seven residues involved in Wnt binding to the FZDs are conserved in the SMO extracellular region (Fig. 5). The disruption of the postulated conserved cysteine bridge between the first and second extracellular loop has been associated with the loss of function of mutant smo4D1 (Ref. 203). Several gain-of-function mutants in SMO have been found in the bottom part of TM6 and TM7 (Ref. 205) (Fig. 5). Intriguingly, one of them coincides with the well-conserved aromatic residue positioned in TM7 close to the cytosolic side205, a position associated with signal transduction mechanisms in FZDs and possibly a homologous position to the Y in nPxxy in the Rhodopsin receptors199,201. Analogous to the GRMs, small-molecule compounds have the ability to interact directly with the TM regions of the SMO protein and directly affect the signalling properties of the receptor, possibly by interacting with different binding pockets in the receptor protein206,207. So far, there are no approved drugs that target an FZD or SMO. However, these receptors are implicated in cancer development, as several types of human tumour have been associated with gain-of-function mutations in SMO205, and blocking of FZD10 suppresses growth of synovial sarcoma cells208. The potential for targeting these receptors for cancer therapy has thus gained stronger impetus.

The Taste2 receptors. The human genome contains 25 functional T2R genes, which are mostly localized in clusters on chromosome 7q31 and 12p13 (Refs 19,20, 209–212). The nature of the bitter taste receptors was discovered in 2000, when the bitter compounds cycloheximide, denatonium and 6-n-propyl-2-thiouracil were found to induce G-protein signalling in cells expressing the 7TM-spanning membrane proteins T2Rs213. The T2Rs can be divided into five different subgroups based on phylogenetic analyses, in which the degree of sequence conservation between the T2R subgroups differs remarkably (20–90%)211. The great sequence diversity within the T2Rs may explain how a limited number of receptors can sense the thousands of bitter compounds that humans can detect19,211,214.

Several of the T2Rs are still orphans; however, T2R14 has recently been found to react to the bitter component in absinthe ((−)-α-thujone) and to picrotoxin214, whereas T2R43 (also known as T2R52; one receptor protein, two different nomenclature systems, see Ref. 210) and T2R44 (T2R53) react to acesulphame K and aristolochic acid215. T2R16 has been shown to bind β-glucopyranosides212. The bitter sensation of saccharin has been associated with the activation of T2R43 and T2R44 (Ref. 215). Figure 6 displays the structurally conserved regions of a cluster of eight well-conserved T2Rs, which can all be found in close proximity on human chromosome 12p13.2: T2R49 (T2R56), T2R48, T2R50 (T2R51), T2R45, T2R44, T2R43, T2R54 and T2R47 (T2R44). The T2Rs are relatively short GPCR receptor proteins, spanning from 290 to 340 amino acids. They are intronless and display short N termini and C termini19,20. The T2Rs seem to lack the otherwise well-conserved cysteine bridge between two of the extracellular loops (Table 1). The extracellular regions and the top of TM3 are frequently associated with single nucleotide polymorphisms in mouse cycloheximide non-tasters (mouse T2R5)213 and in human T2R4 (Ref. 216) (Fig. 6).

Figure 6: Conserved features and structural motifs within the bitter taste 2 receptors (T2Rs).
figure 6

The schematic transmembrane (TM) regions display the consensus of an alignment generated in ClustalW 1.82 (Ref. 233) of eight closely related T2Rs (see main text) from the human genome. The positions of the residues are calculated from the TM boundary (established by Adler et al.19) starting with 1 in the N- to C-terminal direction. Numbers in italic correspond to the first position in each TM region of T2R45. Uppercase letters indicate completely conserved positions, lowercase letters indicate well-conserved positions, whereas x indicates variable positions. Residues conserved in all eight sequences are displayed as circles in which conserved aliphatic residues are shown in beige, polar in orange, aromatic in purple, positively charged in red and negatively charged in blue. Conserved sequence motifs found in the TM regions of T2Rs are surrounded by green boxes. Positions conserved within all 25 human T2Rs are encircled with green. *Single nucleotide polymorphism in human T2R4. Substituted in non-taster mouse T2R5.

Furthermore, the importance of the extracellular loops in T2R ligand interaction/activation was recently shown by Pronin et al., who identified amino acids crucial for 6-nitrosaccharin and N-isopropyl-2-methyl-5-nitrobenzenesulphonamide (IMNB) binding to the human T2R43 (61), by exchanging parts of the extracellular loop 1 and 2 with the non-binder T2R44 (64)217 (Fig. 6) — the numbers within parenthesis indicates the nomenclature used by Ref. 217. Domain-swapping experiments showed that the four differing residues in extracellular loop 1 are crucial for IMNB binding and activation, whereas both extracellular loops 1 and 2 are important for 6-nitrosaccharin binding. The authors also studied the molecular determinates for the interaction between denatonium and T2R47 and concluded that the same area involved in 6-nitrosaccharin and IMNB binding is crucial for denatonium binding to T2R47 (Ref. 217) (Fig. 6). This emphasizes the importance of the first and second extracellular loop in T2R binding and activation. The T2Rs have high species variation and seem, like the olfactory receptors, to have few conserved residues that are common for binding of their ligands. It is likely that the ligand repertoire conveys a different type of evolutionary pressure at these receptors, as with the trace amine receptors218, which is in contrast to many other classic GPCRs that have a more defined ligand–receptor association.

Concluding remarks and outlook

All the vertebrate genomes studied so far hold GPCRs from the five main families (Box 1), and our most recent mining of the human proteome suggests that it contains at least 799 full-length human GPCRs22. However, fewer should be considered as possible drug targets on the basis of whether their physiological function could be related to disease. The largest cluster of human GPCRs that do not appear to represent potential drug targets are the sensory receptors, including the olfactory (388), the bitter taste (T2Rs) (25), the vomeronasal (V1Rs, V2Rs) (6), the sweet/umami taste (TAS1Rs) (3) and the opsins/rhodopsin-related receptors (8)22 (Table 1), leaving 369 GPCRs that could represent drug targets.

There are at least 46 GPCRs (calculated as monomers) that have been successfully targeted by drugs. These are found in three of the main families: Rhodopsin (>39), Secretin (4) and Glutamate (3) (Table 1). Thus, 323 GPCRs that could represent drug targets remain, and about 150 of these are still orphans22. Additionally, several of these orphan receptors do not have close structural relatives; that is, they are not found in phylogenetic clusters like many GPCRs that bind similar types of ligands.

Approximately 50% of the targeted GPCRs interact naturally with peptides or proteins (including enzymes), 26% with biogenic amines, 15% with lipid-like ligands, 4% with amino acids, 2% with nucleotides and 2% with cations (Table 1). Only a small fraction of the human GPCRs have been successfully therapeutically targeted so far: 17% (23 out of 133) of the receptors that bind peptides or proteins (including enzymes); 29% (12 out of 41) of the receptors that bind biogenic amines; and 20% (7 out of 35) of the lipid-like-binding receptors (ligand preference numbers from Ref. 52). Only one of the 16 receptors that bind purines (nucleotides) and two of the 12 amino-acid-binding receptors have so far been targeted therapeutically.

The biogenic amine-binding receptors have the highest number of successfully targeted receptors, which is probably due to their relevance in the treatment of cardiovascular diseases, for example. The peptide/protein-binding receptor group includes the highest number of non-targeted receptors, suggesting a large potential for new drug discovery. The peptide receptors have the advantage of generally binding a limited number of ligands, which cannot be said for many monoamine receptors, which often have considerable affinity to other naturally occurring monoamine ligands. The peptide receptors are to a large degree involved in functions such as regulation of body weight, pain sensation and the immune system, which will continue to attract the attention of drug developers for decades to come. The lipid-binding receptors are gaining increased interest and this coincides with our increased understanding that many lipid compounds act as specific regulatory factors (for a review, see Ref. 219).

The Secretin receptors are mainly targeted through peptide analogues of their endogenous ligands, whereas the GRMs are difficult to target because of the central role of glutamate in the nervous system and the multitude of glutamate-binding receptors. The Adhesion family is less characterized in terms of druggability; however the high domain diversity within their N termini could provide the basis for a selective therapy. The drawback, though, of targeting these receptors with drugs aimed at N-terminal domains is that most of the conserved domains are found in other proteins, which could interfere with selectivity. Nevertheless, receptors with long N-terminal regions, such as the Adhesion GPCRs, may be suitable targets for monoclonal antibody-based drug treatments. Adhesion receptors may participate in cell guiding functions and could therefore be suitable as cancer or immunological drug targets.

To fully utilize the structural diversity of GPCRs for selective drug targeting, detailed knowledge about their secondary structure is needed. With respect to this issue, receptors with ligand-binding pockets within their N-terminal stretch may have an advantage because of the possibility of obtaining crystallized structures of such soluble parts, as observed recently with several GRMs, FZD8 and the FSHR34,35,136,139,197. However, many non-drugged Rhodopsin receptors remain, and given the previous success in targeting this family, it is anticipated that this family will continue to attract the most attention from GPCR-focused drug developers.