Main

Molecular modelling has an important role in rational drug design1,2. Reliable three-dimensional models can provide valuable insights into basic principles of molecular recognition and aid in structure-based approaches to lead discovery and optimization3. G protein-coupled receptors (GPCRs) are membrane proteins involved in signal transduction pathways and are important therapeutic targets for numerous diseases4,5. As such, significant structure prediction efforts using methods ranging from de novo to homology-based approaches have been applied to members of the GPCR family6,7.

Until recently, most GPCR homology modelling efforts have been based on the templates of bovine rhodopsin and bacteriorhodopsin, with refinement of the models achieved through molecular dynamics simulations, ligand docking and incorporation of additional biochemical and biophysical data8,9,10,11,12. The refinement step is necessary in building accurate models, especially around the ligand-binding site, owing to the expected structural differences among members of the family. These differences result from the generally low sequence identity and the large diversity of ligands accommodated within the family7,13,14,15, and from the various conformational states that are associated with different levels of ligand efficacy16,17,18.

The most recently solved GPCR structure is the 2.6 Å crystal structure of the human adenosine A2A receptor bound to an antagonist19. Adenosine receptors belong to the class A rhodopsin-like GPCR family and represent promising therapeutic targets in a wide range of conditions, including cerebral and cardiac ischaemic diseases, sleep disorders, immune and inflammatory disorders, and cancer20. The A2A receptor structure shows an overall seven transmembrane (TM) helix architecture similar to that of the rhodopsin and adrenergic receptor structures, but with shifts in the positions and orientations of the helices and a markedly different structure of the extracellular loops19.

To evaluate current progress in GPCR structure prediction and the docking of potential ligands, as well as highlight areas for future efforts in method development, we carried out a community-wide, blind prediction assessment — GPCR Dock 2008 — in coordination with the publication of the human adenosine A2A receptor structure in October 2008 (Ref. 19). GPCR Dock 2008 was organized in a similar manner to the previous CASP (Critical Assessment of methods of Protein Structure) and CAPRI (Critical Assessment of PRediction of Interactions) studies21,22. In this paper, we report the outcome of the assessment together with our analysis of the current status of GPCR structure and ligand docking predictions.

GPCR Dock 2008

In August 2008, before the publication of the human adenosine A2A receptor structure in October 2008 (Ref. 19) and public release of the three-dimensional coordinates, participants were asked to predict and submit up to ten ranked models of the human A2A receptor in complex with the ligand ZM241385, starting from the amino acid sequence of the receptor and a two-dimensional structure of the ligand (see Box 1 for list of GPCR Dock 2008 participants). A total of 63 different groups initially registered, with 206 models submitted by 29 different groups in the final data set (see Supplementary information S1 (box) for details). Of the 206 submitted models, 37 were either missing the ligand or had incorrect bond connectivity for the ligand. We assessed the remaining 169 models for the prediction accuracy of the ligand binding mode, and all 206 models were assessed for the prediction accuracy of the receptor alone.

Assessment criteria. Assessment criteria are dependent on the purpose of the generated models. Given the value of the GPCR structural models in expanding our knowledge in basic molecular recognition and their potential use in the design and development of new small molecules, the quality of the models was primarily assessed by the accuracy of the ligand binding mode. Particular attention was given to the fact that the crystal structure is a static structure with positional errors, and the value of modelling is ultimately to guide drug discovery and provide biological insight. Our numerical measure of accuracy for the ligand binding mode was based on two metrics: ligand root mean square deviation (RMSD) and the number of correct receptor–ligand contacts. Neither metric alone was sufficient to capture the accuracy of prediction around the ligand binding site; hence, both were used and combined into a z-score to rank the models.

The ligand RMSD between the model and the crystal structure was calculated as the coordinate RMSD for the 25 non-hydrogen atoms of ZM241385 after superimposing the Cα atoms of the protein in the model and the crystal structure. In addition, the ligand RMSD is also calculated excluding the phenoxy group of ZM241385 that has high B-factor values. The number of correct contacts is counted as the number of correctly predicted native contacts observed between protein atoms and the ligand. A native contact is defined as any interatomic distance within 4 Å of the ligand in the crystal structure. There are 75 such receptor–ligand contacts, and an additional 15 contacts formed with water.

The models were ranked by assigning a combined mixed z-score to each model. The combined z-score was calculated as the average of z-scores for ligand RMSD and the number of correct contacts:

The z-scores for ligand RMSD and the number of correct contacts were computed by the following steps. First, a z-score was assigned to each model using the average and standard deviation (SD) values from all models. Second, the average and SD was re-computed excluding models with z-scores that were more than two SDs above (for ligand RMSD) or below (for the number of correct contacts) the average. Third, a z-score was reassigned to each model using the revised average and SD values obtained in step two. The best model — that is, the model with the highest combined z-score — from each group was analysed.

Overall outcome of analysis. The submitted models showed a wide distribution in prediction accuracy of the ligand binding mode, with average values of 9.5 Å (SD 3.8 Å) for ligand RMSD (Fig. 1a) and 4 (SD 7) for the number of correct contacts. These statistics indicate that the majority of the submitted models did not predict the ligand position and the binding interactions very accurately. The lack of a strong correlation between ligand RMSD and binding site RMSD (Fig. 1b) (for example, models with a binding site RMSD of less than 4.0 Å have a range of 2.8 to 17.2 Å ligand RMSD), suggests that the performance of some ligand docking methods can be improved.

Figure 1: Root mean square deviation (RMSD) of submitted models.
figure 1

a | Distribution of ligand RMSD (green bars) and protein Cα RMSD (blue bars) for all models. b | A scatterplot of ligand RMSD (y axis) versus binding site RMSD (x axis) for all models. The binding site RMSD values are calculated for heavy atoms of the binding site residues (F1685.29, E1695.30, M1775.38, W2466.48, L2496.51, H2506.52, N2536.55, H2646.66, M2707.35) after the models were superimposed to the crystal structure using the Cα atoms of the protein.

Very few models score well in both ligand RMSD and the number of correct contacts (only 13 out of the 169 total receptor–ligand models have a combined z-score greater than 1, compared with 40 models that score well solely in ligand RMSD (Z ligand RMSD less than −1 Å). For models with relatively low ligand RMSD values but a small number of correct contacts, the inaccuracy in binding interactions could be attributed to errors in the side chain placement of the ligand binding residues. Although nearly a third of the models capture the hydrogen bonding interaction between the N2536.55 side chain and the exocyclic N15 atom of the ligand (44 out of 169 models have a N253 OD1–ZM241385 N15 interaction distance of less than 4 Å), other key receptor–ligand interactions, such as the aromatic stacking interaction between the F1685.29 side chain and the bicyclic ring of the ligand, are not captured well in most models (Fig. 2).

Figure 2: Statistics of the two key receptor–ligand interactions in all models.
figure 2

a | The hydrogen bonding interaction with N2536.55 and the aromatic stacking interaction with F1685.29 are shown by dashed lines with the distance measurements from the crystal structure. b | Distribution of the distance for the interaction between the side chain carbonyl oxygen OD1 atom in N2536.55 and the exocyclic N15 atom of the ligand (ZM241385), and the average interatomic distance for the aromatic stacking interaction between the heavy atoms in the F1685.29 side chain and the bicyclic ring (atoms C11, N12, N13, C14, N15, N16, N17, C18, N19, C20) of the ligand. c | A scatterplot of the distances for the hydrogen bonding interaction (y axis) versus the aromatic stacking interaction (x axis).

Although the overall outcome clearly shows that there are remaining challenges in accurately predicting the ligand binding mode, the quality of the predictions for the receptor alone seem relatively good: 4.2 ± 0.9 Å for the receptor Cα RMSD, and 2.8 ± 0.5 Å for the TM helices Cα RMSD. Not surprisingly, loop regions, with the exception of the short intracellular loop 1 (ICL1), are not modelled very well in most of the models (Fig. 3a,b and Fig. 4a,b). It is notable that some groups that accurately predicted the TM region of the receptor did not predict the ligand binding mode very well (for example, the TM Cα RMSD is 2.0 Å for the model submitted by I. Pogozheva and A. Lomize, and 2.1 Å for the model submitted by J. Horst and A. Roy), indicating that the methods for modelling the receptor and docking of the ligand can be generally considered as distinct steps in the generation of models for the receptor–ligand complex.

Figure 3: Superposition of all 206 submitted models to the crystal structure of the human adenosine A2A receptor.
figure 3

Protein Cα atom superposition between each model and the crystal structure was done using the align command in PyMOL (version 1.0r2, www.pymol.org) (Protein Data Bank ID code: 3EML without the T4-lysozyme). a,b | The receptor is shown as two orthogonal views of Cα traces, with tube thickness being proportional to the root mean square deviation (RMSD) of each Cα position, showing how well the transmembrane (TM) regions were modelled and how much uncertainty there is in the loop regions. c | A superposition of stick diagrams of the ligand (ZM241385) from 169 models; a CPK model is used to delineate the observed position in the crystal structure. The carboxy-terminus (residue numbers greater than 306) is removed from all models. ECL, extracellular loop; ICL, intracellular loop.

Figure 4: Model analysis.
figure 4

a,b | Distribution of Cα root mean square deviations (RMSDs) for adenosine A2A receptor domains. Panel a shows Cα RMSDs for transmembrane (TM) helices and intracellular loop (ICL) regions: helix I: 6–34; helix II: 40–67; helix III: 73–107; helix IV: 117–142; helix V: 173–205; helix VI: 222–258; helix VII: 266–291; ICL1: 35–39; ICL2: 108–116. Panel b shows Cα RMSDs for extracellular loop (ECL) regions: ECL1: 68–72; ECL2: 143–172 (excluding 149–155 that are missing in the crystal structure); ECL3: 259–265. c | Scatterplot of the number of correct contacts versus ligand RMSDs for the best predictions from all groups. The best predictions from the top six groups are marked as mauve crosses.

Analysis of the best models

Despite the challenges in accurately predicting the receptor–ligand interactions, some models had consistent features with the crystal structure, although model ranking continues to be one of the most challenging areas of development. Here, we focus on the predictions from the top ten groups, ranked according to the combined z-score, and assess the model quality in greater detail (Fig. 4c). Note that, with predictions for only one target, the statistical significance of the group ranking cannot be judged as is typically done in CASP experiments by a head-to-head comparison of common targets between the top groups23. To support our selection of the best predictions, we ranked all models using an alternative metric — binding site contact RMSD — which gives all ligand binding residues equal weight and is an RMSD of receptor–ligand contact distance for all ligand-binding residues. We found that both the z-score ranking and the contact RMSD ranking agree on the selection of the best model.

The best model overall (submitted by S. Costanzi) has a ligand RMSD of 2.8 Å and 34 of 75 correct contacts (Fig. 5a and Table 1). The ligand is modelled in a native-like binding pose, with an extended conformation and a nearly perpendicular orientation to the membrane plane. The model accurately predicts some of the key receptor–ligand interactions: it captures the hydrogen bonding interaction between the N2536.55 side chain and the exocyclic amino group (N15 atom) of the ligand, and the aromatic stacking interaction between the F1685.29 side chain and the bicyclic triazolotriazine core of the ligand. Compared with the crystal structure, the ligand in the model is positioned deeper in the binding pocket, bringing the furan ring closer to TM helices III and V. The inaccuracy in the ligand position is most probably due to errors in the side chain positions of the two crucial ligand binding residues (F1685.29 and E1695.30) in extracellular loop 2 and the side chain orientation of M1775.38 at the extracellular end of TM helix V. The aromatic ring of F1685.29, which interacts with the bicyclic ring, is positioned too deeply; the adjacent E1695.30 forms a hydrogen bonding interaction with the hydroxyl group in the phenolic substituent, instead of the exocyclic N15 atom near the bicyclic ring; and the side chain of M1775.38 is not oriented towards the binding cavity. In addition, the family conserved disulphide bond between C773.25–C1665.27 is predicted accurately, but the disulphide bond in extracellular loop 3 between C2596.61–C2626.64 is not, presumably contributing to the inaccuracy in the side chain orientation of H2646.66, which is not pointed towards the binding site.

Figure 5: Comparison between the best models and the crystal structure around the ligand-binding site.
figure 5

The ligand and the ligand-binding residues F1685.29, E1695.30, M1775.38, L2496.51, N2536.55 and H2646.66 are shown for the best model (S. Costanzi) and the crystal structure (A). The ligand is shown as magenta sticks for the model, and as green semitransparent spheres for the crystal structure; the ligand-binding residues are shown as yellow sticks for the model, and blue sticks for the crystal structure. Extracellular (B) and side views (C) of the ligand in the binding pocket for the best predictions from the top six groups (magenta sticks for models and green spheres for the crystal structure). The receptor crystal structure is shown as grey ribbons. The disulphide bonds are shown as orange sticks. D | The ligand-binding residues F1685.29 and N2536.55 are shown as sticks for the best predictions from the top six groups (yellow for models and blue for the crystal structure). In B–D the models are labelled as: a, S. Constanzi; b, V. Katritch; c, P. Lam; d, I. Davis; e, B. Maigret; f, W. Jurkowski.

Table 1 Summary of results for the best models from the top ranking groups

The best predictions from the top six groups (S. Costanzi; V. Katritch and R. Abagyan; P. Lam and R. Abagyan; I. Davis, P. Barth and D. Baker; B. Maigret; W. Jurkowski and A. Elofsson) highlight the successes and challenges in accurately predicting the ligand binding pose and receptor–ligand interactions (Fig. 5B,C,D and Table 1). The extended ligand conformation is accurately predicted in all six models, and the nearly perpendicular orientation is captured in four of the six models. The hydrogen bonding interaction between the N2536.55 side chain and the exocyclic N15 atom of the ligand is correctly modelled in four models; however, in one of the four, the ligand makes no interaction with residues in extracellular loop 2. The aromatic stacking interaction between the F1685.29 side chain and the bicyclic ring of the ligand is correctly modelled in four models; however, in all four models, the ligand is positioned too deeply in the binding pocket, and the M1775.38 side chain is not oriented towards the binding cavity. There is one model that does not accurately capture either the hydrogen bonding interaction with N2536.55 or the aromatic stacking interaction with F1685.29, whereas five of the six models accurately predict the family conserved disulphide bond between C773.25–C1665.27. None of the six models capture the hydrogen bonding interaction between E1695.30 in extracellular loop 2 and the exocyclic N15 atom of the ligand.

Other models that ranked near the top (those submitted by K. Kanou, W. A. Goddard, C. Bologa and A. Olson) are slightly less accurate, but show similar trends to the top six models in their ability to accurately predict the ligand binding mode (Table 1). The ligand is modelled in a native-like extended conformation in three of the four models. The hydrogen bonding interaction between the N2536.55 side chain and the exocyclic N15 atom of the ligand is modelled accurately in three of the four models, whereas the aromatic stacking interaction between the F1685.29 side chain and the bicyclic ring of the ligand is modelled accurately in only one of the four models. The family conserved disulphide bond between C773.25–C1665.27 is captured in two models. Remarkably, one of the models (submitted by W. A. Goddard) accurately places the E1695.30 side chain proximal to the exocyclic N15 atom of the ligand, and almost captures the hydrogen bonding interaction, even though the overall conformation of extracellular loop 2 is inaccurate.

The best predictions were generally not ranked as the best models by the predictors at the time of model submission (before the release of the crystal structure) (Table 1). Only two of the six best models were ranked first, and three of the six groups show a weak correlation between their model ranking and the model quality as assessed by the combined z-score for the accuracy around the ligand-binding site. Furthermore, the additional models submitted by the six groups are generally of lower quality than the best predictions (Table 1). Only one of the six best models has a z-score that is within one SD of the group average z-score.

Status of GPCR structure modelling and docking

The assessment of the submitted models showed that the best participating methods have the ability to predict close, native-like ligand binding, but have limitations in capturing all of the key receptor–ligand interactions and correctly estimating model quality by ranking. The majority of the submitted models are quite far from predicting a native-like ligand binding pose. The most challenging aspect of GPCR structure prediction highlighted in this assessment seems to be in accurately modelling the ligand interactions with residues in the extracellular loop regions. This result is not surprising given the lack of structural homology in the loops among the known GPCR structures24, and the general difficulties in modelling loop regions25,26.

The most successful prediction methods relied on homology modelling approaches based on the template structures of β-adrenergic receptors, and in some cases with the additional template structures of rhodopsin (Protein Data Bank ID code: 2RH1 (β2AR), 2VT4 (β1AR), 1U19 (bovine rhodopsin), 2Z73 (squid rhodopsin)) to generate models of the receptor, followed by docking of the ligand to one or more receptor models using small-molecule docking programmes such as Glide27, ICM28, GOLD29 and AutoDock30 (see Supplementary information S1 (box) for description of prediction methods). The alignment of the human A2A receptor sequence to the template structure seemed to have been straightforward, given the family conserved motifs and residues in the TM helices31. The extracellular loop 2 was modelled by de novo approaches in many of the top predictions (V. Katritch and R. Abagyan; P. Lam and R. Abagyan; I. Davis, P. Barth and D. Baker; W. Jurkowski and A. Elofsson; W. A. Goddard), but only partially modelled in the best prediction (S. Costanzi) for a short segment of eight residues, located amino-terminal to TM helix V, which includes the disulphide bond forming C1665.27. Some of the criteria used to select and rank the final receptor–ligand complex models were: docking scores, conformational energy of the complex, agreement with mutagenesis and structure–activity relationship data, and binding selectivity studied by virtual ligand screening or by modelling other subtypes of adenosine receptor.

The reliability of the homology modelling approach depends on the availability of suitable templates32. The results of the current assessment show that the structures of β-adrenergic receptors alone or together with rhodopsin were suitable transmembrane templates in predicting the general structure of the adenosine A2A receptor. However, given the expected structural diversity in class A GPCRs, it is unclear whether the current set of techniques applied to the structure prediction of the A2A–ZM241385 complex would result in a similar level of accuracy for the prediction of other GPCRs, especially for those belonging to subfamilies that are phylogenetically distant from the amine and the opsin receptor clusters33. We believe the database of GPCR structures needs to expand further to provide suitable templates for accurate modelling of those other receptors.

The inaccuracies in homology models can arise from errors in side chain packing, main chain shifts in aligned regions, errors in unaligned loop regions, misalignments and incorrect templates34. These errors relate to the issue of 'adding value' to the template structure, which was addressed in the recent CASP experiment35, and also seems to be applicable to GPCR modelling. Indeed, ligand interactions with residues located in structurally divergent regions from the templates are consistently not modelled accurately in all of the six best predictions: the hydrogen bonding interaction between E1695.30 in extracellular loop 2 and the exocyclic N15 atom of the ligand is not captured, and the side chains of H2646.66 in extracellular loop 3 and M1775.38 in the extended bulge structure unique to the A2A receptor at the extracellular end of TM helix V are not oriented towards the binding site. An exception is the aromatic stacking interaction between F1685.29 in extracellular loop 2 and the bicyclic ring of the ligand, which is correctly modelled in some of the predictions. F1685.29 is located in the loop, but it is structurally homologous to F1935.32, which interacts with the carbazole heterocycle of the ligand carazolol in the β2AR structure; hence modelling of this interaction may have been guided by homology. Interestingly, F1685.29 is modelled more accurately than E1695.30 even though mutagenesis data showed that mutation of E1695.30 to alanine reduces the affinity for both antagonists and agonists36, and no data is available for F1685.29.

The inaccuracy in the orientation of the ligand binding pose — for example, the parallel orientation with the phenolic substituent positioned close to TM helices II and III — may in part be due to the inaccurate modelling of the helical shifts in TM helices I, II and III. The helical shifts alter the location of the binding pocket and redefine the pocket size and shape19; thus, it is expected that accurately modelling the helical shifts would contribute to a better prediction of the ligand binding pose. The helical shifts were most accurately modelled by an effective use of multiple template structures of rhodopsin and β-adrenergic receptors (I. Pogozheva and A. Lomize), or an all-atom refinement approach implemented by the ROSETTA programme using a physically realistic model that recapitulated protein interatomic and protein–solvent interactions in the membrane environment37 (I. Davis, P. Barth and D. Baker).

Other sources of error include not modelling the water molecules that are either structurally important or directly involved in ligand binding interactions3. The ligand binding cavity in the A2A–ZM241385 structure has four ordered water molecules19, yet none of the submitted predictions included water molecules. We tried re-docking the ligand to the crystal structure using ICM28 and found that a native-like binding pose (within 1 Å heavy atom RMSD for the bicyclic ring and the furanyl substituent of the ligand, and less than 3 Å overall ligand RMSD) can be recovered without any water molecules, which suggests that water may not be critical for accurately predicting the ligand interactions. However, modelling water molecules together with the ligand might contribute to a better prediction of the ligand binding pose or affinity. Additional re-docking studies with the docking protocols used by the participating methods would help assess the effect of the water molecules, and the accuracy of the docking methods separately from that of the receptor modelling methods.

Finally, it is interesting that the best model was from the S. Costanzi group, which has previously worked on adenosine receptor modelling and docking. Their domain knowledge on the adenosine receptor is likely to have been crucial for the evaluation and interpretation of the mutagenesis and ligand interaction data.

Conclusions

Accurate prediction of GPCR structure and ligand interactions remains a challenge, and the approach will improve with the recent availability of experimentally solved GPCRs. Assessment of these predictions highlights similar issues addressed by the CASP predictions for template-based modelling targets; that is, the difficulty in loop modelling, refinement and improvement over the best available template and model ranking. Accurate modelling of the structurally divergent regions (such as the extracellular loops that form defined architectures), and disulphide bond formation affecting helix residue registry and helical shifts in the TM region seem to be crucial for accurately predicting the key ligand interactions in GPCRs, and this area is perhaps the most in need of technological development. Progress in GPCR modelling and docking will require further improvements in the current prediction methods to enhance the best available templates and generate models that will be more useful for applications in structure-based drug design.