research papers\(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

Journal logoBIOLOGICAL
CRYSTALLOGRAPHY
ISSN: 1399-0047
Volume 65| Part 6| June 2009| Pages 582-601

Decision-making in structure solution using Bayesian estimates of map quality: the PHENIX AutoSol wizard

CROSSMARK_Color_square_no_text.svg

aLos Alamos National Laboratory, Los Alamos, NM 87545, USA, bLawrence Berkeley National Laboratory, One Cyclotron Road, Building 64R0121, Berkeley, CA 94720, USA, and cDepartment of Haematology, University of Cambridge, Cambridge CB2 0XY, England
*Correspondence e-mail: terwilliger@lanl.gov

(Received 14 November 2008; accepted 31 March 2009; online 15 May 2009)

Estimates of the quality of experimental maps are important in many stages of structure determination of macromolecules. Map quality is defined here as the correlation between a map and the corresponding map obtained using phases from the final refined model. Here, ten different measures of experimental map quality were examined using a set of 1359 maps calculated by re-analysis of 246 solved MAD, SAD and MIR data sets. A simple Bayesian approach to estimation of map quality from one or more measures is presented. It was found that a Bayesian estimator based on the skewness of the density values in an electron-density map is the most accurate of the ten individual Bayesian estimators of map quality examined, with a correlation between estimated and actual map quality of 0.90. A combination of the skewness of electron density with the local correlation of r.m.s. density gives a further improvement in estimating map quality, with an overall correlation coefficient of 0.92. The PHENIX AutoSol wizard carries out automated structure solution based on any combination of SAD, MAD, SIR or MIR data sets. The wizard is based on tools from the PHENIX package and uses the Bayesian estimates of map quality described here to choose the highest quality solutions after experimental phasing.

1. Introduction

Structure solution in macromolecular crystallography is a multi-step procedure in which more than one plausible possibility often exists at the conclusion of each step. At the start of the process, one or more MAD, SAD, SIR or MIR data sets are collected and reduced to a list of indices and structure-factor amplitudes (Leslie, 1992[Leslie, A. G. W. (1992). Jnt CCP4/ESF-EACBM Newsl. Protein Crystallogr. 26]; Kabsch, 1993[Kabsch, W. (1993). J. Appl. Cryst. 26, 795-800.]; Otwinowski & Minor, 1997[Otwinowski, Z. & Minor, W. (1997). Methods Enzymol. 276, 307-326.]; Pflugrath, 1999[Pflugrath, J. W. (1999). Acta Cryst. D55, 1718-1725.]). Even at this stage there are often several possibilities for the space group that must be considered. For each possible space group, the process con­tinues with finding a substructure containing heavy atoms or anomalously scattering atoms (Grosse-Kunstleve & Adams, 2003[Grosse-Kunstleve, R. W. & Adams, P. D. (2003). Acta Cryst. D59, 1966-1973.]; Schneider & Sheldrick, 2002[Schneider, T. R. & Sheldrick, G. M. (2002). Acta Cryst. D58, 1772-1779.]; Terwilliger & Berendzen, 1999a[Terwilliger, T. C. & Berendzen, J. (1999a). Acta Cryst. D55, 501-505.],b[Terwilliger, T. C. & Berendzen, J. (1999b). Acta Cryst. D55, 1872-1877.]; Weeks et al., 2003[Weeks, C. M., Adams, P. D., Berendzen, J., Brunger, A. T., Dodson, E. J., Grosse-Kunstleve, R. W., Schneider, T. R., Sheldrick, G. M., Terwilliger, T. C., Turkenburg, M. G. & Usón, I. (2003). Methods Enzymol. 374, 37-82.]). There is often more than one plausible substructure at this stage. For example, in space groups that are not chiral the two possible hands of the substructure cannot normally be distinguished. Furthermore, for MAD data sets there may be alternative solutions found by searching for the substructure using different data sets (from various wavelengths or combining data from different wavelengths using FA values; Terwilliger, 1994[Terwilliger, T. C. (1994). Acta Cryst. D50, 11-16.]). Similarly, for MIR data sets there may also be substructures found for several different derivatives. In addition to these intrinsic possibilities, it is possible that more than one set of parameters or even more than one set of software might be used to generate possible solutions. The potential heavy-atom substructures found are then used to calculate the phases of structure factors, which are in turn used as the starting point for density modification (Wang, 1985[Wang, B.-C. (1985). Methods Enzymol. 115, 90-112.]) and subsequent model building (e.g. Perrakis et al., 1999[Perrakis, A., Morris, R. & Lamzin, V. S. (1999). Nature Struct. Biol. 6, 458-463.]; Terwilliger et al., 2008[Terwilliger, T. C., Grosse-Kunstleve, R. W., Afonine, P. V., Moriarty, N. W., Zwart, P. H., Hung, L.-W., Read, R. J. & Adams, P. D. (2008). Acta Cryst. D64, 61-69.]). Normally, one of the best indications of map quality is that the map can be interpreted in terms of an atomic model.

If every possibility at every stage were investigated fully by calculating maps, carrying out density modification and model building, the process might take many hours or days to complete. To speed up the process, the possibilities at each stage are generally ranked, with only the highest ranked possibilities being considered for the next step. This approach can be efficient, but if it is to yield the best solution at the end it requires a reliable method for deciding which members of a set of solutions are of the highest quality.

The definition of `quality' when applied to electron-density maps normally refers to the correlation between the values of electron density in the map and the values of electron density in a hypothetical `true' map for the same structure. In this work, when tests are carried out to assess various measures of map quality, the `true' quality or map correlation is calculated between the map in question and a map obtained using measured amplitudes but with phases calculated from a refined model of the corresponding structure. Maps that have a high map correlation as defined in this way are generally more useful for model building and interpretation than those with a low map correlation. However, it should be noted that map correlation is not a perfect way to assess the utility of a map, as low-resolution terms are generally stronger and therefore have a higher relative contribution to the correlation than high-resolution terms, while the high-resolution terms are generally essential for the interpretation of a map. Consequently, a map could have a moderately high correlation to a model map, based largely on low-resolution terms, yet not be interpretable.

A number of methods for evaluating the quality of experimental macromolecular electron-density maps have been developed. The methods can generally be grouped into real-space calculations and reciprocal-space calculations.

Real-space methods are based on an examination of the electron-density map and generally answer the question `Does this map look like an electron-density map of a macromolecule?' There are many distinctive features of macromolecular electron-density maps that can be used to answer this question. A good map may be expected to have continuous chains of density (Baker et al., 1993[Baker, D., Bystroff, C., Fletterick, R. J. & Agard, D. A. (1993). Acta Cryst. D49, 429-439.]). It may have local patterns of density that reflect shapes and interatomic spacings common to macromolecules (Colovos et al., 2000[Colovos, C., Toth, E. A. & Yeates, T. O. (2000). Acta Cryst. D56, 1421-1429.]; Terwilliger, 2003[Terwilliger, T. C. (2003). Acta Cryst. D59, 1688-1701.]). It may have a distribution of electron densities with a positive skewness, reflecting the large number of points with moderate or low electron density, the lack of points with negative density and the points with very positive electron density located near atoms in the structure (Podjarny, 1976[Podjarny, A. D. (1976). PhD thesis. Weizmann Institute of Science.]; Lunin, 1993[Lunin, V. Y. (1993). Acta Cryst. D49, 90-99.]). There may be a large variation (contrast) in the local r.m.s.d. of electron density, reflecting regions of the structure containing the macromolecule (with high local variation) and solvent (with low local variation; Terwilliger & Berendzen, 1999a[Terwilliger, T. C. & Berendzen, J. (1999a). Acta Cryst. D55, 501-505.]; Sheldrick, 2002[Sheldrick, G. M. (2002). Z. Kristallogr. 217, 644-650.]). The contiguous nature of the regions of relatively flat solvent may be detected from the correlation of local r.m.s.d. at one point in a map with that at neighboring points (Terwilliger & Berendzen, 1999b[Terwilliger, T. C. & Berendzen, J. (1999b). Acta Cryst. D55, 1872-1877.]). If noncrystallographic symmetry is present in the structure, then the correlation of NCS-related density can be detected (Cowtan & Main, 1998[Cowtan, K. & Main, P. (1998). Acta Cryst. D54, 487-493.]; Vellieux et al., 1995[Vellieux, F. M. D. A. P., Hunt, J. F., Roy, S. & Read, R. J. (1995). J. Appl. Cryst. 28, 347-351.]; Terwilliger, 2002a[Terwilliger, T. C. (2002a). Acta Cryst. D58, 2082-2086.]).

Reciprocal-space methods for evaluation of map quality generally address questions involving structure factors and expectations about the structure such as the model for the solvent region or for the heavy-atom substructure. One such question is simply `Given the anomalously scattering atom model and the observed data, what is the expected correlation between the experimental map and the true map?' The value of the figure of merit of phasing (Blow & Crick, 1959[Blow, D. M. & Crick, F. H. C. (1959). Acta Cryst. 12, 794-802.]; Terwilliger & Berendzen, 1999a[Terwilliger, T. C. & Berendzen, J. (1999a). Acta Cryst. D55, 501-505.],b[Terwilliger, T. C. & Berendzen, J. (1999b). Acta Cryst. D55, 1872-1877.]), when estimated correctly, is similar in magnitude to the correlation between the experimental and true maps and can be used as an estimate of this correlation. Another question addresses the data and the expectations about the electron-density map: `Is the amplitude of each structure factor consistent with the value expected based on the amplitudes and phases of all other reflections and the model of the solvent region?' This question can be answered based on the R factor in the first cycle of density modification (which reflects the agreement between each measured amplitude and an estimate of that amplitude based on all other amplitudes and phases along with expectations about features in the map; Cowtan & Main, 1996[Cowtan, K. D. & Main, P. (1996). Acta Cryst. D52, 43-48.]; Terwilliger, 2001[Terwilliger, T. C. (2001). Acta Cryst. D57, 1763-1775.]). A related question can be asked about the phases: `If a phase is estimated from the model of the solvent region, measured amplitudes of structure factors and the experimental values of all other phases, is this phase correlated with its experimentally determined value?' This question can be answered using the correlation of experimental phases with map probability phases obtained in statistical density modification (Terwilliger, 2001[Terwilliger, T. C. (2001). Acta Cryst. D57, 1763-1775.]). A third question that might be asked is `Do the phases calculated using only the highest peaks in the map match the experimental phases?' This question can be answered by truncating the density at a high level, calculating phases from the map and comparing these with the experimental phases (Baker et al., 1993[Baker, D., Bystroff, C., Fletterick, R. J. & Agard, D. A. (1993). Acta Cryst. D49, 429-439.]).

It is important to note that the measures of map quality are analyzed here for their utility in estimating the qualities of experimental electron-density maps, as opposed to maps that have been calculated using a partially correct model or maps that have had density modification applied. An important difference between experimental maps and those obtained using a model or based on density modification is that in the latter cases the maps have been specifically adjusted in order to maximize one or more of the properties that are being measured. For example, density modification typically flattens the solvent region of the map. Similarly, a map calculated from a model will tend to have a high skewness of the density values and a high connectivity of high electron density. Some of these measures may also be useful in these two other important cases, but the values of each measure corresponding to a particular quality of map are likely to be substantially different.

In this work, we implement ten different measures of quality of experimental electron-density maps, develop a simple Bayesian approach to estimating map quality from each and show how the individual estimates can be combined to yield useful overall estimates of map quality. These map-quality estimates are incorporated into the PHENIX AutoSol wizard and are used to make decisions during automated structure solution.

2. Materials and methods

2.1. Structure solution with the PHENIX AutoSol wizard

The PHENIX AutoSol wizard carries out structure solution for SAD/MAD or MIR/SIR/SIRAS data and any combination of these. If data representing more than one heavy-atom substructure were available, the data were grouped into `data sets' with common heavy-atom substructures. All the structure solutions described here had been carried out previously and refined structures were available in each case. Default values were used here for most parameters, but the number and type of anomalous and heavy-atom scatterers and initial values of scattering factors were taken from this prior work.

2.1.1. Analysis with phenix.xtriage

Each available set of data was analyzed using phenix.xtriage (Zwart et al., 2005[Zwart, P. H., Grosse-Kunstleve, R. W. & Adams, P. D. (2005). CCP4 Newsl. 43, contribution 7.]) for circumstances such as twinning, translational noncrystallographic symmetry, unexpectedly strong or weak reflections or groups of reflections or anisotropic overall atomic displacement parameters that may complicate structure determination. The data were corrected for anisotropy before structure solution was carried out if the overall anisotropy correction yielded values that were highly anisotropic (by default, defined as greater than a 1.5-fold ratio among the values of the parameters along the three principal reciprocal axes and greater than 20 Å2 difference between the highest and lowest values). If an anisotropy correction was applied, then the resulting corrected data were used for structure solution only and not for refinement (as an anisotropy correction is applied as part of the refinement process itself).

2.1.2. Substructure solution with HySS

For each data set (i.e. a MAD or SAD data set or an SIR data set) possible heavy-atom substructures were found using the hybrid substructure search (HySS; Grosse-Kunstleve & Adams, 2003[Grosse-Kunstleve, R. W. & Adams, P. D. (2003). Acta Cryst. D59, 1966-1973.]) from iso­morphous, anomalous or dispersive differences or from FA values (Terwilliger, 1994[Terwilliger, T. C. (1994). Acta Cryst. D50, 11-16.]). The high-resolution limit used for the search was typically 3 Å. By default, HySS was run multiple times on each data set using a different random seed each time and the solution with the highest correlation co­efficient between structure factors calculated from the heavy-atom model and the structure-factor differences or FA values was kept. The correlation coefficient was also used, along with the number of sites found, to determine whether to continue searching. Normally, the search was carried out ten times unless the expected number of sites was found and a correlation of 0.3 was obtained. By default, if no solution was found with a correlation of at least 0.2 at a particular resolution, then up to two additional high-resolution limits were tested in steps of 1 Å (e.g. using a high-resolution limit of 3 Å followed if necessary by high-resolution limits of 4 and 5 Å.

2.1.3. Phasing with Phaser and SOLVE and map evaluation

Each potential heavy-atom substructure found above (along with its inverse) was used to calculate phases with Phaser (for SAD phasing; McCoy et al., 2004[McCoy, A. J., Storoni, L. C. & Read, R. J. (2004). Acta Cryst. D60, 1220-1228.]) or SOLVE (for MAD, SIR and MIR phasing; Terwilliger & Berendzen, 1996[Terwilliger, T. C. & Berendzen, J. (1996). Acta Cryst. D52, 749-757.], 1997[Terwilliger, T. C. & Berendzen, J. (1997). Acta Cryst. D53, 571-579.], 1999a[Terwilliger, T. C. & Berendzen, J. (1999a). Acta Cryst. D55, 501-505.],b[Terwilliger, T. C. & Berendzen, J. (1999b). Acta Cryst. D55, 1872-1877.]). (In the examples shown in this work and in PHENIX versions up to v.1.3 the hand of the space group was fixed; later versions of PHENIX automatically invert chiral space groups when considering the inverse of the substructure.) The resulting phases and amplitudes of structure factors, along with weights (the figure of merit of phasing), were used to calculate experimental electron-density maps using a high-resolution limit of 2.5 Å (or lower if data were not available to this resolution). The high-resolution limit was applied in order to reduce the effects of variable resolution limits on the features of electron-density maps. These maps were evaluated with the measures of map quality described in this work and the overall Bayesian estimate of quality was used to rank solutions. In cases where two solutions have very similar heavy-atom parameters (r.m.s.d. among heavy-atom coordinates of less than 1/10 of the high-resolution limit of the data), only the solution with the higher estimate of quality was considered. The estimate of uncertainty in the map quality was used to identify solutions that might plausibly (5% possibility or greater) be the best solution and normally all such solutions were considered at each step. By default, up to three of the highest ranking solutions (six for MIR structures) for the heavy-atom substructure were used to calculate phases and weights at the full available resolution of the data and for density modification.

In the structure determinations carried out below for development of the map evaluation criteria, rankings were instead obtained using a Z-score procedure (Terwilliger & Berendzen, 1999a[Terwilliger, T. C. & Berendzen, J. (1999a). Acta Cryst. D55, 501-505.],b[Terwilliger, T. C. & Berendzen, J. (1999b). Acta Cryst. D55, 1872-1877.]) based only on the skewness of the electron density (as defined below).

2.1.4. Statistical density modification with RESOLVE

The experimental phases obtained above were used as a starting point for statistical density modification using RESOLVE (Terwilliger, 2000[Terwilliger, T. C. (2000). Acta Cryst. D56, 965-972.]).

In statistical density modification with the PHENIX Auto­Sol wizard, a probabilistic estimate of the boundary between macromolecule and solvent is identified in two ways and that leading to the lower R factor in density modification is used. The first method (Wang, 1985[Wang, B.-C. (1985). Methods Enzymol. 115, 90-112.]) is based on the local r.m.s. density, smoothing the squared density using a sphere (Leslie, 1987[Leslie, A. G. W. (1987). Acta Cryst. A43, 134-136.]) with a smoothing radius (rsmooth) given by an empirically derived formula (chosen by optimizing parameters carrying out density modification using model data),

[r_{\rm smooth}\,\, ({\rm \AA}) = 2.41\,\,{\rm \AA}\,\,(d_{\rm min}/1\,\,{\rm \AA})^{0.9}\langle m\rangle^{-0.26}, \eqno (1)]

where dmin is the high-resolution limit of the data and 〈m〉 is the mean figure of merit of phasing. The second method for solvent-boundary identification uses a comparison of histograms of density based on model maps calculated with partially randomized phases with local histograms of density in the experimental map to assign a probability that each point in the map is part of the macromolecule or part of the solvent region. In both cases a probabilistic solvent boundary is obtained (Terwilliger, 1999[Terwilliger, T. C. (1999). Acta Cryst. D55, 1863-1871.]).

Noncrystallographic symmetry (NCS) is used in density modification if it is detected based on the heavy-atom sub­structure and the presence of correlated density at NCS-related positions in the electron-density map (Terwilliger, 2002a[Terwilliger, T. C. (2002a). Acta Cryst. D58, 2082-2086.],b[Terwilliger, T. C. (2002b). Acta Cryst. D58, 2213-2215.]). The value of rsmooth described above is used as a smoothing radius in a local correlation map to identify the region over which NCS holds (Vellieux et al., 1995[Vellieux, F. M. D. A. P., Hunt, J. F., Roy, S. & Read, R. J. (1995). J. Appl. Cryst. 28, 347-351.]).

2.1.5. Model building with RESOLVE

After density modification, the PHENIX AutoSol wizard carries out automated model building using a single cycle of building with the PHENIX AutoBuild wizard (Terwilliger et al., 2008[Terwilliger, T. C., Grosse-Kunstleve, R. W., Afonine, P. V., Moriarty, N. W., Zwart, P. H., Hung, L.-W., Read, R. J. & Adams, P. D. (2008). Acta Cryst. D64, 61-69.]) or using rapid methods for building secondary structure of proteins and nucleic acids (T. Terwilliger, unpublished work). Initially, a secondary-structure-only model is built into each map. The correlation between a map calculated from the model and the density-modified map is then determined. If the value of the map–model correlation is less than a preset value (typically 0.35), then the building procedure is repeated with a standard cycle of building using the methods in the PHENIX AutoBuild wizard. If a map–model correlation of a given value (typically 0.20) or greater is obtained for at least one solution, then the top solution is identified as that with the highest value of the map–model correlation. If a lower map–model correlation is obtained, then the top solution is identified (see below) based on the Bayesian estimates of quality using the skewness of electron density (skew) and the correlation of local r.m.s. density (r2RMS).

2.2. Evaluation of measures of map quality

A set of measures of map quality were applied to experimental maps (or structure-factor amplitudes, phases and weights) obtained from real but re-enacted structure determinations. Each of the structures considered had been determined previously, so that phases from a refined model could be used with measured amplitudes to calculate a model map to use as a standard. The `true' quality of each map was taken to be the correlation with the corresponding standard map calculated at the same nominal resolution. Each measure of quality was applied to each map and the resulting scores were saved along with the corresponding `true' quality. The structure-solution process was automatically carried out by the PHENIX AutoSol wizard and each experimentally phased map that was obtained during the structure-solution process was examined in this way. To reduce the number of near-duplicate solutions considered, all solutions for a structure that had nearly identical values of the map correlation to the standard map (within a range of ±0.0005 in map correlation) were considered to be the same and only the first was used in the analysis. For comparisons involving two possible enantiomers of a solution, the two enantiomers of a solution sometimes differed only slightly (i.e. the heavy-atom sub­structure was nearly centrosymmetric). In these analyses of enantiomeric pairs, only those that differed by an r.m.s.d. of at least 0.5 Å were considered.

For analysis of map quality, electron-density maps and structure factors were calculated using a high-resolution limit of 2.5 Å (if data were available to that resolution), as described above for the PHENIX AutoSol wizard. Before applying each of the measures of map quality, the experimental maps were normalized to a mean of zero and a variance of unity. They were then adjusted in two steps to reduce the contribution from high density at the coordinates of heavy-atom sites. (The high density at heavy-atom sites might otherwise lead to high values for the skewness, NCS correlation, contrast and possibly other measures.) Firstly, the electron density within a radius (r) of each heavy-atom site used in phasing (where r was given by twice the resolution of the data or 5 Å, whichever was greater) was limited to values less than or equal to twice the r.m.s. (2σ) of the map. Secondly, the electron density everywhere in the map was limited to values in the range −5σ to +5σ. This modified map is referred to below as the normalized truncated experimental electron-density map.

Weighted electron-density maps were calculated in the PHENIX environment (Adams et al., 2002[Adams, P. D., Grosse-Kunstleve, R. W., Hung, L.-W., Ioerger, T. R., McCoy, A. J., Moriarty, N. W., Read, R. J., Sacchettini, J. C., Sauter, N. K. & Terwilliger, T. C. (2002). Acta Cryst. D58, 1948-1954.]) using RESOLVE (Terwilliger, 2000[Terwilliger, T. C. (2000). Acta Cryst. D56, 965-972.]) on a grid with a spacing of 1/3 of the high-resolution limit of the data or finer. Map correlations were obtained by calculating the correlation coefficient of a pair of maps at all the grid points in the asymmetric unit of the unit cell. Model–map correlations were calculated in the same way, except that one map was calculated from the model and an overall B factor (b_overall) was adjusted to maximize the correlation. This correlation was further maximized by adjusting a parameter (rFFT) representing the radius around atoms in the model to be included in FFT-based density calculations (typically about equal to the high-resolution limit of the data). For protein chains, an increment in isotropic thermal factors (beta_b) for each bond between side-chain atoms and the Cβ atom was also applied to maximize the correlation.

2.3. Real-space map-quality measures

The measures of map quality used in this work are described in this and the following section and are summarized in Table 1[link].

Table 1
Real-space measures of map quality tested in this work

      Expected properties
Method Symbol Basis Perfect map Random map
Skewness of electron density skew High positive density and no negative density in a good map Positive skewness Near-zero skewness
Contrast of electron density c Solvent and macromolecule have different r.m.s. variation in densities High contrast Low contrast
Correlation of local r.m.s. density r2RMS Solvent region is contiguous so local r.m.s. is correlated with neighboring local r.m.s. High correlation Low correlation
Flatness of electron density F Solvent region has nearly flat electron density High value of flatness Low value of flatness
Number of regions enclosing high density Nr Chains of a macromolecule can be represented by a few connected regions of density Few (but extended) connected regions Many short connected regions
Overlap of NCS-related density ONCS If NCS is present, NCS-related density is similar High overlap Low overlap
2.3.1. Skewness of electron density

The skewness (skew) of each normalized truncated map (as described in §[link]2.1) was calculated using the relation

[{\rm skew} = \langle \rho^{3}\rangle/\langle \rho^{2}\rangle^{3/2}, \eqno (2)]

where the electron density (ρ) was calculated at all the grid points in the asymmetric unit. This quantity reflects the skewness of the density values in the map.

2.3.2. Contrast of electron density

The contrast between the r.m.s. (root-mean-square) density in the solvent region and the r.m.s. density in the macromolecular region was calculated from the standard deviation of the local r.m.s. density over the entire asymmetric unit (Terwilliger & Berendzen, 1999a[Terwilliger, T. C. & Berendzen, J. (1999a). Acta Cryst. D55, 501-505.]; Sheldrick, 2002[Sheldrick, G. M. (2002). Z. Kristallogr. 217, 644-650.]). The normalized truncated density described in §2.2[link] was first squared. The squared density was then smoothed by averaging all values within a moving sphere with radius (r) given by the larger of 6 Å or twice the high-resolution limit of the data. The standard deviation (s) of the smoothed squared density was then calculated. To compensate for the effect of the solvent fraction in the crystal (f) on the resulting value, the standard deviation (s) calculated above was multiplied by the factor [(1 − f)/f]1/2 to yield the contrast c,

[c = [(1-f)/f]^{1/2}s. \eqno (3)]

The correction factor [(1 − f)/f]1/2 was chosen because it leads to a value of 1 for the contrast for a map for which the entire solvent region has zero variance and the nonsolvent region has a constant and nonzero variance.

2.3.3. Correlation of local r.m.s. density

The presence of contiguous flat solvent regions in a map was detected using the correlation coefficient of the smoothed squared electron density, calculated as described above, with the same quantity calculated using half the value of the smoothing radius, yielding the correlation of r.m.s. density, r2RMS. In this way the local value of the r.m.s. density within a small local region (typically within a radius of 3 Å) is compared with the local r.m.s. density in a larger local region (typically within a radius of 6 Å). If there were a large contiguous solvent region and another large contiguous region containing the macromolecule, the local r.m.s. density in the small region would be expected to be highly correlated with the r.m.s. density in the larger region. On the other hand, if the `solvent' region were broken up into many small flat regions, then this correlation would be expected to be smaller.

2.3.4. Flatness of the solvent region

A normalized truncated electron-density map was partitioned between regions of solvent and macromolecule as described in §2.1.4[link]. The r.m.s. electron density in the solvent region (r.m.s.SOLVENT) and in the region of the macromolecule (r.m.s.PROT) were then calculated. The flatness (F) of the solvent region was expressed as the difference between the two,

[F = {\rm r.m.s._{PROT} - r.m.s._{SOLVENT}}. \eqno (4)]

2.3.5. Number of regions enclosing high density

A threshold of density (t) was found such that 5% of the volume of the asymmetric unit of the crystal had a density greater than this threshold t. All the grid points in the map above the threshold t were marked. The number of discrete regions (Nregions) containing marked points was then counted. For this purpose, a discrete region was defined as a set of all marked grid points that can be connected by tracing from one adjacent marked grid point to another (including symmetry-related marked grid points). To partially compensate for the fact that lower resolution maps have fewer grid points, the number of regions was multiplied by the high-resolution limit of the data used to calculate the map (dmin). To further compensate for the volume of the asymmetric unit containing the macromolecule, the number of regions was then divided by the fraction of the asymmetric unit that contains macromolecule (f) and the volume of the asymmetric unit (V) to yield the normalized number of regions per unit volume (Nr),

[N_{\rm r} = N_{\rm regions}/(fV). \eqno (5)]

2.3.6. Overlap of NCS-related density

If noncrystallographic symmetry was found in the heavy-atom substructure for a solution, then the map was examined for the presence of correlated density at NCS-related locations in the map (Cowtan & Main, 1998[Cowtan, K. & Main, P. (1998). Acta Cryst. D54, 487-493.]; Vellieux et al., 1995[Vellieux, F. M. D. A. P., Hunt, J. F., Roy, S. & Read, R. J. (1995). J. Appl. Cryst. 28, 347-351.]). The overlap (ONCS) between density at NCS-related locations was used to evaluate noncrystallographic symmetry,

[O_{\rm NCS} = \langle \rho_{i}\rho_{j}\rangle, \eqno (6)]

where ρi and ρj are density at NCS-related locations in the asymmetric unit and the average is either within a sphere with radius rsmooth (as described above for identifying the solvent boundary) or over a region within the asymmetric unit. The values of density ρi used were those from the normalized truncated map described above. The region where NCS applies was identified as a contiguous region in which the local mean of the overlap is at least cMIN, where this cutoff cMIN was selected to yield a total volume occupied by all NCS copies that was approximately the same as the total volume (f) occupied by the macromolecule in the asymmetric unit (Terwilliger, 2002a[Terwilliger, T. C. (2002a). Acta Cryst. D58, 2082-2086.]). For the purposes of evaluating a map, the mean value of the overlap of NCS density, ONCS, was calculated over this entire NCS region. If the value of the overlap found was less than OMIN (typically, OMIN = 0.3), the NCS was ignored.

2.4. Reciprocal-space map-quality measures

2.4.1. R factor and phase correlation from statistical density modification

The amplitudes and phases of structure factors calculated using statistical density modification, but without including the experimental phase probabilities, can be compared with the observed amplitudes and experimental phases (Cowtan & Main, 1996[Cowtan, K. D. & Main, P. (1996). Acta Cryst. D52, 43-48.]; Terwilliger, 2001[Terwilliger, T. C. (2001). Acta Cryst. D57, 1763-1775.]). These comparisons yield an R value (RDENMOD) for the amplitudes and a mean cosine of the phase difference (mDENMOD) for the phases.

2.4.2. Figure of merit of phasing

The mean figure of merit of phasing (〈m〉) was used directly from Phaser (for SAD phasing calculations; McCoy et al., 2004[McCoy, A. J., Storoni, L. C. & Read, R. J. (2004). Acta Cryst. D60, 1220-1228.]) or SOLVE (for MIR and MAD phasing calculations; Terwilliger & Berendzen, 1999a[Terwilliger, T. C. & Berendzen, J. (1999a). Acta Cryst. D55, 501-505.],b[Terwilliger, T. C. & Berendzen, J. (1999b). Acta Cryst. D55, 1872-1877.]) as an estimate of the quality of a map.

2.4.3. Density truncation (peak-picking)

The number of non-H atoms (n) in the asymmetric unit was roughly estimated from the fraction of the asymmetric unit that contains macromolecule (f) and the volume of the asymmetric unit (V) using an approximate average atomic volume Vo = 19 Å3 (Stroud & Fauman, 1995[Stroud, R. M. & Fauman, E. B. (1995). Protein Sci. 4, 2392-2404.]) using the relation n = fV/Vo. The highest 3n/4 grid points in the asymmetric unit of the electron-density map were then identified and C atoms were placed at these grid points. A map was calculated from these C atoms and the correlation (r2TRUNCATION) with the original map was obtained after adjusting an overall thermal factor to maximize this correlation.

2.5. Bayesian estimates of map quality

A simple Bayesian approach was used to create estimators of map quality based on one or more of the measures of map quality described in §§[link]2.3 and [link]2.4. For each measure (e.g. skew), the comparison of maps with the corresponding solved structures yielded a list of values of `true' map correlation (r2MODEL) and the measure of quality (e.g. skew). A two-dimensional histogram was created to represent the joint distribution p(r2MODEL, skew). The distributions were sampled with 30 bins for each variable, with the range of allowed values of each ranging from −0.1 to 1.1. Any values obtained outside this range were put in the closest available bin. To compensate for the fact that insufficient data (1359) were present to generate an accurate value for all 900 bins, the values of p(r2MODEL, skew) were smoothed using a Gaussian smoothing algorithm in which p(r2MODEL, skew) was convoluted with a Gaussian function G(r) with a radius (σ) of three bins {G(r) ∝ exp[−(u2 + v2/(2σ2)]}, reducing the effective number of bins to about 100.

To estimate the value of map quality (r2MODEL) from a new observation of the quality measure (skew), Bayes' rule (Hamilton, 1964[Hamilton, W. C. (1964). Statistics in Physical Science. New York: The Ronald Press Company.]) was used,

[p(r^{2}_{\rm MODEL}|{\rm skew}) = Ap_{\rm o}(r^{2}_{\rm MODEL})p({\rm skew}|r^{2}_{\rm MODEL}), \eqno (7a)]

where the normalization factor A assures that the integrated probability for r2MODEL is unity and is given by

[A = 1/\textstyle \int \limits_{r^{2\prime}}[p_{\rm o}(r^{2\prime})p({\rm skew}|r^{2\prime})] \, {\rm d}r^{2\prime}. \eqno (7b)]

(7a)[link] says that the (posterior) probability of a particular value of r2MODEL, given the measurement skew, is the prior probability of r2MODEL [po(r2MODEL)] multiplied by the conditional probability [p(skew|r2MODEL)] of measuring this value of skew given that r2MODEL is the correct value, divided by a normalization factor. We calculated the conditional probability p(skew|r2MODEL) in (7a[link]) from the joint probability distribution p(r2MODEL, skew) using the relation

[p({\rm skew}|r^{2}_{\rm MODEL}) = p(r^{2}_{\rm MODEL}, {\rm skew})/p(r^{2}_{\rm MODEL}). \eqno (7c)]

For the present work, we assume that the prior probability distribution po(r2MODEL) is uniform on [0, 1].

If several measures of map quality (e.g. skew and the contrast c) have been measured, then the estimates can be combined using the same approach:

[p(r^{2}_{\rm MODEL}|{\rm skew}, c) = Ap_{\rm o}(r^{2}_{\rm MODEL})p({\rm skew}, c|r^{2}_{\rm MODEL}), \eqno (8a)]

[A = 1/\textstyle \int \limits_{r^{2\prime}}[p_{\rm o}(r^{2\prime})p({\rm skew}, c|r^{2\prime})] \, {\rm d}r^{2\prime}. \eqno (8b)]

We approximate the probability distribution p(skew, c|r2MODEL) as the product of the two two-dimensional conditional probabilities that we have estimated above,

[p({\rm skew}, c|r^{2}_{\rm MODEL}) \propto p({\rm skew}|r^{2}_{\rm MODEL})p(c|r^{2}_{\rm MODEL}), \eqno (9)]

which amounts to assuming that the skewness and contrast c are conditionally independent for a given fixed r2MODEL value.

To obtain the estimated value and variance of r2MODEL given a set of observations of predictor variables (e.g. skew, c), we used the probability distribution given by (8a)[link] and calculated the expectation value of 〈r2MODEL〉,

[\langle r^{2}_{\rm MODEL}\rangle = \textstyle \int \limits_{r^{2\prime}} p(r^{2\prime}|{\rm skew},c)r^{2\prime} \, {\rm d}r^{2\prime}, \eqno (10a)]

[\langle \sigma^{2}\rangle = \textstyle \int \limits_{r^{2\prime}} p(r^{2\prime}|{\rm skew},c) (r^{2\prime}- \langle r^{2}_{\rm MODEL}\rangle)^{2} \, {\rm d}r^{2'}. \eqno (10b)]

An improved estimate of the conditional probability distributions such as p(skew, c|r2MODEL) could potentially be obtained by calculating the covariance of the variables skew and c for each fixed value of r2MODEL and assuming a normal distribution of skew and c for this fixed value of r2MODEL. This formulation differs from that in (9)[link] by including correlations between skew and c instead of assuming that they are zero and also through the assumption of normality in the distributions of skew and c for fixed r2MODEL. Leaving out the fixed value of r2MODEL for clarity, representing the two-dimensional vector (skew, c) as x = (skew, c) and the mean values of skew and c for this value of r2MODEL as u = (〈skew〉, 〈c〉), we can write (Hamilton, 1964[Hamilton, W. C. (1964). Statistics in Physical Science. New York: The Ronald Press Company.])

[p({\rm skew},c) \simeq \exp[-\textstyle {1 \over 2}({\bf x}-{\bf u})\Sigma^{-1}({\bf x}-{\bf u})^{T}][2\pi \det(\Sigma)], \eqno (11a)]

where Σ is the covariance matrix with elements σij representing the variation of skew and c around their means 〈skew〉 and 〈c〉,

[\sigma_{12} = \sigma_{21} = \langle ({\rm skew}-\langle {\rm skew}\rangle)(c-\langle c \rangle)\rangle = {\rm cov} ({\rm skew},c), \eqno (11b)]

[\sigma_{11} = \langle({\rm skew}-\langle {\rm skew}\rangle)^{2}\rangle = \sigma^{2}_{\rm skew}, \eqno (11c)]

[\sigma_{22} = \langle(c-\langle c\rangle)^{2}\rangle = \sigma^{2}_{c}. \eqno (11d)]

To test this approach we used the data described above, but grouped in bins of r2MODEL. The observations in each bin of r2MODEL were analyzed using (11a)–(11d)[link][link][link][link] based on the values of the N predictor variables (skew, c…) for all the observations in that bin to obtain an approximation of the conditional probability distribution p(skew, c|r2MODEL) for that bin. This set of approximations (one for each bin of r2MODEL) was then used in (8)[link][link] to estimate r2MODEL for individual sets of observations of the N predictor variables. This approach gave correlations that were at most marginally improved over those obtained using estimates of the conditional probability distribution p(skew, c|r2MODEL) based on (9)[link]. For example, using skew and correlation of local r.m.s. density (r2RMS) as predictor variables and analyzing the same data shown in Table 3 (but without cross-validation), the overall correlation coefficient between the true values of r2MODEL and estimates obtained using (9)[link] (in which independence of skew and r2RMS is assumed) was 0.925. Using (10)[link][link][link] (assuming Gaussian distributions for skew and r2RMS) and setting the covariance terms to zero (assuming independence of skew and r2RMS) yielded a value of 0.926; the same analysis but including the covariance terms yielded a value of 0.927. As this approach did not significantly improve the correlation, it was not used. Fig. 1(c) suggests that the assumption of normality in the distributions of the predictor variables (e.g. skew and r2RMS) for fixed r2MODEL is not well justified. This may partially explain the poor performance of this approach.

2.6. Structures and data used

Data from 47 structures in the PHENIX library of MAD, SAD and MIR data sets were used along with 246 MAD and SAD structures from the Joint Center for Structural Genomics (JCSG; https://www.jcs.org ). The structures from the PHENIX library included 1029B (PDB code 1n0e; Chen et al., 2004[Chen, S., Jancrick, J., Yokota, H., Kim, R. & Kim, S.-H. (2004). Proteins, 55, 785-791.]), 1038B (1lql; Choi et al., 2003[Choi, I.-G., Shin, D. H., Brandsen, J., Jancarik, J., Busso, D., Yokota, H., Kim, R. & Kim, S.-H. (2003). J. Struct. Funct. Genomics, 4, 31-34.]), 1063B (1lfp; Shin et al., 2002[Shin, D. H., Yokota, H., Kim, R. & Kim, S.-H. (2002). Proc. Natl Acad. Sci. USA, 99, 7980-7985.]), 1071B (1nf2; Shin, Roberts et al., 2003[Shin, D. H., Roberts, A., Jancarik, J., Yokota, H., Kim, R., Wemmer, D. E. & Kim, S.-H. (2003). Protein Sci. 12, 1464-1472.]), 1102B (1l2f; Shin, Nguyen et al., 2003[Shin, D. H., Nguyen, H. H., Jancarik, J., Yokota, H., Kim, R. & Kim, S.-H. (2003). Biochemistry, 42, 13429-13437.]), 1167B (1s12; Shin et al., 2005[Shin, D. H., Lou, Y., Jancarik, J., Yokota, H., Kim, R. & Kim, S.-H. (2005). J. Struct. Biol. 152, 113-117.]), aep-transaminase (1m32; Chen et al., 2002[Chen, C. C. H., Zhang, H., Kim, A. D., Howard, A., Sheldrick, G. M., Mariano-Dunaway, D. & Herzberg, O. (2002). Biochemistry, 41, 13162-13169.]), armadillo (3bct; Huber et al., 1997[Huber, A. H., Nelson, W. J. & Weis, W. I. (1997). Cell, 90, 871-882.]), calmodulin (1exr; Wilson & Brunger, 2000[Wilson, M. A. & Brunger, A. T. (2000). J. Mol. Biol. 301, 1237-1256.]), cobd (1kus; Cheong et al., 2002[Cheong, C. G., Bauer, C. B., Brushaber, K. R., Escalante-Semerena, J. C. & Rayment, I. (2002). Biochemistry, 41, 4798-4808.]), cp-synthase (1l1e; Huang et al., 2002[Huang, C.-C., Smith, C. V., Glickman, M. S., Jacobs, W. R. Jr & Sacchettini, J. C. (2002). J. Biol. Chem. 277, 11559-11569.]), cyanase (1dw9; Walsh et al., 2000[Walsh, M. A., Otwinowski, Z., Perrakis, A., Anderson, P. M. & Joachimiak, A. (2000). Structure, 8, 505-514.]), epsin (1edu; Hyman et al., 2000[Hyman, J., Chen, H., Di Fiore, P. P., De Camilli, P. & Brunger, A. T. (2000). J. Cell. Biol. 149, 537-546.]), flr (1bkj; Tanner et al., 1996[Tanner, J. J., Lei, B., Tu, S. C. & Krause, K. L. (1996). Biochemistry, 35, 13531-13539.]), fusion-complex (1sfc; Sutton et al., 1998[Sutton, R. B., Fasshauer, D., Jahn, R. & Brünger, A. T. (1998). Nature (London), 395, 347-353.]), gene-5 (1vqb; Skinner et al., 1994[Skinner, M. M., Zhang, H., Leschnitzer, D. H., Guan, Y., Bellamy, H., Sweet, R. M., Gray, C. W., Konings, R. N., Wang, A. H. & Terwilliger, T. C. (1994). Proc. Natl Acad. Sci. USA, 91, 2071-2075.]), gere (1fse; Ducros et al., 2001[Ducros, V. M., Lewis, R. J., Verma, C. S., Dodson, E. J., Leonard, G., Turkenburg, J. P., Murshudov, G. N., Wilkinson, A. J. & Brannigan, J. A. (2001). J. Mol. Biol. 306, 759-771.]), gpatase (1ecf; Muchmore et al., 1998[Muchmore, C. R., Krahn, J. M., Kim, J. H., Zalkin, H. & Smith, J. L. (1998). Protein Sci. 7, 39-51.]), granulocyte (2gmf; Rozwarski et al., 1996[Rozwarski, D. A., Diederichs, K., Hecht, R., Boone, T. & Karplus, P. A. (1996). Proteins, 26, 304-313.]), groEL (1oel; Braig et al., 1995[Braig, K., Adams, P. D. & Brünger, A. T. (1995). Nature Struct. Biol. 2, 1083-1094.]), group2-intron (1kxk; Zhang & Doudna, 2002[Zhang, L. & Doudna, J. A. (2002). Science, 295, 2084-2088.]), hn-rnp (1ha1; Shamoo et al., 1997[Shamoo, Y., Krueger, U., Rice, L. M., Williams, K. R. & Steitz, T. A. (1997). Nature Struct. Biol. 4, 215-222.]), ic-lyase (1f61; Sharma et al., 2000[Sharma, V., Sharma, S., Hoener zu Bentrup, K., McKinney, J. D., Russell, D. G., Jacobs, W. R. Jr & Sacchettini, J. C. (2000). Nature Struct. Biol. 7, 663-668.]), insulin (2bn3; Nanao et al., 2005[Nanao, M. H., Sheldrick, G. M. & Ravelli, R. B. G. (2005). Acta Cryst. D61, 1227-1237.]), lysozyme (unpublished results; CSHL Macromolecular Crystallo­graphy Course), mbp (1ytt; Burling et al., 1996[Burling, F. T., Weis, W. I., Flaherty, K. M. & Brünger, A. T. (1996). Science, 271, 72-77.]), mev-kinase (1kkh; Yang et al., 2002[Yang, D., Shipman, L. W., Roessner, C. A., Scott, A. I. & Sacchettini, J. C. (2002). J. Biol. Chem. 277, 9462-9467.]), myoglobin (A. Gonzales, personal communication), nsf-d2 (1nsf; Yu et al., 1998[Yu, R. C., Hanson, P. I., Jahn, R. & Brünger, A. T. (1998). Nature Struct. Biol. 5, 803-811.]), nsf-n (1qcs; Yu et al., 1999[Yu, R. C., Jahn, R. & Brünger, A. T. (1999). Mol. Cell, 4, 97-107.]), p32 (1p32; Jiang et al., 1999[Jiang, J., Zhang, Y., Krainer, A. R. & Xu, R. M. (1999). Proc. Natl Acad. Sci. USA, 96, 3572-3577.]), p9 (1bkb; Peat et al., 1998[Peat, T. S., Newman, J., Waldo, G. S., Berendzen, J. & Terwilliger, T. C. (1998). Structure, 6, 1207-1214.]), pdz (1kwa; Daniels et al., 1998[Daniels, D. L., Cohen, A. R., Anderson, J. M. & Brünger, A. T. (1998). Nature Struct. Biol. 5, 317-325.]), penicillopepsin (3app; James & Sielecki, 1983[James, M. N. & Sielecki, A. R. (1983). J. Mol. Biol. 163, 299-361.]), psd-95 (1jxm; Tavares et al., 2001[Tavares, G. A., Panepucci, E. H. & Brunger, A. T. (2001). Mol. Cell, 8, 1313-1325.]), qaprtase (1qpo; Sharma et al., 1998[Sharma, V., Grubmeyer, C. & Sacchettini, J. C. (1998). Structure, 6, 1587-1599.]), rab3a (1zbd; Ostermeier & Brunger, 1999[Ostermeier, C. & Brunger, A. T. (1999). Cell, 96, 363-374.]), rh-dehalogenase (1bn7; Newman et al., 1999[Newman, J., Peat, T. S., Richard, R., Kan, L., Swanson, P. E., Affholter, J. A., Holmes, I. H., Schindler, J. F., Unkefer, C. J. & Terwilliger, T. C. (1999). Biochemistry, 38, 16105-16114.]), rnase-p (1nz0; Kazantsev et al., 2003[Kazantsev, A. V., Krivenko, A. A., Harrington, D. J., Carter, R. J., Holbrook, S. R., Adams, P. D. & Pace, N. R. (2003). Proc. Natl Acad. Sci. USA, 100, 7497-7502.]), rnase-s (1rge; Sevcik et al., 1996[Sevcik, J., Dauter, Z., Lamzin, V. S. & Wilson, K. S. (1996). Acta Cryst. D52, 327-344.]), rop (1f4n; Willis et al., 2000[Willis, M. A., Bishop, B., Regan, L. & Brunger, A. T. (2000). Structure Fold. Des. 8, 1319-1328.]), s-hydrolase (1a7a; Turner et al., 1998[Turner, M. A., Yuan, C. S., Borchardt, R. T., Hershfield, M. S., Smith, G. D. & Howell, P. L. (1998). Nature Struct. Biol. 5, 369-376.]), sec17 (1qqe; Rice & Brunger, 1999[Rice, L. M. & Brunger, A. T. (1999). Mol. Cell, 4, 85-95.]), synapsin (1auv; Esser et al., 1998[Esser, L., Wang, C. R., Hosaka, M., Smagula, C. S., Sudhof, T. C. & Deisenhofer, J. (1998). EMBO J. 17, 977-984.]), synaptotagmin (1dqv; Sutton et al., 1999[Sutton, R. B., Ernst, J. A. & Brunger, A. T. (1999). J. Cell Biol. 147, 589-598.]), tryparedoxin (1qk8; Alphey et al., 1999[Alphey, M. S., Leonard, G. A., Gourley, D. G., Tetaud, E., Fairlamb, A. H. & Hunter, W. N. (1999). J. Biol. Chem. 274, 25613-25622.]), ut-synthase (1e8c; Gordon et al., 2001[Gordon, E., Flouret, B., Chantalat, L., van Heijenoort, J., Mengin-Lecreulx, D. & Dideberg, O. (2001). J. Biol. Chem. 276, 10999-11006.]) and vmp ( l8w; Eicken et al., 2002[Eicken, C., Sharma, V., Klabunde, T., Lawrenz, M. B., Hardham, J. M., Norris, S. J. & Sacchettini, J. C. (2002). J. Biol. Chem. 277, 21691-21696.]).

The structures from the JCSG included PDB (Bernstein et al., 1977[Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, E. F. Jr, Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977). J. Mol. Biol. 112, 535-542.]; Berman et al., 2000[Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, I. N., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28, 235-242.]) entries 1o1x (Xu et al., 2004[Xu, Q. et al. (2004). Proteins, 56, 171-175.]), 1vjf, 1vjr, 1vk4, 1vk8, 1vk9, 1vkd, 1vkn, 1vl0, 1vl5, 1vli, 1vlo, 1vly, 1vm8, 1vmg, 1vmi, 1vp8, 1vpm, 1vpz (Rife et al., 2005[Rife, C. et al. (2005). Proteins, 61, 449-453.]), 1vqr (Xu, Schwarzenbacher, McMullan et al., 2006[Xu, Q., Schwarzenbacher, R., McMullan, D. et al. (2006). Proteins, 62, 292-296.]), 1vqs, 1vqy, 1vqz, 1vr0 (DiDonato et al., 2006[DiDonato, M. et al. (2006). Proteins, 65, 771-776.]), 1vr3 (Xu, Schwarzenbacher, Krishna et al., 2006[Xu, Q., Schwarzenbacher, R., Krishna, S. S. et al. (2006). Proteins, 64, 808-813.]), 1vr5, 1vr8 (Xu, Krishna et al., 2006[Xu, Q., Krishna, S. S. et al. (2006). Proteins, 65, 777-782.]), 1vrm (Han et al., 2006[Han, G. W. et al. (2006). Proteins, 64, 1083-1090.]), 1z82, 1z85, 1zbt, 1zej, 1zh8, 1zko, 1ztc, 1zx8 (Jin et al., 2006[Jin, K. K. et al. (2006). Proteins, 63, 1112-1118.]), 1zy9, 1zyb, 2a3n, 2aam, 2aml, 2ax3, 2b8n (Schwarzenbacher et al., 2006[Schwarzenbacher, R. et al. (2006). Proteins, 65, 243-248.]), 2etd, 2ets, 2evr, 2f4i, 2f4l, 2fg0, 2fg9, 2fna, 2ftr, 2fup, 2fur, 2g0w, 2gb5, 2gc9, 2gf6, 2gfg, 2ghr (Zubieta et al., 2007[Zubieta, C. et al. (2007). Proteins, 68, 999-1005.]), 2gno, 2go7, 2gpi, 2gpj, 2grj, 2gvh, 2h1q, 2h1t, 2h9f, 2hcf, 2hh6, 2hhz, 2hi0, 2hq7, 2hq9, 2hr2, 2hsz, 2huh, 2hx1, 2hx5, 2hxv, 2i02, 2i8d, 2i9w, 2ig6, 2ii1, 2ilb, 2isb, 2it9, 2itb, 2nuj, 2o08, 2o2g, 2o2x, 2o2z, 2o3l, 2o62, 2oa2, 2oaf, 2oc6, 2od5, 2ogi, 2oh1, 2oh3, 2oik, 2ooj, 2ook, 2op5, 2opl, 2oqm, 2ord, 2osd, 2otm, 2ou3, 2ou5, 2ou6, 2own, 2oyo, 2ozg, 2ozj, 2p10, 2p1a, 2p7i, 2p8j, 2pbl, 2peb, 2pfw, 2pg4, 2pgc, 2pke, 2pn1, 2pq7, 2pr7, 2prr, 2prv, 2pv4, 2pv7, 2pwn, 2py6, 2pyq, 2pyx, 2q02, 2q04, 2q0t, 2q14, 2q3l, 2q78, 2q7x, 2q9k, 2q9r, 2qe6, 2qe9, 2qez, 2qg3, 2qhp, 2qj8, 2ql8, 2qml, 2qpx, 2qr6, 2qtp, 2qtq, 2qw5, 2qww, 2qwz, 2qyv, 2r01, 2r0x, 2r1i, 2r3b, 2r44, 2r4i, 2r9v, 2ra9, 2ras, 2rcc, 2rcd, 2rd9, 2rdc, 2re3, 2re7, 2rfp, 2rgq, 2rha, 2rhm, 2rij, 2ril, 2rkh, 3b5e, 3b5o, 3b77, 3b7f, 3b81, 3b8l, 3bb5, 3bb9, 3bcw, 3bdd and 3bde.

3. Results and discussion

3.1. Measures of map quality

A key goal of this work was to identify one or more quality measures of maps or of structure factors that are simple to calculate and that can yield accurate estimates of the qualities of the corresponding electron-density maps. Table 1[link] lists six measures of map quality examined here that are based on the features of the maps (real-space measures) and Table 2[link] lists four additional measures that depend on the structure factors and phases used to calculate maps. The measures were chosen to represent a range of possible measures that cover many important features of electron-density maps and structure factors.

Table 2
Reciprocal-space measures of map quality tested in this work

      Expected properties
Method Symbol Basis Perfect map Random map
Phase correlation from statistical density modification mDENMOD Phases from first cycle of density modification are unbiased and are correlated with experimental phases High mDENMOD Low mDENMOD
R factor from statistical density modification RDENMOD Amplitudes for a reflection can be calculated from phases and amplitudes of all other reflections and expected features of the map Low RDENMOD High RDENMOD
Density truncation r2TRUNCATION Much of the information in a map of a macromolecule consists of the density at points in the map near atomic positions High r2TRUNCATION Low r2TRUNCATION
Mean figure of merit of phasing m Estimates of accuracy of experimental phases are an approximate upper bound on quality of the map High 〈m Low 〈m

To evaluate these measures of map quality, we carried out a re-analysis of data for 246 previously solved MAD, SAD and MIR structures, creating electron-density maps during the structure-determination process and analyzing them with each of the measures in Tables 1[link] and 2[link]. As the structures are all known, the `true' map quality for each map could be calculated as the correlation coefficient r2MODEL between each map and the corresponding map obtained using phases calculated from the refined model of the structure (after any necessary origin shifts are applied) using the PHENIX tool phenix.get_cc_mtz_mtz.

For each of the 246 data sets, the PHENIX AutoSol wizard was used to scale the data, calculate anomalous or isomorphous differences and identify potential heavy-atom solutions. As both hands of the heavy-atom substructure would normally be considered, at least two sets of heavy-atom solutions were generally obtained for each data set. Additionally, as MIR and MAD data sets have more than one set of anomalous or isomorphous differences, these data sets generally yielded additional heavy-atom solutions. Also for MIR and MAD structure determinations, difference Fourier analysis was used to generate even more heavy-atom solutions. Consequently, there were a total of 1359 heavy-atom solutions analyzed in this work even though there were only 246 data sets.

Figs. 1[link](a)–1[link](j) show the values of each measure plotted against r2MODEL for 1359 maps based on structures calculated from the MAD, SAD and MIR data listed in §[link]2.6. The maps represent the phases obtained at several stages in structure determination. Some were calculated using heavy-atom solutions found from anomalous or isomorphous differences or from FA values with HySS (Grosse-Kunstleve & Adams, 2003[Grosse-Kunstleve, R. W. & Adams, P. D. (2003). Acta Cryst. D59, 1966-1973.]). Others were calculated using the corresponding substructures with inverted hand. Others were obtained from difference Fourier (MIR) and anomalous difference Fourier (MAD) analyses. In the case of MIR, a large number of additional solutions were obtained by combinations of partial solutions from different derivatives.

[Figure 1]
Figure 1
Measures of the quality of electron-density maps and structure factors. Measures of quality were calculated as described in the text for 1359 sets of structure factors and associated maps. Each measure is plotted with an abscissa equal to the correlation of density of the map with a map calculated from a final model (r2MODEL). Measures based on structures determined at resolutions of 2 Å or higher are shown as black diamonds and those at resolutions lower than 2 Å are shown as purple squares. All measures of quality and the correlation with model density (r2MODEL) were calculated at a resolution of 2.5 Å or the nominal resolution of the data, whichever is the lower. (a) Skewness of electron density. (b) Contrast of electron density. (c) Correlation of local r.m.s. density. (d) Flatness of solvent region. (e) Number of regions enclosing high density. (f) Overlap of NCS-related density. (g) Phase correlation from statistical density modification. (h) R factor from statistical density modification. (i) Density truncation. (j) Figure of merit of phasing.

The general features of the plots in Fig. 1[link] are illustrated by a discussion of Fig. 1[link](a), which shows the skewness of electron density (skew) in experimental maps as a function of the true map quality r2MODEL. In Fig. 1[link](a) the purple squares correspond to data sets with a nominal resolution lower than 2 Å and the black diamonds to data sets with resolutions of 2 Å or higher. (Note that the data for all these calculations were truncated at a resolution of 2.5 Å, so that most resolution-dependent differences are likely to be the consequence of data-set-dependent decreases of intensities with resolution rather than the resolution of the data.)

Fig. 1[link](a) shows that the skewness of the electron density depends strongly on the map quality, as represented by the correlation of the density in the map with that of a model map (r2MODEL). The skewness is approximately zero for maps with a correlation in the range 0.0 < r2MODEL < 0.2. It increases slightly for maps with correlations in the range 0.2 < r2MODEL < 0.4 and then increases substantially for maps with higher correlations (r2MODEL > 0.4). The standard deviation of the values of the skewness is about 0.05–0.10 over most ranges of map correlation. For example, for values of map correlation with r2MODEL < 0.2 the mean skewness is −0.02 and the standard deviation is 0.07 and for values of map correlation with 0.4 < r2MODEL < 0.5 the mean skewness is 0.14 with a standard deviation of 0.06. For values of map correlation with 0.6 < r2MODEL < 0.7 the mean skewness is 0.38 with a standard deviation of 0.10. Another way to view these relationships is to note that the difference (0.16) in the mean values of the skewness between values of map correlation of r2MODEL < 0.2 and values of map correlation in the range 0.4 < r2MODEL < 0.5 is about twice the standard deviation of the skewness in either range. This means that the skewness can be expected to differentiate between maps with model correlations r2MODEL of zero and 0.4, but that it cannot differentiate them correctly all of the time. This can also be seen directly from Fig. 1[link](a), in which some of the values of skewness for maps with model correlations r2MODEL near 0.4 are lower than values for maps with near-zero values of r2MODEL.

The maps represented in Fig. 1[link](a) that are based on high-resolution data sets (<2 Å) have values of skewness that are similar to those of lower resolution data sets. This similarity is most likely to reflect the fact that all the data in these calculations were truncated at a resolution of 2.5 Å.

Several of the other nine measures of map quality examined have relationships to model map correlation similar to those described above for the skewness. The contrast (c; Fig. 1[link]b), correlation of local r.m.s. density (r2RMS; Fig. 1[link]c) and flatness of the solvent region (F; Fig. 1[link]d) in particular show very similar behaviour, except that none of these discriminate as well as the skewness between maps of moderate quality (correlations r2MODEL near 0.4) and those of very low quality with correlations near zero. These three measures are all related as they all are based on the presence of solvent and nonsolvent regions in the crystal. However, the calculations differ in that the con­trast (c) does not require knowledge of the solvent boundary while the flatness (F) does. Additionally, the correlation of local r.m.s. density reflects the contiguous nature of the solvent region while the contrast (c) and flatness (F) reflect the presence of a solvent region, whether contiguous or not.

A somewhat different behavior is shown by the number of contiguous regions (Nr) required to enclose the highest 5% of density in a map (Fig. 1[link]e). This measure decreases with increasing map quality, but only slightly, so that it is not a strong discriminator between maps of low and moderate quality.

The overlap of NCS-related density (Fig. 1[link]f) is a measure which, as implemented here, only applies to maps where NCS can be identified from the symmetry present in the heavy-atom sites. It is therefore different from the measures discussed so far and cannot be used as a general measure of map quality. It is nevertheless useful in differentiating between maps with very high model map correlations (r2MODEL) and those that have lower model map correlation.

Figs. 1[link](g) and 1[link](h) show the phase correlations (mDENMOD) and R factors (RDENMOD) obtained from the first cycle of statistical density modification using the same structure factors, phases and weights that were used to calculate the electron-density maps analyzed in Figs. 1[link](a)–1[link](f). In the first cycle of statistical density modification with RESOLVE (Terwilliger, 2000[Terwilliger, T. C. (2000). Acta Cryst. D56, 965-972.]), estimates of the phase and amplitude of a reflection k were obtained using only information from all the other reflections in the data set. The amplitude and phase for reflection k from the density-modification procedure can then be compared with the experimentally observed amplitude and the `experimental' phase (derived using isomorphous or anomalous differences) to yield an R factor for density modification (RDENMOD) and a mean cosine of the phase difference (mDENMOD). Fig. 1[link](g) shows that, as expected, the R factor for density modification decreases with increasing map quality, while Fig. 1[link](h) shows that the phase correlation increases over the same range.

Fig. 1[link](i) shows that the correlation of pseudo-maps calculated using dummy atoms placed at the highest peaks in a map with their corresponding original maps (r2TRUNCATION) is weakly related to the quality of the map. It seems possible that more sophisticated methods of map skeletonization (Baker et al., 1993[Baker, D., Bystroff, C., Fletterick, R. J. & Agard, D. A. (1993). Acta Cryst. D49, 429-439.]) might be more useful in map evaluation than our simple measure.

Finally, Fig. 1[link](j) shows that the mean figure of merit of phasing (〈m〉) is related to the quality of the map, but that there are many maps with very low correlation to the corresponding model maps that nevertheless have high mean figures of merit. This relationship can be understood by con­sidering that the figures of merit of phasing of two maps that are calculated using the same data but opposite enantiomers of the heavy-atom substructure are normally identical for SAD phasing if all the anomalous scatterers are of the same type. Typically, one of these maps may have a high correlation to the model map while the other may have a very low correlation.

Overall, Fig. 1[link] shows that several measures of map quality based on different features of the map and on the structure factors and phases leading to the map have strong relationships to the quality of the electron-density map, with the skewness of electron density clearly being one of the best indicators of map quality.

In Fig. 1[link](a) there is one point at (r2MODEL = 0.03, skew = 0.31) that is quite far from all the others, with a value of the skewness that is far greater than all the other points with very small values of r2MODEL. This point corresponds to a heavy-atom solution found during the analysis of data from PDB entry 2re3 which yields an electron-density map that is incorrect but not at all random. The crystal has translational noncrystallographic symmetry and the electron density in the electron-density map for this solution is offset from that the correct map by an origin shift that is noncrystallographic. Consequently, our analysis of the two maps, which only allows crystallographic translational offsets, shows a near-zero correlation of the maps despite considerable similarity (a correlation of 0.73 when offset). We note that the translation involved does correspond to a real difference: if the coordinates of PDB entry 2re3 are shifted by this translation (0, 0.735, 0) in space group P43212 the amplitudes of the structure factors do change and the R factor based on experimental amplitudes for the model in this position is 0.53, compared with a value of 0.23 for the deposited model. Note that this solution also appears in Fig. 2[link](a) at the position (0.60, 0.03) and in Fig. 2[link](b) at the position (0.62, 0.03) where it is again an outlier.

[Figure 2]
Figure 2
Comparisons of cross-validated estimates of map quality with actual map quality. Measures of map quality as shown in Fig. 1[link] were used in (7a)[link] and (8a)[link] to estimate overall map quality. The calculations were carried out one data set at a time. For each data set, joint probability distributions of each measure of quality and true quality [e.g. p(skew, r2MODEL)] were calculated excluding data from all solutions for that structure. These cross-validated joint probability distributions were used in (7a)[link] and (8a)[link] to estimate map quality using the measures of quality for each map associated with that data set. In each case, the true map quality (r2MODEL) is plotted as a function of the Bayesian estimates of map quality. (a) Estimates of map quality using the skewness of electron density in (7a)[link]. (b) Estimates using the correlation of local r.m.s. density in (7a)[link]. (c) Estimates using the skewness and correlation of local r.m.s. density in (8a)[link].

3.2. Estimation of map quality using features of the map and of the structure factors used to calculate the map

Fig. 1[link] showed that each of the six different features of electron-density maps and the four characteristics of structure factors we examined depend in some way on the quality of the corresponding map. We used the Bayesian approach described in §[link]2.5 to use this information to estimate map quality from these ten features. The general idea of this approach is very simple. Imagine that a particular map has been examined, yielding a value of the skewness of electron density of 0.20. Considering the plot in Fig. 1[link](a), it is reasonable to conclude that this map is very likely to have a correlation (r2MODEL) with the corresponding model map in the range 0.4 < r2MODEL < 0.6, because nearly all examples in Fig. 1[link](a) with a skewness of about 0.20 are in this range. Equation 7[link](a)[link] is a mathematical way to make this statement. Equation 8[link](a)[link] is a similar statement, except that it includes more than one measure of map quality. As described in §[link]2.5, we assume here that the various measures of map quality (skewness, contrast etc.) are independent. This allows the simple calculation in (8a)[link] to be used to estimate r2MODEL from several measures of map quality.

Fig. 2[link](a) shows the results of using (7a)[link] to estimate r2MODEL from the skewness of electron density. In Fig. 2[link](a) the abscissa is the Bayesian estimate of r2MODEL using the skewness of electron density and the ordinate is the true value of r2MODEL. To ensure that the parameters in the Bayesian estimator did not contain information on the specific cases being tested, a cross-validation procedure was used in which all solutions for the structure being examined were excluded when con­structing the Bayesian estimators. Fig. 2[link](a) shows that in cases where the true value of r2MODEL is in the range 0.0 < r2MODEL < 0.2, the estimates of r2MODEL all have very similar values of about 0.1. This can be understood from Fig. 1[link](a), in which the skewness is seen to be insensitive to values of r2MODEL in this range. The Bayesian estimates of r2MODEL for low values of skewness are all close to the midpoint of this range, as they are simply the average of plausible values of r2MODEL given the observation of the value of the skewness. For higher values of r2MODEL, the estimates of r2MODEL are closer to the true values. Overall, the correlation coefficient between the Bayesian estimates and the true values of r2MODEL is 0.90 and the r.m.s. error in prediction of r2MODEL is 0.10. As a check on our procedures, we note that the mean uncertainty estimates for r2MODEL obtained from the Bayesian procedure was 0.11, which is quite similar to the actual r.m.s. error in prediction of r2MODEL of 0.10.

Table 3[link] summarizes the accuracy of the Bayesian estimates of map quality based on each of the measures described in Tables 1[link] and 2[link] (with the exception of the overlap of NCS density, which is not included because it does not apply to most of the maps in our tests). For each measure, Table 3[link] lists the values of the correlation coefficient of the Bayesian estimates and the true map quality (r2MODEL) along with the r.m.s. prediction error in r2MODEL. Overall, the skewness of electron density, with a correlation coefficient between Bayesian estimates and true values of r2MODEL of 0.90, is the most reliable indicator of map quality, with the correlation of local r.m.s. density being the next best (correlation of 0.85) and with contrast, flatness of solvent region and density-modification phase correlations and R factor giving only slightly poorer predictions of r2MODEL, with correlations in the range 0.75–0.80.

Table 3
Cross-validated prediction correlation

Quality measure(s) Prediction correlation coefficient R.m.s. prediction error
skew 0.90 0.10
c 0.78 0.15
r2RMS 0.85 0.12
F 0.80 0.14
Nr 0.42 0.20
mDENMOD 0.80 0.10
RDENMOD 0.77 0.14
r2TRUNCATION 0.48 0.21
m 0.42 0.21
skew and r2RMS 0.92 0.09

To identify an optimal combination of measures for estimation of map quality, we began with the best single measure (skew) and used (9)[link] to combine information from each of the other measures. The measure giving the best prediction of r2MODEL in combination with the skewness of electron density was the correlation of local r.m.s. density (r2RMS; Table 3[link]). Fig. 2[link](b) shows how the estimates of map quality obtained using just the correlation of r.m.s. electron density compare with actual map quality and Fig. 2[link](c) shows estimates based on both skewness and correlation of r.m.s. electron density. The correlation of r.m.s. density was the next-best single pre­dictor after skew; in addition, the correlation of prediction errors from these two variables was relatively low (0.61; Table 4[link]). The assumptions in (9)[link] are therefore relatively well justified and it is not surprising that the resulting estimator is improved over that using just the skewness of electron density. This process was continued but no further improvement was obtained in the Bayesian estimator. The optimized combination of measures based on skewness and correlation of local r.m.s. density yielded a correlation coefficient between the Bayesian estimates and true values of r2MODEL of 0.92 and an r.m.s. prediction error of 0.09 (Table 3[link] and Fig. 2[link]c).

Table 4
Correlation of prediction errors

Values of r2MODEL were estimated for each measure of map quality using (7a)[link] as in Fig. 3[link]. The true values of r2MODEL were then subtracted, yielding prediction errors for each map for each measure of map quality. The correlation coefficients (r2) of prediction errors among the various measures of map quality are listed.

  skew c r2RMS F Nr mDENMOD RDENMOD r2TRUNCATION m
skew 1                
c 0.69 1              
r2RMS 0.60 0.82 1            
F 0.73 0.95 0.84 1          
Nr 0.61 0.86 0.61 0.79 1        
mDENMOD 0.63 0.81 0.79 0.88 0.66 1      
RDENMOD 0.66 0.79 0.74 0.79 0.77 0.84 1    
r2TRUNCATION 0.54 0.82 0.63 0.71 0.88 0.61 0.76 1  
m 0.55 0.73 0.61 0.68 0.69 0.64 0.70 0.85 1

3.3. Identification of the hand of heavy-atom substructures using measures of map quality

A particularly important application of measures of map quality is the identification of the hand of heavy-atom sub­structures. The hand of the heavy-atom substructure cannot normally be identified directly during substructure determination by direct methods such as the HySS procedure (Grosse-Kunstleve & Adams, 2003[Grosse-Kunstleve, R. W. & Adams, P. D. (2003). Acta Cryst. D59, 1966-1973.]) used here. Consequently, some procedure is needed for identifying which hand of the heavy-atom substructure is correct. Figs. 3[link](a)–3[link](i) compare the values obtained for nine measures of map quality based on 353 pairs of heavy-atom substructures with correct and inverted handedness from the 186 data sets in this work for which the space group was not chiral (structures with chiral space groups were excluded so the hand of the space group could be fixed in this analysis). The mean figure of merit of phasing is not shown because it is essentially identical for the two hands of the substructure in all the cases examined. The 706 maps represented by these 353 pairs are a subset of the 1359 maps used in the calculations shown in Fig. 1[link].

[Figure 3]
Figure 3
Comparisons of measures of map quality for pairs of maps based on enantiomorphic heavy-atom substructures. For structures in nonchiral space groups, all pairs of solutions derived from enantiomorphic pairs of heavy-atom substructures were selected. The member of the pair leading to the map with the higher correlation coefficient to the corresponding model map was identified as the `correct' hand and the other as the `inverse' hand. The value of each measure of map quality for the correct hand is plotted as the abscissa in each plot and the value of the measure for the corresponding inverse hand is the ordinate. Maps based on MAD data are represented as black diamonds, those from MIR data (all maps examined in this figure are from single derivatives) are represented as red triangles and those from SAD data are represented as blue squares. (a) Skewness of electron density. (b) Contrast of electron density. (c) Correlation of local r.m.s. density. (d) Flatness of solvent region. (e) Number of regions enclosing high density. (f) Overlap of NCS-related density. (g) Phase correlation from statistical density modification. (h) R factor from statistical density modification. (i) Density truncation.

It is somewhat remarkable that these nine measures of map quality all give very good discrimination between the correct and incorrect hands of heavy-atom substructures (Fig. 3[link] and Table 5[link]), even though they are not all so useful in estimating the absolute quality of maps (Table 3[link]). The best discrimination between correct and incorrect hands is obtained with the skewness of electron density (Fig. 3[link]a), as expected from the high correlation of estimates of map quality based on skewness with actual map quality (Table 3[link]). Using the skewness of electron density to make decisions on handedness (Fig. 3[link]a), 98% of decisions (in cases where the quality of the maps for the two hands differs by at least 0.05) would correctly identify the map with the higher quality (Table 5[link]). Note that for SIR or MIR data without anomalous differences none of these techniques can identify the correct hand because the inverse hand of the heavy atoms leads to a map that has inverse chirality but is otherwise identical. A similar argument would partially apply in cases where an anomalous signal is present but is weak. This situation is presumably the cause of the large number of MIR-derived points near the diagonal of the panels in Fig. 3[link].

Table 5
Decision-making accuracacy for enantiomeric pairs

The percentage of cases in which the higher (or lower, as appropriate) value of the quality measure is associated with the higher value of the actual map correlation coefficient with the corresponding model map. Only cases in which the actual map correlations differ by at least 0.05 are considered.

Quality measure(s) Percentage of correct predictions
skew 0.98
c 0.94
r2RMS 0.95
F 0.94
Nr 0.95
ONCS 0.90
mDENMOD 0.93
RDENMOD 0.94
r2TRUNCATION 0.97

3.4. Identification of the highest quality density-modified map for a structure

The scoring procedures described above are based on an analysis of the phases and structure-factor amplitudes corresponding to an experimental electron-density map. Prior to final map interpretation, however, the experimentally determined phases of structure factors are normally optimized by density modification (Wang, 1985[Wang, B.-C. (1985). Methods Enzymol. 115, 90-112.]). Several additional parameters are required for density modification, including identification of noncrystallographic symmetry (if any), solvent content and the solvent region. It seemed possible that these parameters might not always be chosen optimally and the best experimental maps might not always lead to the best density-modified maps. Consequently, some additional method of scoring the density-modified maps might be useful.

To investigate this possibility, we carried out structure determination with the data sets used in Fig. 1[link], this time with default parameters in the PHENIX AutoSol wizard including Bayesian estimates of experimental map quality based on the skewness of electron density (skew) and the correlation of local r.m.s. density (r2RMS). For each structure, the final steps were to carry out density modification with RESOLVE (Terwilliger, 2000[Terwilliger, T. C. (2000). Acta Cryst. D56, 965-972.]) on the top-ranked solution or solutions and then to build a preliminary atomic model. In cases where there was one solution that was much better than all others (see §[link]2), then only that solution was used in density modification. However, in most cases there were multiple solutions with similar Bayesian estimates of quality and up to three (MAD, SAD) or six (MIR) of these were used in density modification.

Fig. 4[link](a) shows the relationship between the qualities of experimental maps and the qualities of the corresponding density-modified maps for 569 experimental maps for 260 data sets. For experimental maps of high quality (correlation with model map over 0.6), the quality of the density-modified map is generally (but not always) very high, typically ranging from 0.75 to 0.90. For very poor experimental maps (correlation with model map of less than 0.2) the density-modified maps were also uniformly poor (typical map correlation of 0–0.1). On the other hand, for experimental maps of moderate quality (map correlation between 0.2 and 0.5) the quality of the density-modified maps vary over a wide range (from about 0.1 to about 0.9).

[Figure 4]
Figure 4
Map qualities of density-modified maps. (a) Qualities of density-modified maps as a function of the qualities of the corresponding experimental maps. (b) Comparison of qualities of pairs of density-modified maps for the same structure derived from experimental maps of similar quality (see text).

Much of the variability in density modification for experimental maps of moderate quality illustrated in Fig. 4[link](a) could arise from the intrinsic differences in solvent content, non­crystallographic symmetry, type of experiment and resolution between the different structures. To examine this, we have plotted in Fig. 4[link](b) the true map qualities of density-modified maps for all 176 pairs of solutions from Fig. 4[link](a) that are from the same structure, use the same number of non-crystallo­graphic symmetry operators (if any) in density modification and have values of true experimental map correlation within 0.05 of each other. In Fig. 4[link](b) each point corresponds to one pair of solutions. The abscissa is the value of density-modified map quality for the solution with the higher value of experimental map quality and the ordinate is the density-modified map quality for the solution with lower experimental map quality. Each member of such a pair has identical solvent content, resolution, actual noncrystallographic symmetry, number of noncrystallographic symmetry operators identified and experiment type and differs only slightly in true experimental map quality. Fig. 4[link](b) shows that when all these factors are controlled the quality of pairs of density-modified maps is very similar in most cases, but substantial differences in the qualities of the density-modified maps remain in some cases.

The remaining variation in effects of density modification illustrated in Fig. 4[link](b) suggests that it might be useful to carry out a final ranking of solutions based on a measure of quality of the corresponding density-modified maps. We used the map–model correlation between density-modified maps and the preliminary atomic models built with the PHENIX AutoSol wizard as such a measure of quality. Table 6[link] shows the utility of this map–model correlation in identifying the solution with the best density-modified map for each of the 149 structures used in Fig. 4[link](a) in which there was more than one solution tested by density modification and model building and in which the model-building process yielded a model with a model–map correlation of at least 0.20.

Table 6
Decision-making accuracies in choosing the solution with the best experimental or density-modified map

The percentage of correct predictions of best maps is the percentage of cases in which the solution with the highest value of the quality measure has a map correlation coefficient with the corresponding model map within 0.02 of that of the best obtained for any solution for that structure. The analysis is based on 372 sets of structure factors and associated maps obtained from 149 data sets as in Fig. 1[link], selecting the top-ranked 2–6 solutions and carrying out density modification with RESOLVE (Terwilliger, 2000[Terwilliger, T. C. (2000). Acta Cryst. D56, 965-972.]) to yield density-modified maps. A model was built into each density-modified map using a rapid method for building helices and strands. If the value of the map–model correlation was less than 0.35, then the building procedure was repeated with a standard cycle of building using the methods in the PHENIX AutoBuild wizard (Terwilliger et al., 2008[Terwilliger, T. C., Grosse-Kunstleve, R. W., Afonine, P. V., Moriarty, N. W., Zwart, P. H., Hung, L.-W., Read, R. J. & Adams, P. D. (2008). Acta Cryst. D64, 61-69.]) and the value of the map–model correlation from the full standard procedure was used. Only structures for which at least one model–map correlation was at least 0.20 are included in the analysis. The worst error in identification of the best maps is the largest value of the difference between the correlation coefficient of the best map with the corresponding model map and that of the map with the highest value of the quality measure.

  Percentage of correct predictions of best maps Worst error in identification of best maps
Quality measure Experimental maps Density-modified maps Experimental maps Density-modified maps
Bayesian estimate using skew and r2RMS of experimental map 91 88 0.29 0.58
Map-model correlation for model built into density-modified map 87 92 0.40 0.26

The first row in Table 6[link] provides a background for this analysis by considering the use of our Bayesian estimates of experimental map quality to identify the best solutions. In Table 6[link] experimental map quality and density-modified map quality are examined separately. Using the Bayesian estimates (which are based on the experimental maps), the best experimental map for a particular structure could be identified 91% of the time. The worst error in identification of the best experimental map corresponded to a difference in map correlation of 0.29. Next, density-modified maps were examined. The solution with the highest Bayesian estimate of experimental map quality led to the best density-modified map in 88% of cases; however, the worst error in identification of the best density-modified map corresponded to a very large difference in map correlation of 0.58.

Using the map–model correlation for the model built into the density-modified maps in decision-making the situation is reversed, with the best experimental map identified only 87% of the time and the best density-modified map identified 92% of the time. Further, the density-modified map yielding the highest map–model correlation was never worse than the very best density-modified map obtained by more than a difference in correlation of 0.26, showing that the model–map correlation is a useful criterion for final ranking of solutions. Overall, Table 6[link] indicates that model–map correlation is an improvement over Bayesian estimates of experimental map quality for the identification of the best density-modified map.

3.5. Using the PHENIX AutoSol wizard to redetermine structures from the PHENIX structure library

To test the overall utility of the Bayesian estimates of map quality in the overall context of structure determination, we carried out automated structure determinations on all 48 MAD, SAD and MIR structures in the PHENIX structure library with the PHENIX AutoSol wizard. The structures in this library range from relatively straightforward cases of SAD and MAD structure determination to considerably more complex cases that involved combinations of SAD or MAD with MIR and difficult-to-solve heavy-atom substructures. In the tests carried out here, only one source of phase information was used for each structure (i.e. MAD, SAD or MIR), except in the case of the fusion-complex structure (PDB code 1sfc ; Sutton et al., 1998[Sutton, R. B., Fasshauer, D., Jahn, R. & Brünger, A. T. (1998). Nature (London), 395, 347-353.]), in which SAD and SIR data were combined.

To evaluate the overall contribution of the Bayesian scoring approach described here to structure solution, we compared the qualities of the final density-modified maps obtained with the PHENIX AutoSol wizard using each of three different methods of making decisions during the heavy-atom solution and phasing steps of structure determination. The first method (`perfect scoring') was to use the actual correlation coefficient of each experimental map with that of the corresponding idealized map (using phases from a refined model) to decide which map was best during structure solution. Once density-modified maps had been calculated, the correlations of those maps with the idealized map were used for the final ranking. The second method (`Bayesian scoring') was to use the Bayesian estimates based on the combination of the skewness of electron density and the correlation of local r.m.s. density for decision-making during structure solution. Once density-modified maps had been calculated, a model was built and the correlation between this model and the density-modified map was used for final ranking. The third method (`random scoring') was to use random scores for decision-making during structure solution and then to use the model–map correlation for the final ranking. Fig. 5[link](a) illustrates these comparisons for MAD structure determinations, Fig. 5[link](b) illustrates them for SAD structure determinations and Fig. 5[link](c) for MIR structure determinations.

[Figure 5]
Figure 5
Comparison of quality of density-modified maps obtained using the skewness of electron density and correlation of local r.m.s. density for scoring with those obtained using the true map quality (correlation to the corresponding model map) for scoring. See text for details. The light blue bars labeled `Perfect scoring' correspond to running the PHENIX AutoSol wizard and using the actual experimental map quality to make decisions at each step prior to obtaining density-modified phases and using the actual density-modified map quality to make the final choice of solution. The dark maroon bars labeled `Bayesian scoring' correspond to using the Bayesian scores for experimental maps based on the skewness of electron density and correlation of local r.m.s. density and using the model–map correlation to choose the final density-modified solution. The light green bars labeled `Random scoring' correspond to using random scores to make decisions about experimental map quality and model–map correlation to choose the final solution. Each `random scoring' value is the average of ten separate runs of PHENIX AutoSol wizard carried out with differing random seeds. Note that the `perfect scoring' method does not necessarily lead to the best final map. For example, an experimental map that is not the best one but is chosen by another scoring method could adventitiously yield additional sites that lead to a better final solution. (a) Structures determined using MAD. Structures shown are aep-transaminase (PDB code 1m32; Chen et al., 2002[Chen, C. C. H., Zhang, H., Kim, A. D., Howard, A., Sheldrick, G. M., Mariano-Dunaway, D. & Herzberg, O. (2002). Biochemistry, 41, 13162-13169.]), armadillo (3bct; Huber et al., 1997[Huber, A. H., Nelson, W. J. & Weis, W. I. (1997). Cell, 90, 871-882.]), cobd (1kus; Cheong et al., 2002[Cheong, C. G., Bauer, C. B., Brushaber, K. R., Escalante-Semerena, J. C. & Rayment, I. (2002). Biochemistry, 41, 4798-4808.]), cp-synthase (1l1e; Huang et al., 2002[Huang, C.-C., Smith, C. V., Glickman, M. S., Jacobs, W. R. Jr & Sacchettini, J. C. (2002). J. Biol. Chem. 277, 11559-11569.]), cyanase (1dw9; Walsh et al., 2000[Walsh, M. A., Otwinowski, Z., Perrakis, A., Anderson, P. M. & Joachimiak, A. (2000). Structure, 8, 505-514.]), epsin (1edu; Hyman et al., 2000[Hyman, J., Chen, H., Di Fiore, P. P., De Camilli, P. & Brunger, A. T. (2000). J. Cell. Biol. 149, 537-546.]), gene-5 (1vqb; Skinner et al., 1994[Skinner, M. M., Zhang, H., Leschnitzer, D. H., Guan, Y., Bellamy, H., Sweet, R. M., Gray, C. W., Konings, R. N., Wang, A. H. & Terwilliger, T. C. (1994). Proc. Natl Acad. Sci. USA, 91, 2071-2075.]), gere (1fse; Ducros et al., 2001[Ducros, V. M., Lewis, R. J., Verma, C. S., Dodson, E. J., Leonard, G., Turkenburg, J. P., Murshudov, G. N., Wilkinson, A. J. & Brannigan, J. A. (2001). J. Mol. Biol. 306, 759-771.]), gpatase (1ecf; Muchmore et al., 1998[Muchmore, C. R., Krahn, J. M., Kim, J. H., Zalkin, H. & Smith, J. L. (1998). Protein Sci. 7, 39-51.]), group2-intron (1kxk; Zhang & Doudna, 2002[Zhang, L. & Doudna, J. A. (2002). Science, 295, 2084-2088.]), ic-lyase (1f61; Sharma et al., 2000[Sharma, V., Sharma, S., Hoener zu Bentrup, K., McKinney, J. D., Russell, D. G., Jacobs, W. R. Jr & Sacchettini, J. C. (2000). Nature Struct. Biol. 7, 663-668.]), lysozyme (unpublished results; CSHL Macromolecular Crystallography Course), mbp (1ytt; Burling et al., 1996[Burling, F. T., Weis, W. I., Flaherty, K. M. & Brünger, A. T. (1996). Science, 271, 72-77.]), mev-kinase (1kkh; Yang et al., 2002[Yang, D., Shipman, L. W., Roessner, C. A., Scott, A. I. & Sacchettini, J. C. (2002). J. Biol. Chem. 277, 9462-9467.]), nsf-d2 (1nsf; Yu et al., 1998[Yu, R. C., Hanson, P. I., Jahn, R. & Brünger, A. T. (1998). Nature Struct. Biol. 5, 803-811.]), p32 (1p32; Jiang et al., 1999[Jiang, J., Zhang, Y., Krainer, A. R. & Xu, R. M. (1999). Proc. Natl Acad. Sci. USA, 96, 3572-3577.]), p9 (1bkb; Peat et al., 1998[Peat, T. S., Newman, J., Waldo, G. S., Berendzen, J. & Terwilliger, T. C. (1998). Structure, 6, 1207-1214.]), pdz (1kwa; Daniels et al., 1998[Daniels, D. L., Cohen, A. R., Anderson, J. M. & Brünger, A. T. (1998). Nature Struct. Biol. 5, 317-325.]), psd-95 (1jxm; Tavares et al., 2001[Tavares, G. A., Panepucci, E. H. & Brunger, A. T. (2001). Mol. Cell, 8, 1313-1325.]), rab3a (1zbd; Ostermeier & Brunger, 1999[Ostermeier, C. & Brunger, A. T. (1999). Cell, 96, 363-374.]), s-hydrolase (1a7a; Turner et al., 1998[Turner, M. A., Yuan, C. S., Borchardt, R. T., Hershfield, M. S., Smith, G. D. & Howell, P. L. (1998). Nature Struct. Biol. 5, 369-376.]), synapsin (1auv; Esser et al., 1998[Esser, L., Wang, C. R., Hosaka, M., Smagula, C. S., Sudhof, T. C. & Deisenhofer, J. (1998). EMBO J. 17, 977-984.]), tryparedoxin (1qk8; Alphey et al., 1999[Alphey, M. S., Leonard, G. A., Gourley, D. G., Tetaud, E., Fairlamb, A. H. & Hunter, W. N. (1999). J. Biol. Chem. 274, 25613-25622.]) and vmp (1l8w; Eicken et al., 2002[Eicken, C., Sharma, V., Klabunde, T., Lawrenz, M. B., Hardham, J. M., Norris, S. J. & Sacchettini, J. C. (2002). J. Biol. Chem. 277, 21691-21696.]) (b) Structures determined using SAD: 1029B (1n0e; Chen et al., 2004[Chen, S., Jancrick, J., Yokota, H., Kim, R. & Kim, S.-H. (2004). Proteins, 55, 785-791.]), 1038B (1lql; Choi et al., 2003[Choi, I.-G., Shin, D. H., Brandsen, J., Jancarik, J., Busso, D., Yokota, H., Kim, R. & Kim, S.-H. (2003). J. Struct. Funct. Genomics, 4, 31-34.]), 1063B (1lfp; Shin et al., 2002[Shin, D. H., Yokota, H., Kim, R. & Kim, S.-H. (2002). Proc. Natl Acad. Sci. USA, 99, 7980-7985.]), 1071B (1nf2; Shin, Roberts et al., 2003[Shin, D. H., Roberts, A., Jancarik, J., Yokota, H., Kim, R., Wemmer, D. E. & Kim, S.-H. (2003). Protein Sci. 12, 1464-1472.]), 1102B (1l2f; Shin, Nguyen et al., 2003[Shin, D. H., Nguyen, H. H., Jancarik, J., Yokota, H., Kim, R. & Kim, S.-H. (2003). Biochemistry, 42, 13429-13437.]), 1167B (1s12; Shin et al., 2005[Shin, D. H., Lou, Y., Jancarik, J., Yokota, H., Kim, R. & Kim, S.-H. (2005). J. Struct. Biol. 152, 113-117.]), rnase-p (1nz0; Kazantsev et al., 2003[Kazantsev, A. V., Krivenko, A. A., Harrington, D. J., Carter, R. J., Holbrook, S. R., Adams, P. D. & Pace, N. R. (2003). Proc. Natl Acad. Sci. USA, 100, 7497-7502.]), calmodulin (1exr; Wilson & Brunger, 2000[Wilson, M. A. & Brunger, A. T. (2000). J. Mol. Biol. 301, 1237-1256.]), fusion-complex (1sfc; Sutton et al., 1998[Sutton, R. B., Fasshauer, D., Jahn, R. & Brünger, A. T. (1998). Nature (London), 395, 347-353.]), insulin (2bn3; Nanao et al., 2005[Nanao, M. H., Sheldrick, G. M. & Ravelli, R. B. G. (2005). Acta Cryst. D61, 1227-1237.]), myoglobin (A. Gonzales, personal communication), nsf-n (1qcs; Yu et al., 1999[Yu, R. C., Jahn, R. & Brünger, A. T. (1999). Mol. Cell, 4, 97-107.]), sec17 (1qqe; Rice & Brunger, 1999[Rice, L. M. & Brunger, A. T. (1999). Mol. Cell, 4, 85-95.]) and ut-synthase (1e8c; Gordon et al., 2001[Gordon, E., Flouret, B., Chantalat, L., van Heijenoort, J., Mengin-Lecreulx, D. & Dideberg, O. (2001). J. Biol. Chem. 276, 10999-11006.]). Note that fusion-complex was solved with SAD plus SIR. (c) Structures determined using MIR: flr (1bkj; Tanner et al., 1996[Tanner, J. J., Lei, B., Tu, S. C. & Krause, K. L. (1996). Biochemistry, 35, 13531-13539.]), granulocyte (2gmf; Rozwarski et al., 1996[Rozwarski, D. A., Diederichs, K., Hecht, R., Boone, T. & Karplus, P. A. (1996). Proteins, 26, 304-313.]), groEL (1oel; Braig et al., 1995[Braig, K., Adams, P. D. & Brünger, A. T. (1995). Nature Struct. Biol. 2, 1083-1094.]), hn-rnp (1ha1; Shamoo et al., 1997[Shamoo, Y., Krueger, U., Rice, L. M., Williams, K. R. & Steitz, T. A. (1997). Nature Struct. Biol. 4, 215-222.]), penicillopepsin (3app; James & Sielecki, 1983[James, M. N. & Sielecki, A. R. (1983). J. Mol. Biol. 163, 299-361.]), qaprtase (1qpo; Sharma et al., 1998[Sharma, V., Grubmeyer, C. & Sacchettini, J. C. (1998). Structure, 6, 1587-1599.]), rh-dehalogenase (1bn7; Newman et al., 1999[Newman, J., Peat, T. S., Richard, R., Kan, L., Swanson, P. E., Affholter, J. A., Holmes, I. H., Schindler, J. F., Unkefer, C. J. & Terwilliger, T. C. (1999). Biochemistry, 38, 16105-16114.]), rnase-s (1rge; Sevcik et al., 1996[Sevcik, J., Dauter, Z., Lamzin, V. S. & Wilson, K. S. (1996). Acta Cryst. D52, 327-344.]), rop (1f4n; Willis et al., 2000[Willis, M. A., Bishop, B., Regan, L. & Brunger, A. T. (2000). Structure Fold. Des. 8, 1319-1328.]) and synaptotagmin (1dqv; Sutton et al., 1999[Sutton, R. B., Ernst, J. A. & Brunger, A. T. (1999). J. Cell Biol. 147, 589-598.]).

For MAD, SAD and MIR structure determinations the decision-making procedure using Bayesian estimates of experimental map quality and model–map correlations as estimates of density-modified map quality led to density-modified electron-density maps that were very similar in quality to those obtained using the decision-making process based on actual map quality (Fig. 5[link]). This indicates that the quality of final density-modified maps produced by the PHENIX AutoSol wizard are essentially as good as they can be with any decision-making system, given the algorithms and parameters used to find heavy-atom sites and to carry out phasing, density modification and model building in the wizard.

In addition to the `perfect scoring' and `Bayesian scoring' approaches shown in Fig. 5[link], the figure includes density-modified map quality for solutions obtained using random scores for experimental maps (but still using model–map correlation to evaluate final density-modified maps). Each `random scoring' value is the average of ten runs with differing random seeds, so they represent an average value of the quality of final maps obtained with random scoring of experimental maps. The quality of these maps is generally lower than that of those obtained with either of the other two methods, showing that the scoring is contributing important information to the structure-determination process.

Although Fig. 5[link] indicates that the quality of the final maps obtained with the PHENIX AutoSol wizard are essentially as good as they can be with the structure-solution algorithms in the wizard, it is likely that the number of solutions that need to be examined at each stage in structure determination could be lowered if improved estimates of experimental map quality were available. The default parameters in the PHENIX AutoSol wizard defining the number of solutions to keep at each stage were chosen to be large enough that the best solution was generally in the set that was considered at each stage using the 48 MAD, SAD and MIR data sets examined in Fig. 5[link]. If improved scoring methods are developed, then a systematic re-examination of these default parameters would probably be useful. In the meantime, modifying these parameters to include larger or smaller numbers of solutions at each stage may be useful in cases that are more challenging or that are more straightforward, respectively.

The skewness of electron-density values in an electron-density map has been recognized for some time as a potential indicator of the quality of the map (Podjarny, 1976[Podjarny, A. D. (1976). PhD thesis. Weizmann Institute of Science.]; Lunin, 1993[Lunin, V. Y. (1993). Acta Cryst. D49, 90-99.]). As the skewness of a map is not a familiar quantity to most crystallographers, we illustrate it for `poor' and `good' experimental electron-density maps. Both maps were based on experimental data for aep-transaminase (PDB code 1m32 ; Chen et al., 2002[Chen, C. C. H., Zhang, H., Kim, A. D., Howard, A., Sheldrick, G. M., Mariano-Dunaway, D. & Herzberg, O. (2002). Biochemistry, 41, 13162-13169.]) and were obtained during the course of automated analysis of this data with the PHENIX AutoSol wizard. The poor map was calculated using an incorrect set of heavy-atom sites and the good map was calculated using a largely correct set of heavy-atom sites. Fig. 6[link] shows histograms of the number of grid points in each map with various values of electron density. The x axis in Fig. 6[link] corresponds to electron density in a map normalized to the r.m.s. in the map after subtracting the mean of the map from all values. The dotted lines in Fig. 6[link] illustrate the fraction of grid points in the poor map that correspond to each value of normalized electron density. It may be seen that this histogram of densities from a poor map has a very nearly Gaussian shape. This poor map had a skewness of 0.004 and its correlation to a map based on the refined model of the structure was 0.04. In contrast, the solid lines in Fig. 6[link] illustrate the fraction of grid points in the good map corresponding to various values of electron density. This histogram differs from that derived from the poor map in that it is not symmetrical. The peak is slightly negative of the origin and it has a distinct tail on the positive side of the peak. This good map had a skewness of 0.4 and its correlation to the map based on the refined model was 0.66. Note that the differences in shapes of the histograms based on poor or good maps can be rather small, as in Fig. 6[link]. Nevertheless, the skewness can usually be estimated very accurately because there are typically tens of thousands of grid points in the maps, so that the shapes of the histograms are very precisely defined.

[Figure 6]
Figure 6
Histograms of density corresponding to a poor map (dotted lines, correlation to model map of 0.04) and to a good map (solid lines, correlation to model map of 0.66). See text for details.

4. Conclusions

Each of the ten measures of the quality of experimental electron-density maps evaluated here has some utility in estimating the true quality of these maps. These measures of map quality reflect a wide range of characteristics (Tables 1[link] and 2[link]) ranging from the flatness of the solvent region typically found in macromolecular structures to the connectivity of regions of high electron density corresponding to the chains of polymers in these structures. Overall, the skewness of electron density stands out as the best of these measures (Table 3[link] and Fig. 2[link]). Used in a simple Bayesian estimator, the correlation between map quality estimated with the skewness of electron density with true map quality is about 0.90, while the next-best estimator (the correlation of local r.m.s. density) gives a correlation of only 0.85. Combining the two yields the most useful estimator we have developed, with a correlation between estimated and actual map quality of 0.92 and an r.m.s. prediction error in map quality of 0.09.

With the exception of the mean figure of merit of phasing, which does not depend on the hand of the heavy-atom sub­structure, all the measures of map quality analyzed are remarkably good discriminators between maps calculated using the correct and inverse hands of the heavy-atom sub­structure (Fig. 3[link]).

The PHENIX AutoSol wizard uses a combination of the skewness of electron density and the correlation of local r.m.s. density to form a Bayesian estimator of map quality. The PHENIX AutoSol wizard makes decisions about the heavy-atom substructures to pursue based on these map-quality estimates. Once density-modified maps are available, a model is built into the maps and the map–model correlation is used to identify the best overall solutions. This process yields density-modified electron-density maps of approximately the same overall quality as those obtainable with a perfect decision-making system (Fig. 5[link]).

Our Bayesian estimates of map quality, while highly useful in evaluating experimental maps, are nevertheless not the best indicators of the quality of the corresponding density-modified maps. The map–model correlation obtained after preliminary model building is a better indicator of the quality of density-modified maps (Fig. 4[link] and Table 6[link]).

In this work, we have ignored the resolution-dependence of the measures of map quality. This is made possible in part by the use of a high-resolution limit of 2.5 Å for all the calculations of map quality and is generally justified by the relatively small remaining resolution-dependence of most of the measures of map quality (Fig. 1[link]). Nevertheless, it seems possible that some improvement in estimation of map quality might be obtained by including the resolution-dependence (or the effective overall isotropic displacement factor) of the data in the analysis. Additionally, we have assumed independence of the various measures of map quality in (8a[link]). We were not able to improve the estimates of map quality using a simple covariance-matrix approach to combining estimates of map quality, but other more sophisticated approaches, together with a much greater set of sample data, might lead to improved estimates of map quality.

Acknowledgements

The authors would like to thank the NIH Protein Structure Initiative for generous support of the PHENIX project (1P01 GM063210). This work was supported in part by the US Department of Energy under Contract No. DE-AC02-05CH11231. RJR is supported by a Principal Research Fellowship from the Wellcome Trust (UK). The authors are grateful to the Joint Center for Structural Genomics for making raw data available at https://www.jcsg.org and to the many researchers who contributed their data to the PHENIX structure library. The worksheets used to generate the figures and the data and scripts used to calculate the values in the figures and tables are available at https://solve.lanl.gov/pub/solve/scoring_2009 .

References

First citationAdams, P. D., Grosse-Kunstleve, R. W., Hung, L.-W., Ioerger, T. R., McCoy, A. J., Moriarty, N. W., Read, R. J., Sacchettini, J. C., Sauter, N. K. & Terwilliger, T. C. (2002). Acta Cryst. D58, 1948–1954.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationAlphey, M. S., Leonard, G. A., Gourley, D. G., Tetaud, E., Fairlamb, A. H. & Hunter, W. N. (1999). J. Biol. Chem. 274, 25613–25622.  Web of Science CrossRef PubMed CAS Google Scholar
First citationBaker, D., Bystroff, C., Fletterick, R. J. & Agard, D. A. (1993). Acta Cryst. D49, 429–439.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationBerman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, I. N., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28, 235–242.  Web of Science CrossRef PubMed CAS Google Scholar
First citationBernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, E. F. Jr, Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977). J. Mol. Biol. 112, 535–542.  CrossRef CAS PubMed Web of Science Google Scholar
First citationBlow, D. M. & Crick, F. H. C. (1959). Acta Cryst. 12, 794–802.  CrossRef CAS IUCr Journals Web of Science Google Scholar
First citationBraig, K., Adams, P. D. & Brünger, A. T. (1995). Nature Struct. Biol. 2, 1083–1094.  CrossRef CAS PubMed Web of Science Google Scholar
First citationBurling, F. T., Weis, W. I., Flaherty, K. M. & Brünger, A. T. (1996). Science, 271, 72–77.  CrossRef CAS PubMed Web of Science Google Scholar
First citationChen, C. C. H., Zhang, H., Kim, A. D., Howard, A., Sheldrick, G. M., Mariano-Dunaway, D. & Herzberg, O. (2002). Biochemistry, 41, 13162–13169.  Web of Science CrossRef PubMed CAS Google Scholar
First citationChen, S., Jancrick, J., Yokota, H., Kim, R. & Kim, S.-H. (2004). Proteins, 55, 785–791.  Web of Science CrossRef PubMed CAS Google Scholar
First citationCheong, C. G., Bauer, C. B., Brushaber, K. R., Escalante-Semerena, J. C. & Rayment, I. (2002). Biochemistry, 41, 4798–4808.  Web of Science CrossRef PubMed CAS Google Scholar
First citationChoi, I.-G., Shin, D. H., Brandsen, J., Jancarik, J., Busso, D., Yokota, H., Kim, R. & Kim, S.-H. (2003). J. Struct. Funct. Genomics, 4, 31–34.  CrossRef PubMed CAS Google Scholar
First citationColovos, C., Toth, E. A. & Yeates, T. O. (2000). Acta Cryst. D56, 1421–1429.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationCowtan, K. D. & Main, P. (1996). Acta Cryst. D52, 43–48.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationCowtan, K. & Main, P. (1998). Acta Cryst. D54, 487–493.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationDaniels, D. L., Cohen, A. R., Anderson, J. M. & Brünger, A. T. (1998). Nature Struct. Biol. 5, 317–325.  Web of Science CrossRef CAS PubMed Google Scholar
First citationDiDonato, M. et al. (2006). Proteins, 65, 771–776.  Web of Science CrossRef PubMed CAS Google Scholar
First citationDucros, V. M., Lewis, R. J., Verma, C. S., Dodson, E. J., Leonard, G., Turkenburg, J. P., Murshudov, G. N., Wilkinson, A. J. & Brannigan, J. A. (2001). J. Mol. Biol. 306, 759–771.  Web of Science CrossRef PubMed CAS Google Scholar
First citationEicken, C., Sharma, V., Klabunde, T., Lawrenz, M. B., Hardham, J. M., Norris, S. J. & Sacchettini, J. C. (2002). J. Biol. Chem. 277, 21691–21696.  Web of Science CrossRef PubMed CAS Google Scholar
First citationEsser, L., Wang, C. R., Hosaka, M., Smagula, C. S., Sudhof, T. C. & Deisenhofer, J. (1998). EMBO J. 17, 977–984.  Web of Science CrossRef CAS PubMed Google Scholar
First citationGordon, E., Flouret, B., Chantalat, L., van Heijenoort, J., Mengin-Lecreulx, D. & Dideberg, O. (2001). J. Biol. Chem. 276, 10999–11006.  Web of Science CrossRef PubMed CAS Google Scholar
First citationGrosse-Kunstleve, R. W. & Adams, P. D. (2003). Acta Cryst. D59, 1966–1973.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationHamilton, W. C. (1964). Statistics in Physical Science. New York: The Ronald Press Company.  Google Scholar
First citationHan, G. W. et al. (2006). Proteins, 64, 1083–1090.  Web of Science CrossRef PubMed CAS Google Scholar
First citationHuang, C.-C., Smith, C. V., Glickman, M. S., Jacobs, W. R. Jr & Sacchettini, J. C. (2002). J. Biol. Chem. 277, 11559–11569.  Web of Science CrossRef PubMed CAS Google Scholar
First citationHuber, A. H., Nelson, W. J. & Weis, W. I. (1997). Cell, 90, 871–882.  CrossRef CAS PubMed Web of Science Google Scholar
First citationHyman, J., Chen, H., Di Fiore, P. P., De Camilli, P. & Brunger, A. T. (2000). J. Cell. Biol. 149, 537–546.  Web of Science CrossRef PubMed CAS Google Scholar
First citationJames, M. N. & Sielecki, A. R. (1983). J. Mol. Biol. 163, 299–361.  CrossRef CAS PubMed Web of Science Google Scholar
First citationJiang, J., Zhang, Y., Krainer, A. R. & Xu, R. M. (1999). Proc. Natl Acad. Sci. USA, 96, 3572–3577.  Web of Science CrossRef PubMed CAS Google Scholar
First citationJin, K. K. et al. (2006). Proteins, 63, 1112–1118.  Web of Science CrossRef PubMed CAS Google Scholar
First citationKabsch, W. (1993). J. Appl. Cryst. 26, 795–800.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationKazantsev, A. V., Krivenko, A. A., Harrington, D. J., Carter, R. J., Holbrook, S. R., Adams, P. D. & Pace, N. R. (2003). Proc. Natl Acad. Sci. USA, 100, 7497–7502.  Web of Science CrossRef PubMed CAS Google Scholar
First citationLeslie, A. G. W. (1987). Acta Cryst. A43, 134–136.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationLeslie, A. G. W. (1992). Jnt CCP4/ESF–EACBM Newsl. Protein Crystallogr. 26  Google Scholar
First citationLunin, V. Y. (1993). Acta Cryst. D49, 90–99.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationMcCoy, A. J., Storoni, L. C. & Read, R. J. (2004). Acta Cryst. D60, 1220–1228.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationMuchmore, C. R., Krahn, J. M., Kim, J. H., Zalkin, H. & Smith, J. L. (1998). Protein Sci. 7, 39–51.  Web of Science CrossRef CAS PubMed Google Scholar
First citationNanao, M. H., Sheldrick, G. M. & Ravelli, R. B. G. (2005). Acta Cryst. D61, 1227–1237.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationNewman, J., Peat, T. S., Richard, R., Kan, L., Swanson, P. E., Affholter, J. A., Holmes, I. H., Schindler, J. F., Unkefer, C. J. & Terwilliger, T. C. (1999). Biochemistry, 38, 16105–16114.  Web of Science CrossRef PubMed CAS Google Scholar
First citationOstermeier, C. & Brunger, A. T. (1999). Cell, 96, 363–374.  Web of Science CrossRef PubMed CAS Google Scholar
First citationOtwinowski, Z. & Minor, W. (1997). Methods Enzymol. 276, 307–326.  CrossRef CAS Web of Science Google Scholar
First citationPeat, T. S., Newman, J., Waldo, G. S., Berendzen, J. & Terwilliger, T. C. (1998). Structure, 6, 1207–1214.  Web of Science CrossRef CAS PubMed Google Scholar
First citationPerrakis, A., Morris, R. & Lamzin, V. S. (1999). Nature Struct. Biol. 6, 458–463.  Web of Science CrossRef PubMed CAS Google Scholar
First citationPflugrath, J. W. (1999). Acta Cryst. D55, 1718–1725.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationPodjarny, A. D. (1976). PhD thesis. Weizmann Institute of Science.  Google Scholar
First citationRice, L. M. & Brunger, A. T. (1999). Mol. Cell, 4, 85–95.  Web of Science CrossRef PubMed CAS Google Scholar
First citationRife, C. et al. (2005). Proteins, 61, 449–453.  Web of Science CrossRef PubMed CAS Google Scholar
First citationRozwarski, D. A., Diederichs, K., Hecht, R., Boone, T. & Karplus, P. A. (1996). Proteins, 26, 304–313.  CrossRef CAS PubMed Google Scholar
First citationSchneider, T. R. & Sheldrick, G. M. (2002). Acta Cryst. D58, 1772–1779.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationSchwarzenbacher, R. et al. (2006). Proteins, 65, 243–248.  Web of Science CrossRef PubMed CAS Google Scholar
First citationSevcik, J., Dauter, Z., Lamzin, V. S. & Wilson, K. S. (1996). Acta Cryst. D52, 327–344.  CrossRef CAS IUCr Journals Google Scholar
First citationShamoo, Y., Krueger, U., Rice, L. M., Williams, K. R. & Steitz, T. A. (1997). Nature Struct. Biol. 4, 215–222.  CrossRef CAS PubMed Web of Science Google Scholar
First citationSharma, V., Grubmeyer, C. & Sacchettini, J. C. (1998). Structure, 6, 1587–1599.  Web of Science CrossRef CAS PubMed Google Scholar
First citationSharma, V., Sharma, S., Hoener zu Bentrup, K., McKinney, J. D., Russell, D. G., Jacobs, W. R. Jr & Sacchettini, J. C. (2000). Nature Struct. Biol. 7, 663–668.  Web of Science CrossRef PubMed CAS Google Scholar
First citationSheldrick, G. M. (2002). Z. Kristallogr. 217, 644–650.  Web of Science CrossRef CAS Google Scholar
First citationShin, D. H., Lou, Y., Jancarik, J., Yokota, H., Kim, R. & Kim, S.-H. (2005). J. Struct. Biol. 152, 113–117.  Web of Science CrossRef PubMed CAS Google Scholar
First citationShin, D. H., Nguyen, H. H., Jancarik, J., Yokota, H., Kim, R. & Kim, S.-H. (2003). Biochemistry, 42, 13429–13437.  Web of Science CrossRef PubMed CAS Google Scholar
First citationShin, D. H., Roberts, A., Jancarik, J., Yokota, H., Kim, R., Wemmer, D. E. & Kim, S.-H. (2003). Protein Sci. 12, 1464–1472.  Web of Science CrossRef PubMed CAS Google Scholar
First citationShin, D. H., Yokota, H., Kim, R. & Kim, S.-H. (2002). Proc. Natl Acad. Sci. USA, 99, 7980–7985.  Web of Science CrossRef PubMed CAS Google Scholar
First citationSkinner, M. M., Zhang, H., Leschnitzer, D. H., Guan, Y., Bellamy, H., Sweet, R. M., Gray, C. W., Konings, R. N., Wang, A. H. & Terwilliger, T. C. (1994). Proc. Natl Acad. Sci. USA, 91, 2071–2075.  CrossRef CAS PubMed Web of Science Google Scholar
First citationStroud, R. M. & Fauman, E. B. (1995). Protein Sci. 4, 2392–2404.  CrossRef CAS PubMed Google Scholar
First citationSutton, R. B., Ernst, J. A. & Brunger, A. T. (1999). J. Cell Biol. 147, 589–598.  Web of Science CrossRef PubMed CAS Google Scholar
First citationSutton, R. B., Fasshauer, D., Jahn, R. & Brünger, A. T. (1998). Nature (London), 395, 347–353.  Web of Science CAS PubMed Google Scholar
First citationTanner, J. J., Lei, B., Tu, S. C. & Krause, K. L. (1996). Biochemistry, 35, 13531–13539.  CrossRef CAS PubMed Web of Science Google Scholar
First citationTavares, G. A., Panepucci, E. H. & Brunger, A. T. (2001). Mol. Cell, 8, 1313–1325.  Web of Science CrossRef PubMed CAS Google Scholar
First citationTerwilliger, T. C. (1994). Acta Cryst. D50, 11–16.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationTerwilliger, T. C. (1999). Acta Cryst. D55, 1863–1871.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationTerwilliger, T. C. (2000). Acta Cryst. D56, 965–972.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationTerwilliger, T. C. (2001). Acta Cryst. D57, 1763–1775.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationTerwilliger, T. C. (2002a). Acta Cryst. D58, 2082–2086.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationTerwilliger, T. C. (2002b). Acta Cryst. D58, 2213–2215.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationTerwilliger, T. C. (2003). Acta Cryst. D59, 1688–1701.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationTerwilliger, T. C. & Berendzen, J. (1996). Acta Cryst. D52, 749–757.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationTerwilliger, T. C. & Berendzen, J. (1997). Acta Cryst. D53, 571–579.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationTerwilliger, T. C. & Berendzen, J. (1999a). Acta Cryst. D55, 501–505.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationTerwilliger, T. C. & Berendzen, J. (1999b). Acta Cryst. D55, 1872–1877.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationTerwilliger, T. C., Grosse-Kunstleve, R. W., Afonine, P. V., Moriarty, N. W., Zwart, P. H., Hung, L.-W., Read, R. J. & Adams, P. D. (2008). Acta Cryst. D64, 61–69.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationTurner, M. A., Yuan, C. S., Borchardt, R. T., Hershfield, M. S., Smith, G. D. & Howell, P. L. (1998). Nature Struct. Biol. 5, 369–376.  Web of Science CrossRef CAS PubMed Google Scholar
First citationVellieux, F. M. D. A. P., Hunt, J. F., Roy, S. & Read, R. J. (1995). J. Appl. Cryst. 28, 347–351.  CrossRef CAS IUCr Journals Google Scholar
First citationWalsh, M. A., Otwinowski, Z., Perrakis, A., Anderson, P. M. & Joachimiak, A. (2000). Structure, 8, 505–514.  Web of Science CrossRef PubMed CAS Google Scholar
First citationWang, B.-C. (1985). Methods Enzymol. 115, 90–112.  CrossRef CAS PubMed Google Scholar
First citationWeeks, C. M., Adams, P. D., Berendzen, J., Brunger, A. T., Dodson, E. J., Grosse-Kunstleve, R. W., Schneider, T. R., Sheldrick, G. M., Terwilliger, T. C., Turkenburg, M. G. & Usón, I. (2003). Methods Enzymol. 374, 37–82.  Web of Science CrossRef PubMed CAS Google Scholar
First citationWillis, M. A., Bishop, B., Regan, L. & Brunger, A. T. (2000). Structure Fold. Des. 8, 1319–1328.  Web of Science CrossRef PubMed CAS Google Scholar
First citationWilson, M. A. & Brunger, A. T. (2000). J. Mol. Biol. 301, 1237–1256.  Web of Science CrossRef PubMed CAS Google Scholar
First citationXu, Q. et al. (2004). Proteins, 56, 171–175.  Web of Science CrossRef PubMed CAS Google Scholar
First citationXu, Q., Krishna, S. S. et al. (2006). Proteins, 65, 777–782.  Web of Science CrossRef PubMed CAS Google Scholar
First citationXu, Q., Schwarzenbacher, R., Krishna, S. S. et al. (2006). Proteins, 64, 808–813.  Web of Science CrossRef PubMed CAS Google Scholar
First citationXu, Q., Schwarzenbacher, R., McMullan, D. et al. (2006). Proteins, 62, 292–296.  Web of Science CrossRef PubMed CAS Google Scholar
First citationYang, D., Shipman, L. W., Roessner, C. A., Scott, A. I. & Sacchettini, J. C. (2002). J. Biol. Chem. 277, 9462–9467.  Web of Science CrossRef PubMed CAS Google Scholar
First citationYu, R. C., Hanson, P. I., Jahn, R. & Brünger, A. T. (1998). Nature Struct. Biol. 5, 803–811.  Web of Science CrossRef CAS PubMed Google Scholar
First citationYu, R. C., Jahn, R. & Brünger, A. T. (1999). Mol. Cell, 4, 97–107.  Web of Science CrossRef PubMed CAS Google Scholar
First citationZhang, L. & Doudna, J. A. (2002). Science, 295, 2084–2088.  Web of Science CrossRef PubMed CAS Google Scholar
First citationZubieta, C. et al. (2007). Proteins, 68, 999–1005.  Web of Science CrossRef PubMed CAS Google Scholar
First citationZwart, P. H., Grosse-Kunstleve, R. W. & Adams, P. D. (2005). CCP4 Newsl. 43, contribution 7.  Google Scholar

This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.

Journal logoBIOLOGICAL
CRYSTALLOGRAPHY
ISSN: 1399-0047
Volume 65| Part 6| June 2009| Pages 582-601
Follow Acta Cryst. D
Sign up for e-alerts
Follow Acta Cryst. on Twitter
Follow us on facebook
Sign up for RSS feeds