research papers\(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

Journal logoBIOLOGICAL
CRYSTALLOGRAPHY
ISSN: 1399-0047
Volume 67| Part 4| April 2011| Pages 355-367

REFMAC5 for the refinement of macromolecular crystal structures

CROSSMARK_Color_square_no_text.svg

aStructural Biology Laboratory, Department of Chemistry, University of York, Heslington, York YO10 5YW, England,bBiophysical Structural Chemistry, Leiden University, PO Box 9502, 2300 RA Leiden, The Netherlands,cRandall Division of Cell and Molecular Biophysics, New Hunt's House, King's College London, London, England, and dSTFC Daresbury Laboratory, Warrington WA4 4AD, England
*Correspondence e-mail: garib@ysbl.york.ac.uk

(Received 14 July 2010; accepted 10 January 2011)

This paper describes various components of the macromolecular crystallographic refinement program REFMAC5, which is distributed as part of the CCP4 suite. REFMAC5 utilizes different likelihood functions depending on the diffraction data employed (amplitudes or intensities), the presence of twinning and the availability of SAD/SIRAS experimental diffraction data. To ensure chemical and structural integrity of the refined model, REFMAC5 offers several classes of restraints and choices of model parameterization. Reliable models at resolutions at least as low as 4 Å can be achieved thanks to low-resolution refinement tools such as secondary-structure restraints, restraints to known homologous structures, automatic global and local NCS restraints, `jelly-body' restraints and the use of novel long-range restraints on atomic displacement parameters (ADPs) based on the Kullback–Leibler divergence. REFMAC5 additionally offers TLS parameterization and, when high-resolution data are available, fast refinement of anisotropic ADPs. Refinement in the presence of twinning is performed in a fully automated fashion. REFMAC5 is a flexible and highly optimized refinement package that is ideally suited for refinement across the entire resolution spectrum encountered in macromolecular crystallography.

Keywords: REFMAC5; refinement.

1. Introduction

As a final step in the process of solving a macromolecular crystal (MX) structure, refinement is carried out to maximize the agreement between the model and the X-ray data. Model parameters that are optimized in the refinement process include atomic coordinates, atomic displacement parameters (ADPs), scale factors and, in the presence of twinning, twin fraction(s). Although refinement procedures are typically designed for the final stages of MX analysis, they are also often used to improve partial models and to calculate the `best' electron-density maps for further model (re)building. Refinement protocols are therefore an essential component of model-building pipelines [ARP/wARP (Perrakis et al., 1999[Perrakis, A., Morris, S. & Lamzin, V. S. (1999). Nature Struct. Biol. 6, 458-463.]), SOLVE/RESOLVE (Terwilliger, 2003[Terwilliger, T. C. (2003). Acta Cryst. D59, 1174-1182.]) and Buccaneer (Cowtan, 2006[Cowtan, K. (2006). Acta Cryst. D62, 1002-1011.])] and are of paramount importance in guiding manual model updates using molecular-graphics software [Coot (Emsley & Cowtan, 2004[Emsley, P. & Cowtan, K. (2004). Acta Cryst. D60, 2126-2132.]), O (Jones et al., 1991[Jones, T. A., Zou, J.-Y., Cowan, S. W. & Kjeldgaard, M. (1991). Acta Cryst. A47, 110-119.]) and XtalView (McRee & Israel, 2008[McRee, D. E. & Israel, M. (2008). J. Struct. Biol. 163, 208-213.])].

The first software tools for MX refinement appeared in the 1970s. Real-space refinement using torsion-angle parameterization was introduced by Diamond (1971[Diamond, R. (1971). Acta Cryst. A27, 436-452.]). This was followed a few years later by reciprocal-space algorithms for the refinement of individual atomic parameters with added energy (Jack & Levitt, 1978[Jack, A. & Levitt, M. (1978). Acta Cryst. A34, 931-935.]) and restraints (Konnert, 1976[Konnert, J. H. (1976). Acta Cryst. A32, 614-617.]) in order to deliver chemically reasonable models. The energy and restraints approaches differ only in terminology as they use similar information and both can be unified using a Bayesian formalism (Murshudov et al., 1997[Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Acta Cryst. D53, 240-255.]). Early programs used the well established statistical technique of least-squares residuals with equal weights on all reflections (Press et al., 1992[Press, W. H., Flannery, B. P., Teukolsky, S. A. & Vetterling, W. T. (1992). Numerical Recipes in FORTRAN. Cambridge University Press.]), with gradients and second derivatives (if needed) calculated directly. This changed when Fourier methods, which were developed for small-molecule structure refinement (Booth, 1946[Booth, A. (1946). Proc. R. Soc. Lond. A Math. Phys. Sci. 188, 77-92.]; Cochran, 1948[Cochran, W. (1948). Acta Cryst. 1, 138-142.]; Cruickshank, 1952[Cruickshank, D. W. J. (1952). Acta Cryst. 5, 511-518.], 1956[Cruickshank, D. W. J. (1956). Acta Cryst. 9, 747-753.]), were formalized for macromolecules (Ten Eyck, 1977[Ten Eyck, L. F. (1977). Acta Cryst. A33, 486-492.]; Agarwal, 1978[Agarwal, R. C. (1978). Acta Cryst. A34, 791-809.]). The use of the FFT for structure-factor and gradient evaluation (Agarwal, 1978[Agarwal, R. C. (1978). Acta Cryst. A34, 791-809.]) sped up calculations dramatically and the refinement of large molecules using relatively modest computers became realistic. Later, the introduction of molecular dynamics (Brünger, 1991[Brünger, A. (1991). Annu. Rev. Phys. Chem. 42, 197-223.]), the generalization of the FFT approach for all space groups (Brünger, 1989[Brünger, A. T. (1989). Acta Cryst. A45, 42-50.]) and the development of a modular approach to refinement programs (Tronrud et al., 1987[Tronrud, D. E., Ten Eyck, L. F. & Matthews, B. W. (1987). Acta Cryst. A43, 489-501.]) dramatically changed MX solution procedures. Also, the introduction of the very robust and popular small-molecular refinement program SHELXL (Sheldrick, 2008[Sheldrick, G. M. (2008). Acta Cryst. A64, 112-122.]) to the macromolecular community allowed routine analysis of high-resolution MX data, including the refinement of merohedral and non-merohedral twins.

More sophisticated statistical approaches to MX structure refinement started to emerge in the 1990s. Although the basic formulations and most of the necessary probability distributions used in crystallography were developed in the 1950s and 1960s (Luzzati, 1951[Luzzati, V. (1951). Acta Cryst. 4, 367-369.]; Ramachandran et al., 1963[Ramachandran, G. N., Srinivasan, R. & Sarma, V. R. (1963). Acta Cryst. 16, 662-666.]; Srinivasan & Ramachandran, 1965[Srinivasan, R. & Ramachandran, G. N. (1965). Acta Cryst. 19, 1008-1014.]; see also Srinivasan & Parthasarathy, 1976[Srinivasan, R. & Parthasarathy, S. (1976). Some Statistical Applications in X-ray Crystallography. Oxford: Pergamon Press.], and references therein), their implementation for MX refinement started in the middle of the 1990s (Pannu & Read, 1996[Pannu, N. S. & Read, R. J. (1996). Acta Cryst. A52, 659-668.]; Bricogne & Irwin, 1996[Bricogne, G. & Irwin, J. (1996). Proceedings of the CCP4 Study Weekend. Macromolecular Refinement, edited by E. J. Dodson, M. Moore, A. Ralph & S. Bailey, pp. 85-92. Warrington: Daresbury Laboratory.]; Murshudov et al., 1997[Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Acta Cryst. D53, 240-255.]). It should be emphasized that prior to the application of maximum-likelihood (ML) techniques in MX refinement, the importance of advanced statistical approaches to all stages of MX analysis had been advocated by Bricogne (1997[Bricogne, G. (1997). Methods Enzymol. 276, 361-423.]) for two decades. Nowadays, most MX refinement programs offer likelihood targets as an option. Although ML can be very well approximated using the weighted least-squares approach in the very simple case of refinement against structure-factor amplitudes (Murshudov et al., 1997[Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Acta Cryst. D53, 240-255.]), ML has the attractive advantage that it is relatively easy (at least theoretically) to generalize for the joint utilization of a variety of sources of observations. For example, it was immediately extended to use experimental phase information (Bricogne, 1997[Bricogne, G. (1997). Methods Enzymol. 276, 361-423.]; Murshudov et al., 1997[Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Acta Cryst. D53, 240-255.]; Pannu et al., 1998[Pannu, N. S., Murshudov, G. N., Dodson, E. J. & Read, R. J. (1998). Acta Cryst. D54, 1285-1294.]). In the last two decades, there have been many developments of likelihood functions towards the exploitation of all available experimental data for refinement, thus increasing the reliability of the refined model in the final stages of refinement and improving the electron density used in model building in the early stages of MX analysis (Bricogne, 1997[Bricogne, G. (1997). Methods Enzymol. 276, 361-423.]; Skubák et al., 2004[Skubák, P., Murshudov, G. N. & Pannu, N. S. (2004). Acta Cryst. D60, 2196-2201.], 2009[Skubák, P., Murshudov, G. & Pannu, N. S. (2009). Acta Cryst. D65, 1051-1061.]).

MX crystallography can now take advantage of highly optimized software packages dealing with all of the various stages of structure solution, including refinement. There are several programs available that either are designed to perform refinement or offer refinement as an option. These include BUSTER/TNT (Blanc et al., 2004[Blanc, E., Roversi, P., Vonrhein, C., Flensburg, C., Lea, S. M. & Bricogne, G. (2004). Acta Cryst. D60, 2210-2221.]), CNS (Brünger et al., 1998[Brünger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse-Kunstleve, R. W., Jiang, J.-S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T. & Warren, G. L. (1998). Acta Cryst. D54, 905-921.]), MAIN (Turk, 2008[Turk, D. (2008). Acta Cryst. A64, C23.]), MOPRO (Guillot et al., 2001[Guillot, B., Viry, L., Guillot, R., Lecomte, C. & Jelsch, C. (2001). J. Appl. Cryst. 34, 214-223.]), phenix.refine (Adams et al., 2010[Adams, P. D. et al. (2010). Acta Cryst. D66, 213-221.]), REFMAC5 (Murshudov et al., 1997[Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Acta Cryst. D53, 240-255.]), SHELXL (Sheldrick, 2008[Sheldrick, G. M. (2008). Acta Cryst. A64, 112-122.]) and TNT (Tronrud et al., 1987[Tronrud, D. E., Ten Eyck, L. F. & Matthews, B. W. (1987). Acta Cryst. A43, 489-501.]). While MOPRO was specifically designed for niche ultrahigh-resolution refinement and is able to model deformation density, all of the other programs can deal with a multitude of MX refinement problems and produce high-quality electron-density maps, although with different emphases and strengths.

This contribution describes the various components of the macromolecular crystallographic refinement program REFMAC5, which is distributed as part of the CCP4 suite (Collaborative Computational Project, Number 4, 1994[Collaborative Computational Project, Number 4 (1994). Acta Cryst. D50, 760-763.]). REFMAC5 is a flexible and highly optimized refinement package that is ideally suited for refinement across the entire resolution spectrum that is encountered in macromolecular crystallo­graphy.

2. Target functions in REFMAC5

As in all other refinement programs, the target function minimized in REFMAC5 has two components: a component utilizing geometry (or prior knowledge) and a component utilizing experimental X-ray knowledge,

[f_{\rm total} = f_{\rm geom} + w f_{\rm xray}, \eqno (1)]

where ftotal is the total target function to be minimized, con­sisting of functions controlling the geometry of the model and the fit of the model parameters to the experimental data, and w is a weight between the relative contributions of these two components. In macromolecular crystallography, the weight is traditionally selected by trial and error. REFMAC5 offers automatic weighting, which is based on the fact that both components are the natural logarithm of a probability distribution. However, this `automatic' weight may lead to unreasonable deviations from ideal geometry (either too tight or too relaxed) in some cases, as the ideal geometry is difficult to describe statistically. For these cases, the weight parameter may need to be selected manually to produce more reasonable geometry, e.g. such that the root-mean-square deviation of the bond lengths from the ideal values is 0.02 Å and at resolutions lower than 3 Å perhaps even smaller.

From a Bayesian viewpoint (O'Hagan, 1994[O'Hagan, A. (1994). Kendal's Advanced Theory of Statistics, Vol. 2B, Bayesian Inference. London: Hodder Arnold.]), these functions have the following probabilistic interpretation (ignoring constants which are irrelevant for minimization purposes):

[\eqalignno {f_{\rm total} & = -\log [P_{\rm posterior} ({\rm model \semi obs})] \cr f_{\rm geom} & = -\log [P_{\rm prior}({\rm model})]\cr f_{\rm xray} & = -\log[P_{\rm likelihood}({\rm obs\semi model})].& (2)}]

From this point of view, MX refinement is similar to a well known technique in statistical analysis: maximum posterior (MAP) estimation. The model parameters are linked with the experimental data via fxray, i.e. likelihood is a mechanism that controls information flow from the experimental data to the derived model. Consequently, it is important to design a likelihood function that allows optimal information transfer from the data to the derived model. fgeom ensures that the derived model is consistent with the presumed chemical and structural knowledge. This function plays the role of regularization, reduction of the effective number of parameters and transfer of known information to the new model. If care is not taken, then wrong information may be transferred to the model; removing the effect of such errors may be difficult if possible at all. The design of such functions should be performed using verifiable invariant information and it should be testable and revisable during the refinement and model-building procedures.

Functions dealing with geometry usually depend only on atomic parameters. We are not aware of any function used in crystallography that deals with the prior geometry probability distributions of overall parameters. A possible reason for the lack of interest in (and necessity of) this type of function may be that, despite popular belief, the statistical problem in crystallo­graphy is sufficiently well defined and that the main problems are those of model parameterization and completion.

The existing refinement programs differ in the target functions and optimization techniques used to derive model parameters. Most MX programs use likelihood target functions. However, their form, implementations and parameterizations are different. Therefore, it should not come as a surprise if different programs give (slightly) different results in terms of model parameters, electron-density maps and reliability factors (such as R and Rfree).

2.1. X-ray component

The X-ray likelihood target functions used in REFMAC5 are based on a general multivariate probability distribution of E observations given M model structure factors. This function is derived from a multivariate complex Gaussian distribution of N = E + M structure factors for acentric reflections and from a multivariate real Gaussian distribution for centric reflections and has the following form:

[P = \cases { \displaystyle{{|{\rm C}_{M}|\prod \limits_{i = 1}^E |F_i|}\over{\pi^{E}|{\rm C}_{N}|}} {\textstyle \int \limits_0^{2\pi} \ldots \int\limits_0^{2\pi}} P_{\rm pr}({\boldalpha}) \cr \times\exp\left[- {\textstyle \sum \limits_{i,j = 1}^{N}} F_i (a_{i,j}-c_{i-E,j-E}) F_j\right] \,\, {\rm d} {\boldalpha}& acentric \cr \displaystyle \left[{{|{\rm C}_{M}|}\over{(2\pi)^{E} |{\rm C}_{N}|}}\right]^{1/2} {\textstyle \sum \limits_{{\alpha_1 = \alpha_{1,1} \atop \alpha_1 = \alpha_{1,2}}}} \ldots {\textstyle \sum \limits_{{\alpha_E = \alpha_{E,1} \atop \alpha_E = \alpha_{E,2}}}} P_{\rm pr}({\boldalpha}) \cr \times \displaystyle\exp \left[-{{1} \over {2}}\textstyle \sum \limits_{i,j = 1}^{N} F_i (a_{i,j}-c_{i-E,j-E}) F_j\right] & centric}, \eqno (3)]

where P = P(|F1|, …, |FE|; FE+1, …, FN), Fi = |Fi|exp(ιαi}, |F1|, …, |FE| denote the observed amplitudes, FE+1, …, FN are the model structure factors, CN is the covariance matrix with the elements of its inverse denoted by aij, CM is the bottom right square submatrix of CN of dimension M with the elements of its inverse denoted by cij. We define cij = 0 for i ≤ 0 or j ≤ 0. |CN| and |CM| are the determinants of matrices CN and CM, [{\boldalpha}] = (α1, …, αE) is the vector of the unknown phases of the observations that need to be integrated and [P_{\rm pr}({\boldalpha})] is a probability distribution expressing any prior knowledge about the phases.

In the simplest case of one observation, one model and no prior knowledge about phases, the integral in (3)[link] can be evaluated analytically. In this case, the function follows a Rice distribution (Bricogne & Irwin, 1996[Bricogne, G. & Irwin, J. (1996). Proceedings of the CCP4 Study Weekend. Macromolecular Refinement, edited by E. J. Dodson, M. Moore, A. Ralph & S. Bailey, pp. 85-92. Warrington: Daresbury Laboratory.]), which is a non-central χ2 distribution of |Fo|2/Σ and |Fo|2/2Σ with non-centrality parameters D2|Fc|2/Σ and D2|Fo|2/2Σ with one and two degrees of freedom for centric and acentric reflections, respectively (Stuart & Ord, 2009[Stuart, A. & Ord, K. (2009). Kendall's Advanced Theory of Statistics, Vol. 1, Distribution Theory. Hoboken: John Wiley & Sons.]),

[P(|F_{\rm o}|\semi F_{\rm c}) = \cases {\displaystyle{{2|F_{\rm o}|} \over {\Sigma}} \exp\left(-{{|F_{\rm o}|^2+D^2|F_{\rm c}|^2} \over {\Sigma}}\right) \cr \times \displaystyle I_0\left(2{{|F_{\rm o}|D|F_{\rm c}|} \over {\Sigma}}\right) & acentric \cr \displaystyle\left({{2} \over {\pi \Sigma}}\right)^{1/2} \exp\left(-{{|F_{\rm o}|^2+D^2|F_{\rm c}|^2} \over {2\Sigma}}\right)\cr \times \displaystyle \cosh\left({{|F_{\rm o}|D|F_{\rm c}|} \over {\Sigma}}\right) & centric}, \eqno (4)]

where D in its simplest interpretation is 〈cos(Δxs)〉, a Luzzati error parameter (Luzzati, 1952[Luzzati, V. (1952). Acta Cryst. 5, 802-810.]) expressing errors in the positional parameters of the model, Fc is the model structure factor, |Fo| is the observed amplitude of the structure factor and Σ is the uncertainty or the second central moment of the distribution. Both Σ and D enter the equation as part of the covariance matrices CN and CM from (3)[link]. Σ is a function of the multiplicity of the Miller indices ( factor), experimental uncertainties (σo), model completeness and model errors. For simplicity, the following parameterization is used:

[\Sigma = \cases { 2\sigma_{\rm o}^{2} + \varepsilon \Sigma_{\rm mod} & acentric \cr \sigma_{\rm o}^{2} + \varepsilon \Sigma_{\rm mod} & centric}. \eqno (5)]

The current version of REFMAC5 estimates D and Σmod in resolution bins. Working reflections are used for estimation of D and free reflections are used for Σmod estimation. Although this simple parameterization works in many cases, it may give misleading results for data from crystals with pseudo translation, OD disorder or modulated crystals in general. Currently, there is no satisfactory implementation of the error model to account for these cases.

2.2. Incorporation of experimental phase information in model refinement

2.2.1. MLHL likelihood

MLHL likelihood (Bricogne, 1997[Bricogne, G. (1997). Methods Enzymol. 276, 361-423.]; Murshudov et al., 1997[Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Acta Cryst. D53, 240-255.]; Pannu et al., 1998[Pannu, N. S., Murshudov, G. N., Dodson, E. J. & Read, R. J. (1998). Acta Cryst. D54, 1285-1294.]) is based on a special case of the probability distribution (3)[link] where we have one observation, one model and phase information derived from an experiment available as a prior distribution Ppr(α),

[P(|F_{\rm o}|\semi F_c) = \cases{ \displaystyle{{|F_{\rm o}|} \over {\pi\Sigma}} {\textstyle \int \limits_{0}^{2\pi}} P_{\rm pr}(\alpha) \cr \times \displaystyle\exp\left(-{{|F_{\rm o}-DF_{\rm c}|^2} \over {\Sigma}}\right) \;{\rm d}\alpha & acentric \cr \displaystyle \left({{1} \over {2\pi\Sigma}}\right)^{1/2} {\textstyle \sum \limits_{{\alpha = \alpha_1 \atop \alpha = \alpha_2}}} P_{\rm pr}(\alpha)\cr \times\displaystyle \exp\left(-{{|F_{\rm o}-DF_{\rm c}|^2} \over {2\Sigma}}\right) & centric}, \eqno (6)]

where Fo = |Fo|exp(ια), Fc = |Fc|exp(ιαc), α is the unknown phase of the structure factor and α1 and α2 are its possible values for a centric reflection. The prior phase probability distribution Ppr(α) is usually represented as a generalized von Mises distribution (Mardia & Jupp, 1999[Mardia, K. V. & Jupp, P. E. (1999). Directional Statistics. Chichester: John Wiley & Sons.]) and is better known in crystallography as a Hendrickson–Lattman distribution (Hendrickson & Lattman, 1970[Hendrickson, W. A. & Lattman, E. E. (1970). Acta Cryst. B26, 136-143.]),

[P(\alpha) = N \exp[A\cos(\alpha)+ B\sin(\alpha) + C\cos(2\alpha) + D\sin(2\alpha)], \eqno (7)]

where A, B, C and D are coefficients of the Fourier transformation of the logarithm of the phase probability distribution and N is the normalization coefficient. The distribution is unimodal when C and D are zero; otherwise, it is a bimodal distribution that reflects the possible phase uncertainty in experimental phasing. For centric reflections C and D are zero.

2.2.2. SAD/SIRAS likelihood

The MLHL likelihood is dependent on the reliability and accuracy of the prior distribution Ppr(α). However, the phase distributions after density modification (or even after phasing), which are usually used as Ppr(α), often suffer from inaccurate estimation of the phase errors. Furthermore, MLHL [as well as any other special case of (3)[link] with a non-uniform Ppr(α)] assumes independence of the prior phases from the model phases. These shortcomings can be addressed by using experimental information directly from the experimental data, instead of from the Ppr(α) distributions obtained in previous steps of the structure-solution process. Currently, SAD and SIRAS likelihood functions are implemented in REFMAC5.

The SAD probability distribution (Skubák et al., 2004[Skubák, P., Murshudov, G. N. & Pannu, N. S. (2004). Acta Cryst. D60, 2196-2201.]) is obtained from (3)[link] by setting E = 2, M = 2, Ppr(α) = constant and |F1| = |Fo+|, |F2| = |(Fo)*|, F3 = Fc+, F4 = (Fc)*, where F+ and F are the structure factors of the Friedel pairs. The model structure factors are constructed using the current parameters of the protein, the heavy-atom substructure and the inputted anomalous scattering parameters. Similarly, the SIRAS function (Skubák et al., 2009[Skubák, P., Murshudov, G. & Pannu, N. S. (2009). Acta Cryst. D65, 1051-1061.]) is a special case of (3)[link] with E = 3, M = 3, Ppr(α) = constant and |F1| = |FoN|, |F2| = |Fo+|, |F3| = |(Fo)*|, F4 = FcN, F5 = Fc+, F6 = (Fc)*, where |F1| and F4 correspond to the observation and the model of the native crystal, respectively, and |F2|, |F3|, F5 and F6 refer to the Friedel pair observations and models of the derivative crystal. If any of the E observations are symmetrically equivalent, for instance centric Friedel pair intensities, the equation is reduced appropriately so as to only include non-equivalent observations and models.

The incorporation of prior phase information by the refinement function is especially useful in the early and middle stages of model building and at all stages of structure solution at lower resolutions, owing to the improvement in the observation-to-parameter ratio. The refinement of a well resolved high-resolution structure is often best achieved using the simple Rice function.

Fig. 1[link] shows the effect of various likelihood functions on automatic model building using ARP/wARP (Perrakis et al., 1999[Perrakis, A., Morris, S. & Lamzin, V. S. (1999). Nature Struct. Biol. 6, 458-463.]).

[Figure 1]
Figure 1
Fraction of the model correctly built by ARP/wARP v.7.0 iterated with REFMAC5 using different target functions. The maps inputted to model building were prepared by CRANK (Ness et al., 2004[Ness, S. R., de Graaff, R. A. G., Abrahams, J. P. & Pannu, N. S. (2004). Structure, 12, 1753-1761.]). The sample consists of 102 data sets described in Skubák et al. (2010[Skubák, P., Waterreus, W.-J. & Pannu, N. S. (2010). Acta Cryst. D66, 783-788.]).

2.3. Twin refinement

The function used for twin refinement is a generalization of the Rice distribution in the presence of a linear relationship between the observed intensities. This function has the form

[\eqalignno {P(I_{\rm o}\semi{\rm model}) & = \textstyle \int \limits _F P(I_{\rm o}\semi F) P(F\semi {\rm model}) \,\, {\rm d}F \cr P(I_{\rm o}\semi F) & = N_{\rm o} \exp\left(-{\textstyle \sum \limits_{{\rm related} \atop {\rm reflections}}} {{[I_{{\rm o}j}-f(\alpha,F)]^2} \over {2\sigma_{{\rm o}j}^{2}}}\right) \cr P(F\semi {\rm model})& = N_{\rm model} {\textstyle \prod} \exp \left (-{{|F_i-F_{c,i}|^2} \over {\epsilon \Sigma}}\right) \cr f(\alpha,F) & = \textstyle \sum \alpha_{ij} |F_j|^2, & (8)}]

where No and Nmodel are normalization coefficients. In the first equation, the first term inside the integral, P(Io; F), represents the probability distribution of observations if `ideal' structure factors are known. Here, all reflections that are twinned and that can be grouped together are included. Models representing the data-collection instrument, if available, could be added to this term. The second term, P(F; model), represents a probability distribution of the `ideal' structure factors should an atomic model be known for a single crystal. Here, all reflections from the asymmetric unit that contribute to the observed `twinned' intensities are included. If the data were to come from more than one crystal or if, for example, SAD should be used simultaneously with twinning, then this term would need to be modified appropriately. Fc is a function of atomic and overall parameter D. Overall parameters also include Σ and twin-fraction parameters. f represents the way structure factors from the asymmetric unit contribute to the particular `twinned' intensity. The above formula is more symbolic rather than precise; further details of twin refinement will be published elsewhere.

REFMAC5 performs the following preparations before starting refinement against twinned data.

  • (i) Identify potential (pseudo)merohedral twin operators by analyses of cell/space-group combination using the algorithm developed by Lebedev et al. (2006[Lebedev, A. A., Vagin, A. A. & Murshudov, G. N. (2006). Acta Cryst. D62, 83-95.]).

  • (ii) Calculate Rmerge for each potential twin operator and filter out twin operators for which Rmerge is greater than 0.5 or a user-defined value.

  • (iii) Estimate twin fractions for the remaining twin domains and filter out those with small twin fractions (the default value is 0.05).

  • (iv) Make sure that the point group and twin operators form a group. Strictly speaking this stage is not necessary, but it makes bookkeeping easy.

  • (v) Perform twin refinement using the remaining twin operators. Twin fractions are refined at every cycle.

All integrals necessary for evaluation of the minus log-likelihood function and its derivatives with respect to the structure factors are evaluated using the Laplace approximation (McKay, 2003[McKay, D. J. C. (2003). Information Theory, Inference and Learning Algorithms. Cambridge University Press.]).

2.4. Modelling bulk-solvent contribution

Typically, a significant part of a macromolecular crystal is occupied by disordered solvent. Accurate modelling of this part of the crystal is still an unsolved problem of MX. The contribution of bulk solvent to structure factors is strongest at low resolution, although its effect at high resolution is still non-negligible.

The absence of good models for disordered solvent may be one of the reasons why R factors in MX are significantly higher than those in small-molecular crystallography. For small molecules R factors can be around 1%, whereas for MX they are rarely less than 10% and more often around 20% or even higher.

REFMAC5 uses two types of bulk (disordered) solvent models. One of them is the so-called Babinet's bulk-solvent model, which is based on the assumption that the only difference between solvent and protein at low resolution is their scale factor (Tronrud, 1997[Tronrud, G. (1997). Methods Enzymol. 277, 306-319.]). Here, we use a slight modification of the formulation described by Tronrud (1997[Tronrud, G. (1997). Methods Enzymol. 277, 306-319.]) and assume that if protein electron density is convoluted using the Gaussian kernel and multiplied by an appropriate scale factor, then protein and solvent electron densities are equal,

[\eqalignno {\rho_{\rm solvent}+{\cal F}(k_{\rm babinet})*\rho_{\rm protein} = {\rm constant}\iff &\cr F_{\rm solvent}+k_{\rm babinet}F_{\rm protein} = 0 \iff &\cr F_{\rm solvent} = -k_{\rm babinet} F_{\rm protein} \Longrightarrow &\cr F_{{\rm total}0} = F_{\rm solvent} + F_{\rm protein} = (1-k_{\rm babinet}) F_{\rm protein}, && (9)}]

where * denotes convolution, [{\cal F}] denotes the Fourier transform and kbabinet = kbabinet0exp(−Bbabinet|s|2/4). Here, we used the convolution theorem, which states that the Fourier transform of the convolution of two functions is the product of their Fourier transforms.

The second bulk-solvent model is derived similarly to that described by Jiang & Brünger (1994[Jiang, J.-A. & Brünger, A. (1994). J. Mol. Biol. 243, 100-115.]). The basic assumption is that disordered solvent atoms are uniformly distributed over the region of the asymmetric unit that is not occupied by the atoms of the modelled part of the crystal structure. The region of the asymmetric unit occupied by the atomic model is masked out. Any holes inside this mask are removed using a cavity-detection algorithm. A constant value is assigned outside this region and the structure factors Fmask are calculated using an FFT algorithm. These structure factors, multiplied by appropriate scale factors (estimated during the scaling procedure), are added to those calculated from the atomic model. Additionally, various mask parameters may optionally be optimized.

One should be careful with bulk-solvent corrections, especially when the atomic model is incomplete. This type of bulk-solvent model may result in smeared-out electron density that may reduce the height of electron density in less-ordered and unmodelled parts of the crystal.

The final total structure factors with scale and solvent contributions included take the following form:

[\eqalignno {F_{\rm total} & = k_{\rm overall} k_{\rm aniso}(1-k_{\rm babinet})(F_{\rm protein}+k_{\rm mask} F_{\rm mask}) \cr k_{\rm overall} & = k_{{\rm overall}0}\exp(-B_{\rm overall}|s|^2/4) \cr k_{\rm babinet} & = k_{{\rm babinet}0}\exp(-B_{\rm babinet}|s|^2/4) \cr k_{\rm mask} & = k_{{\rm mask}0}\exp(-B_{\rm mask}|s|^2/4) \cr k_{\rm aniso} & = \exp(-{\bf s}^T U_{\rm aniso} {\bf s}/4),\,\,{\rm with }\,\,{\rm trace}(U_{\rm aniso}) = 0, & (10)}]

where the ks are scale factors, s is the reciprocal-space vector, |s| is the length of this vector, Uaniso is the crystallographic anisotropic tensor that obeys crystal symmetry, Fmask is the contribution from the mask bulk solvent and Fprotein is the contribution from the protein part of the crystal. Usually, either mask or Babinet bulk-solvent correction is used. However, sometimes their combination may provide better statistics (lower R factors) than either individually.

The overall parameters of the solvent models, the overall anisotropy and the scale factors are estimated using a least-squares fit of the amplitude of the total structure factors to the observed amplitudes,

[\textstyle \sum \limits_{{\rm working} \atop {\rm reflections}} (|F_{\rm o}|-|F_{\rm total}|)^2 \longrightarrow {\min}. \eqno (11)]

In the case of twin refinement, the following function is used to estimate overall parameters including twin fractions (details of twin refinement will be published elsewhere),

[{\textstyle \sum \limits_{{\rm working} \atop {\rm reflections}}} \left\{{{\max(I_{\rm o},-3.0*\sigma_{\rm o})-f(\alpha,F_{\rm c})} \over {[\max(I_{\rm o},0.001*\sigma_{\rm o})]^{1/2}+ [f(\alpha,F_{\rm c}]^{1/2}}}\right\}^2 \longrightarrow \min, \eqno (12)]

where f(α, F) is as defined in (8)[link].

Both (11)[link] and (12)[link] are minimized using the Gauss–Newton method with eigenvalue filtering to solve linear equations, which ensures that even very highly correlated parameters can be estimated simultaneously. However, one should be careful in interpretating these parameters as the system is highly correlated.

Once overall parameters such as the scale factors and twin fractions have been estimated, REFMAC5 estimates the overall parameters of one of the abovementioned likelihood functions and evaluates the function and its derivatives with respect to the atomic parameters. A general description of this procedure can be found in Steiner et al. (2003[Steiner, R. A., Lebedev, A. A. & Murshudov, G. N. (2003). Acta Cryst. D59, 2114-2124.]).

2.5. Geometry component

The function controlling the geometry has several components.

  • (i) Chemical information about the constituent blocks (e.g. amino acids, nucleic acids, ligands) of macromolecules and the covalent links between them.

  • (ii) Internal consistency of macromolecules (e.g. NCS).

  • (iii) Structural knowledge (known structures, restraints on current interatomic distances, secondary structures).

The first component is used by all programs and has been tabulated in an mmCIF dictionary (Vagin et al., 2004[Vagin, A. A., Steiner, R. A., Lebedev, A. A., Potterton, L., McNicholas, S., Long, F. & Murshudov, G. N. (2004). Acta Cryst. D60, 2184-2195.]) now used by several programs, including REFMAC5, phenix.refine (Adams et al., 2010[Adams, P. D. et al. (2010). Acta Cryst. D66, 213-221.]) and Coot (Emsley & Cowtan, 2004[Emsley, P. & Cowtan, K. (2004). Acta Cryst. D60, 2126-2132.]). The current version of the dictionary contains around 9000 entries and several hundred covalent-link descriptions. Any new entries may be added using one of several programs, including Sketcher (Vagin et al., 2004[Vagin, A. A., Steiner, R. A., Lebedev, A. A., Potterton, L., McNicholas, S., Long, F. & Murshudov, G. N. (2004). Acta Cryst. D60, 2184-2195.]) from CCP4 (Collaborative Computational Project, Number 4, 1994[Collaborative Computational Project, Number 4 (1994). Acta Cryst. D50, 760-763.]), JLigand (unpublished work), PRODRG (Schüttelkopf & van Aalten, 2004[Schüttelkopf, A. W. & van Aalten, D. M. F. (2004). Acta Cryst. D60, 1355-1363.]) and phenix.elbow (Adams et al., 2010[Adams, P. D. et al. (2010). Acta Cryst. D66, 213-221.]).

Standard restraints on the covalent structure have the general form

[{\textstyle \sum \limits_{\rm bonds}} {{1} \over {\sigma_{b}^2}} (b_m-b_i)^2, \eqno (13)]

where bm represents a geometric parameter (e.g. bonds, angles, chiralities) calculated from the model and bi is the ideal value of this particular geometric parameter as tabulated in the dictionary.

Apart from ω (the angle of the peptide bond) and χ (the angles of amino-acid side chains), torsion angles in general are not restrained by default. However, the user can request to restrain a particular torsion angle defined in the dictionary or can define general torsion angles and use them as restraints. In general, it is not clear how to handle the restraint on torsion angles automatically, as these angles may depend on the covalent structure as well as the chemical environment of a particular ligand.

2.6. Noncrystallographic symmetry restraints

2.6.1. Automatic NCS definition

Automatic NCS identification in REFMAC5 is performed using the following procedure.

  • (i) Align the sequences of all chains with all chains using the dynamic alignment algorithm (Needleman & Wunsch, 1970[Needleman, S. B. & Wunsch, C. D. (1970). J. Mol. Biol. 48, 443-453.]).

  • (ii) Accept the alignment if the number of aligned residues is more than k (default 15) residues and the sequence identity for aligned residues is more than α% (default 80%).

  • (iii) Calculate the global root-mean-square deviation (r.m.s.d.) using all aligned residues.

  • (iv) Calculate the average local r.m.s.d. using the formula

    [{{1}\over {N-k+1}} {\textstyle \sum \limits_{i = 1}^{N-k+1}} {{1} \over {n_i}} {\textstyle \sum \limits_{j = i}^{k+i} \sum \limits_{l \in N_j} r_l^2}, \quad r_l = x_l-(R_i y_l +t_i), \eqno (14)]

    where N is the number of aligned residues, j indexes the aligned residues, Nj is the number of corresponding atoms in residue j, nj is the number of atoms in the ith group, rl is the vector of differences between corresponding atomic positions and Rj and tj are the rotation and translation that give the best superposition between atoms in group i. To calculate the r.m.s.d., it is not necessary to calculate the rotation and translation operators explicitly or to apply these transformations to atoms. Rather, it is achieved implicitly using Procrustes analysis, as described, for example, in Mardia & Bibby (1979[Mardia, K. V. & Bibby, J. E. (1979). Multivariate Analysis. London/San Diego: Academic Press.]). When k = N, the local and global r.m.s.d. coincide.

  • (v) If the r.m.s.d. is less than β Å (default 2.5 Å), then we consider the chains to be aligned.

  • (vi) Prepare the list of aligned atoms. If after applying the transformation matrix (calculated using aligned atoms) the neighbours (waters, ligands) of aligned atoms are superimposed, then they are also added to the list of aligned atoms.

  • (vii) If local NCS is requested, then prepare pairs of corresponding interatomic distances.

Steps (i)–(v) are performed once during each session of refinement. Step (vi) is performed during every cycle of refinement in order to allow conformational changes to occur.

2.6.2. Global NCS

For global NCS restraints, transformation operators (Rij and tij) that optimally superpose all NCS-related molecules are estimated and the following residual is added to the total target function,

[{\textstyle \sum \limits_{{\rm NCS \,\,related} \atop {\rm molecules}} \textstyle \sum \limits_{{\rm NCS \,\,related} \atop {\rm atoms}}} w\left |x_i - {{1} \over {n}} {\textstyle \sum} (R_{ij} x_j + t_{ij})\right|^2, \eqno (15)]

where the weight w is a user-controllable parameter. Note that the transformation matrices are estimated using xi and xj and thus they are dependent on these parameters. Therefore, in principle the gradient and second-derivative calculations should take this dependence into account, although this dependence is ignored in the current version of REFMAC5. Ignoring the contribution of these terms may reduce the rate of convergence, although in practice it does not seem to pose a problem.

2.6.3. Local NCS

The following function (similar to the implementation in BUSTER) is used for local NCS restraints,

[{\textstyle \sum \limits_{{\rm chain\,\,pairs}\,\,i,j}} \,\,{\textstyle \sum \limits_{{d_{i,kl}\lt d_{\max} \atop d_{j,kl}\lt d_{\max}}}} {\rm GM}\left(\sigma_{\rm GM},{{d_{i,kl}-d_{j,kl}} \over {\sigma}}\right), \eqno (16)]

where GM is the Geman–McClure robust estimator function (Geman & McClure, 1987[Geman, S. & McClure, D. (1987). Bull. Int. Stat. Inst. 52, 5-21.]), which can be written

[{\rm GM}(\sigma,r) = {{r^2} \over {1+\sigma^2 r^2}}. \eqno (17)]

Fig. 2[link] shows that for small values of r this function is similar to the usual least-squares function. However, it behaves differently for large r: least-square residuals do not allow conformational changes to occur, whereas this type of function is more tolerant to such changes.

[Figure 2]
Figure 2
Behaviour of the Geman–McClure function versus the quadratic (least-squares) function. For small values of r they look similar, whereas for large values of r GM is less restrictive than least squares, allowing conformational changes to occur. Black line, GM = r2/(1 + σ2r2) with σ = 0.5; red line, quadratic function r2. This figure was produced using the software package R (R Development Core Team, 2007[R Development Core Team (2007). R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing. https://www.R-project.org .]).
2.6.4. External structure restraints

The interatomic distances within the structure being analysed may be similar to a known (deposited) structure, particularly in localized regions. In cases where it makes sense, this information can be exploited in order to aid the refinement of the target structure. In doing so, the target structure is pulled towards the con­formation adopted by the known structure. The mechanism for generic external restraints described by Mooij et al. (2009[Mooij, W., Cohen, S., Joosten, K., Murshudov, G. & Perrakis, A. (2009). Structure, 17, 183-189.]) is used for external structure restraints.

In our implementation, structural information from external known structures is utilized by applying restraints to the distances between atom pairs based on a presumed atomic correspondence between the two structures. The following function is used for external structure restraints,

[{\textstyle \sum \limits_{a_i \in A} \sum \limits_{{a_j \in A} \atop {d_{ij}\lt d_{\max}}}} w_{\rm ext} {\rm GM}\left(\sigma_{\rm GM},{{d_{ij}-d_{ij}^*} \over {\sigma_{ij}}}\right), \eqno (18)]

where the atoms ai belong to the set A of atoms for which a correspondence is known, dij is the distance between the positions of atoms ai and aj, dij* is the corresponding distance in the known structure, σij is the estimated standard deviation of dij about dij* and dmax ensures that atom pairs are only restrained within localized regions, allowing insensitivity to global conformational changes. External structure restraints should be weighted differently to the other geometry com­ponents in order to allow the restraint strength to be separately specified. Consequently, a weight wext is applied, which should be appropriately chosen depending on the data quality and resolution, the structural similarity between the external known structure and the target, and the choice of dmax. The Geman–McClure function with sensitivity parameter σGM is used to increase robustness to outliers, as with the local NCS restraints.

Prior information from the external known structure(s) is generated using the software tool PROSMART. Specifically, this includes the atomic correspondence A, distances dij*, standard deviations σij and the distance cutoff dmax.

Potential sources of prior structural information include different conformations of the target chain (such as those that may result from using different crystallization conditions or in a different binding state) as well as those from homologous or structurally similar proteins. It is possible to use multiple known structures as prior information. The combination of this information results in modified values of dij* and σij as appropriate. This allows a structure to be refined utilizing information from a whole class of similar structures, rather than just a single source. Furthermore, it opens up the future possibility for multi-crystal co-refinement.

The employed formalism also allows the application of atomic distance restraints to secondary-structure elements (and, in principle, other motifs). Consequently, external restraints may be applied without requiring the prior identification of known structures similar to the target. This is intended to help to refine such motifs towards the expected/presumed local conformation.

This technique has been found to be particularly useful for low-resolution crystals and in cases where the target structure is unable to be refined to a satisfactory level. When used appropriately, external structure restraints should increase refinement reliability. Consequently, the difference between the R and Rfree values is expected to decrease in successful cases.

Fig. 3[link] shows the refinement statistics resulting from using external restraints to refine a low-resolution bluetongue virus VP4 enzyme (Sutton et al., 2007[Sutton, G., Grimes, J., Stuart, D. & Roy, P. (2007). Nature Struct. Mol. Biol. 14, 449-451.]). A sequence-identical structure solved at a higher resolution is used as prior information. Refinement statistics are compared after ten refinement cycles with and without using external restraints. Using the external restraints results in a 2.8% improvement in Rfree. Furthermore, the difference between the R and Rfree values is reduced from 11.5 to 4.3%, suggesting greatly increased refinement reliability.

[Figure 3]
Figure 3
Superposition of the structures of bluetongue virus VP4 enzyme with PDB entries 2jha (green) and 2jhp (blue) (Sutton et al., 2007[Sutton, G., Grimes, J., Stuart, D. & Roy, P. (2007). Nature Struct. Mol. Biol. 14, 449-451.]), which were solved at 3.4 and 2.5 Å, respectively. The graph shows the resultant R (solid) and Rfree (dashed) values from ten iterations of refinement of the low-resolution structure 2jha . Results are shown with (red) and without (black) external restraints, using 2jhp as prior information. This figure was produced using PROSMART to superpose the structures, PyMOL (DeLano, 2002[DeLano, W. L. (2002). PyMOL. https://www.pymol.org .]) to display the structures and the software package R (R Development Core Team, 2007[R Development Core Team (2007). R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing. https://www.R-project.org .]) to generate the graph.
2.6.5. `Jelly-body' restraints

The ratio of the number of observations to the number of adjustable parameters is very small at low resolution. Even after accounting for chemical restraints, this ratio stays very small and refinement in such cases is usually unstable. The danger of overfitting is very high; this is reflected in large differences between the R and Rfree values. External structure restraints and the use of experimental phase information (described above) provide ways of dealing with this problem. Unfortunately, it is not always possible to find similar structures refined at high resolution (or at least ones that result in a sufficiently successful improvement in refinement statistics) and experimental phase information is not always available or sufficient. Fortunately, statistical techniques exist to deal with this type of problem. Such techniques include ridge regression (Stuart et al., 2009[Stuart, A., Ord, K. & Arnold, S. (2009). Kendall's Advanced Theory of Statistics, Vol. 2A, Classical Inference. Hoboken: John Wiley & Sons.]), the lasso estimation procedure (Tibshirani, 1997[Tibshirani, R. J. (1997). Stat. Med. 16, 385-395.]) and Bayesian estimation with prior knowledge of parameters (O'Hagan, 1994[O'Hagan, A. (1994). Kendal's Advanced Theory of Statistics, Vol. 2B, Bayesian Inference. London: Hodder Arnold.]).

REFMAC5 has a regularization function in interatomic distance space that has the form

[\textstyle \sum \limits_{d_{ij,{\rm current}}\lt d_{\max}} w (d_{ij}-d_{ij,{\rm current}})^2 \eqno (19)]

for pairs of atoms i, j from the same chain, with maximum radius dmax, which can be controlled (default 4.25 Å). Note that this term does not contribute to the value of the function or its gradient; it only changes the second derivative, thus changing the search direction. It should be noted that a similar technique has been implemented in CNS (Schröder et al., 2010[Schröder, G. F., Brünger, A. T. & Levitt, M. (2010). Nature (London), 464, 1218-1222.]).

Note that if all interatomic distances were constrained, then individual atomic refinement would become rigid-body refinement. The effect of `jelly-body' restraints is the implicit parameterization between the rigid body and individual atoms. This technique has strong similarity to elastic network model calculations (Trion, 1996[Trion, M. M. (1996). Phys. Rev. Lett. 77, 1906-1908.]). This simple formula has been found to work surprisingly well.

2.6.6. Atomic displacement parameter restraints

Unlike positional parameters, where prior knowledge can be designed using basic knowledge of the chemistry of the building blocks of macromolecules and analysis of high-resolution structures, it is not obvious how to design restraints for atomic displace­ment parameters (ADPs). Ideally, restraints should reflect the geometry of the molecules as well as their overall mobility. Various programs use various restraints (Sheldrick, 2008[Sheldrick, G. M. (2008). Acta Cryst. A64, 112-122.]; Adams et al., 2010[Adams, P. D. et al. (2010). Acta Cryst. D66, 213-221.]; Konnert & Hendrickson, 1980[Konnert, J. H. & Hendrickson, W. A. (1980). Acta Cryst. A36, 344-350.]; Murshudov et al., 1997[Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Acta Cryst. D53, 240-255.]). In the new version of REFMAC5, restraints on ADPs are based on the distances between distributions. If we assume that atoms are represented as Gaussian distributions, then we are able to design restraints based on the distance between such distributions.

For a given two distributions in three-dimensional space P(x) and Q(x), the symmetrized Kullback–Liebler (KL) divergence (McKay, 2003[McKay, D. J. C. (2003). Information Theory, Inference and Learning Algorithms. Cambridge University Press.]) is defined as follows:

[{\rm KL}(P,Q) = {\textstyle \int \limits_{R^3}} P(x) \log{{P(x)} \over {Q(x)}} \,\, {\rm d}x + {\textstyle \int \limits_{R^3}} Q(x) \log{{Q(x)} \over {P(x)}} \,\, {\rm d}x. \eqno (20)]

It can be verified that the symmetrized KL divergence satisfies the conditions of a metric distance in the space of distributions. The KL divergence can also be represented as follows:

[{\rm KL}(P,Q) = {\textstyle \int \limits_{R^3}} [P(x)-Q(x)]\{\log[P(x)]-\log[Q(x)]\}\,\, {\rm d}x. \eqno (21)]

This distance changes more smoothly than the L2 distance between functions and seems to be a useful criterion for the design of approximate probability distributions (McKay, 2003[McKay, D. J. C. (2003). Information Theory, Inference and Learning Algorithms. Cambridge University Press.]; O'Hagan, 1994[O'Hagan, A. (1994). Kendal's Advanced Theory of Statistics, Vol. 2B, Bayesian Inference. London: Hodder Arnold.]).

When both distributions are Gaussian with mean zero, this distance has an elegant form. Assume that both atoms have Gaussian distribution:

[\eqalignno {P(x) & = {{1} \over {(2\pi)^{1/3} \det(U_1)^{1/2}}} \exp(-x^T U_1 x/2) \cr Q(x) & = {{1} \over {(2\pi)^{1/3} \det(U_2)^{1/2}}} \exp(-x^T U_2 x/2). & (22)}]

In this case, the KL divergence becomes

[{\rm KL}(P,Q) = {\rm trace}(U_{1}^{-1} U_{2} + U_{2}^{-1} U_{1} - 2). \eqno (23)]

In the case of isotropic ADPs, KL has an even simpler form:

[{\rm KL}_{\rm iso}(P,Q) = 3(B_1/B_2+B_2/B_1-2) = 3{{(B_1-B_2)^2} \over {B_1 B_2}}. \eqno (24)]

REFMAC5 uses restraints based on the KL divergence:

[{\textstyle \sum \limits_{{\rm atom \,\, pairs} \atop {r_{ij} \lt r_{\max}}}} w_{ij}(r_{ij}) KL(i,j). \eqno (25)]

The summation is over all atom pairs with distance less than rmax. The weights depend on the nature of the bonds as well as on the distance between the atoms. If atoms are bonded or angle-related then the weight is larger. However, the weight is smaller if the atoms are not related by covalent bonds. Moreover, if the distance between the atoms is more than 3 Å then the weight decreases as follows:

[w_{ij}(r_{ij}) = 9w_{0,ij}/d_{jj}^{2}, \eqno (26)]

where w0,ij is the weight for nonbonded atoms that are closer than 3 Å to each other.

2.6.7. Rigid-bond restraints

For anisotropic atoms there are so-called rigid-bond restraints, based on the idea of rigid-bond tests of anisotropic atoms (Hirshfeld, 1976[Hirshfeld, F. L. (1976). Acta Cryst. A32, 239-244.]). The idea is that projections of U values on the bond vector joining two atoms should be similar. In other words, if two atoms are bonded then an oscillation across the bond is more likely than an independent oscillation along the bond. Atoms oscillate along the bond in a concerted fashion.

Rigid-bond restraints are designed as follows. Let us assume that two atoms have positions x1 and x2 and their corresponding ADPs are U1 and U2; the unit vector joining these atoms is then calculated,

[l_{12} = {{x_1-x_2} \over {||x_1-x_2||}}. \eqno (27)]

The projections of corresponding U values on this vector are then calculated as

[\eqalignno {U_{1,{\rm proj}} & = \textstyle \sum \limits_{i,j} U_{1,ij}l_il_j \cr U_{2,{\rm proj}} & = \textstyle \sum \limits_{i,j} U_{2,ij} l_i l_j. & (28)}]

Now, using these projections, the KL divergence is formed for all pairs and added to the target function:

[{\textstyle \sum \limits_{|r_{lk}|\lt r_{\max}}} w_{lk} {{(U_{l,{\rm proj}}-U_{k,{\rm proj}})^2} \over {U_{l,{\rm proj}} U_{k,{\rm proj}}}}. \eqno (29)]

Again, the weights depend on the nature of the bonds between the atoms and the distances between them. Note that if the ADPs of both bonded atoms are isotropic then the rigid-bond restraint is equivalent to the above-described KL restraint.

2.6.8. Sphericity restraints

To avoid atoms exploding and becoming too elliptical or, even worse, non-elliptical, REFMAC5 uses restraints on sphericity. It is a simple restraint: an isotropic equivalent of the anisotropic tensor,

[\textstyle \sum \limits_{k} w_{k} \sum \limits_{i,i} [U_{k,ii}-{\rm trace}(U_{k})]^2 + \sum \limits_{i\neq j} U_{k,ij}^2, \eqno (30)]

where k indexes the anisotropic atoms, i, j are components of the anisotropic tensor and wk are weights for this particular type of restraint. The weights depend on the number of other restraints (KL, rigid bond) on this atom. Atoms that have fewer restraints have stronger weights on sphericity, since these atoms are more likely to be unstable.

It should be noted that similar restraints on ADPs are used in several other refinement programs (Sheldrick, 2008[Sheldrick, G. M. (2008). Acta Cryst. A64, 112-122.]; Adams et al., 2010[Adams, P. D. et al. (2010). Acta Cryst. D66, 213-221.]).

3. Parameterization

3.1. General parameters

REFMAC5 uses the standard parameterization of molecules in terms of atomic coordinates and isotropic/anisotropic atomic displacement parameters. The refinement of these parameters is performed using an FFT formulation for gradients and approximations for second derivatives. Details of these formulations have been published elsewhere (Murshudov et al., 1997[Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Acta Cryst. D53, 240-255.], 1999[Murshudov, G. N., Vagin, A. A., Lebedev, A., Wilson, K. S. & Dodson, E. J. (1999). Acta Cryst. D55, 247-255.]; Steiner et al., 2003[Steiner, R. A., Lebedev, A. A. & Murshudov, G. N. (2003). Acta Cryst. D59, 2114-2124.]). Once the gradients and approximate second derivatives have been calculated for these parameters, they are used to calculate the derivatives of derived parameters. Derived parameters include those for rigid-body and TLS refinement.

3.2. Rigid body

Rigid-body parameterization is achieved as follows. For each rigid group, transformation operators are defined and new positions are calculated from the starting positions using the formula

[x_{\rm new} = R_{j}(\alpha,\beta,\gamma) (x_{\rm old} - t_{\rm origin}) + t_j, \eqno (31)]

where Rj is the rotation matrix, torigin is the centre of mass of the rigid group and tj is the translational component of the transformation. The xold are the starting coordinates of the atoms and xnew are their positions after application of the transformation operators. There are six parameters per rigid group, defining the rotation matrix and the translational component. At each cycle of refinement, an eigenvalue-filtering technique is used to avoid potential singularities arising from the shape of the rigid groups. It should be noted that no terms between rigid groups are calculated for the approximate second-derivative matrix. For large rigid groups this does not pose much of a problem. However, for many small rigid groups it may slow down convergence substantially. In any case, it is not recommended to divide molecules into very small rigid groups. For these cases, `jelly-body' refinement should produce better results.

Once derivatives with respect to the positional parameters have been calculated, those for rigid-body parameters are calculated using the chain rule. The current version of REFMAC5 uses an Euler angle parameterization.

3.3. TLS

Atomic displacement parameters describe the spread of atomic positions and can be derived from the Fourier transform of a Gaussian probability distribution function for the atomic centre. The atomic displacement parameters are an important part of the model. Traditionally, a single parameter describing isotropic displacements has been used, namely the B factor. However, it is well known that atomic displacements are likely to be anisotropic owing to directional bonding and at high resolutions the six parameters per atom of a fully anisotropic model can be refined. TLS refinement is a way of modelling anisotropic displacements using only a few parameters, so that the method can be used at medium and low resolutions. The TLS model was originally proposed for small-molecule crystallography (Schomaker & Trueblood, 1968[Schomaker, V. & Trueblood, K. N. (1968). Acta Cryst. B24, 63-76.]) and was incorporated into REFMAC5 almost ten years ago (Winn et al., 2001[Winn, M. D., Isupov, M. N. & Murshudov, G. N. (2001). Acta Cryst. D57, 122-133.]).

The idea behind TLS is to suppose that groups of atoms move as rigid bodies and to constrain the anisotropic displace­ment parameters of these atoms accordingly. The rigid-body motion is described by translation (T), libration (L) and screw (S) tensors, using a total of 20 parameters for each rigid body. Given values for these 20 parameters, anisotropic displacement parameters can be derived for each atom in the group (and this relationship also allows one to calculate derivatives via the chain rule). Usually, an extra isotropic displacement parameter (the residual B factor) is refined for each atom in addition to the TLS contribution. The sum of these two con­tributions can be output using the supplementary program TLSANL (Howlin et al., 1993[Howlin, B., Butler, S. A., Moss, D. S., Harris, G. W. & Driessen, H. P. C. (1993). J. Appl. Cryst. 26, 622-624.]) or optionally directly from REFMAC5.

TLS groups need to be chosen before refinement and constitute part of the definition of the model for the macromolecule. Groups of atoms should conform to the idea that they move as a quasi-rigid body. Often the choice of one group per chain suffices (or at least serves as a reference calculation) and this is the default in REFMAC5. More detailed choices can be made using methods such as TLSMD (Painter & Merritt, 2006[Painter, J. & Merritt, E. A. (2006). Acta Cryst. D62, 439-450.]). By default, REFMAC5 also includes waters in the first hydration shell, which it seems reasonable to assume move in concert with the protein chain.

Fig. 4[link] shows the effect of TLS refinement and orientation of libration tensors. In this case, TLS refinement improves R/Rfree and the derived libration tensors make biological sense.

[Figure 4]
Figure 4
TLS refinement of glucosamine-6-phosphate synthase (Mouilleron & Golinelli-Pimpaneau, 2007[Mouilleron, S. & Golinelli-Pimpaneau, B. (2007). Protein Sci. 16, 485-493.]). The results for chain C are shown, which is separated into two TLS groups. Thermal ellipsoids derived from the TLS refinement are shown for the two groups. Those in red correspond to the ligand Fru6P which is included in the TLS group for the synthase domain. The yellow arrows show the principal axes of the libration tensor for each TLS group. Inclusion of TLS parameters led to a reduction in R and Rfree of 3.4% and 3.8%, respectively, and could be related to the biological function. The principal axis of the libration tensor was calculated using TLSANL (Howlin et al., 1993[Howlin, B., Butler, S. A., Moss, D. S., Harris, G. W. & Driessen, H. P. C. (1993). J. Appl. Cryst. 26, 622-624.]) and the figure was prepared using CCP4mg (Potterton et al., 2004[Potterton, L., McNicholas, S., Krissinel, E., Gruber, J., Cowtan, K., Emsley, P., Murshudov, G. N., Cohen, S., Perrakis, A. & Noble, M. (2004). Acta Cryst. D60, 2288-2294.]).

4. Optimization

REFMAC5 uses the Gauss–Newton method for optimization. For an elegant and comprehensive review on optimization techniques, see Nocedal & Wright (1999[Nocedal, J. & Wright, S. J. (1999). Numerical Optimization. New York: Springer.]). In this method, the exact second derivative is not calculated, but rather approximated to make sure it is always non-negative. Once derivatives or approximations have been calculated, the following linear equation is built,

[H \Delta p = -G, \eqno (32)]

where H is the approximate second derivative and G is the gradient vector. The contribution of most of the geometrical terms are calculated using algorithms designed for quadratic optimization or least-squares fitting (Press et al., 1992[Press, W. H., Flannery, B. P., Teukolsky, S. A. & Vetterling, W. T. (1992). Numerical Recipes in FORTRAN. Cambridge University Press.]). To calculate the contribution from the Geman–McClure terms, the following approximation is used (Huber & Ronchetti, 2009[Huber, P. J. & Ronchetti, E. M. (2009). Robust Statistics. Hoboken: John Wiley & Sons.]),

[\eqalignno {{\rm GM}(\sigma,r) & = {{r^2} \over {1+\sigma^2 r^2}} \cr {{\rm d GM} \over {{\rm d} r}} & = {{2 r} \over {(1+\sigma^2 r^2)^2}} \cr {{{\rm d}^2 {\rm GM}} \over {{\rm d} r^2}} & \simeq {{2} \over {(1+\sigma^2 r^2)^2}}. & (33)}]

This approximation ensures that H stays non-negative and consequently directions calculated as a result of the solution of (32)[link] point towards a reduction of the total function.

The contribution of the X-ray term to the gradient is calculated using FFT algorithms (Murshudov et al., 1997[Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Acta Cryst. D53, 240-255.]). The Fisher information matrix, as described by Steiner et al. (2003[Steiner, R. A., Lebedev, A. A. & Murshudov, G. N. (2003). Acta Cryst. D59, 2114-2124.]), is used to calculate the contribution of the likelihood functions to the matrix H. Tests have demonstrated that using the diagonal elements of the Fisher information matrix and both diagonal and nondiagonal elements of the geometry terms results in a more stable refinement.

Once all of the terms contributing to H and G have been calculated, the linear equation (32)[link] is solved using preconditioned conjugate-gradient methods (Nocedal & Wright, 1999[Nocedal, J. & Wright, S. J. (1999). Numerical Optimization. New York: Springer.]; Tronrud, 1992[Tronrud, D. E. (1992). Acta Cryst. A48, 912-916.]). A diagonal matrix formed by the diagonal elements of H is used as a preconditioner. This brings parameters with different overall scales (positional and B values) onto the same scale and controlling convergence becomes easier.

If the conjugate-gradient procedure does not converge in Nmaxiter cycles (the default is 1000), then the diagonal terms of the H matrix are increased. Thus, if the matrix is not positive then ridge regression is activated. In the presence of a potential (near-) singularity, REFMAC5 uses the following procedure to solve the linear equation.

  • (i) Define and use preconditioner. At this stage, H and G are modified. Define the new matrix by H1 and vector by G1.

  • (ii) Set γ = 0.

  • (iii) Define a new matrix: H2 = H1 + γI, where I is the identity matrix.

  • (iv) Solve the equation H2p = −G1 using the conjugate-gradient method for linear equations for sparse and positive-definite matrices (Press et al., 1992[Press, W. H., Flannery, B. P., Teukolsky, S. A. & Vetterling, W. T. (1992). Numerical Recipes in FORTRAN. Cambridge University Press.]). If convergence was achieved in less than Nmaxiter iterations, then proceed to the next step. Otherwise, increase γ and go to step (iii).

  • (v) Decondition the matrix, gradient and shift vectors.

  • (vi) Apply shifts to the atomic parameters, making sure that the ADPs are positive.

  • (vii) Calculate the value of the total function.

  • (viii) If the value of the total function is less than the previous value, then proceed to the next step. Otherwise, reduce the shifts and repeat steps (vi)–(viii).

  • (ix) Finish the refinement cycle.

After application of the shifts, the next cycle of refinement starts.

5. Conclusions

Refinement is an important step in macromolecular crystal structure elucidation. It is used as a final step in structure solution, as well as as an intermediate step to improve models and obtain improved electron density to facilitate further model rebuilding.

REFMAC5 is one of the refinement programs that incorporates various tools to deal with some crystal peculiarities, low-resolution MX structure refinement and high-resolution refinement. There are also tabulated dictionaries of the constituent blocks of macromolecules, cofactors and ligands. The number of dictionary elements now exceeds 9000. There are also tools to deal with new ligands and covalent modifications of ligands and/or proteins.

Low-resolution MX structure analysis is still a challenging task. There are several outstanding problems that need to be dealt with before we can claim that low-resolution MX analysis is complete. Statistics, image processing and computer science provide general methods for these and related problems. Unfortunately, these techniques cannot be directly applied to MX structure analysis, either because of the huge computer resources needed or because the assumptions used are not applicable to MX.

In our opinion, the problems of state-of-the-art MX analysis that need urgent attention include the following.

  • (i) Reparameterization depending on the quality and the amount of experimental data. Some tools implemented in REFMAC5 allow partial dealing with this problem. These tools include (a) restraining against known homologous structures, (b) `jelly-body' restraints or refinement along implicit normal modes, (c) long-range ADP restraints based on KL divergence, (d) automatic local and global NCS restraints and (e) experimental phase-information restraints. However, low-resolution refinement and model (re)building is still not as automatic as for high-resolution structures.

  • (ii) Statistical methods for peculiar crystals with low signal-to-noise ratio. Some of the implemented tools, such as likelihood-based twin refinement and SAD/SIRAS refinement, help in the analysis of some of the data produced by such crystals. The analysis of data from such peculiar crystals as OD disorder with or without twinning, multiple cells, translocational disorder or modulated crystals in general remains problematic.

  • (iii) Another important problem is that of limited and noisy data. As a result of resolution cutoff (owing to the survival time of the crystal under X-ray irradiation or otherwise), the resultant electron density usually exhibits noise owing to series termination. If the resolution that the crystal actually diffracts to is the same as the resolution of the data, then series termination is not very serious as the signal dies out towards the limit of the resolution. However, in these cases the electron density becomes blurred, reflecting high mobility of the molecules or crystal disorder. When map sharpening is used, the signal is amplified and series termination becomes a serious problem. To reduce noise, it is necessary to work with the full Fourier transformation. In other words, resolution extension and the prediction of missing reflections becomes an important problem. The dramatic effect of such an approach for density modification at high resolution has been demonstrated by Altomare et al. (2008[Altomare, A., Cuocci, C., Giacovazzo, C., Kamel, G. S., Moliterni, A. & Rizzi, R. (2008). Acta Cryst. A64, 326-336.]) and Sheldrick (2008[Sheldrick, G. M. (2008). Acta Cryst. A64, 112-122.]). The direct replacement of missing reflections by calculated ones necessarily introduces bias towards model errors and may mask real signal. To avoid this, it is necessary to integrate over the errors in the model parameters (coordinates, B values, scale values and twin fractions). However, since the number of parameters is very large (sometimes exceeding 1 000 000), integration using available numerical techniques is not feasible.

  • (iv) Error estimation. Despite the advances in MX, there have been few attempts to evaluate errors in the estimated parameters. Works attempting to deal with this problem are few and far between (Sheldrick, 2008[Sheldrick, G. M. (2008). Acta Cryst. A64, 112-122.]). To complete MX structure analysis, it is necessary to develop and implement techniques for error estimation. If this is achieved, then incorrect structures could be eliminated while analysing the MX data and building the model. One of the promising approaches to this problem is the Gauss–Markov random field sampling technique (Hue & Held, 2005[Hue, H. & Held, L. (2005). Gaussian Markov Random Field Models. Boca Raton: Chapman & Hall/CRC.]) using the (approximate) second derivative as a field-defining matrix.

  • (v) Multicrystal refinement with the simultaneous multicrystal averaging of isomorphous or non-isomorphous crystals is one of the important directions for low-resolution refinement. If it is dealt with properly then the number of structures analysed at low resolution should increase substantially.

Further improvement may consist of a combination of various experimental techniques. For example, the simultaneous treatment of electron-microscopy (EM) and MX data could increase the reliability of EM models and put MX models in the context of larger biological systems.

The direct use of unmerged data is another direction in which refinement procedures could be developed. If this were achieved, then several long-standing problems could be easier to deal with. Two such problems are the following. (i) In general, the space group of a crystal should be considered as an adjustable parameter. If unmerged data are used, then space-group assumptions could be tested after every few sessions of refinement and model building. (ii) Dealing with the processes in the crystal during data collection requires unmerged data. One of the best-known such problems is radiation damage.

Acknowledgements

We thank the CCP4 staff and CCP4 Bulletin Board participants for continuous stimulating discussions. We also thank the wider user community. Without their continuous feedback and bug reports, development of programs such as REFMAC5 would be impossible. This work was supported by Wellcome Trust Grant No. 064405/Z/01/A to GNM and FL. AAV was supported by a CCP4 grant, PS and NSP are supported by the Netherlandse Organisatie voor Wetenschappelijk Onderzoek (NWO), and AAL and RAN were supported by a BBSRC grant. CCP4 is supported by BBSRC grant BB/F0202281.

References

First citationAdams, P. D. et al. (2010). Acta Cryst. D66, 213–221.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationAgarwal, R. C. (1978). Acta Cryst. A34, 791–809.  CrossRef CAS IUCr Journals Web of Science Google Scholar
First citationAltomare, A., Cuocci, C., Giacovazzo, C., Kamel, G. S., Moliterni, A. & Rizzi, R. (2008). Acta Cryst. A64, 326–336.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationBlanc, E., Roversi, P., Vonrhein, C., Flensburg, C., Lea, S. M. & Bricogne, G. (2004). Acta Cryst. D60, 2210–2221.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationBooth, A. (1946). Proc. R. Soc. Lond. A Math. Phys. Sci. 188, 77–92.  CrossRef CAS Google Scholar
First citationBricogne, G. (1997). Methods Enzymol. 276, 361–423.  CrossRef CAS Web of Science Google Scholar
First citationBricogne, G. & Irwin, J. (1996). Proceedings of the CCP4 Study Weekend. Macromolecular Refinement, edited by E. J. Dodson, M. Moore, A. Ralph & S. Bailey, pp. 85–92. Warrington: Daresbury Laboratory.  Google Scholar
First citationBrünger, A. (1991). Annu. Rev. Phys. Chem. 42, 197–223.  Google Scholar
First citationBrünger, A. T. (1989). Acta Cryst. A45, 42–50.  CrossRef Web of Science IUCr Journals Google Scholar
First citationBrünger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse-Kunstleve, R. W., Jiang, J.-S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T. & Warren, G. L. (1998). Acta Cryst. D54, 905–921.  Web of Science CrossRef IUCr Journals Google Scholar
First citationCochran, W. (1948). Acta Cryst. 1, 138–142.  CrossRef CAS IUCr Journals Google Scholar
First citationCollaborative Computational Project, Number 4 (1994). Acta Cryst. D50, 760–763.  CrossRef IUCr Journals Google Scholar
First citationCowtan, K. (2006). Acta Cryst. D62, 1002–1011.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationCruickshank, D. W. J. (1952). Acta Cryst. 5, 511–518.  CrossRef IUCr Journals Web of Science Google Scholar
First citationCruickshank, D. W. J. (1956). Acta Cryst. 9, 747–753.  CrossRef CAS IUCr Journals Web of Science Google Scholar
First citationDeLano, W. L. (2002). PyMOL. https://www.pymol.orgGoogle Scholar
First citationDiamond, R. (1971). Acta Cryst. A27, 436–452.  CrossRef CAS IUCr Journals Web of Science Google Scholar
First citationEmsley, P. & Cowtan, K. (2004). Acta Cryst. D60, 2126–2132.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationGeman, S. & McClure, D. (1987). Bull. Int. Stat. Inst. 52, 5–21.  Google Scholar
First citationGuillot, B., Viry, L., Guillot, R., Lecomte, C. & Jelsch, C. (2001). J. Appl. Cryst. 34, 214–223.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationHendrickson, W. A. & Lattman, E. E. (1970). Acta Cryst. B26, 136–143.  CrossRef CAS IUCr Journals Google Scholar
First citationHirshfeld, F. L. (1976). Acta Cryst. A32, 239–244.  CrossRef IUCr Journals Web of Science Google Scholar
First citationHowlin, B., Butler, S. A., Moss, D. S., Harris, G. W. & Driessen, H. P. C. (1993). J. Appl. Cryst. 26, 622–624.  CrossRef Web of Science IUCr Journals Google Scholar
First citationHuber, P. J. & Ronchetti, E. M. (2009). Robust Statistics. Hoboken: John Wiley & Sons.  Google Scholar
First citationHue, H. & Held, L. (2005). Gaussian Markov Random Field Models. Boca Raton: Chapman & Hall/CRC.  Google Scholar
First citationJack, A. & Levitt, M. (1978). Acta Cryst. A34, 931–935.  CrossRef CAS IUCr Journals Web of Science Google Scholar
First citationJiang, J.-A. & Brünger, A. (1994). J. Mol. Biol. 243, 100–115.  CrossRef CAS PubMed Web of Science Google Scholar
First citationJones, T. A., Zou, J.-Y., Cowan, S. W. & Kjeldgaard, M. (1991). Acta Cryst. A47, 110–119.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationKonnert, J. H. (1976). Acta Cryst. A32, 614–617.  CrossRef CAS IUCr Journals Web of Science Google Scholar
First citationKonnert, J. H. & Hendrickson, W. A. (1980). Acta Cryst. A36, 344–350.  CrossRef CAS IUCr Journals Google Scholar
First citationLebedev, A. A., Vagin, A. A. & Murshudov, G. N. (2006). Acta Cryst. D62, 83–95.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationLuzzati, V. (1951). Acta Cryst. 4, 367–369.  CrossRef IUCr Journals Web of Science Google Scholar
First citationLuzzati, V. (1952). Acta Cryst. 5, 802–810.  CrossRef IUCr Journals Web of Science Google Scholar
First citationMardia, K. V. & Bibby, J. E. (1979). Multivariate Analysis. London/San Diego: Academic Press.  Google Scholar
First citationMardia, K. V. & Jupp, P. E. (1999). Directional Statistics. Chichester: John Wiley & Sons.  Google Scholar
First citationMcKay, D. J. C. (2003). Information Theory, Inference and Learning Algorithms. Cambridge University Press.  Google Scholar
First citationMcRee, D. E. & Israel, M. (2008). J. Struct. Biol. 163, 208–213.  Web of Science CrossRef PubMed CAS Google Scholar
First citationMooij, W., Cohen, S., Joosten, K., Murshudov, G. & Perrakis, A. (2009). Structure, 17, 183–189.  Web of Science CrossRef PubMed CAS Google Scholar
First citationMouilleron, S. & Golinelli-Pimpaneau, B. (2007). Protein Sci. 16, 485–493.  Web of Science CrossRef PubMed CAS Google Scholar
First citationMurshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Acta Cryst. D53, 240–255.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationMurshudov, G. N., Vagin, A. A., Lebedev, A., Wilson, K. S. & Dodson, E. J. (1999). Acta Cryst. D55, 247–255.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationNeedleman, S. B. & Wunsch, C. D. (1970). J. Mol. Biol. 48, 443–453.  CrossRef CAS PubMed Web of Science Google Scholar
First citationNess, S. R., de Graaff, R. A. G., Abrahams, J. P. & Pannu, N. S. (2004). Structure, 12, 1753–1761.  Web of Science CrossRef PubMed CAS Google Scholar
First citationNocedal, J. & Wright, S. J. (1999). Numerical Optimization. New York: Springer.  Google Scholar
First citationO'Hagan, A. (1994). Kendal's Advanced Theory of Statistics, Vol. 2B, Bayesian Inference. London: Hodder Arnold.  Google Scholar
First citationPainter, J. & Merritt, E. A. (2006). Acta Cryst. D62, 439–450.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationPannu, N. S., Murshudov, G. N., Dodson, E. J. & Read, R. J. (1998). Acta Cryst. D54, 1285–1294.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationPannu, N. S. & Read, R. J. (1996). Acta Cryst. A52, 659–668.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationPerrakis, A., Morris, S. & Lamzin, V. S. (1999). Nature Struct. Biol. 6, 458–463.  Web of Science CrossRef PubMed CAS Google Scholar
First citationPotterton, L., McNicholas, S., Krissinel, E., Gruber, J., Cowtan, K., Emsley, P., Murshudov, G. N., Cohen, S., Perrakis, A. & Noble, M. (2004). Acta Cryst. D60, 2288–2294.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationPress, W. H., Flannery, B. P., Teukolsky, S. A. & Vetterling, W. T. (1992). Numerical Recipes in FORTRAN. Cambridge University Press.  Google Scholar
First citationR Development Core Team (2007). R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing. https://www.R-project.orgGoogle Scholar
First citationRamachandran, G. N., Srinivasan, R. & Sarma, V. R. (1963). Acta Cryst. 16, 662–666.  CrossRef CAS IUCr Journals Web of Science Google Scholar
First citationSchomaker, V. & Trueblood, K. N. (1968). Acta Cryst. B24, 63–76.  CrossRef CAS IUCr Journals Web of Science Google Scholar
First citationSchröder, G. F., Brünger, A. T. & Levitt, M. (2010). Nature (London), 464, 1218–1222.  Web of Science PubMed Google Scholar
First citationSchüttelkopf, A. W. & van Aalten, D. M. F. (2004). Acta Cryst. D60, 1355–1363.  Web of Science CrossRef IUCr Journals Google Scholar
First citationSheldrick, G. M. (2008). Acta Cryst. A64, 112–122.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationSkubák, P., Murshudov, G. N. & Pannu, N. S. (2004). Acta Cryst. D60, 2196–2201.  Web of Science CrossRef IUCr Journals Google Scholar
First citationSkubák, P., Murshudov, G. & Pannu, N. S. (2009). Acta Cryst. D65, 1051–1061.  Web of Science CrossRef IUCr Journals Google Scholar
First citationSkubák, P., Waterreus, W.-J. & Pannu, N. S. (2010). Acta Cryst. D66, 783–788.  Web of Science CrossRef IUCr Journals Google Scholar
First citationSrinivasan, R. & Parthasarathy, S. (1976). Some Statistical Applications in X-ray Crystallography. Oxford: Pergamon Press.  Google Scholar
First citationSrinivasan, R. & Ramachandran, G. N. (1965). Acta Cryst. 19, 1008–1014.  CrossRef CAS IUCr Journals Web of Science Google Scholar
First citationSteiner, R. A., Lebedev, A. A. & Murshudov, G. N. (2003). Acta Cryst. D59, 2114–2124.  CrossRef CAS IUCr Journals Google Scholar
First citationStuart, A. & Ord, K. (2009). Kendall's Advanced Theory of Statistics, Vol. 1, Distribution Theory. Hoboken: John Wiley & Sons.  Google Scholar
First citationStuart, A., Ord, K. & Arnold, S. (2009). Kendall's Advanced Theory of Statistics, Vol. 2A, Classical Inference. Hoboken: John Wiley & Sons.  Google Scholar
First citationSutton, G., Grimes, J., Stuart, D. & Roy, P. (2007). Nature Struct. Mol. Biol. 14, 449–451.  CrossRef CAS Google Scholar
First citationTen Eyck, L. F. (1977). Acta Cryst. A33, 486–492.  CrossRef CAS IUCr Journals Web of Science Google Scholar
First citationTerwilliger, T. C. (2003). Acta Cryst. D59, 1174–1182.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationTibshirani, R. J. (1997). Stat. Med. 16, 385–395.  CrossRef CAS PubMed Web of Science Google Scholar
First citationTrion, M. M. (1996). Phys. Rev. Lett. 77, 1906–1908.  Google Scholar
First citationTronrud, D. E. (1992). Acta Cryst. A48, 912–916.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationTronrud, D. E., Ten Eyck, L. F. & Matthews, B. W. (1987). Acta Cryst. A43, 489–501.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationTronrud, G. (1997). Methods Enzymol. 277, 306–319.  CrossRef CAS PubMed Web of Science Google Scholar
First citationTurk, D. (2008). Acta Cryst. A64, C23.  CrossRef IUCr Journals Google Scholar
First citationVagin, A. A., Steiner, R. A., Lebedev, A. A., Potterton, L., McNicholas, S., Long, F. & Murshudov, G. N. (2004). Acta Cryst. D60, 2184–2195.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationWinn, M. D., Isupov, M. N. & Murshudov, G. N. (2001). Acta Cryst. D57, 122–133.  Web of Science CrossRef CAS IUCr Journals Google Scholar

This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.

Journal logoBIOLOGICAL
CRYSTALLOGRAPHY
ISSN: 1399-0047
Volume 67| Part 4| April 2011| Pages 355-367
Follow Acta Cryst. D
Sign up for e-alerts
Follow Acta Cryst. on Twitter
Follow us on facebook
Sign up for RSS feeds