research papers\(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

Journal logoSTRUCTURAL
BIOLOGY
ISSN: 2059-7983

Cryo-EM single-particle structure refinement and map calculation using Servalcat

crossmark logo

aMRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom, and bScientific Computing Department, UKRI Science and Technology Facilities Council, Rutherford Appleton Laboratory, Harwell Campus, Didcot OX11 0FA, United Kingdom
*Correspondence e-mail: kyamashita@mrc-lmb.cam.ac.uk, garib@mrc-lmb.cam.ac.uk

Edited by A. Perrakis, Netherlands Cancer Institute, The Netherlands (Received 4 May 2021; accepted 11 September 2021; online 29 September 2021)

In 2020, cryo-EM single-particle analysis achieved true atomic resolution thanks to technological developments in hardware and software. The number of high-resolution reconstructions continues to grow, increasing the importance of the accurate determination of atomic coordinates. Here, a new Python package and program called Servalcat is presented that is designed to facilitate atomic model refinement. Servalcat implements a refinement pipeline using the program REFMAC5 from the CCP4 package. After the refinement, Servalcat calculates a weighted FoFc difference map, which is derived from Bayesian statistics. This map helps manual and automatic model building in real space, as is common practice in crystallography. The FoFc map helps in the visualization of weak features including hydrogen densities. Although hydrogen densities are weak, they are stronger than in the electron-density maps produced by X-ray crystallography, and some H atoms are even visible at ∼1.8 Å resolution. Servalcat also facilitates atomic model refinement under symmetry constraints. If point-group symmetry has been applied to the map during reconstruction, the asymmetric unit model is refined with the appropriate symmetry constraints.

1. Notation

FT: Fourier transform of unknown true map (complex values).

Fn: Fourier transform of noise in the observed map (complex values).

Fo1, Fo2: Fourier transforms of the two unweighted and unsharpened half maps from independent reconstructions (complex values).

Fo: Fourier transform of the observed full map, (Fo1 + Fo2)/2.

Fc: Fourier transform of calculated map from atomic coordinates (complex values).

E: structure factors normalized in resolution bins, F/(〈|F|2〉)1/2.

k: resolution-dependent scale factor between Fo and FT.

D: resolution-dependent scale factor between Fo and Fc.

[\sigma_{\rm T}^{2}]: variance of signal, var(FT).

[\sigma_{\rm n}^{2}]: variance of noise, var(Fn).

[\sigma_{\rm U,T}^{2}]: variance of unexplained signal, var(DFckFT).

f: atomic scattering factor.

s: column vector of position in reciprocal space.

sT: row vector of position in reciprocal space.

x: column vector of position in real space.

(R, t): rotation matrix and translation vector that could be an element of a point group.

B: displacement parameter of an atom, or blurring parameter for a local or global region of a map. A real value (isotropic case) or a 3 × 3 symmetric matrix (anisotropic case). Usually B is isotropic and atomic unless otherwise stated. Also called an atomic displacement parameter (ADP) if associated with an atom.

Unless otherwise stated, all quantities in Fourier space are dependent on s.

2. Introduction

Atomic model refinement is the optimization of the model's parameters against the observed data. Atomic parameters typically include coordinates, atomic displacement parameters (ADPs) and occupancies. In crystallography, refinement is crucial because of the phase problem: the accuracy of density maps relies on the accuracy of the phases of the structure factors. Accurate phases are not observed and must be calculated from the model (Tronrud, 2004[Tronrud, D. E. (2004). Acta Cryst. D60, 2156-2168.]). More accurate maps may be obtained as the model becomes more accurate through the refinement. In single-particle analysis (SPA) there is no phase problem, although the Fourier coefficients can be noisy, especially at high resolution.

Accurate atomic model determination is becoming more and more important due to the `resolution revolution' in cryo-EM SPA following the introduction of direct electron detectors and new data-processing methods (Bai et al., 2015[Bai, X.-C., McMullan, G. & Scheres, S. H. W. (2015). Trends Biochem. Sci. 40, 49-57.]). As of April 2021, more than 2500 SPA entries with resolutions better than 3.5 Å have been deposited in the Electron Microscopy Data Bank (EMDB; Tagari et al., 2002[Tagari, M., Newman, R., Chagoyen, M., Carazo, J.-M. & Henrick, K. (2002). Trends Biochem. Sci. 27, 589.]). This improvement in resolution has accelerated the development of methods for model building, refinement and validation. Automatic model-building programs that were originally developed for crystallography are now being adapted for cryo-EM SPA maps (Terwilliger, Adams et al., 2018[Terwilliger, T. C., Adams, P. D., Afonine, P. V. & Sobolev, O. V. (2018a). Nat. Methods, 15, 905-908.]; Hoh et al., 2020[Hoh, S. W., Burnley, T. & Cowtan, K. (2020). Acta Cryst. D76, 531-541.]; Chojnowski et al., 2021[Chojnowski, G., Sobolev, E., Heuser, P. & Lamzin, V. S. (2021). Acta Cryst. D77, 142-150.]). Density modification and local map sharpening can help to interpret the map (Jakobi et al., 2017[Jakobi, A. J., Wilmanns, M. & Sachse, C. (2017). eLife, 6, e27131.]; Terwilliger, Sobolev et al., 2018[Terwilliger, T. C., Sobolev, O. V., Afonine, P. V. & Adams, P. D. (2018b). Acta Cryst. D74, 545-559.]; Ramírez-Aportela et al., 2019[Ramírez-Aportela, E., Vilas, J. L., Glukhova, A., Melero, R., Conesa, P., Martínez, M., Maluenda, D., Mota, J., Jiménez, A., Vargas, J., Marabini, R., Sexton, P. M., Carazo, J. M. & Sorzano, C. O. S. (2019). Bioinformatics, 36, 765-772.]; Ramlaul et al., 2019[Ramlaul, K., Palmer, C. M. & Aylett, C. H. (2019). J. Struct. Biol. 205, 30-40.]; Terwilliger et al., 2020[Terwilliger, T. C., Sobolev, O. V., Afonine, P. V., Adams, P. D. & Read, R. J. (2020). Acta Cryst. D76, 912-925.]). In general, care must be exercised when using any techniques based on prior knowledge; bias towards incorrect assumptions might lead to misinterpretation of the maps. Full-atom refinement can be performed either in real space (Afonine et al., 2018[Afonine, P. V., Poon, B. K., Read, R. J., Sobolev, O. V., Terwilliger, T. C., Urzhumtsev, A. & Adams, P. D. (2018). Acta Cryst. D74, 531-544.]) or in reciprocal space (Murshudov, 2016[Murshudov, G. N. (2016). Methods Enzymol. 579, 277-305.]).

After refinement, the model should be validated; the model should have a reasonable geometry and should describe the map well. Due to the low data-to-parameter ratio, all models will exhibit a degree of overfitting; however, the model should not deviate substantially from cross-validation data (Brown et al., 2015[Brown, A., Long, F., Nicholls, R. A., Toots, J., Emsley, P. & Murshudov, G. (2015). Acta Cryst. D71, 136-153.]). MolProbity is the most widely used geometry validation tool, and includes analyses of clashes, rotamers and the Ramachandran plot (Chen et al., 2010[Chen, V. B., Arendall, W. B., Headd, J. J., Keedy, D. A., Immormino, R. M., Kapral, G. J., Murray, L. W., Richardson, J. S. & Richardson, D. C. (2010). Acta Cryst. D66, 12-21.]). Map–model quality is assessed using real-space local correlations (Cragnolini et al., 2021[Cragnolini, T., Sahota, H., Joseph, A. P., Sweeney, A., Malhotra, S., Vasishtan, D. & Topf, M. (2021). Acta Cryst. D77, 41-47.]), which have commonly been used in crystallography (Tickle, 2012[Tickle, I. J. (2012). Acta Cryst. D68, 454-467.]). In reciprocal-space refinement, the R factor can be calculated as in crystallography, but the map–model Fourier shell correlation (FSC) is preferred as it does not depend on resolution-dependent scaling and takes phases into account explicitly. An FoFc map, which highlights un­modelled features and errors in the current model, is almost always used in crystallography, and some similar tools already exist for SPA (Joseph et al., 2020[Joseph, A. P., Lagerstedt, I., Jakobi, A., Burnley, T., Patwardhan, A., Topf, M. & Winn, M. (2020). J. Chem. Inf. Model. 60, 2552-2560.]). The σA-weighted (m|Fo| − D|Fc|)exp(iφc) map as used in crystallography is not directly applicable to SPA, because phases are available for both Fo and Fc and we should model the error of Fo in the complex plane, rather than simply using the estimated phase error as in crystallography (see below).

In 2020, cryo-EM SPA achieved atomic resolution, according to Sheldrick's criterion (Wlodawer & Dauter, 2017[Wlodawer, A. & Dauter, Z. (2017). Acta Cryst. D73, 379-380.]), in structural analyses of apoferritin, which were reported by two groups (Nakane et al., 2020[Nakane, T., Kotecha, A., Sente, A., McMullan, G., Masiulis, S., Brown, P. M. G. E., Grigoras, I. T., Malinauskaite, L., Malinauskas, T., Miehling, J., Uchański, T., Yu, L., Karia, D., Pechnikova, E. V., de Jong, E., Keizer, J., Bischoff, M., McCormack, J., Tiemeijer, P., Hardwick, S. W., Chirgadze, D. Y., Murshudov, G., Aricescu, A. R. & Scheres, S. H. W. (2020). Nature, 587, 152-156.]; Yip et al., 2020[Yip, K. M., Fischer, N., Paknia, E., Chari, A. & Stark, H. (2020). Nature, 587, 157-161.]). Nakane et al. (2020[Nakane, T., Kotecha, A., Sente, A., McMullan, G., Masiulis, S., Brown, P. M. G. E., Grigoras, I. T., Malinauskaite, L., Malinauskas, T., Miehling, J., Uchański, T., Yu, L., Karia, D., Pechnikova, E. V., de Jong, E., Keizer, J., Bischoff, M., McCormack, J., Tiemeijer, P., Hardwick, S. W., Chirgadze, D. Y., Murshudov, G., Aricescu, A. R. & Scheres, S. H. W. (2020). Nature, 587, 152-156.]) observed H-atom densities at 1.2 and 1.7 Å resolutions using FoFc maps calculated by REFMAC5. There is a higher chance of observing hydrogen density in electron microscopy than in X-ray crystallography because of the increased contrast for the lighter elements (Clabbers & Abrahams, 2018[Clabbers, M. T. B. & Abrahams, J. P. (2018). Crystallogr. Rev. 24, 176-204.]). Nevertheless, hydrogen density is relatively weak and there is always a much higher peak from the parent atom nearby, so the FoFc difference map is essential to see it. In addition, there is complexity in the interpretation of hydrogen peaks in EM. An electron in an H atom is usually shifted towards the parent atom from the nucleus position. In EM, both the electrons and the nucleus contribute to scattering, and this offset results in a shift of hydrogen density peaks beyond the position of the hydrogen nucleus (Nakane et al., 2020[Nakane, T., Kotecha, A., Sente, A., McMullan, G., Masiulis, S., Brown, P. M. G. E., Grigoras, I. T., Malinauskaite, L., Malinauskas, T., Miehling, J., Uchański, T., Yu, L., Karia, D., Pechnikova, E. V., de Jong, E., Keizer, J., Bischoff, M., McCormack, J., Tiemeijer, P., Hardwick, S. W., Chirgadze, D. Y., Murshudov, G., Aricescu, A. R. & Scheres, S. H. W. (2020). Nature, 587, 152-156.]).

SPA structures often have point-group symmetries (rather than space-group symmetry as in crystallography). Approximately half of the SPA entries in the EMDB have non-C1 point-group symmetry according to their associated metadata. Such symmetry is advantageous and helps to reach higher resolution because it increases the effective number of particles. If the map is symmetrized, downstream analyses should be aware of it and the structural model must follow the symmetry. As in crystallography, it is natural to work in a single asymmetric unit. The MTRIX records in the PDB format or _struct_ncs_oper in the mmCIF format can be used to encode the symmetry information.1 Currently, for structures from SPA there are only a few depositions of such asymmetric unit models in the PDB (excepting viruses). We recommend refining and depositing an asymmetric unit model, which makes sure the symmetry copies are truly identical. It should be noted that validation tools must be aware of any applied symmetry operators, but results should be reported for the asymmetric unit only. These considerations are only valid if the map is symmetrized, and we suggest that the point-group information should be required by the deposition system.

Here, we present Servalcat, a Python package and stand­alone program for the refinement and map calculation of cryo-EM SPA structures. Servalcat takes unsharpened and unweighted half maps of the independent reconstructions as inputs and implements a refinement pipeline using REFMAC5, which uses a dedicated likelihood function for SPA (Murshudov, 2016[Murshudov, G. N. (2016). Methods Enzymol. 579, 277-305.]). After the refinement, Servalcat calculates a sharpened and weighted FoFc map derived from Bayesian statistics as described below. If the map has point-group symmetry, the user can give an asymmetric unit model and a point-group symbol, and the program will output a refined asymmetric unit model with symmetry annotation as well as a symmetry-expanded model. The noncrystallographic symmetry (NCS) constraint function in REFMAC5 has been updated to consider symmetry-related nonbonded inter­actions and ADP similarity restraints (to ensure the similarity of ADPs of atoms brought into close proximity via symmetry operations).

Servalcat is freely available as a standalone package and also as part of CCP-EM (Burnley et al., 2017[Burnley, T., Palmer, C. M. & Winn, M. (2017). Acta Cryst. D73, 469-477.]), where the REFMAC5 interface has been updated to use Servalcat.

3. Map calculation and sharpening using signal variance

Let us assume that Fo is the result of a position-independent blurring k of the true Fourier coefficients FT with an independent zero-mean Gaussian noise with variance [\sigma_{\rm n}^{2}]. That is,

[p(F_{\rm o}\semi F_{\rm T}) = {{1} \over {\pi\sigma_{\rm n}^{2}}} \exp(-|F_{\rm o}-kF_{\rm T}|^{2}/\sigma_{\rm n}^{2}), \eqno (1)]

[{\rm var}(F_{\rm o}) = k^{2}\cdot{\rm var}(F_{\rm T})+\sigma_{\rm n}^{2} = k^{2} \sigma_{\rm T}^{2}+\sigma_{\rm n}^{2}. \eqno (2)]

Note that in this work we treat k as a function of resolution |s|. Multiplication by k in Fourier space is equivalent to isotropic blurring by a convolution in real space. In general, k could take on a different value at each point s in Fourier space, which would produce a position-independent but direction-dependent blurring in real space.

The variance of the noise ([\sigma_{\rm n}^{2}]) can be calculated from the half maps in resolution bins (Murshudov, 2016[Murshudov, G. N. (2016). Methods Enzymol. 579, 277-305.]),

[\sigma_{\rm n}^{2} = {{{\rm var}{(F_{\rm o1}-F_{\rm o2})}} \over {4}}.\eqno (3)]

We will later use the relationship of [\sigma_{\rm n}^{2}] and [k^{2}\sigma_{\rm T}^{2}] to the FSC, correlation coefficients in resolution bins (Rosenthal & Henderson, 2003[Rosenthal, P. B. & Henderson, R. (2003). J. Mol. Biol. 333, 721-745.]),

[{\rm FSC}_{\rm half} = {\rm CC}(F_{\rm o1},F_{\rm o2}) = {{k^{2} \sigma_{\rm T}^{2}} \over {k^{2}\sigma_{\rm T}^{2}+2\sigma_{\rm n}^{2}}}, \eqno(4)]

[{\rm FSC}_{\rm full} = {{k^{2}\sigma_{\rm T}^{2}} \over {k^{2}\sigma_{\rm T}^{2}+ \sigma_{\rm n}^{2}}} = {{2{\rm FSC}_{\rm half}} \over {{\rm FSC}_{\rm half}+1}}. \eqno (5)]

Let us also assume that the errors in the model follow a Gaussian distribution (Luzzati, 1952[Luzzati, V. (1952). Acta Cryst. 5, 802-810.]),

[p(F_{\rm T} \semi F_{\rm c}) = {{k^{2}} \over {\pi\sigma_{\rm U,T}^{2}}}\exp(-|kF_{\rm T }-DF_{\rm c}|^{2}/\sigma_{\rm U,T}^{2}),\eqno(6)]

We need two functions: the likelihood p(Fo; Fc) for the estimation of parameters (of the atomic model and of the distribution function) and the posterior distribution p(FT; FoFc) of the unknown FT for map calculation.

3.1. Likelihood

As derived in Murshudov (2016[Murshudov, G. N. (2016). Methods Enzymol. 579, 277-305.]),

[p(F_{\rm o} \semi F_{\rm c}) = {{1} \over {\pi(\sigma_{\rm U,T}^{2}+\sigma_{\rm n}^{ 2})}}\exp[-|F_{\rm o}-DF_{\rm c}|^{2}/(\sigma_{\rm U,T}^{2}+\sigma_{\rm n}^{2})] \eqno (7)]

is the likelihood function that is optimized during atomic model refinement. D and [\sigma_{\rm U,T}^{2}] are obtained in each resolution bin i by maximizing the joint likelihood (7[link]):

[D = {{\textstyle\sum \limits_{s\in i}F_{\rm o}(s)F_{\rm c}^{*}(s)} \over {\textstyle\sum\limits_{s\in i}|F_{\rm c}(s)|^{2}}}, \eqno (8)]

[\sigma_{\rm U,T}^{2} = {\rm max}\left[0,\textstyle\sum\limits_{s\in i}{{|F_{\rm o}(s)-DF_{\rm c}(s)|^{2}} \over {N_{i}}}-\sigma^{2}_{{\rm n},i}\right], \eqno(9)]

where Ni is the number of Fourier coefficients in bin i.

3.2. Posterior distribution and map calculation

The posterior distribution, as derived in Murshudov (2016[Murshudov, G. N. (2016). Methods Enzymol. 579, 277-305.]),

[p(F_{\rm T}\semi F_{\rm o},F_{\rm c}) = {{p(F_{\rm o}\semi F_{\rm T})p(F_{\rm T}\semi F_{ \rm c})} \over {p(F_{\rm o}\semi F_{\rm c})}} \eqno (10)]

is a 2D Gaussian distribution with the mean and variance

[\langle F_{\rm T}\rangle = {{wF_{\rm o}+(1-w)DF_{\rm c}} \over {k}}, \eqno(11)]

[{\rm var}(F_{\rm T}) = {{\sigma_{\rm n}^{2}\sigma_{\rm U,T}^{2}} \over {k^{2}(\sigma_{\rm U,T}^{2}+\sigma_{\rm n}^{2})}}, \eqno (12)]

where

[w = {{\sigma_{\rm U,T}^{2}} \over {\sigma_{\rm U,T}^{2}+\sigma_{\rm n}^{2}}}. \eqno (13)]

Coefficients for an FoFc-type difference map can be derived as

[F_{\rm diff} = \langle F_{\rm T}\rangle-{{DF_{\rm c}} \over {k}} = {{w} \over {k}}(F_{\rm o }-DF_{\rm c}). \eqno (14)]

The remaining unknown variable is k, which cannot be determined from the data alone. For position-independent isotropic Gaussian blurring, k has the form exp(−Boverall|s|2/4) and Boverall may be estimated from line fitting of a Wilson plot (Wilson, 1942[Wilson, A. J. C. (1942). Nature, 150, 152.]). However such an estimate is unstable, especially when only low-resolution data are available. Here, we introduce a simple approximation using the variance of the signal. Let us assume that the true map consists of atoms with the same isotropic ADP of 〈B〉, and then

[\eqalignno {k^{2}(s)\sigma_{\rm T}^{2}(s) & = \exp(-B_{\rm overall}|s|^{2}/2) \cr &\ \quad {\times}\ {\rm var}\left[\textstyle \sum \limits_{j}f_{j}\exp(-\langle B\rangle|s|^{2}/4)\exp(2\pi is^{\rm T}x_{j})\right] \cr & = \exp[-(B_{\rm overall}+\langle B\rangle)|s|^{2}/2]\cr &\ \quad {\times}\ \left\{\textstyle\sum \limits_{j}f_{j}^{2}+ \sum \limits_{j}\sum \limits_{j^{\prime}\neq j}f_{j}f_{j^{\prime}} \exp[2\pi is^{\rm T}(x_{j}-x_{j^{\prime}})]\right\}\cr & \simeq \exp[-(B_{\rm overall}+\langle B\rangle)|s|^{2}/2]\textstyle\sum \limits_{j}f_{j}^{2}. & (15)}]

We ignored the interference terms [\exp[2\pi is^{\rm T}(x_{j}-x_{j^{\prime}})]]. Further ignoring resolution-dependent terms in [\textstyle \sum f_{j}^{2}], we can use kσT as a proxy for k, which gives the best sharpening for the region, with a local blurring parameter of 〈B〉. kσT can be transformed as follows:

[k\simeq{{k\sigma_{\rm T}} \over {(k^{2}\sigma_{\rm T}^{2}+\sigma_{\rm n}^{2})^{1/2}}} (k^{2}\sigma_{\rm T}^{2}+\sigma_{\rm n}^{2})^{1/2} = ({\rm FSC}_{\rm full})^{1/2} (\langle|F_{\rm o}|^{2}\rangle)^{1/2}. \eqno (16)]

The FoFc coefficient then finally has the form

[F_{\rm diff} = {{w} \over {({\rm FSC}_{\rm full}\langle|F_{\rm o }|^{2}\rangle)^{1/2}}}(F_{\rm o}-DF_{\rm c}).\eqno (17)]

Servalcat calculates an FoFc map using (17[link]). Note that the FoFc map is only sensible when the ADPs are properly refined; otherwise we will see spurious peaks due to incorrect ADPs. For this reason, unsharpened Fo should be used as the input for atomic model refinement (see Section 4.1[link]); the sharpening is then consistent as the same sharpening factor is applied to Fo and Fc. Note also that the sharpening is based on the average B value, so regions having very different B values may show fewer structural features.

The map from the estimated true Fourier coefficients (11[link]) may be useful, but there is a risk of model bias because of the contribution from Fc. In the future, techniques may be available to resolve the issue of model bias. At the moment, Servalcat provides the following as a default map for manual inspection. This is a special case of (11[link]) in the absence of a model, that is with D = 0,

[\langle F_{\rm T}\rangle = {{k^{2}\sigma_{\rm T}^{2}} \over {k^{2}\sigma_{\rm T}^{2 }+\sigma_{\rm n}^{2}}}{{F_{\rm o}} \over {k}} = ({\rm FSC}_{\rm full})^{1/2}E_{\rm o}.\eqno (18)]

This is equivalent to EMDA's normalized expected map (Warshamanage et al., 2021[Warshamanage, R., Yamashita, K. & Murshudov, G. N. (2021). bioRxiv, 2021.07.26.453750.]).

The approach here should work at any resolution where atomic model refinement is applicable.

3.3. Variance of a masked map

The significance of difference map peaks is usually defined by the r.m.s.d. (sigma) level in crystallography. However, in SPA the box size is arbitrary and the voxels outside the molecular envelope lead to underestimation of the r.m.s.d. value. Here, we demonstrate how a mask inflates sigma-scaled density and show that it is useful to normalize the map using the standard deviation within the mask.

We consider a masked map containing n points in total, where m points are within the mask and thus the values for n − m points are zero. If we calculate the mean value of the whole data,

[\mu_{\rm total} = {{\textstyle\sum \limits_{i=1}^{n}d_{i}} \over {n}} = {{\textstyle\sum \limits_{i=1}^{m}d_{i}} \over {n}} = {{\textstyle\sum \limits_{i=1}^{m}d_{i}} \over {m}}{{m} \over {n}} = \mu_{\rm mask}{{m} \over {n}}. \eqno (19)]

Thus, to calculate the mean within the mask we can calculate the total mean and then use the formula for correction:

[\mu_{\rm mask} = \mu_{\rm total}{{n} \over {m}}.\eqno (20)]

For the variance,

[\eqalignno {{\rm var}_{\rm total} = {{\textstyle\sum \limits_{i=1}^{n}d_{i}^{2}} \over {n}}-\mu_{\rm total}^{2} & = {{\textstyle\sum \limits_{i=1}^{m}d_{i}^{2}} \over {m}}{{m} \over {n}}-{{m^{2}} \over {n^{2}}}\mu_{\rm mask }^{2} \cr & = {{m} \over {n}}{\rm var}_{\rm mask}+{{m(n-m)} \over {n^{2}}}\mu_{\rm mask }^{2}.& (21)}]

From here we can calculate varmask if we know vartotal and μtotal. If we denote f = m/n then we can write

[{\rm var}_{\rm total} = f{\rm var}_{\rm mask}+f(1-f)\mu_{\rm mask}^{2}. \eqno (22)]

If the mean inside the mask is zero then there is a simple relationship between the total variance and the variance within the mask. This explains the dependence between the box size and the r.m.s.d. of a cryo-EM SPA map. Servalcat normalizes the FoFc map by (varmask)1/2 when a mask file is given. (Otherwise only the FoFc structure factors are written in MTZ format.)

If we assume that the map consists of signal and noise, and there is no correlation between them, then we can claim that varmask = varsignal + varnoise. Now, in addition, if we assume that we have modelled the map fully with an atomic model (or that two maps have an almost perfect overlap of signals) then the difference maps should consist almost entirely of noise. Therefore, vardiffmap,mask = varnoise. This variance should be calculated within the mask to make sure that we do not have variance reduction because of systematically low values outside the region occupied by the macromolecule. If we want to increase the reliability of these variances for a region of interest then we may also mask out other regions where there might be signal that is not fully accounted for by the current model. This can also be practiced in crystallography.

4. Refinement procedure

In this section the refinement and map-calculation procedures are described. Everything other than REFMAC5 itself is implemented in Servalcat using the GEMMI library (https://github.com/project-gemmi/gemmi). Fig. 1[link] summarizes the procedure.

[Figure 1]
Figure 1
The workflow of Servalcat for the refinement of SPA structures.

4.1. Map choice

The optimal map depends on the purpose. For manual inspection, optimally sharpened and weighted maps should be used so that the best visual interpretability is achieved. In general, this does not mean the best signal-to-noise ratio, but it does mean that the details of structural features are visible in the map. On the other hand, unsharpened and unweighted maps are preferred in refinement. If a sharpened map is used, some atoms may need to be refined to have negative B values (or nonpositive definite if anisotropic), but they are constrained to be positive in the refinement, resulting in suboptimal atomic models. On the other hand, blurred maps will just give a shifted distribution of refined B values. An unweighted map is preferred because it enables the calculation of many properties including noise variance and optimally weighted maps after refinement (see Section 3[link]). Users should therefore be aware that the ADPs in the model are not refined against the same map that is used for visual inspection. Cross-validation (Brown et al., 2015[Brown, A., Long, F., Nicholls, R. A., Toots, J., Emsley, P. & Murshudov, G. (2015). Acta Cryst. D71, 136-153.]) can also be carried out throughout refinement and model building if both half maps are readily available. Therefore, unsharpened and unweighted half maps from two independent reconstructions are considered to be optimal inputs for the Servalcat pipeline, which performs atomic model refinement followed by map calculation.

4.2. Masking and trimming

The box size in SPA is often substantially larger than the molecule, which is unnecessary for atomic model refinement. Therefore the map is masked and trimmed into a smaller box to speed up calculations, as discussed in Nicholls et al. (2018[Nicholls, R. A., Tykac, M., Kovalevskiy, O. & Murshudov, G. N. (2018). Acta Cryst. D74, 492-505.]).

Half maps are first sharpened, masked at a radius of 3 Å (default) from the atom positions and then blurred by the same factor. Sharpening before masking is important to avoid masking away any of the signal (the tails of the atomic density distributions), because the raw half maps are blurred and the signal is spread out. The optimal sharpening will differ depending on the region, but here we use an overall isotropic B value estimated by comparing |Fo| with |Fc| calculated from a copy of the initial model with all ADPs set to zero. Alternatively, a user-supplied B value can be used. The sharpened–masked–unsharpened half maps are then averaged to make a full map that is used as the refinement target in REFMAC5. After refinement, the map–model FSC is calculated using a newly created mask based on the refined model.

4.3. Point-group symmetry

If the maps are symmetrized, the user can specify a point-group symbol and give the coordinates for just a single asymmetric unit. Symmetry operators are calculated from the symbols (Cn, Dn, O, T and I) following the axis convention in RELION (Scheres, 2012[Scheres, S. H. W. (2012). J. Struct. Biol. 180, 519-530.]), which follows the common orientation convention (Heymann et al., 2005[Heymann, J. B., Chagoyen, M. & Belnap, D. M. (2005). J. Struct. Biol. 151, 196-207.]) except for T. It is also assumed that the centre of the box is the origin of symmetry. This requires translation for each rotation Rj, which can be calculated as cRjc = (IRj)c, where c is the origin of symmetry. Reconstruction programs such as RELION (Scheres, 2012[Scheres, S. H. W. (2012). J. Struct. Biol. 180, 519-530.]) usually follow this assumption. However, the rotation of the axes and the position of the origin are arbitrary in general, and in future will be determined automatically using ProSHADE (Nicholls et al., 2018[Nicholls, R. A., Tykac, M., Kovalevskiy, O. & Murshudov, G. N. (2018). Acta Cryst. D74, 492-505.]; Tykac, 2018[Tykac, M. (2018). PhD thesis. University of Cambridge. https://doi.org/10.17863/CAM.31783.]) and EMDA. The model in the asymmetric unit is expanded when creating a mask and performing map trimming. The rotation matrices are invariant to changing the box sizes and shifts of the molecule. The translation vectors in the symmetry operators are recalculated for the shifted model.

REFMAC5 internally generates symmetry copies when calculating Fc and restraint terms. For anisotropic ADPs, the Baniso matrix in the Cartesian basis is transformed by [R_{j}B_{\rm aniso}R_{j}^{\rm T}]. This anisotropic ADP transformation is also implemented in GEMMI.

During the refinement, nonbonded interaction and ADP similarity restraints are evaluated using the symmetry-expanded model, and the gradients are calculated for the model in the asymmetric unit.

If atoms are on special positions (for example on a rotation axis), they are restrained2 to sit on the special position and have anisotropic ADPs consistent with symmetry. Firstly, atoms are identified as being on a special position if the following condition is obeyed for any of the symmetry operators j,

[|x-(R_{j}x+t_{j})|^{2} \, \lt\, \varepsilon^{2}, \eqno (23)]

where ɛ is a tolerance that can be modified by users. The default value is 0.25 Å. If an atom is on a special position then the program makes sure that the symmetry operators for this position form a group that is a subgroup of the point group of the map. Once the elements of the subgroup for this atom have been identified, the atom is forced to be on that position by simply replacing its coordinates with

[x_{\rm sym} = {{1} \over {N_{\rm sym}}}\textstyle \sum \limits_{j}^{N_{\rm sym}}(R_{j}x+t_{j}). \eqno (24)]

In every cycle, the positions of these atoms are restrained to be on their special positions by adding a term to the target function,

[{{1} \over {\sigma_{x}^{2}}}\left|x-{{1} \over {N_{\rm sym}}}\textstyle \sum \limits_{j}^{N_{\rm sym}}(R_ {j}x+t_{j})\right|^{2}, \eqno (25)]

where the summation is performed over all subgroup elements of the special position and σx is a user-controllable weight parameter for special positions. The occupancy of the atom is adjusted based on the multiplicity of the position.

If anisotropic ADPs are used, they are also forced to obey symmetry conditions for atoms on special positions by replac­ing the anisotropic tensor with

[B_{\rm sym} = {{1} \over {N_{\rm sym}}} \textstyle\sum \limits_{j}^{N_{\rm sym}}R_{j}B_{\rm aniso}R_{j} ^{T}. \eqno (26)]

After this, similarly to the positional parameters, in every cycle restraints are applied to the anisotropic tensor of the atoms on special positions to avoid violation of the symmetry condition for the ADP,

[{{1} \over {\sigma_{B}^{2}}}\left|B_{\rm aniso}-{{1} \over {N_{\rm sym}}}\textstyle \sum \limits_{j}^{N_{\rm sym}}R_{j}B_{\rm aniso}R_{j}^{T}\right|^{2}, \eqno(27)]

where σB is a user-controllable weight parameter for Baniso values on special positions. Here, the distance between anisotropic tensors is a Frobenius distance |B1B2|2 = [\textstyle \sum_{i,j}|B_{1,i,j}-B_{2,i,j}|^{2}].

4.4. H atoms

Hydrogen electrons are usually shifted towards the parent atoms by 0.1–0.2 Å (Williams et al., 2018[Williams, C. J., Headd, J. J., Moriarty, N. W., Prisant, M. G., Videau, L. L., Deis, L. N., Verma, V., Keedy, D. A., Hintze, B. J., Chen, V. B., Jain, S., Lewis, S. M., Arendall, W. B. III, Snoeyink, J., Adams, P. D., Lovell, S. C., Richardson, J. S. & Richardson, D. C. (2018). Protein Sci. 27, 293-315.]). This must be accounted for when calculating structure factors from the atomic model (Fc). REFMAC5 and Servalcat (GEMMI) use the Mott–Bethe formula (Mott & Bragg, 1930[Mott, N. F. & Bragg, W. L. (1930). Proc. R. Soc. London A, 127, 658-665.]; Bethe, 1930[Bethe, H. (1930). Ann. Phys. 397, 325-400.]; Murshudov, 2016[Murshudov, G. N. (2016). Methods Enzymol. 579, 277-305.]), which can conveniently take this fact into account.

The atomic scattering factor for an atom with a shifted nucleus is

[f_{e}(s) = {{me^{2}} \over {8\pi h^{2}\varepsilon_{0}}}{{Z\exp(-2\pi is^{\rm T}\Delta x)-f_{X}(s)} \over {|s|^{2}}}, \eqno (28)]

where Δx is the positional shift of the nucleus with respect to the centre of the electron density. The hydrogen density peak in real space is shifted beyond the position of the hydrogen nucleus and varies depending on the ADP and resolution cutoff (Nakane et al., 2020[Nakane, T., Kotecha, A., Sente, A., McMullan, G., Masiulis, S., Brown, P. M. G. E., Grigoras, I. T., Malinauskaite, L., Malinauskas, T., Miehling, J., Uchański, T., Yu, L., Karia, D., Pechnikova, E. V., de Jong, E., Keizer, J., Bischoff, M., McCormack, J., Tiemeijer, P., Hardwick, S. W., Chirgadze, D. Y., Murshudov, G., Aricescu, A. R. & Scheres, S. H. W. (2020). Nature, 587, 152-156.]). The expected peak position may be calculated by the Fourier transform of (28[link]). The new CCP4 monomer library includes nucleus bond distances (_chem_comp_bond.value_dist_nucleus; Nicholls et al., 2021[Nicholls, R. A., Wojdyr, M., Joosten, R. P., Catapano, L., Long, F., Fischer, M., Emsley, P. & Murshudov, G. N. (2021). Acta Cryst. D77, 727-745.]).

4.5. Refinement

REFMAC5 performs a maximum-likelihood refinement against the Fourier transform of a sharpened–masked–unsharpened map (see Section 4.2[link]) using a dedicated likelihood function for SPA (7[link]). The estimated noise [\sigma_{\rm n}^{2}] is not used at the moment. No solvent model is used. The average of map–model FSC weighted by the number of Fourier coefficients in each shell (FSC average) is reported to monitor the refinement. At low resolution the use of jelly-body restraints or external restraints is encouraged to ensure a large radius of convergence and stabilize the refinement (Murshudov et al., 2011[Murshudov, G. N., Skubák, P., Lebedev, A. A., Pannu, N. S., Steiner, R. A., Nicholls, R. A., Winn, M. D., Long, F. & Vagin, A. A. (2011). Acta Cryst. D67, 355-367.]; Nicholls et al., 2012[Nicholls, R. A., Long, F. & Murshudov, G. N. (2012). Acta Cryst. D68, 404-417.]). Note that jelly-body restraints are only useful when the initial model geometry is of good quality because they try to keep the model in its current conformation. After the refinement, Servalcat shifts the model back to the original box and adjusts the translation vectors of the symmetry operators if needed. It also generates an MTZ file of map coefficients including the sharpened and weighted Fo − Fc and Fo maps (as calculated by equations 17[link] and 18[link]).

4.6. User interface

Servalcat has a command-line interface. A graphical interface will be available in CCP-EM, where the REFMAC5 interface has been updated and is now based on Servalcat.

From the user's point of view, the main difference in setting up a refinement job is that the default input is now a pair of half maps. (Refinement from a single input map is still possible but is no longer the default option.) The user is also offered more control over the options for refinement weight, symmetry and handling of H atoms. At the end of refinement, the FoFc difference map from Servalcat is made available along with the other output files in the CCP-EM launcher.

5. Methods and results

5.1. FoFc map for ligand visualization

FoFc omit maps are widely used to convincingly demonstrate the existence of ligands in crystallography. They are also useful for this purpose in SPA. Fig. 2[link] shows an example of an FoFc omit map for the ligand density from EMDB entries EMD-22898 (Kern et al., 2021[Kern, D. M., Sorum, B., Mali, S. S., Hoel, C. M., Sridharan, S., Remis, J. P., Toso, D. B., Kotecha, A., Bautista, D. M. & Brohawn, S. G. (2021). Nat. Struct. Mol. Biol. 28, 573-582.]) and EMD-8123 (Murray et al., 2016[Murray, J., Savva, C. G., Shin, B.-S., Dever, T. E., Ramakrishnan, V. & Fernández, I. S. (2016). eLife, 5, e13567.]), clearly showing support for the presence of the ligand. To generate the map from EMD-22898, chain A of the atomic model from PDB entry 7kjr was refined using the half maps under C2 symmetry constraints. For EMD-8123, PDB entry 5it7 was refined using the half maps without symmetry constraints. After the refinement, the ligand and water atoms were omitted and the FoFc maps were calculated. Map values were normalized within a mask. Since a suitable mask for EMD-22898 was not available in the EMDB, one was calculated from half-map correlation using EMDA.

[Figure 2]
Figure 2
An example of an FoFc omit map for visualization of ligand density. The ligand molecules and ions shown as sticks and spheres, respectively, are omitted in the map calculation. The resolution is (a) 2.08 Å (PDB entry 7kjr/EMDB entry EMD-22898) and (b) 3.6 Å (PDB entry 5it7/EMDB entry EMD-8123). The FoFc omit maps are contoured at 3σ (where σ is the standard deviation within the mask; see Section 3.3[link]). The images were created using PyMOL (Schrödinger, 2020[Schrodinger, LLC (2020). The PyMOL Molecular Graphics System, Version 2.4.]).

The weighting and sharpening scheme in Servalcat was compared with alternatives using no weights or (FSCfull)1/2 weights (Rosenthal & Henderson, 2003[Rosenthal, P. B. & Henderson, R. (2003). J. Mol. Biol. 333, 721-745.]), both with sharpening by the overall B value as determined from Wilson plot fitting by RELION (Supplementary Figs. S1 and S2). Especially in the case of EMDB entry EMD-8123 (Supplementary Fig. S2), sharpening by the overall B value obtained by line fitting gave oversharpened maps.

5.2. FoFc map for detecting model errors

In crystallography, FoFc maps are almost always used for manual and automatic model rebuilding. Strong negative density usually indicates that parts of the model should be moved away or removed, while strong positive density implies that there are unmodelled atoms. The FoFc map is typically updated after every refinement session, and refinement may be stopped when there are no significant strong peaks.

The same refinement practice is possible in SPA. Fig. 3[link] illustrates the use of the FoFc map for detecting model errors using EMDB entry EMD-0919 and PDB entry 6lmt (Demura et al., 2020[Demura, K., Kusakizako, T., Shihoya, W., Hiraizumi, M., Nomura, K., Shimada, H., Yamashita, K., Nishizawa, T., Taruno, A. & Nureki, O. (2020). Sci. Adv. 6, eaba8105.]). Chain A of the model was refined using the half maps under C8 symmetry constraints. After refinement, the Fo − Fc map was calculated and normalized using the standard deviation of the region within the EMDB-deposited mask. In this example, it is clear from the positive and negative difference peaks that the tryptophan and methionine side chains should be repositioned. The weighting and sharpening scheme are compared in Supplementary Fig. S3, demonstrating that appropriate weighting can increase the interpretability of maps.

[Figure 3]
Figure 3
An example of an FoFc map for detecting model error, in this case mispositioned tryptophan and methionine side chains (PDB entry 6lmt/EMDB entry EMD-0919). The resolution is 2.66 Å and the FoFc map is contoured at ±4σ (scaled within the mask). Green and red meshes represent positive and negative maps, respectively. The grey mesh is the weighted and sharpened Fo map. This image was created using PyMOL.

5.3. Hydrogen density analysis

Nakane et al. (2020[Nakane, T., Kotecha, A., Sente, A., McMullan, G., Masiulis, S., Brown, P. M. G. E., Grigoras, I. T., Malinauskaite, L., Malinauskas, T., Miehling, J., Uchański, T., Yu, L., Karia, D., Pechnikova, E. V., de Jong, E., Keizer, J., Bischoff, M., McCormack, J., Tiemeijer, P., Hardwick, S. W., Chirgadze, D. Y., Murshudov, G., Aricescu, A. R. & Scheres, S. H. W. (2020). Nature, 587, 152-156.]) reported convincing densities for H atoms in apoferritin and GABAAR maps by cryo-EM SPA at 1.2 and 1.7 Å resolution, respectively. It is natural to ask what is the lowest resolution at which H atoms can be seen in cryo-EM SPA using currently available computational tools.

Here, we analyzed apoferritin maps from the EMDB to see if and when hydrogen densities could be observed. There are 25 mouse or human apo­ferritin entries at resolutions better than 2.1 Å, of which 19 had half maps and were used in the analysis (Table 1[link]). Chain A of each model was refined using the half maps under O symmetry constraints. If there was no corresponding PDB entry, PDB entry 7a4m or 6z6u was placed in the map using MOLREP (Vagin & Teplyakov, 2010[Vagin, A. & Teplyakov, A. (2010). Acta Cryst. D66, 22-25.]) followed by jiggle fit in Coot (Brown et al., 2015[Brown, A., Long, F., Nicholls, R. A., Toots, J., Emsley, P. & Murshudov, G. (2015). Acta Cryst. D71, 136-153.]) before full atomic refinement. After ten cycles of refinement with REFMAC5, an FoFc map was calculated and normalized within the mask. Riding H atoms were used in the refinement (so they are not refined, but generated at fixed positions; this is the default in REFMAC5) and they were omitted for FoFc map calculation. Peaks of ≥2σ and ≥3σ were detected using PEAKMAX from the CCP4 package (Winn et al., 2011[Winn, M. D., Ballard, C. C., Cowtan, K. D., Dodson, E. J., Emsley, P., Evans, P. R., Keegan, R. M., Krissinel, E. B., Leslie, A. G. W., McCoy, A., McNicholas, S. J., Murshudov, G. N., Pannu, N. S., Potterton, E. A., Powell, H. R., Read, R. J., Vagin, A. & Wilson, K. S. (2011). Acta Cryst. D67, 235-242.]), and were associated with hydrogen positions if the distance from the peak was less than 0.3 Å. H atoms having multiple potential minima (such as those in hydroxyl, sulfhydryl or carboxyl groups) were ignored in the analysis. The ratios of the number of hydrogen peaks to the number of H atoms in the model are plotted in Fig. 4[link](a). The result shows that the 1.25 Å resolution data gave the highest ratio of ∼70% hydrogens detected (Fig. 5[link]a). Even at 1.84 Å resolution approximately 17% of the H atoms may be found (Fig. 5[link]b), while at 2.0 or 2.1 Å resolution only a few H atoms are visible in the map (Fig. 5[link]c). The weighting and sharpening schemes are compared in Supplementary Figs. S4–S6. Note that there may be false positives due to, for example, alternative conformations or inaccuracies in the model.

Table 1
Test data for hydrogen peak analysis

EMDB code PDB code Resolution (Å) Reference
EMD-11638 7a4m 1.22 Nakane et al. (2020[Nakane, T., Kotecha, A., Sente, A., McMullan, G., Masiulis, S., Brown, P. M. G. E., Grigoras, I. T., Malinauskaite, L., Malinauskas, T., Miehling, J., Uchański, T., Yu, L., Karia, D., Pechnikova, E. V., de Jong, E., Keizer, J., Bischoff, M., McCormack, J., Tiemeijer, P., Hardwick, S. W., Chirgadze, D. Y., Murshudov, G., Aricescu, A. R. & Scheres, S. H. W. (2020). Nature, 587, 152-156.])
EMD-11103 6z6u 1.25 Yip et al. (2020[Yip, K. M., Fischer, N., Paknia, E., Chari, A. & Stark, H. (2020). Nature, 587, 157-161.])
EMD-30683 (7a4m) 1.31 Danev et al. (2021[Danev, R., Yanagisawa, H. & Kikkawa, M. (2021). Microscopy, dfab016.])
EMD-30685 (7a4m) 1.35 Danev et al. (2021[Danev, R., Yanagisawa, H. & Kikkawa, M. (2021). Microscopy, dfab016.])
EMD-30684 (7a4m) 1.43 Danev et al. (2021[Danev, R., Yanagisawa, H. & Kikkawa, M. (2021). Microscopy, dfab016.])
EMD-30686 (7a4m) 1.43 Danev et al. (2021[Danev, R., Yanagisawa, H. & Kikkawa, M. (2021). Microscopy, dfab016.])
EMD-9865 (7a4m) 1.54 Kato et al. (2019[Kato, T., Makino, F., Nakane, T., Terahara, N., Kaneko, T., Shimizu, Y., Motoki, S., Ishikawa, I., Yonekura, K. & Namba, K. (2019). Microsc. Microanal. 25, 998-999.])
EMD-11121 6z9e 1.55 Yip et al. (2020[Yip, K. M., Fischer, N., Paknia, E., Chari, A. & Stark, H. (2020). Nature, 587, 157-161.])
EMD-11122 6z9f 1.56 Yip et al. (2020[Yip, K. M., Fischer, N., Paknia, E., Chari, A. & Stark, H. (2020). Nature, 587, 157-161.])
EMD-9599 (7a4m) 1.62 Danev et al. (2019[Danev, R., Yanagisawa, H. & Kikkawa, M. (2019). Trends Biochem. Sci. 44, 837-848.])
EMD-0144 (6z6u) 1.65 Zivanov et al. (2018[Zivanov, J., Nakane, T., Forsberg, B. O., Kimanius, D., Hagen, W. J., Lindahl, E. & Scheres, S. H. W. (2018). eLife, 7, e42166.])
EMD-20026 (6z6u) 1.75 Pintilie et al. (2020[Pintilie, G., Zhang, K., Su, Z., Li, S., Schmid, M. F. & Chiu, W. (2020). Nat. Methods, 17, 328-334.])
EMD-21024 6v21 1.75 Wu et al. (2020[Wu, M., Lander, G. C. & Herzik, M. A. (2020). J. Struct. Biol. X, 4, 100020.])
EMD-10101 6s61 1.84 No publication
EMD-10675 (7a4m) 1.86 Fislage et al. (2020[Fislage, M., Shkumatov, A. V., Stroobants, A. & Efremov, R. G. (2020). IUCrJ, 7, 707-718.])
EMD-21951 6wx6 2.00 Tan & Rubinstein (2020[Tan, Y. Z. & Rubinstein, J. L. (2020). Acta Cryst. D76, 1092-1103.])
EMD-22351 (6z6u) 2.07 Guo et al. (2020[Guo, H., Franken, E., Deng, Y., Benlekbir, S., Singla Lezcano, G., Janssen, B., Yu, L., Ripstein, Z. A., Tan, Y. Z. & Rubinstein, J. L. (2020). IUCrJ, 7, 860-869.])
EMD-4905 6rjh 2.10 Naydenova et al. (2019[Naydenova, K., Peet, M. J. & Russo, C. J. (2019). Proc. Natl Acad. Sci. USA, 116, 11718-11724.])
EMD-20521 6pxm 2.10 No publication
†No PDB entry was assigned and the code in parentheses was used for refinement (PDB entry 7a4m from mouse and PDB entry 6z6u from human).
[Figure 4]
Figure 4
Detection of H atoms, measured as the number of observed hydrogen density peaks divided by the number of H atoms in the model. (a) Different apoferritin cases by cryo-EM SPA (see Table 1[link]). (b) Different (apo)ferritin cases by X-ray crystallography using PDB entries 2v2p, 2v2s, 6gxj, 5erj, 5mij, 2cih, 2w0o, 7bd7, 3f37, 2v2n, 1h96, 2chi, 2zg8, 2v2m, 2z5p, 3h7g, 3f34, 2zg7, 3f32, 3f33, 3f36, 2gyd, 3o7s, 1xz1, 1xz3, 2cn7, 2zg9, 3f38, 2cei, 2iu2, 3fi6, 6env, 3f39, 5ix6, 2v2o, 2v2l, 2v2r, 3o7r, 3rav, 3u90, 3f35, 1aew, 5mik, 2g4h, 2v2i, 3rd0, 5erk, 6ra8, 1gwg, 2clu and 2z5q. (c, d) Apoferritin cases calculated at different resolutions from the same map and model, PDB entry 7a4m/EMDB entry EMD-11638, determined at 1.22 Å resolution. (c) shows detection of H atoms in FoFc maps and (d) in calculated Fc maps. This figure was prepared using ggplot2 (Wickham, 2016[Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. New York: Springer.]) in R (R Core Team, 2020[R Core Team (2020). R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.]).
[Figure 5]
Figure 5
Observation of hydrogen density peaks in FoFc maps with different resolutions, using (a) 1.25 Å resolution data (PDB entry 6z6u/EMDB entry EMD-11103), (b) 1.84 Å resolution data (PDB entry 6s61/EMDB entry EMD-10101) and (c) 2.00 Å resolution data (PDB entry 6wx6/EMDB entry EMD-21951). H atoms are omitted in the map calculation. Green and red meshes represent positive and negative FoFc maps contoured at ±3σ (scaled within the mask), respectively. The images were created using PyMOL.

In addition, FoFc maps were generated from the 1.2 Å resolution data (PDB entry 7a4m; EMDB entry EMD-11638) using several different resolution cutoffs. These were analysed in the same way (Fig. 4[link]c), along with Fc maps calculated from the PDB entry 7a4m model at the same resolutions (Fig. 4[link]d). Figs. 4[link](c) and 4[link](d) show that if the cryo-EM experiment and atomic model refinement are carried out carefully, with due attention to ADPs, then some H atoms can be seen even at 2.0 Å resolution.

For comparison, we performed the same analysis using X-ray crystallographic data for (apo)ferritins deposited in the PDB. 51 re-refined atomic models available in the PDB-REDO database (Joosten et al., 2012[Joosten, R. P., Joosten, K., Murshudov, G. N. & Perrakis, A. (2012). Acta Cryst. D68, 484-496.]) were downloaded, crystallographic mFoDFc maps were calculated using REFMAC5 and density peaks for H atoms were analysed as just described. The result (Fig. 4[link]b) confirms that, as expected, H atoms are more visible in EM than using X-rays.

6. Conclusions

A new program, Servalcat, for the refinement and validation of atomic models using cryo-EM SPA maps has been developed. The program controls the refinement flow and performs difference-map calculations. A weighted and sharpened Fo − Fc map was derived as a validation tool, obtained from the posterior distribution of FT and an approximation of an overall blurring factor calculated from the variance of the signal. We showed that such maps are useful to visualize H atoms and model errors, as in crystallography.

In this work, we assumed the blurring factor k was position-independent (see Section 3[link]). However, in reality, blurring of maps is position- and direction-dependent, for example due to the varying mobility of different domains and/or uncertainty in the particle alignments. For such regions k should ideally be replaced with klocal, derived from a local map blurring parameter Blocal according to klocal(s) = exp(−Blocal|s|2/4) (if isotropic) or exp(−sTBlocals/4) (if anisotropic). If we could estimate Blocal values, then we would be able to use them for the visual improvement of maps. This is especially important for identifying weak densities. We are working on this subject.

We showed that many H atoms may be observed in the difference maps, even up to a resolution of 2 Å. We would expect that they should also be visible in electron diffraction (MicroED) experiments. However, high accuracy would be needed in the experiment, data analysis and model refinement in both MicroED and cryo-EM SPA to achieve this experimentally. For example, the electron dose in cryo-EM experiments is often high enough to cause radiation damage (Hattne et al., 2018[Hattne, J., Shi, D., Glynn, C., Zee, C.-T., Gallagher-Jones, M., Martynowycz, M. W., Rodriguez, J. A. & Gonen, T. (2018). Structure, 26, 759-766.]); H atoms are known to suffer from radiation damage (Leapman & Sun, 1995[Leapman, R. D. & Sun, S. (1995). Ultramicroscopy, 59, 71-79.]) and this would hinder their detection. Lower dose experiments might be needed for more reliable identification of hydrogen, even at the expense of resolution.

Symmetry is widely used in cryo-EM SPA. When symmetry is imposed in the reconstruction, it should be used throughout the downstream analyses, and all software tools should be aware of it and take it into account. The asymmetric unit model should be refined under symmetry constraints, and it should be deposited in the PDB with the correct annotation of the symmetry. The PDB and EMDB deposition system will need to validate the symmetry of both the model and the map. We hope that this will become common practice in the future. The same practice should be established for helical reconstructions, in which symmetry is described by the axial symmetry type (Cn or Dn), twist and rise (He & Scheres, 2017[He, S. & Scheres, S. H. W. (2017). J. Struct. Biol. 198, 163-176.]). Servalcat will support helical symmetry in the future.

Servalcat is freely available under an open source (MPL-2.0) licence at https://github.com/keitaroyam/servalcat. The features described in this paper have been implemented in REFMAC 5.8.0291 and Servalcat 0.2.0 (which requires GEMMI 0.4.9). Servalcat is also available in the latest nightly builds of the CCP-EM suite and will be included in the upcoming version 1.6 release.

Supporting information


Footnotes

1There is a similar record, BIOMT, which encodes the biological assembly. In SPA, the symmetry of the map usually corresponds to the biological assembly, but this is not always the case. Both MTRIX and BIOMT records are generally required during deposition.

2Technically, fixed position constraints would be more appropriate here. We used restraints instead of constraints for simplicity of implementation. In the future, we will implement the use of constraints instead.

Acknowledgements

The authors are grateful to Marcin Wojdyr for the implementation of Fc calculation for EM in the GEMMI library, Takanori Nakane for critical reading of the manuscript, computational structural biology group members for discussion, and Jake Grimmett and Toby Darling from the MRC–LMB Scientific Computing Department for computing support and resources.

Funding information

This work was supported by the Medical Research Council as part of UK Research and Innovation (MC_UP_A025_1012 to KY and GNM; MR/V000403/1 to CMP and TB).

References

First citationAfonine, P. V., Poon, B. K., Read, R. J., Sobolev, O. V., Terwilliger, T. C., Urzhumtsev, A. & Adams, P. D. (2018). Acta Cryst. D74, 531–544.  Web of Science CrossRef IUCr Journals Google Scholar
First citationBai, X.-C., McMullan, G. & Scheres, S. H. W. (2015). Trends Biochem. Sci. 40, 49–57.  Web of Science CrossRef CAS PubMed Google Scholar
First citationBethe, H. (1930). Ann. Phys. 397, 325–400.  CrossRef Google Scholar
First citationBrown, A., Long, F., Nicholls, R. A., Toots, J., Emsley, P. & Murshudov, G. (2015). Acta Cryst. D71, 136–153.  Web of Science CrossRef IUCr Journals Google Scholar
First citationBurnley, T., Palmer, C. M. & Winn, M. (2017). Acta Cryst. D73, 469–477.  Web of Science CrossRef IUCr Journals Google Scholar
First citationChen, V. B., Arendall, W. B., Headd, J. J., Keedy, D. A., Immormino, R. M., Kapral, G. J., Murray, L. W., Richardson, J. S. & Richardson, D. C. (2010). Acta Cryst. D66, 12–21.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationChojnowski, G., Sobolev, E., Heuser, P. & Lamzin, V. S. (2021). Acta Cryst. D77, 142–150.  CrossRef IUCr Journals Google Scholar
First citationClabbers, M. T. B. & Abrahams, J. P. (2018). Crystallogr. Rev. 24, 176–204.  Web of Science CrossRef CAS Google Scholar
First citationCragnolini, T., Sahota, H., Joseph, A. P., Sweeney, A., Malhotra, S., Vasishtan, D. & Topf, M. (2021). Acta Cryst. D77, 41–47.  CrossRef IUCr Journals Google Scholar
First citationDanev, R., Yanagisawa, H. & Kikkawa, M. (2019). Trends Biochem. Sci. 44, 837–848.  Web of Science CrossRef CAS PubMed Google Scholar
First citationDanev, R., Yanagisawa, H. & Kikkawa, M. (2021). Microscopy, dfab016.  CrossRef Google Scholar
First citationDemura, K., Kusakizako, T., Shihoya, W., Hiraizumi, M., Nomura, K., Shimada, H., Yamashita, K., Nishizawa, T., Taruno, A. & Nureki, O. (2020). Sci. Adv. 6, eaba8105.  CrossRef PubMed Google Scholar
First citationFislage, M., Shkumatov, A. V., Stroobants, A. & Efremov, R. G. (2020). IUCrJ, 7, 707–718.  Web of Science CrossRef CAS PubMed IUCr Journals Google Scholar
First citationGuo, H., Franken, E., Deng, Y., Benlekbir, S., Singla Lezcano, G., Janssen, B., Yu, L., Ripstein, Z. A., Tan, Y. Z. & Rubinstein, J. L. (2020). IUCrJ, 7, 860–869.  Web of Science CrossRef CAS PubMed IUCr Journals Google Scholar
First citationHattne, J., Shi, D., Glynn, C., Zee, C.-T., Gallagher-Jones, M., Martynowycz, M. W., Rodriguez, J. A. & Gonen, T. (2018). Structure, 26, 759–766.  Web of Science CrossRef CAS PubMed Google Scholar
First citationHe, S. & Scheres, S. H. W. (2017). J. Struct. Biol. 198, 163–176.  Web of Science CrossRef CAS PubMed Google Scholar
First citationHeymann, J. B., Chagoyen, M. & Belnap, D. M. (2005). J. Struct. Biol. 151, 196–207.  Web of Science CrossRef PubMed Google Scholar
First citationHoh, S. W., Burnley, T. & Cowtan, K. (2020). Acta Cryst. D76, 531–541.  CrossRef IUCr Journals Google Scholar
First citationJakobi, A. J., Wilmanns, M. & Sachse, C. (2017). eLife, 6, e27131.  Web of Science CrossRef PubMed Google Scholar
First citationJoosten, R. P., Joosten, K., Murshudov, G. N. & Perrakis, A. (2012). Acta Cryst. D68, 484–496.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationJoseph, A. P., Lagerstedt, I., Jakobi, A., Burnley, T., Patwardhan, A., Topf, M. & Winn, M. (2020). J. Chem. Inf. Model. 60, 2552–2560.  Web of Science CrossRef CAS PubMed Google Scholar
First citationKato, T., Makino, F., Nakane, T., Terahara, N., Kaneko, T., Shimizu, Y., Motoki, S., Ishikawa, I., Yonekura, K. & Namba, K. (2019). Microsc. Microanal. 25, 998–999.  CrossRef PubMed Google Scholar
First citationKern, D. M., Sorum, B., Mali, S. S., Hoel, C. M., Sridharan, S., Remis, J. P., Toso, D. B., Kotecha, A., Bautista, D. M. & Brohawn, S. G. (2021). Nat. Struct. Mol. Biol. 28, 573–582.  CrossRef CAS PubMed Google Scholar
First citationLeapman, R. D. & Sun, S. (1995). Ultramicroscopy, 59, 71–79.  CrossRef CAS PubMed Web of Science Google Scholar
First citationLuzzati, V. (1952). Acta Cryst. 5, 802–810.  CrossRef IUCr Journals Web of Science Google Scholar
First citationMott, N. F. & Bragg, W. L. (1930). Proc. R. Soc. London A, 127, 658–665.  CAS Google Scholar
First citationMurray, J., Savva, C. G., Shin, B.-S., Dever, T. E., Ramakrishnan, V. & Fernández, I. S. (2016). eLife, 5, e13567.  CrossRef PubMed Google Scholar
First citationMurshudov, G. N. (2016). Methods Enzymol. 579, 277–305.  Web of Science CrossRef CAS PubMed Google Scholar
First citationMurshudov, G. N., Skubák, P., Lebedev, A. A., Pannu, N. S., Steiner, R. A., Nicholls, R. A., Winn, M. D., Long, F. & Vagin, A. A. (2011). Acta Cryst. D67, 355–367.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationNakane, T., Kotecha, A., Sente, A., McMullan, G., Masiulis, S., Brown, P. M. G. E., Grigoras, I. T., Malinauskaite, L., Malinauskas, T., Miehling, J., Uchański, T., Yu, L., Karia, D., Pechnikova, E. V., de Jong, E., Keizer, J., Bischoff, M., McCormack, J., Tiemeijer, P., Hardwick, S. W., Chirgadze, D. Y., Murshudov, G., Aricescu, A. R. & Scheres, S. H. W. (2020). Nature, 587, 152–156.  Web of Science CrossRef CAS PubMed Google Scholar
First citationNaydenova, K., Peet, M. J. & Russo, C. J. (2019). Proc. Natl Acad. Sci. USA, 116, 11718–11724.  Web of Science CAS PubMed Google Scholar
First citationNicholls, R. A., Long, F. & Murshudov, G. N. (2012). Acta Cryst. D68, 404–417.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationNicholls, R. A., Tykac, M., Kovalevskiy, O. & Murshudov, G. N. (2018). Acta Cryst. D74, 492–505.  Web of Science CrossRef IUCr Journals Google Scholar
First citationNicholls, R. A., Wojdyr, M., Joosten, R. P., Catapano, L., Long, F., Fischer, M., Emsley, P. & Murshudov, G. N. (2021). Acta Cryst. D77, 727–745.  Web of Science CrossRef IUCr Journals Google Scholar
First citationPintilie, G., Zhang, K., Su, Z., Li, S., Schmid, M. F. & Chiu, W. (2020). Nat. Methods, 17, 328–334.  Web of Science CrossRef CAS PubMed Google Scholar
First citationRamírez-Aportela, E., Vilas, J. L., Glukhova, A., Melero, R., Conesa, P., Martínez, M., Maluenda, D., Mota, J., Jiménez, A., Vargas, J., Marabini, R., Sexton, P. M., Carazo, J. M. & Sorzano, C. O. S. (2019). Bioinformatics, 36, 765–772.  Google Scholar
First citationRamlaul, K., Palmer, C. M. & Aylett, C. H. (2019). J. Struct. Biol. 205, 30–40.  Web of Science CrossRef PubMed Google Scholar
First citationR Core Team (2020). R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.  Google Scholar
First citationRosenthal, P. B. & Henderson, R. (2003). J. Mol. Biol. 333, 721–745.  Web of Science CrossRef PubMed CAS Google Scholar
First citationScheres, S. H. W. (2012). J. Struct. Biol. 180, 519–530.  Web of Science CrossRef CAS PubMed Google Scholar
First citationSchrodinger, LLC (2020). The PyMOL Molecular Graphics System, Version 2.4.  Google Scholar
First citationTagari, M., Newman, R., Chagoyen, M., Carazo, J.-M. & Henrick, K. (2002). Trends Biochem. Sci. 27, 589.  CrossRef PubMed Google Scholar
First citationTan, Y. Z. & Rubinstein, J. L. (2020). Acta Cryst. D76, 1092–1103.  Web of Science CrossRef IUCr Journals Google Scholar
First citationTerwilliger, T. C., Adams, P. D., Afonine, P. V. & Sobolev, O. V. (2018a). Nat. Methods, 15, 905–908.  CrossRef CAS PubMed Google Scholar
First citationTerwilliger, T. C., Sobolev, O. V., Afonine, P. V. & Adams, P. D. (2018b). Acta Cryst. D74, 545–559.  CrossRef IUCr Journals Google Scholar
First citationTerwilliger, T. C., Sobolev, O. V., Afonine, P. V., Adams, P. D. & Read, R. J. (2020). Acta Cryst. D76, 912–925.  Web of Science CrossRef IUCr Journals Google Scholar
First citationTickle, I. J. (2012). Acta Cryst. D68, 454–467.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationTronrud, D. E. (2004). Acta Cryst. D60, 2156–2168.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationTykac, M. (2018). PhD thesis. University of Cambridge. https://doi.org/10.17863/CAM.31783Google Scholar
First citationVagin, A. & Teplyakov, A. (2010). Acta Cryst. D66, 22–25.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationWarshamanage, R., Yamashita, K. & Murshudov, G. N. (2021). bioRxiv, 2021.07.26.453750.  Google Scholar
First citationWickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. New York: Springer.  Google Scholar
First citationWilliams, C. J., Headd, J. J., Moriarty, N. W., Prisant, M. G., Videau, L. L., Deis, L. N., Verma, V., Keedy, D. A., Hintze, B. J., Chen, V. B., Jain, S., Lewis, S. M., Arendall, W. B. III, Snoeyink, J., Adams, P. D., Lovell, S. C., Richardson, J. S. & Richardson, D. C. (2018). Protein Sci. 27, 293–315.  Web of Science CrossRef CAS PubMed Google Scholar
First citationWilson, A. J. C. (1942). Nature, 150, 152.  CrossRef Google Scholar
First citationWinn, M. D., Ballard, C. C., Cowtan, K. D., Dodson, E. J., Emsley, P., Evans, P. R., Keegan, R. M., Krissinel, E. B., Leslie, A. G. W., McCoy, A., McNicholas, S. J., Murshudov, G. N., Pannu, N. S., Potterton, E. A., Powell, H. R., Read, R. J., Vagin, A. & Wilson, K. S. (2011). Acta Cryst. D67, 235–242.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationWlodawer, A. & Dauter, Z. (2017). Acta Cryst. D73, 379–380.  Web of Science CrossRef IUCr Journals Google Scholar
First citationWu, M., Lander, G. C. & Herzik, M. A. (2020). J. Struct. Biol. X, 4, 100020.  Web of Science PubMed Google Scholar
First citationYip, K. M., Fischer, N., Paknia, E., Chari, A. & Stark, H. (2020). Nature, 587, 157–161.  Web of Science CrossRef CAS PubMed Google Scholar
First citationZivanov, J., Nakane, T., Forsberg, B. O., Kimanius, D., Hagen, W. J., Lindahl, E. & Scheres, S. H. W. (2018). eLife, 7, e42166.  Web of Science CrossRef PubMed Google Scholar

This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.

Journal logoSTRUCTURAL
BIOLOGY
ISSN: 2059-7983
Follow Acta Cryst. D
Sign up for e-alerts
Follow Acta Cryst. on Twitter
Follow us on facebook
Sign up for RSS feeds