radiation damage\(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

Journal logoJOURNAL OF
SYNCHROTRON
RADIATION
ISSN: 1600-5775
Volume 22| Part 2| March 2015| Pages 239-248
i

XFEL diffraction: developing processing methods to optimize data quality

CROSSMARK_Color_square_no_text.svg

aPhysical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
*Correspondence e-mail: nksauter@lbl.gov

(Received 4 October 2014; accepted 29 December 2014; online 29 January 2015)

Serial crystallography, using either femtosecond X-ray pulses from free-electron laser sources or short synchrotron-radiation exposures, has the potential to reveal metalloprotein structural details while minimizing damage processes. However, deriving a self-consistent set of Bragg intensities from numerous still-crystal exposures remains a difficult problem, with optimal protocols likely to be quite different from those well established for rotation photography. Here several data processing issues unique to serial crystallography are examined. It is found that the limiting resolution differs for each shot, an effect that is likely to be due to both the sample heterogeneity and pulse-to-pulse variation in experimental conditions. Shots with lower resolution limits produce lower-quality models for predicting Bragg spot positions during the integration step. Also, still shots by their nature record only partial measurements of the Bragg intensity. An approximate model that corrects to the full-spot equivalent (with the simplifying assumption that the X-rays are monochromatic) brings the distribution of intensities closer to that expected from an ideal crystal, and improves the sharpness of anomalous difference Fourier peaks indicating metal positions.

1. Introduction

As a strategy to avoid radiation damage, serial crystallography techniques aim to spread the X-ray dose over numerous crystal specimens, with the goal of observing Bragg spots from material that is close to the undamaged state. Based on the general decay of diffraction at a third-generation synchrotron source, an upper limit for radiation absorbed dose of 30 MGy has been proposed (Owen et al., 2006[Owen, R. L., Rudiño-Piñera, E. & Garman, E. F. (2006). Proc. Natl Acad. Sci. USA, 103, 4912-4917.]) for single-crystal experiments. However, it is also clear that, even at doses far below this limit, damage at specific sites of interest is observed, in particular at metal sites where valence states and coordination geometry are sensitive to X-rays. In photosystem II (PSII), for example, the valence state of the multinuclear Mn4Ca complex can be monitored by X-ray absorption near-edge spectroscopy (XANES; Yano et al., 2005[Yano, J., Kern, J., Irrgang, K. D., Latimer, M. J., Bergmann, U., Glatzel, P., Pushkar, Y., Biesiadka, J., Loll, B., Sauer, K., Messinger, J., Zouni, A. & Yachandra, V. K. (2005). Proc. Natl Acad. Sci. USA, 102, 12047-12052.]). This complex, which is responsible for catalyzing the water oxidation reaction that evolves oxygen during photosynthesis, has a high valent Mn4(III2, IV2) structure in dark-adapted crystals. Critically, XANES can detect the accumulation of the radiation-damaged low-valent Mn(II) state even at the smallest doses examined, 0.6 MGy (at 100 K, with 13.3 keV radiation; Yano et al., 2005[Yano, J., Kern, J., Irrgang, K. D., Latimer, M. J., Bergmann, U., Glatzel, P., Pushkar, Y., Biesiadka, J., Loll, B., Sauer, K., Messinger, J., Zouni, A. & Yachandra, V. K. (2005). Proc. Natl Acad. Sci. USA, 102, 12047-12052.]). In contrast, the femtosecond-scale pulse durations from an X-ray free-electron laser (XFEL) permit the observation of the undamaged Mn4(III2, IV2) complex (Kern et al., 2012[Kern, J. et al. (2012). Proc. Natl Acad. Sci. USA, 109, 9721-9726.], 2013[Kern, J. et al. (2013). Science, 340, 491-495.]), as shown by Mn Kβ1,3 X-ray emission spectroscopy that is likewise sensitive to the valence state (Alonso-Mori et al., 2012[Alonso-Mori, R. et al. (2012). Proc. Natl Acad. Sci. USA, 109, 19103-19107.]). Short pulses also permit the direct observation of metal coordination bond lengths expected by theory from undamaged Mn (Suga et al., 2015[Suga, M., Akita, F., Hirata, K., Ueno, G., Murakami, H., Nakajima, Y., Shimizu, T., Yamashita, K., Yamamoto, M., Ago, H. & Shen, J.-R. (2015). Nature (London), 517, 99-103.]). Furthermore, these XFEL-based observations can be performed under room-temperature conditions that permit time-dependent pump–probe studies of the water oxidation mechanism (Kern et al., 2014[Kern, J. et al. (2014). Nat. Commun. 5, 4371.]). Such experiments, when performed at the Linac Coherent Light Source (LCLS) with typical pulse durations of 40 fs, deliver a photon flux that would be equivalent to about 200 MGy (Kern et al., 2012[Kern, J. et al. (2012). Proc. Natl Acad. Sci. USA, 109, 9721-9726.]) if they were carried out at synchrotron-source time scales that allow radiation absorption. Thus, despite reports of diffraction decay with exceedingly long XFEL pulses (150 fs) and higher equivalent doses of 3 GGy (Lomb et al., 2011[Lomb, L. et al. (2011). Phys. Rev. B, 84, 214111.]), it appears that short-pulse XFEL still shots provide a promising method to look at radiation-sensitive structures, including those of metalloproteins.

Several high-resolution crystal structures have now been derived from XFEL diffraction (Boutet et al., 2012[Boutet, S. et al. (2012). Science, 337, 362-364.]; Redecke et al., 2013[Redecke, L. et al. (2013). Science, 339, 227-230.]; Barends et al., 2013a[Barends, T. R. M. et al. (2013a). Acta Cryst. D69, 838-842.],b[Barends, T. R. M., Foucar, L., Botha, S., Doak, R. B., Shoeman, R. L., Nass, K., Koglin, J. E., Williams, G. J., Boutet, S., Messerschmidt, M. & Schlichting, I. (2013b). Nature (London), 505, 244-247.]; Liu et al., 2013[Liu, W. et al. (2013). Science, 342, 1521-1524.]; Weierstall et al., 2014[Weierstall, U. et al. (2014). Nat. Commun. 5, 3309.]; Hattne et al., 2014[Hattne, J. et al. (2014). Nat. Methods, 11, 545-548.]; Kern et al., 2014[Kern, J. et al. (2014). Nat. Commun. 5, 4371.]; Sawaya et al., 2014[Sawaya, M. R. et al. (2014). Proc. Natl Acad. Sci. USA, 111, 12769-12774.]; Cohen et al., 2014[Cohen, A. E. et al. (2014). Proc. Natl Acad. Sci. USA, 111, 17122-17127.]; Tenboer et al., 2014[Tenboer, J. et al. (2014). Science, 346, 1242-1246.]), with a common result being the large number of diffraction images required to produce a complete set of merged structure factors, ranging in these cases from 104 to 1.8 × 105. Part of this requirement arises from the heterogeneous quality of the diffraction images. When the data are examined in detail, it appears that the limiting resolution differs from shot to shot; this can be quantified by asking at what resolution the average Bragg spot signal-to-noise ratio [I/σ(I), where I is the intensity and [\sigma(I)] is the standard deviation from counting statistics] falls below a threshold value (Kern et al., 2012[Kern, J. et al. (2012). Proc. Natl Acad. Sci. USA, 109, 9721-9726.], 2014[Kern, J. et al. (2014). Nat. Commun. 5, 4371.]; Hattne et al., 2014[Hattne, J. et al. (2014). Nat. Methods, 11, 545-548.]). Scoring the data in this way suggests that only a small fraction of the images contributes signal at the outer limits of resolution. Considering this, the high resolution data are especially valuable, and special effort is warranted to optimize their measurement.

However, there are several known issues with XFEL data processing that make it difficult to gain accurate measurements of the high resolution signal. Firstly, there are tradeoffs made when implementing the algorithm for Bragg spot integration. The program cctbx.xfel (Hattne et al., 2014[Hattne, J. et al. (2014). Nat. Methods, 11, 545-548.]) chooses small integration masks that tightly conform to the pixels believed to contain signal based on a lattice model, so as to discard surrounding pixels that contain only Gaussian noise. However, these small integration masks make the model highly sensitive to the calibration of both the detector distance and the detector metrology, which defines the mutual positions of sensor tiles (Hart et al., 2012[Hart, P. et al. (2012). Proc. SPIE, 8504, 85040C.]; Hattne et al., 2014[Hattne, J. et al. (2014). Nat. Methods, 11, 545-548.]). While cctbx.xfel can fortunately calibrate sensor positions to about 0.1-pixel accuracy, it is found that even a 0.5-pixel miscalibration noticeably degrades the integrated data, and that this is felt most acutely for the highest-resolution data (Hattne et al., 2014[Hattne, J. et al. (2014). Nat. Methods, 11, 545-548.]). Secondly, when performing a control data analysis with simulated image data, it can be shown that the inability to perfectly model the lattice orientation produces some Bragg spot predictions that are not in the simulated data, and misses other spots that really are in the simulation (Sauter et al., 2014[Sauter, N. K., Hattne, J., Brewster, A. S., Echols, N., Zwart, P. H. & Adams, P. D. (2014). Acta Cryst. D70, 3299-3309.]). These effects are most pronounced at the highest resolution limits. Thirdly, correct modeling of the diffraction pattern of a crystal with mosaic structure (Nave, 1998[Nave, C. (1998). Acta Cryst. D54, 848-853.], 2014[Nave, C. (2014). J. Synchrotron Rad. 21, 537-546.]) requires counterbalancing parameters that describe the crystal's physical properties. Increasing the parameter describing the angular spread of mosaic blocks permits the modeling of Bragg spots at the highest resolution limits, while decreasing the mosaic block size parameter allows the model to cover the low resolution Bragg spots. However, tuning these parameters requires assumptions about the mosaic structure of the crystal, and this entails increased uncertainty at the resolution extremes. Finally, in contrast to the usual rotation method employed at synchrotron sources, where each reciprocal lattice point is fully moved through the reflection condition, Bragg spots recorded on still shots necessarily represent partial measurements of the structure factor intensity. While it has been shown experimentally that adjacent spots can have differing partialities of measurement (Hattne et al., 2014[Hattne, J. et al. (2014). Nat. Methods, 11, 545-548.]), particularly at high resolution, there is as yet no robust model to correct measurements to the full-spot equivalent.

Here further evidence that the data quality is most sensitive to error at the highest resolution is presented, providing further incentive for resolution-based filtering. However, it is demonstrated that a straightforward filter based on I/σ(I) removes real signal that is capable of improving anomalous difference measurements. Finally, with the eventual goal of deriving a proper expression to correct for partiality, it is demonstrated that a simplified model based on the assumption of monochromaticity provides a reasonable first step toward improving the structure factors.

2. Methods

Data were processed with the program cctbx.xfel (Hattne et al., 2014[Hattne, J. et al. (2014). Nat. Methods, 11, 545-548.]; Sauter et al., 2014[Sauter, N. K., Hattne, J., Brewster, A. S., Echols, N., Zwart, P. H. & Adams, P. D. (2014). Acta Cryst. D70, 3299-3309.]). A tutorial for processing the thermolysin data is presented at https://cci.lbl.gov/xfel .

2.1. Data analyzed

Thermolysin diffraction patterns were reprocessed from a previously described data set (Hattne et al., 2014[Hattne, J. et al. (2014). Nat. Methods, 11, 545-548.]) that is publicly archived at the Coherent X-ray Imaging Data Bank (https://cxidb.org ), accession ID 23. Data were acquired during the L498 (December 2012) beam time at the 1 µm focus of the Coherent X-ray Imaging (CXI) instrument (Boutet & Williams, 2010[Boutet, S. & Williams, G. J. (2010). New J. Phys. 12, 035024.]) of LCLS. Typical crystal size was approximately 2 µm × 3 µm × 1 µm (Sierra et al., 2012[Sierra, R. G. et al. (2012). Acta Cryst. D68, 1584-1587.]). Since the thermolysin structure contains a single Zn atom, it was possible to use the signal-to-noise ratio of the anomalous difference electron density as a metric for the data processing quality (Sauter et al., 2014[Sauter, N. K., Hattne, J., Brewster, A. S., Echols, N., Zwart, P. H. & Adams, P. D. (2014). Acta Cryst. D70, 3299-3309.]). Therefore, in the work presented here the analysis was limited to data (runs 16–27) collected at a wavelength of 1.269 Å, slightly more energetic than the Zn K-edge at 1.284 Å.

Simulated still-shot diffraction patterns from photosystem I (PSI) were obtained from James Holton (LBNL), and are available at https://bl831a.als.lbl.gov/example_data_sets/Illuin/LCLS . The 20000-image simulated dataset was created with the program fastBragg as described (Kirian et al., 2010[Kirian, R. A., Wang, X., Weierstall, U., Schmidt, K. E., Spence, J. C. H., Hunter, M., Fromme, P., White, T., Chapman, H. N. & Holton, J. (2010). Opt. Express, 18, 5713-5723.], 2011[Kirian, R. A., White, T. A., Holton, J. M., Chapman, H. N., Fromme, P., Barty, A., Lomb, L., Aquila, A., Maia, F. R. N. C., Martin, A. V., Fromme, R., Wang, X., Hunter, M. S., Schmidt, K. E. & Spence, J. C. H. (2011). Acta Cryst. A67, 131-140.]), utilizing modeled structure factors from Protein Data Bank entry 1jb0 . Spatially coherent simulations of randomly oriented parallelepiped nanocrystals (17 × 17 × 30 unit cells; cell lengths a = b = 281 Å, c = 165.2 Å) were performed, assuming constant-flux, polarized, monochromatic radiation (λ = 1.32 Å) with zero divergence, impinging on a pixel-array detector with pixel size (0.11 mm)2 at a distance of 129 mm from the sample. Solvent scattering and shot noise were added so as to effectively limit the resolution to about 3.3 Å. At very low resolutions (d > 60 Å) the simulation exhibits diffraction fringes between Bragg spots as previously observed for PSI (Chapman et al., 2011[Chapman, H. N. et al. (2011). Nature (London), 470, 73-77.]); however, the present paper attempts to analyze only the central Bragg peak, and the analysis is limited to the 15–3.5 Å range. Angular misorientation between the cctbx.xfel models and the true crystal orientations used for the simulation were calculated after accounting for the orientational ambiguities due to the hexagonal lattice symmetry operators (six-fold along c and two-fold along a + b). Angular misorientations were then decomposed into a rotation Rz about an axis parallel to the beam, and a residual rotation Rxy about an axis perpendicular to the beam.

2.2. Correction of the integrated intensity to the full-spot equivalent

This section describes the component of partiality that arises from the crystal's mosaic structure (Nave, 1998[Nave, C. (1998). Acta Cryst. D54, 848-853.], 2014[Nave, C. (2014). J. Synchrotron Rad. 21, 537-546.]), setting aside the effects of beam properties such as dispersion and divergence for future work. Consider a reciprocal lattice point at reciprocal space position Q (rlpQ; Fig. 1[link]). Points on the Ewald sphere of radius 1/λ (λ, wavelength) satisfy the reflection conditions exactly, but what if rlpQ is located a distance rh from the Ewald sphere surface, as in Fig. 1[link]? In this case diffraction can still be modeled if crystal imperfections are taken into account, thus widening rlpQ into a ball of radius rs, and satisfying the Laue conditions at the spherical cap-shaped intersection between the Ewald sphere and the rlpQ ball. Although this intersection area AQ could be expressed analytically, it is convenient to approximate it as a circle of radius rp given by the right triangle in Fig. 1[link], with

[{r}_{p}^{\,2}={r}_{s}^{\,2}-{r}_{h}^{\,2}.\eqno(1)]

To obtain the best match with experimentally observed still-shot diffraction (Sauter et al., 2014[Sauter, N. K., Hattne, J., Brewster, A. S., Echols, N., Zwart, P. H. & Adams, P. D. (2014). Acta Cryst. D70, 3299-3309.]), it is useful to consider two parameters that contribute to the ball radius rs. Viewing the crystal as a mosaic of coherently scattering blocks (Nave, 1998[Nave, C. (1998). Acta Cryst. D54, 848-853.]) of effective width Deff gives

[{r}_{s}={{1}\over{{D}_{\rm{eff}}}}.\eqno(2)]

Meanwhile, considering the angular spread and unit cell variation among the mosaic blocks leads to the expression

[{r}_{s}={{{\eta}_{\rm{eff}}}\over{2d}},\eqno(3)]

where d is the resolution and [{\eta_{{\rm{eff}}}}] is the effective full-width mosaicity.1 Combining these two effects gives a final expression for the Ewald sphere intersection area:

[{A}_{Q}= \pi\left[{\left({{1}\over{{D}_{\rm{eff}}}}+{{{\eta}_{\rm{eff}}}\over{2d}}\right)}^{2}-{r}_{h}^{\,2}\right].\eqno(4)]

For the partiality of the Bragg spot in Fig. 1[link], arising from the crystal's mosaic structure, it seems intuitive that the partiality should be proportional to AQ, with a maximum obtained when the Ewald sphere slices through the center of rlpQ, and a minimum of 0 at rh = rs. Taking the simplest case first, that with [{\eta_{{\rm{eff}}}}] = 0, one finds that the maximal area is a constant: [\pi/D_{{\rm{eff}}}^2]. To turn this into a measure for partiality, one must assure that the partiality always takes on values from 0 to 1, and that it is a unitless quantity instead of having dimensions of length−2. This is accomplished by taking a suggestion from James Holton who, considering work on NaCl where a reference reflection was used (Bragg et al., 1921[Bragg, W. L., James, R. W. & Bosanquet, C. H. (1921). Philos. Mag. Ser. 6, 41, 309-337.]), proposed (private communication) that the ratio between AQ and the area intersected by the F000 reciprocal lattice point should be used:

[{A}_{000}=\pi {\left({{1}\over{{D}_{\rm{eff}}}}\right)}^{2}. \eqno(5)]

For F000, equation (5)[link] always holds because rh and 1/d are identically zero.

[Figure 1]
Figure 1
Geometric definition of partiality, accounting for the mosaic structure of the crystal. For a still shot taken with monochromatic X-rays of wavelength λ, a reciprocal lattice point (blue ball centered on Q) partially intersects the Ewald sphere. The intersection area, which is actually a spherical cap, is approximated by a circle of radius rp, which is determined by rh, the distance from Q to the Ewald sphere, and rs, the resolution-dependent radius of the reciprocal lattice point as described in the text. Partiality is defined as the intersection area-to-ball volume ratio for lattice point Q, normalized by the intersection area-to-ball volume ratio of the F000 spot at reciprocal space origin O.

Next, in the general case where [{\eta_{{\rm{eff}}}}] > 0, the maximal intersection area AQ increases as a function of resolution due to its dependence on 1/d, but this apparently goes against the expectation that the maximum partiality for any spot, independent of resolution, should be 1. To correct for this, one can normalize against the volume VQ of the reciprocal lattice point, so that the full expression for partiality P involves the ratio of area to volume:

[P= {{ A_Q/V_Q }\over{ A_{000}/V_{000} }} = {{{r}_{s}^{\,2}-{r}_{h}^{\,2}}\over{{D}_{\rm{eff}}\,{r}_{s}^{\,3}}},\eqno(6)]

where

[{r}_{s}= {{1}\over{{D}_{\rm{eff}}}} + {{{\eta}_{\rm{eff}}}\over{2d}}.]

To evaluate equation (6)[link], the parameters Deff and [{\eta_{{\rm{eff}}}}] are determined separately for each image as previously described (Sauter et al., 2014[Sauter, N. K., Hattne, J., Brewster, A. S., Echols, N., Zwart, P. H. & Adams, P. D. (2014). Acta Cryst. D70, 3299-3309.]). Plotting the partiality of Bragg spots from a single thermolysin image (Fig. 2[link]) confirms the expected behavior: the distribution of P increases to a maximum at rh = 0 but never actually reaches 1.0 due to the normalization by VQ, and it falls off to zero at rh = [\pm {r_s}].

[Figure 2]
Figure 2
Partiality estimates for Bragg spots integrated on a single thermolysin image, plotted as a function of the rh/rs ratio.

Individual Bragg spot intensity measurements I are corrected to their full-spot equivalent IF with

[{I_F} = {I/P},\eqno(7)]

and those measurements with very low partiality are discarded, i.e. those with [|{r}_{h}|] > 0.9rs.

Prior to merging data from different images together, duplicate measurements from different images are placed on a common scale by determining a separate scaling factor G and isotropic temperature factor B for each image. In common with previous work on scaling (Hamilton et al., 1965[Hamilton, W. C., Rollett, J. S. & Sparks, R. A. (1965). Acta Cryst. 18, 129-130.]; Fox & Holmes, 1966[Fox, G. C. & Holmes, K. C. (1966). Acta Cryst. 20, 886-891.]; Bolotovsky et al., 1998[Bolotovsky, R., Steller, I. & Rossmann, M. G. (1998). J. Appl. Cryst. 31, 708-717.]; Kabsch, 2014[Kabsch, W. (2014). Acta Cryst. D70, 2204-2216.]), here these parameters were determined by iterative non-linear least-squares minimization of a target functional,

[\Psi=\sum\limits_{\bf{h}} \left\{ I_{\bf{h}}-P\left({r}_{h}\right)G \exp\left[-2B{\left({{\sin\theta_{\bf{h}}}\over{\lambda}}\right)}^{2}\right] I_{{\bf{h}},{\rm{ref}}} \right\}^2,\eqno(8)]

where [\theta_{\bf{h}}] is the Bragg angle, and the summation is over all Miller indices [{\bf{h}}] measured on a given image. However, instead of taking [I_{{\bf{h}},{\rm{ref}}}] to be the best least-squares estimate of the structure factor intensity over the global dataset, the shortcut is taken of using reference intensities measured at a synchrotron, in this case thermolysin intensities from Protein Data Bank entry 2tli .2

Furthermore, for the computation of partiality in still-shot data [equation (6)[link]], the Ewald sphere distances rh are sensitive functions of the crystal orientation, and in particular are susceptible to rotational uncertainties about the two orthogonal axes perpendicular to the X-ray beam; see §3.3[link] below. By expressing rh explicitly as a function of these rotations, the scaling equation (8)[link] can be modified to include these rotations as free parameters. The necessary equation is:

[{r}_{h}\left({\varphi}_{y},{\varphi}_{x}\right)= \left|\!\left|R_y\left(\varphi_y\right) \, R_x\left(\varphi_x\right)\,A^*\,{\bf{h}}+{\bf{s}}_0\left|\!\left|\,\,-\,\,({{1}/{\lambda}}),\eqno(9)]

where [{R_y}({{\varphi_y}})] and [{R_x}({{\varphi_x}})] are matrices describing rotational perturbations through angles [{\varphi_y}] and [{\varphi_x}] about orthogonal axes y and x perpendicular to the beam, A* is the reciprocal space orientation matrix determined by indexing (Sauter et al., 2006[Sauter, N. K., Grosse-Kunstleve, R. W. & Adams, P. D. (2006). J. Appl. Cryst. 39, 158-168.]), also known as the UB matrix (Busing & Levy, 1967[Busing, W. R. & Levy, H. A. (1967). Acta Cryst. 22, 457-464.]), and [{\bf{s}}_0] is the vector describing the travel direction of the X-ray beam with length [1/\lambda]. With the final set of free parameters being G, B, [\varphi_y] and [\varphi_x], this adjustment of the crystal orientation to optimize agreement between reference intensities and the corrected measured intensities is similar to other post­refinement protocols used for both classical rotation photography (Winkler et al., 1979[Winkler, F. K., Schutt, C. E. & Harrison, S. C. (1979). Acta Cryst. A35, 901-911.]; Rossmann et al., 1979[Rossmann, M. G., Leslie, A. G. W., Abdel-Meguid, S. S. & Tsukihara, T. (1979). J. Appl. Cryst. 12, 570-581.]) and XFEL scaling (White, 2014[White, T. A. (2014). Philos. Trans. R. Soc. London B, 369, 20130330.]; Kabsch, 2014[Kabsch, W. (2014). Acta Cryst. D70, 2204-2216.]).

2.3. Comparison of data processing protocols

The thermolysin data were processed five times to assess the relative effects of differing protocols (Table 1[link]). During the integration step, the lattice models were either truncated (Hattne et al., 2014[Hattne, J. et al. (2014). Nat. Methods, 11, 545-548.]) at a resolution limit, separate for each lattice, where integrated intensity measurements fell below a threshold value (protocols 4, 6 and 7POST); or the data were integrated to a fixed limit of 2.2 Å (6F and 7F,POST). In either case, negative measurements were removed before the data from separate lattices were scaled together. Scaling was performed either by finding a simple scaling constant to fit partial structure factor intensities from each lattice to full calculated intensities based on PDB entry 2tli (4, 6 and 6F) as previously described (Hattne et al., 2014[Hattne, J. et al. (2014). Nat. Methods, 11, 545-548.]), or by the post­refinement protocol of §2.2[link] (7POST and 7F,POST). Once duplicate measurements were merged globally over the whole data set, intensity distribution statistics were calculated with phenix.xtriage (Zwart et al., 2005[Zwart, P. H., Grosse-Kunstleve, R. W. & Adams, P. D. (2005). CCP4 Newsl. 43, 26-35.]). The previously published XFEL thermolysin structure (PDB entry 4ow3 ) was re-refined against the newly processed data with phenix.refine (Afonine et al., 2012[Afonine, P. V., Grosse-Kunstleve, R. W., Echols, N., Headd, J. J., Moriarty, N. W., Mustyakimov, M., Terwilliger, T. C., Urzhumtsev, A., Zwart, P. H. & Adams, P. D. (2012). Acta Cryst. D68, 352-367.]), and anomalous difference Fourier peak heights analyzed with phenix (Adams et al., 2010[Adams, P. D., Afonine, P. V., Bunkóczi, G., Chen, V. B., Davis, I. W., Echols, N., Headd, J. J., Hung, L.-W., Kapral, G. J., Grosse-Kunstleve, R. W., McCoy, A. J., Moriarty, N. W., Oeffner, R., Read, R. J., Richardson, D. C., Richardson, J. S., Terwilliger, T. C. & Zwart, P. H. (2010). Acta Cryst. D66, 213-221.]). Likelihood-weighted maps displayed with Coot (Emsley et al., 2010[Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. (2010). Acta Cryst. D66, 486-501.]) are shown in Fig. S1.3 Correlation coefficients of these maps to a 1.65 Å reference model (from synchrotron-based data, PDB entry 2tlx ) were determined after rigid body refinement of the 2tlx model into the XFEL unit cell. Separately, in order to assess the ability to perform automated model building, the structure was solved by molecular replacement against 4ow3 with phaser (McCoy et al., 2007[McCoy, A. J., Grosse-Kunstleve, R. W., Adams, P. D., Winn, M. D., Storoni, L. C. & Read, R. J. (2007). J. Appl. Cryst. 40, 658-674.]). Molecular replacement phasing information was combined with single-wavelength anomalous differences with phenix.autosol (Terwilliger et al., 2009[Terwilliger, T. C., Adams, P. D., Read, R. J., McCoy, A. J., Moriarty, N. W., Grosse-Kunstleve, R. W., Afonine, P. V., Zwart, P. H. & Hung, L.-W. (2009). Acta Cryst. D65, 582-601.]), and fully automated fitting of the amino acid sequence was performed with phenix.autobuild (Terwilliger et al., 2008[Terwilliger, T. C., Grosse-Kunstleve, R. W., Afonine, P. V., Moriarty, N. W., Zwart, P. H., Hung, L.-W., Read, R. J. & Adams, P. D. (2008). Acta Cryst. D64, 61-69.]).

Table 1
Processing outcome on XFEL still shots from thermolysin

  Protocol number
  4 6 6F 7POST 7F,POST
Protocol choice
 Model restraints Spot positions only Spot positions + angular deviations Spot positions + angular deviations Spot positions + angular deviations Spot positions + angular deviations
 Postrefinement and partiality correction No No No Yes Yes
 Each-lattice resolution bin cutoff [I/σ(I)] 0.5 0.5 None 0.5 None
Indexing results
 # Total hits with >15 Bragg spots 14041 14041 14041 14041 14041
 # Integrated and merged lattices 12097 12550 13756 12551 13733
Model accuracy
 Half-width mosaicity 0.292° 0.168° 0.213° 0.168° 0.213°
 Mosaic block size 4320 Å 4220 Å 4370 Å 4220 Å 4370 Å
Integrated data results
 〈Individual image CC〉 32.0% 40.2% 40.1% 40.2% 40.1%
 No. of measurements 51–2.2Å 6605566 5036076 11905131 4290566 9915864
 Positive measurements 51–2.2Å 4297065 3626262 7249271 3187835 6201772
 Negative measurements 35% 28% 39% 26% 37%
Structure factor merging
 Merging resolution range (Å) 51–2.2 (2.28–2.2) 51–2.2 (2.28–2.2) 51–2.2 (2.28–2.2) 51–2.2 (2.28–2.2) 51–2.2 (2.28–2.2)
 Unique Miller indices 17198 (1405) 17297 (1488) 17513 (1700) 17227 (1425) 17513 (1700)
 Multiplicity of observation 250 (3.0) 210 (3.6) 414 (53) 185 (3.2) 354 (44)
 Completeness 98.2% (82.7%) 98.8% (87.6%) 100% (100%) 98.4% (83.9%) 100% (100%)
I/σ(I) 36.1 (2.3) 56.7 (3.2) 55.9 (4.2) 74.9 (3.5) 72.7 (4.0)
 CC1/2 correlation of semi-datasets 72.2% (4.1%) 87.2% (42.7%) 92.1% (14.6%) 90.2% (34.0%) 92.8% (16.0%)
R1/2 intensity agreement of semi-datasets 33.9% (95.2%) 32.0% (89.7%) 26.7% (69.6%) 29.3% (89.7%) 26.7% (78.0%)
 CCiso versus 4ow3 (based on intensities) 86.8% (18.1%) 94.7% (40.0%) 95.1% (23.3%) 94.8% (42.1%) 95.2% (30.1%)
Riso versus 4ow3 (based on intensities) 23.6% (79.0%) 18.0% (73.8%) 17.7% (63.8%) 23.4% (76.1%) 22.5% (69.3%)
Structure factor quality tests
 Wilson B-factor (Å2) 12.2 17.2 18.3 17.7 20.6
 〈I2〉/〈I2 (theoretical 2.0) 1.293 1.518 1.471 1.697 1.628
 〈|L|〉 (acentric theoretical = 0.5) 0.302 0.376 0.366 0.425 0.412
 〈L2〉 (acentric theoretical = 0.333) 0.137 0.202 0.193 0.252 0.238
N(Z) maximum deviation (acentric) 0.201 0.121 0.133 0.071 0.082
N(Z) maximum deviation (centric) 0.271 0.198 0.213 0.112 0.147
Quality of refined structure
 Refinement resolution range (Å) 51–2.2 (2.34–2.2) 51–2.2 (2.34–2.2) 51–2.2 (2.34–2.2) 51–2.2 (2.34–2.2) 51–2.2 (2.34–2.2)
Rwork 24.5% (35.2%) 20.8% (33.5%) 21.2% (32.0%) 20.6% (36.3%) 19.5% (33.0%)
Rfree 29.6% (39.9%) 26.3% (44.0%) 26.0% (39.0%) 24.1% (45.8%) 24.3% (42.0%)
 Zn2+ anomalous-difference peak height 2.9σ 5.8σ 7.2σ 7.4σ 8.7σ
 Molprobity clashscore (Chen et al., 2010[Chen, V. B., Arendall, W. B. III, Headd, J. J., Keedy, D. A., Immormino, R. M., Kapral, G. J., Murray, L. W., Richardson, J. S. & Richardson, D. C. (2010). Acta Cryst. D66, 12-21.]) 8.41 2.16 3.23 1.08 0.86
 Protein atom B-factors (Å2) 15.6 18.0 20.4 19.1 21.3
 Solvent atom B-factors (Å2) 23.3 28.8 29.7 27.9 30.5
 Number of autobuilt water molecules 311 295 248 236 232
 Overall/local map C.C. to 2tlx model 77.0%/81.3% 81.3%/84.4% 82.0%/85.0% 82.5%/85.2% 83.2%/85.7%
Automated model building after MR-SAD
 No. of mainchain/sidechain (of total 316) 310/299 309/309 309/309 312/305 312/306
Rwork/Rfree 24.0%/28.8% 23.7%/28.2% 22.6%/26.2% 23.0%/26.5% 22.6%/26.2%
†For the thermolysin data analysis, candidate Bragg spots were chosen with a minimum spot area of 2 square pixels.

3. Results and discussion

3.1. Shot-to-shot heterogeneity is intrinsic to the data

Heterogeneity in XFEL-based serial crystallographic images is a necessary consequence of physical properties such as the mosaic structure that varies among crystals, and pulse-to-pulse differences in incident flux and the volume of crystal/beam intersection. Shot-to-shot variation in the limiting resolution has been previously noted for microcrystal populations of two proteins: PSII and thermolysin (Kern et al., 2013[Kern, J. et al. (2013). Science, 340, 491-495.], 2014[Kern, J. et al. (2014). Nat. Commun. 5, 4371.]; Hattne et al., 2014[Hattne, J. et al. (2014). Nat. Methods, 11, 545-548.]). These reports, using data processed with cctbx.xfel, were based on the examination of Wilson plots (integrated Bragg spot intensity versus diffraction angle bin), to identify the limiting resolution where average intensity falls below the average counting-statistics noise. This analysis, however, does not convey whether the resolution limits are determined by actual falloff of the recorded spot intensities or by artifacts produced by the integration algorithm.

Fig. 3(a)[link] confirms that the resolution limit variation is indeed intrinsic to the recorded data. The horizontal axis (scatter plot and histogram) reports the distribution of resolution limits judged by a spotpicking algorithm (Zhang et al., 2006[Zhang, Z., Sauter, N. K., van den Bedem, H., Snell, G. & Deacon, A. M. (2006). J. Appl. Cryst. 39, 112-119.]). After removal of untrusted pixels and subtraction of local background, the signal is judged by whether the intensity exceeds local variance by a given threshold; the resulting population of Bragg spot candidates is then plotted as a function of diffraction angle and a uniform cutoff criterion applied over the whole data set. Resolution cutoffs determined this way are therefore independent of all the ensuing data processing details such as indexing (discovery of basis vector candidates), choice of basis vectors to form the unit cell, model refinement, application of symmetry constraints, and choice of algorithms for spot prediction and signal integration.

[Figure 3]
Figure 3
Resolution limits and positional accuracy of the thermolysin integration model. (a) Limiting resolution for 1000 randomly selected shots from runs 21–27 of the L498 experiment, collected at a sample-to-detector distance of 171.0 mm, and thus restricted to 2.6 Å at the detector edge, and 2.05 Å in the detector corners. Data for the strongest-diffracting samples are therefore limited by a sharp cutoff due to detector geometry rather than the intrinsic sample diffraction. Horizontal axis: limits based on bright spots picked by a spotfinding algorithm (Zhang et al., 2006[Zhang, Z., Sauter, N. K., van den Bedem, H., Snell, G. & Deacon, A. M. (2006). J. Appl. Cryst. 39, 112-119.]); blue bars represent a histogram of resolution limits determined with `method 2' from that paper. Vertical axis: limits based on a Wilson plot of the integrated intensities. (b) Displacement (in pixels) between Bragg spot positions predicted by the lattice model used for integration, and the center of mass positions actually measured for bright spotfinder-picked spots. Blue traces: displacement for 20 randomly selected shots, with bright spots from each shot grouped into resolution bins; black dots identify the highest-resolution bin for each individual shot. Red curve: aggregate displacement over the 1000 images analyzed in panel (a).

However, once the integrated intensities are analyzed with a Wilson plot, the resolution cutoffs of the integrated data [Fig. 3(a)[link], vertical axis] are well correlated with those determined simply on the basis of spotpicking [correlation coefficient r = 89% for Fig. 3(a)[link]], ruling out any distortion arising from data processing. Indeed, the lattice model used for data integration can be used to push beyond the limits of the spotfinder to some extent: for two-thirds of the images plotted (Fig. 3a[link]), the cctbx.xfel integration limit is above the diagonal, and therefore the model is finding signal that is missed by straightforward spotpicking.

Given this successful result, why not simply ignore the resolution cutoffs altogether, use the lattice model to predict spot positions out to the corner of the detector, and thereby take full advantage of the weak measurements when ultimately the duplicate measurements are merged over the whole data set to produce structure factors? Indeed, it is widely recognized (e.g. Weiss, 2001[Weiss, M. S. (2001). J. Appl. Cryst. 34, 130-135.]) that high-quality reduced data can be obtained by merging numerous multiplicitous measurements. The argument against this proposition is that it supposes that the error model for the weak high-resolution spots is well characterized and suitably random, which is a requirement for merging data (Borek et al., 2003[Borek, D., Minor, W. & Otwinowski, Z. (2003). Acta Cryst. D59, 2031-2038.]). The following sections (§§3.2[link]–3.3[link]) show that there are large non-random systematic uncertainties in the model. Moreover, while the spotpicking Bragg candidates offer a built-in validation of the model, the uncertainties are poorly characterized at the highest resolutions that are beyond the spotpicking limit.

3.2. The positional accuracy of the model is resolution-dependent

The CSPAD imaging detector at LCLS (Hart et al., 2012[Hart, P. et al. (2012). Proc. SPIE, 8504, 85040C.]) is designed to fulfil stringent requirements: signal is integrated over a 50 fs X-ray pulse, readout is performed at the pulse repetition rate of 120 Hz, and operation is in vacuum. A large detection area is achieved by tiling 32 rectangular silicon sensors; however, this geometrical arrangement also creates the problem of knowing the sensors' relative positions (metrology) to subpixel accuracy. As reported earlier (Hattne et al., 2014[Hattne, J. et al. (2014). Nat. Methods, 11, 545-548.]), cctbx.xfel can determine or validate the tile displacements to within 0.1 pixel. It compares the positions of Bragg spots observed by the spotfinder with those predicted by the lattice model, and performs iterative non-linear least-squares parameter refinement over tile positions and lattice model parameters. Once the tile positions (and rotations) have been corrected, one can investigate the residual displacement errors of the bright Bragg spots (Fig. 3b[link]).

Fig. 3(b)[link] indicates that the positional error of the model increases at higher resolutions; this is evident both for individual images (blue traces), and for aggregate positional errors over the whole dataset (red curve). While positional uncertainty is quite manageable at 10 Å d-spacings (0.3 pixels), it becomes problematic (1.0 pixel) at 2.7 Å. Several factors may combine to cause this effect. Firstly, there is a positional error, potentially 1 pixel or greater, due to the assumption of monochromaticity. In reality the X-ray pulses at LCLS have a stochastic spectrum with ∼0.5% bandpass (Emma et al., 2010[Emma, P. et al. (2010). Nat. Photon. 4, 641-647.]). Ideally the model could be augmented with a spot prediction algorithm that determines the wavelength range satisfying Bragg's law separately for each reflection (Hattne et al., 2014[Hattne, J. et al. (2014). Nat. Methods, 11, 545-548.]), thus taking the bandpass into account when predicting the 2θ diffraction angle. Secondly, the thickness of the silicon sensor (0.5 mm for the CSPAD) introduces a differential parallax effect, which again is potentially correctable (Hülsen et al., 2005[Hülsen, G., Brönnimann, C. & Eikenberry, E. F. (2005). Nucl. Instrum. Methods Phys. Res. A, 548, 540-554.]). These phenomena affect spots' radial positions, and indeed we observe that the radial displacement is the largest component of the positional error (data not shown).

Regardless of the cause of the positional displacement shown in Fig. 3(b)[link], values exceeding 1 pixel could significantly degrade the intensities, considering that a typical spot area is 5 square pixels (Hattne et al., 2014[Hattne, J. et al. (2014). Nat. Methods, 11, 545-548.]), and in view of cctbx.xfel's practice of constructing tightly conforming integration masks based on nearby bright spots. Rather than explicitly determining the uncertainty for each modeled spot at high resolution, cctbx.xfel currently takes the easier route of using the falloff of the Wilson plot as a proxy for uncertainty, and simply cutting off the integrated intensities past the apparent resolution limit. Other approaches to downweighting outlier data may be possible; for example, one of the 20 lattices plotted in Fig. 3(b)[link] has positional displacements exceeding 2 pixels, which should probably disqualify it from the subsequent data merging process. Filtering individual lattices based on positional displacement rather than I/σ(I) falloff might offer a way to preserve weak high-resolution signal in the final merged intensities.

3.3. Resolution-dependent model quality, due to misorientation, affects map features

An inherent concern with still shots is that the orientation of the crystal is not uniquely determined by measuring the Bragg spot positions. Only one of the three rotational degrees of freedom is directly coupled to spot positions, namely the rotation Rz around the axis parallel to the beam. The other two rotations (Rxy) move reciprocal lattice points in and out of the reflecting condition, but do not change the direction of the diffracted rays. It has been possible to improve the outcome by placing an additional restraint on the refinement of the orientational model. Specifically, one can rotate the model lattice, while minimizing the deviations between the observed reciprocal lattice points and the Ewald sphere, with deviations being expressed either as reciprocal space distances (Kabsch, 2014[Kabsch, W. (2014). Acta Cryst. D70, 2204-2216.]) or rotational angles (Sauter et al., 2014[Sauter, N. K., Hattne, J., Brewster, A. S., Echols, N., Zwart, P. H. & Adams, P. D. (2014). Acta Cryst. D70, 3299-3309.]).

The effect of these restraints can be directly gauged by considering simulated diffraction data. Fig. 4[link] shows that, for 1000 simulated 3.3 Å PSI diffraction patterns in random orientations, lattice models refined against spot positions alone have residual Rxy misorientations up to 0.3° (Fig. 4a[link]); while applying the angular restraint brings most Rxy mis­orientations to below 0.05° (Fig. 4b[link]). A misoriented lattice model can have a dramatic effect on spot predictions (Fig. 4a[link]). Improperly oriented model lattices place the observed reciprocal lattice points far away from the Ewald sphere, thus the mosaicity parameter must be adjusted upward so that the predicted spot pattern can cover all the observations. As illustrated in Fig. 4(a)[link], this has the unwanted effect of creating false predictions for numerous spots that are not actually recorded in the image. Furthermore, previous work (Sauter et al., 2014[Sauter, N. K., Hattne, J., Brewster, A. S., Echols, N., Zwart, P. H. & Adams, P. D. (2014). Acta Cryst. D70, 3299-3309.]) with the PSI simulation shows that the fraction of spots predicted falsely increases with resolution. In parallel, if the experimental thermolysin data are processed with a protocol that omits the angular restraint and thus is believed to allow numerous false spot predictions, the ability to distinguish the Zn2+ anomalous difference Fourier peak is markedly decreased (Table 1[link], compare protocols 4 and 6). All these results provide further argument for cutting off integrated intensities at the resolution suggested by the Wilson plot, thereby guarding against the chance that any given measurement is errantly modeled due to lattice misorientation.

[Figure 4]
Figure 4
Bragg spot predictions are more accurate when the orientational model is refined against Ewald sphere distance. Two protocols are evaluated: (a) refinement of indexed spots against observed positions only, and (b) also refining the model against the angular deviation of the reciprocal lattice point from the Ewald sphere, corresponding to protocols 4 and 6 of Sauter et al. (2014[Sauter, N. K., Hattne, J., Brewster, A. S., Echols, N., Zwart, P. H. & Adams, P. D. (2014). Acta Cryst. D70, 3299-3309.]), respectively. Plots represent a random sampling of processing results for simulated PSI data, in which the modeled orientation can be compared against the known true orientation from the simulation. Horizontal axis: residual misorientation angle Rxy after removal of the small misorientation Rz along the axis parallel to the beam direction (r.m.s. Rz misorientation is 0.017° for both panels). Vertical axis: fraction of Bragg spots predicted by the model but not present in the simulated data (blue), and fraction of Bragg spots in the simulation that are not modeled (red).

3.4. Direct test of the I/σ(I) cutoff

The preceding two sections raise cautions about the systematic errors present in high resolution data. Accordingly, the program cctbx.xfel has been implemented with the option of applying a separate resolution cutoff to individual lattices, reasoning that, for the highest-resolution bins where I/σ(I) falls below a particular threshold, the data integration model has probably diverged too much for the intensities to be useful (Hattne et al., 2014[Hattne, J. et al. (2014). Nat. Methods, 11, 545-548.]). However, as recent literature has highlighted the pitfalls of discarding data (Karplus & Diederichs, 2012[Karplus, P. A. & Diederichs, K. (2012). Science, 336, 1030-1033.]; Diederichs & Karplus, 2013[Diederichs, K. & Karplus, P. A. (2013). Acta Cryst. D69, 1215-1222.]), Table 1[link] presents a direct comparison between thermolysin data processed with an I/σ(I)-dependent cutoff (protocols 6 and 7POST) and data processed with a fixed cutoff of 2.2 Å (6F and 7F,POST). As expected, the inclusion of more weak high-resolution data dramatically increases the average multiplicity of observations, as well as increasing the fraction of negative observations due to the poor quality of the high resolution models. Notably, however, the inclusion of more data also increases the height of the Zn2+ anomalous difference Fourier peak, suggesting that there is value in preserving the high resolution information. As noted in §3.2[link], it is worth developing alternative methods that would include more data, but yet still account for the known systematic errors such as positional displacement.

3.5. Modeling the partiality

Even with utmost care given to choose data based on the significance level of the signal, a large inherent uncertainty remains with all still-shot data, due to the partial nature of the recorded intensities. This uncertainty is not present for rotation photography, where well established methods exist (Rossmann et al., 1979[Rossmann, M. G., Leslie, A. G. W., Abdel-Meguid, S. S. & Tsukihara, T. (1979). J. Appl. Cryst. 12, 570-581.]; Winkler et al., 1979[Winkler, F. K., Schutt, C. E. & Harrison, S. C. (1979). Acta Cryst. A35, 901-911.]) to quantify the spot partiality based on the volume of the reciprocal lattice point (rlpQ) swept up by the Ewald sphere due to rotation. However, this is not a useful measure for still shots where, in the absence of rotation, the swept-up volume due to rotation is always zero.

Two factors are directly relevant when considering spot partiality on still shots. First, due to crystal imperfection (Nave, 1998[Nave, C. (1998). Acta Cryst. D54, 848-853.]; Bellamy et al., 2000[Bellamy, H. D., Snell, E. H., Lovelace, J., Pokross, M. & Borgstahl, G. E. O. (2000). Acta Cryst. D56, 986-995.]; Helliwell, 2005[Helliwell, J. R. (2005). Acta Cryst. D61, 793-798.]), the reciprocal lattice point itself is spread out into a finite volume, therefore it has a finite intersection area with the Ewald sphere, even though the swept-up volume is zero. Secondly, due to the dispersion and divergence of the beam, one must consider a family of Ewald spheres of different radii (to account for dispersion; Hattne et al., 2014[Hattne, J. et al. (2014). Nat. Methods, 11, 545-548.]) and radius vector direction (to account for divergence). This Ewald sphere degeneracy does sweep out a volume of the reciprocal lattice point, as has been discussed (White, 2014[White, T. A. (2014). Philos. Trans. R. Soc. London B, 369, 20130330.]). In this paper, the focus is exclusively on the component of partiality due to crystal imperfection, as it seems a reasonable starting point. Many still datasets are acquired on endstations with negligible divergence, such as the LCLS/CXI 1 µm focus. While beam dispersion has been large (∼0.5%) for many XFEL datasets (Emma et al., 2010[Emma, P. et al. (2010). Nat. Photon. 4, 641-647.]), it is also possible to acquire stills at synchrotron sources where the energy bandpass [\Delta{E}/E] is potentially less then 10−4, and it is now possible to create seeded XFEL beams with similarly narrow bandpasses (Amann et al., 2012[Amann, J. et al. (2012). Nat. Photon. 6, 693-698.]).

Therefore, a correction for partiality based on a monochromatic zero-divergence model is described in equation (6)[link]. Partiality is related to the finite width of the reciprocal lattice spot, due to the underlying mosaic disorder in the crystal that is modeled by two parameters: an effective mosaic block size Deff and an effective full-width mosaic angular spread [\eta_{\rm{eff}}]. Intensity measurements are corrected for partiality in combination with scaling and postrefinement [equations (8)[link] and (9)[link]]. Despite the simple assumption of monochromaticity, this treatment notably improves the XFEL thermolysin data, which were collected with a non-monochromatic source (Table 1[link], compare protocols 6 and 7POST). The multiplicity of observation decreases, due to many reciprocal lattice points being classified as lying too far from the Ewald sphere, thus discarding a set of measurements that have no signal. The crystallographic R-factors improve, and the significance level of the anomalous difference Fourier peak for the Zn increases from 5.8σ to 7.4σ. These effects depend on performing postrefinement [equation (9)[link]] to determine the optimal crystal orientation for calculating partiality; no improvement is observed unless the partiality correction is combined with postrefinement (data not shown).

Statistics indicating the quality of the merged structure factors (Padilla & Yeates, 2003[Padilla, J. E. & Yeates, T. O. (2003). Acta Cryst. D59, 1124-1130.]) also show that the partiality correction (with postrefinement) alters the intensity distribution so as to conform better with theoretical expectation (Table 1[link] and Fig. 5[link]). Synchrotron datasets have long been judged by their structure factor intensity distributions (Wilson, 1949[Wilson, A. J. C. (1949). Acta Cryst. 2, 318-321.]; French & Wilson, 1978[French, S. & Wilson, K. (1978). Acta Cryst. A34, 517-525.]; Stein, 2007[Stein, N. (2007). CCP4 Newsl. 47, contribution 9.]). It would be useful if such metrics could also be applied to judge the quality of XFEL data. However, the present comparison shows that distributions of the L and Z statistics (defined in Fig. 5[link]) are highly dependent on the data processing procedures, and that, while accounting for partiality helps, the optimal protocol has not yet been achieved. One straightforward avenue for improvement would be to incorporate known spectral dispersion information into the partiality calculation. XFEL pulses, in particular the self-amplified stimulated emission pulses (SASE) in ordinary use, have complex and stochastic spectra, but it has been possible to measure these spectra on a shot-by-shot basis (Zhu et al., 2012[Zhu, D., Cammarata, M., Feldkamp, J. M., Fritz, D. M., Hastings, J. B., Lee, S., Lemke, H. T., Robert, A., Turner, J. L. & Feng, Y. (2012). Appl. Phys. Lett. 101, 034103.]). For future datasets where the incident spectra I0(E) are routinely available, one could perform a weighted summation over the entire bandpass to obtain the polychromatic partiality,

[P_{\rm{polychromatic}}= {{ \textstyle\sum{I}_0(E)\,P\left(r_{h,E}\right)\Delta{E} }\over{ \textstyle\sum{I}_0(E)\,\Delta{E} }},\eqno(10)]

where the summations are performed over all energy increments [\Delta{E}] within the measured spectrum, and the functional dependence of P(rh,E) is explicitly stated to emphasize that the Ewald-sphere distances rh are dependent on energy. Spectral measurements are not available for the thermolysin data presented here; however, other datasets that are linked to spectral information are under investigation.

[Figure 5]
Figure 5
Data quality statistics for the merged structure factor intensities from thermolysin. (a, b, c) Cumulative distribution function N(L) of the local statistic: L = (I1-I2)/(I1+I2), where I1 and I2 are unrelated intensities (Padilla & Yeates, 2003[Padilla, J. E. & Yeates, T. O. (2003). Acta Cryst. D59, 1124-1130.]). (d, e, f) Cumulative distribution function N(z), where z = I/〈I〉. Identical data were processed with the protocols listed in Table 1[link]: (a, d) protocol 4, lattice model is not restrained against proximity to the Ewald sphere; (b, e) protocols 6 and 6F, proximity restraints are applied, with and without a separate resolution cutoff for each lattice; and (c, f) protocols 7POST and 7F,POST, which are the same as protocols 6 and 6F except that crystal orientation is postrefined to maximize agreement with a set of reference intensities as described in the text. Agreement between the merged intensities (thick lines) and the theoretical distribution (thin lines) demonstrates that such statistics offer useful metrics for evaluating different processing protocols, with the postrefined model giving the best agreement with theoretical expectation.

4. Conclusions

To the knowledge of the author, this is the first literature presentation of experimentally measured XFEL still-shot diffraction data that are explicitly corrected for partiality (albeit with the simplified assumption of monochromaticity), and modeled with a lattice that is oriented by postrefinement. Equation (6)[link], the expression for still-shot partiality, is similar to equation (40) in a recent paper from Kabsch (2014[Kabsch, W. (2014). Acta Cryst. D70, 2204-2216.]), in that both rest on the assumption of monochromaticity. However, the Kabsch paper does not include the effect of mosaic block size [equation (2)[link]], which makes a resolution-independent contribution to the size of reciprocal lattice points, necessary for optimal modeling of still data (Sauter et al., 2014[Sauter, N. K., Hattne, J., Brewster, A. S., Echols, N., Zwart, P. H. & Adams, P. D. (2014). Acta Cryst. D70, 3299-3309.]) if the block size is small. The equation (6)[link] approach differs substantially from that used by White (2014[White, T. A. (2014). Philos. Trans. R. Soc. London B, 369, 20130330.]), as that paper defines partiality in terms of the fractional immersion of a reciprocal lattice point between two Ewald spheres of different wavelengths, representing the high- and low-energy limits of the XFEL spectrum.

While no attempt is made here to comparatively evaluate these three partiality and postrefinement methodologies, it is clear that, as a general principle, algorithm choices must rely on objective metrics that measure the quality of the result. Examples of data processing quality metrics include the r.m.s. displacement between observed and modeled Bragg spot positions (and its resolution dependence), statistics that rely on the moments of the intensity distribution (Stein, 2007[Stein, N. (2007). CCP4 Newsl. 47, contribution 9.]), local L-statistics (Padilla & Yeates, 2003[Padilla, J. E. & Yeates, T. O. (2003). Acta Cryst. D59, 1124-1130.]), crystallographic R-factors, and the height of anomalous difference Fourier peaks for metal sites.

Thermolysin is an informative case for testing the potential of still-shot crystallography. It is possible to phase the structure with synchrotron data using SAD phasing, from the single Zn metal site (Ferrer et al., 2013[Ferrer, J.-L., Larive, N. A., Bowler, M. W. & Nurizzo, D. (2013). Exp. Opin. Drug. Discov. 8, 835-847.]). However, the best XFEL thermolysin data (giving an 18σ anomalous difference Fourier peak out to 1.8 Å resolution) falls short of the phasing power needed for a SAD structure solution (Kern et al., 2014[Kern, J. et al. (2014). Nat. Commun. 5, 4371.]). Only a single SAD-phased XFEL structure has been published (of lysozyme; Barends et al., 2013b[Barends, T. R. M., Foucar, L., Botha, S., Doak, R. B., Shoeman, R. L., Nass, K., Koglin, J. E., Williams, G. J., Boutet, S., Messerschmidt, M. & Schlichting, I. (2013b). Nature (London), 505, 244-247.]), yet the usefulness of XFEL techniques may depend on whether they can be utilized generally to solve new macromolecular structures, and gain high-resolution information on systems that would otherwise be damaged at synchrotron sources. Data processing strategies that help correct specific issues such as partial measurements and the heterogeneous distribution of resolution limits will hopefully lead to more favorable structural outcomes.

5. Software availability

The partiality correction and postrefinement procedures described here are incorporated into cctbx.xfel (https://cci.lbl.gov/xfel ) and are available as a command line option in the cxi.merge program component.

Supporting information


Footnotes

1The earlier paper (Sauter et al., 2014[Sauter, N. K., Hattne, J., Brewster, A. S., Echols, N., Zwart, P. H. & Adams, P. D. (2014). Acta Cryst. D70, 3299-3309.]) treats [\eta_{\rm{eff}}] as a rotational spread about the origin O of reciprocal space, causing rlpQ to be shaped as a spherical cap of radius 1/d and maximum half-angle [\eta_{\rm{eff}}/2]. Here, in contrast, equation (3)[link] implies that rlpQ is shaped like a ball, not a spherical cap. This is chosen only to simplify the derivation of partiality, not to indicate a preference for a particular physical model (Nave, 1998[Nave, C. (1998). Acta Cryst. D54, 848-853.]; Juers et al., 2007[Juers, D. H., Lovelace, J., Bellamy, H. D., Snell, E. H., Matthews, B. W. & Borgstahl, G. E. O. (2007). Acta Cryst. D63, 1139-1153.]) to describe the crystal.

2Scaling with a set of reference intensities has been used in more traditional crystallographic settings to extract weak signal from long wavelength anomalous diffraction experiments (Mueller-Dieckmann et al., 2004[Mueller-Dieckmann, C., Polentarutti, M., Djinovic Carugo, K., Panjikar, S., Tucker, P. A. & Weiss, M. S. (2004). Acta Cryst. D60, 28-38.]). Moreover, it can be shown with XFEL data that a scaling reference introduces no intensity bias, by scaling XFEL lysozyme measurements (CXIDB accession ID 17) against an isomorphous lysozyme structure containing the alanine truncation mutant E35A (PDB entry 3ok0 ). After molecular replacement (with the correct structure 4et8 used as the search model) followed by refinement, the likelihood-weighted electron density map shows perfectly normal signal for glutamic acid 35, proving that the false scaling model does not distort the information content of the experimental intensities. Furthermore, if the 3ok0 structure is used for phasing instead of 4et8 , one sees positive difference density for the glutamate side chain at 4 standard deviations, indicating that the intensities contain sufficient signal to overcome the phase bias introduced by the incorrect 3ok0 phasing model.

3Supporting information for this paper is available from the IUCr electronic archives (Reference: XH5046 ).

Acknowledgements

I thank James M. Holton (Lawrence Berkeley National Laboratory) for making available both the PSI simulated data and the program fastBragg (https://bl831.als.lbl.gov/~jamesh/fastBragg ), and for suggesting the functional form of the partiality correction, Peter Zwart and Paul Adams (LBNL) for technical discussions, and Helen Ginn and David Stuart (Oxford University), as well as Monarin Uervirojnangkoorn, William Weis and Axel Brunger (Stanford University) for discussing their separate work on partiality and postrefinement. This work was supported by NIH grants GM095887 and GM102520 and Director, Office of Science, Department of Energy (DOE) under contract DE-AC02-05CH11231 for data processing methods (NKS).

References

First citationAdams, P. D., Afonine, P. V., Bunkóczi, G., Chen, V. B., Davis, I. W., Echols, N., Headd, J. J., Hung, L.-W., Kapral, G. J., Grosse-Kunstleve, R. W., McCoy, A. J., Moriarty, N. W., Oeffner, R., Read, R. J., Richardson, D. C., Richardson, J. S., Terwilliger, T. C. & Zwart, P. H. (2010). Acta Cryst. D66, 213–221.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationAfonine, P. V., Grosse-Kunstleve, R. W., Echols, N., Headd, J. J., Moriarty, N. W., Mustyakimov, M., Terwilliger, T. C., Urzhumtsev, A., Zwart, P. H. & Adams, P. D. (2012). Acta Cryst. D68, 352–367.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationAlonso-Mori, R. et al. (2012). Proc. Natl Acad. Sci. USA, 109, 19103–19107.  Web of Science CAS PubMed Google Scholar
First citationAmann, J. et al. (2012). Nat. Photon. 6, 693–698.  Web of Science CrossRef CAS Google Scholar
First citationBarends, T. R. M. et al. (2013a). Acta Cryst. D69, 838–842.  Web of Science CrossRef IUCr Journals Google Scholar
First citationBarends, T. R. M., Foucar, L., Botha, S., Doak, R. B., Shoeman, R. L., Nass, K., Koglin, J. E., Williams, G. J., Boutet, S., Messerschmidt, M. & Schlichting, I. (2013b). Nature (London), 505, 244–247.  Web of Science CrossRef PubMed Google Scholar
First citationBellamy, H. D., Snell, E. H., Lovelace, J., Pokross, M. & Borgstahl, G. E. O. (2000). Acta Cryst. D56, 986–995.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationBolotovsky, R., Steller, I. & Rossmann, M. G. (1998). J. Appl. Cryst. 31, 708–717.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationBorek, D., Minor, W. & Otwinowski, Z. (2003). Acta Cryst. D59, 2031–2038.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationBoutet, S. & Williams, G. J. (2010). New J. Phys. 12, 035024.  Web of Science CrossRef Google Scholar
First citationBoutet, S. et al. (2012). Science, 337, 362–364.  CrossRef CAS PubMed Google Scholar
First citationBragg, W. L., James, R. W. & Bosanquet, C. H. (1921). Philos. Mag. Ser. 6, 41, 309–337.  Google Scholar
First citationBusing, W. R. & Levy, H. A. (1967). Acta Cryst. 22, 457–464.  CrossRef IUCr Journals Web of Science Google Scholar
First citationChapman, H. N. et al. (2011). Nature (London), 470, 73–77.  Web of Science CrossRef CAS PubMed Google Scholar
First citationChen, V. B., Arendall, W. B. III, Headd, J. J., Keedy, D. A., Immormino, R. M., Kapral, G. J., Murray, L. W., Richardson, J. S. & Richardson, D. C. (2010). Acta Cryst. D66, 12–21.  Web of Science CrossRef IUCr Journals Google Scholar
First citationCohen, A. E. et al. (2014). Proc. Natl Acad. Sci. USA, 111, 17122–17127.  Web of Science CrossRef CAS PubMed Google Scholar
First citationDiederichs, K. & Karplus, P. A. (2013). Acta Cryst. D69, 1215–1222.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationEmma, P. et al. (2010). Nat. Photon. 4, 641–647.  Web of Science CrossRef CAS Google Scholar
First citationEmsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. (2010). Acta Cryst. D66, 486–501.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationFerrer, J.-L., Larive, N. A., Bowler, M. W. & Nurizzo, D. (2013). Exp. Opin. Drug. Discov. 8, 835–847.  Web of Science CrossRef CAS Google Scholar
First citationFox, G. C. & Holmes, K. C. (1966). Acta Cryst. 20, 886–891.  CrossRef CAS IUCr Journals Web of Science Google Scholar
First citationFrench, S. & Wilson, K. (1978). Acta Cryst. A34, 517–525.  CrossRef CAS IUCr Journals Web of Science Google Scholar
First citationHamilton, W. C., Rollett, J. S. & Sparks, R. A. (1965). Acta Cryst. 18, 129–130.  CrossRef IUCr Journals Web of Science Google Scholar
First citationHart, P. et al. (2012). Proc. SPIE, 8504, 85040C.  CrossRef Google Scholar
First citationHattne, J. et al. (2014). Nat. Methods, 11, 545–548.  Web of Science CrossRef CAS PubMed Google Scholar
First citationHelliwell, J. R. (2005). Acta Cryst. D61, 793–798.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationHülsen, G., Brönnimann, C. & Eikenberry, E. F. (2005). Nucl. Instrum. Methods Phys. Res. A, 548, 540–554.  Google Scholar
First citationJuers, D. H., Lovelace, J., Bellamy, H. D., Snell, E. H., Matthews, B. W. & Borgstahl, G. E. O. (2007). Acta Cryst. D63, 1139–1153.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationKabsch, W. (2014). Acta Cryst. D70, 2204–2216.  Web of Science CrossRef IUCr Journals Google Scholar
First citationKarplus, P. A. & Diederichs, K. (2012). Science, 336, 1030–1033.  Web of Science CrossRef CAS PubMed Google Scholar
First citationKern, J. et al. (2012). Proc. Natl Acad. Sci. USA, 109, 9721–9726.  Web of Science CrossRef CAS PubMed Google Scholar
First citationKern, J. et al. (2013). Science, 340, 491–495.  Web of Science CrossRef CAS PubMed Google Scholar
First citationKern, J. et al. (2014). Nat. Commun. 5, 4371.  Web of Science CrossRef PubMed Google Scholar
First citationKirian, R. A., Wang, X., Weierstall, U., Schmidt, K. E., Spence, J. C. H., Hunter, M., Fromme, P., White, T., Chapman, H. N. & Holton, J. (2010). Opt. Express, 18, 5713–5723.  Web of Science CrossRef PubMed Google Scholar
First citationKirian, R. A., White, T. A., Holton, J. M., Chapman, H. N., Fromme, P., Barty, A., Lomb, L., Aquila, A., Maia, F. R. N. C., Martin, A. V., Fromme, R., Wang, X., Hunter, M. S., Schmidt, K. E. & Spence, J. C. H. (2011). Acta Cryst. A67, 131–140.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationLiu, W. et al. (2013). Science, 342, 1521–1524.  Web of Science CrossRef CAS PubMed Google Scholar
First citationLomb, L. et al. (2011). Phys. Rev. B, 84, 214111.  Web of Science CrossRef Google Scholar
First citationMcCoy, A. J., Grosse-Kunstleve, R. W., Adams, P. D., Winn, M. D., Storoni, L. C. & Read, R. J. (2007). J. Appl. Cryst. 40, 658–674.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationMueller-Dieckmann, C., Polentarutti, M., Djinovic Carugo, K., Panjikar, S., Tucker, P. A. & Weiss, M. S. (2004). Acta Cryst. D60, 28–38.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationNave, C. (1998). Acta Cryst. D54, 848–853.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationNave, C. (2014). J. Synchrotron Rad. 21, 537–546.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationOwen, R. L., Rudiño-Piñera, E. & Garman, E. F. (2006). Proc. Natl Acad. Sci. USA, 103, 4912–4917.  Web of Science CrossRef PubMed CAS Google Scholar
First citationPadilla, J. E. & Yeates, T. O. (2003). Acta Cryst. D59, 1124–1130.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationRedecke, L. et al. (2013). Science, 339, 227–230.  Web of Science CrossRef CAS PubMed Google Scholar
First citationRossmann, M. G., Leslie, A. G. W., Abdel-Meguid, S. S. & Tsukihara, T. (1979). J. Appl. Cryst. 12, 570–581.  CrossRef CAS IUCr Journals Web of Science Google Scholar
First citationSauter, N. K., Grosse-Kunstleve, R. W. & Adams, P. D. (2006). J. Appl. Cryst. 39, 158–168.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationSauter, N. K., Hattne, J., Brewster, A. S., Echols, N., Zwart, P. H. & Adams, P. D. (2014). Acta Cryst. D70, 3299–3309.  Web of Science CrossRef IUCr Journals Google Scholar
First citationSawaya, M. R. et al. (2014). Proc. Natl Acad. Sci. USA, 111, 12769–12774.  Web of Science CrossRef CAS PubMed Google Scholar
First citationSierra, R. G. et al. (2012). Acta Cryst. D68, 1584–1587.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationStein, N. (2007). CCP4 Newsl. 47, contribution 9.  Google Scholar
First citationSuga, M., Akita, F., Hirata, K., Ueno, G., Murakami, H., Nakajima, Y., Shimizu, T., Yamashita, K., Yamamoto, M., Ago, H. & Shen, J.-R. (2015). Nature (London), 517, 99–103.  Web of Science CrossRef CAS PubMed Google Scholar
First citationTenboer, J. et al. (2014). Science, 346, 1242–1246.  Web of Science CrossRef CAS PubMed Google Scholar
First citationTerwilliger, T. C., Adams, P. D., Read, R. J., McCoy, A. J., Moriarty, N. W., Grosse-Kunstleve, R. W., Afonine, P. V., Zwart, P. H. & Hung, L.-W. (2009). Acta Cryst. D65, 582–601.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationTerwilliger, T. C., Grosse-Kunstleve, R. W., Afonine, P. V., Moriarty, N. W., Zwart, P. H., Hung, L.-W., Read, R. J. & Adams, P. D. (2008). Acta Cryst. D64, 61–69.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationWeierstall, U. et al. (2014). Nat. Commun. 5, 3309.  Web of Science CrossRef PubMed Google Scholar
First citationWeiss, M. S. (2001). J. Appl. Cryst. 34, 130–135.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationWhite, T. A. (2014). Philos. Trans. R. Soc. London B, 369, 20130330.  Web of Science CrossRef Google Scholar
First citationWilson, A. J. C. (1949). Acta Cryst. 2, 318–321.  CrossRef IUCr Journals Web of Science Google Scholar
First citationWinkler, F. K., Schutt, C. E. & Harrison, S. C. (1979). Acta Cryst. A35, 901–911.  CrossRef CAS IUCr Journals Web of Science Google Scholar
First citationYano, J., Kern, J., Irrgang, K. D., Latimer, M. J., Bergmann, U., Glatzel, P., Pushkar, Y., Biesiadka, J., Loll, B., Sauer, K., Messinger, J., Zouni, A. & Yachandra, V. K. (2005). Proc. Natl Acad. Sci. USA, 102, 12047–12052.  Web of Science CrossRef PubMed CAS Google Scholar
First citationZhang, Z., Sauter, N. K., van den Bedem, H., Snell, G. & Deacon, A. M. (2006). J. Appl. Cryst. 39, 112–119.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationZhu, D., Cammarata, M., Feldkamp, J. M., Fritz, D. M., Hastings, J. B., Lee, S., Lemke, H. T., Robert, A., Turner, J. L. & Feng, Y. (2012). Appl. Phys. Lett. 101, 034103.  Web of Science CrossRef Google Scholar
First citationZwart, P. H., Grosse-Kunstleve, R. W. & Adams, P. D. (2005). CCP4 Newsl. 43, 26–35.  Google Scholar

This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.

Journal logoJOURNAL OF
SYNCHROTRON
RADIATION
ISSN: 1600-5775
Volume 22| Part 2| March 2015| Pages 239-248
i
Follow J. Synchrotron Rad.
Sign up for e-alerts
Follow J. Synchrotron Rad. on Twitter
Follow us on facebook
Sign up for RSS feeds