Download citation
Acta Cryst. (2014). A70, C338
Download citation

link to html
The use of prior chemical knowledge such as bond lengths, bond angles about constituent blocks of macromolecules and ligands is an essential part of macromolecular crystal structure analysis. One of the reliable sources of such chemical knowledge is small molecule database where small molecule crystal structures have been analysed against high-resolution, high-quality experimental data. Furthermore, vast amount of data in small molecule database provide comprehensive coverage of flexible chemical environment and enable proper statistical analysis to avoid biased representation of those chemical properties. This presentation describes our work on organization of the data from open-access and daily-updated small molecule database, Crystallography Open Database(COD) [1], into a new generation of CCP4 monomer library (Dictionary), a container of prior chemical knowledge [2]. In order to describe specific environment atoms are in, they are classified into different atomic types based on local graphs and some basic chemical properties of atoms. This scheme can be applied to any small molecule databases. The atom types, and values of bond lengths and bond associated with them, are further clustered into a hierarchical tree and an isomorphism-mapping algorithm is implemented to facilitate fast search among a large number of atom types (typically several millions). This also provides a mechanism to derive reliable values for bond lengths and angles of novel ligands. Metal and non-organic atoms are treated differently with organic ones. The original data in COD are curated using several criteria and further statistical analysis on derived values of bond lengths and angles are allow to extract reliable chemical information from such databanks as COD. There are several software tools associated with new dictionary including 1) generate "ideal" bond lengths and angles for unknown ligand; 2) generate starting coordinates to represent one of the conformation of the ligand under consideration.

Download citation
Acta Cryst. (2014). A70, C1710
Download citation

link to html
Crystallography Open Database (COD, http://www.crystallography.net/) is the largest to date curated open-access collection of small to medium sized unit cell crystal structures [1,2]. Over 11 years of development, COD has accumulated over 1/4 million structures from the peer reviewed press and personal communications. COD has an automated data submission Web site, performs routine automatic quality checks on all incoming structures and is now recommended as a database for crystallographic deposition by several scientific journals. To facilitate automatic use and discoverability of COD data, and to increase usefulness of our database for chemists, two steps were undertaken. COD was now supplemented with software and data from the CrystalEye data aggregator. The new software permits extracting chemical data and presenting them as structural formula, unique moieties, and chemically significant fragments. We have also implemented search of crystal structures by the structural chemical formulae of the target compounds. The search is first of all performed among 70 000 hand-curated chemical structure descriptors, and can be extended to automatically generated descriptors. To facilitate data curation, a new software platform for data review is being developed. All COD structures will be evaluated using statistical distributions of observed geometrical and chemical properties (bond lengths, angles, dihedrals, planarities). The most statistically unusual structures will be forwarded to a COD reviewer Internet forum, where qualified reviewers will be asked whether they find provided evidence for a particular structure convincing or not. In this way, a set of human review indicators (convincing/unconvincing) will be available along with the match against the bulk of data (usual structure/unusual). Such indicators would be especially useful for deciding which COD records require special attention and which subsets of COD should be selected for reliable scientific inferences.

Download citation
Acta Cryst. (2014). A70, C1736
Download citation

link to html
As computational chemistry methods enjoy unprecedented growth, computer power increases and price/performance ratio drops, a large number of crystal structures can today be refined and their properties computed using modern theoretical approaches, such as DFT, post-HF, QM/MM, MCMM methods. Availability of several open source codes for computational and quantum chemistry and open-access crystallographic databases enables large scale computations of material structures and properties. We thus increasingly feel that an open collection of theoretically computed chemical structures would be a valuable resource for the scientific community. To address this need, we have launched a Theoretical Crystallography Open Database (TCOD, http://www.crystallography.net/tcod/). The TCOD sets a goal to collect a comprehensive set of computed crystal structures that would be made available under an Open Data license and invites all scientists to deposit their published results or pre-publication data. Accompanied with a large set of experimental structures in the COD database [1], the TCOD opens immediate possibilities for experimental and theoretical data cross-validation. To ensure high quality of deposited data, TCOD offers ontologies in a form of CIF [2] dictionaries that describe parameters of computed chemical and crystal structures, and an automated pipeline that checks each submitted structure against a set of community-specified criteria for convergence, computation quality and reproducibility. The scope of TCOD and validation tools make TCOD a high-quality, comprehensive theoretical structure database, useful in a broad range of disciplines. First-principle calculations are also of huge interest to simulate physical properties, either prospectively or for materials that do not grow as sufficiently large crystals. The property results can now be tested against the Material Properties Open Database [3] (http://www.materialproperties.org/) to ameliorate the used models.
Follow Acta Cryst. A
Sign up for e-alerts
Follow Acta Cryst. on Twitter
Follow us on facebook
Sign up for RSS feeds