Error reading crystal structure (.cif)

I have never used crystal structures before and was about to open up my first one with Avogadro. For some reason it is giving me this error:

Any thoughts as to what could be happening? (Also, .cif attachments are not allowed in the forum so I had to change the extension to .txt)

d3cp03720g2.txt (165.7 KB)

Hmm. I had noticed some similar bugs, and was due to a bug in Open Babel. I need to get a new release of Open Babel out eventually, but in the meantime, the Mac and Windows builds patch Open Babel as they build it.

Unfortunately, I can confirm this happens in my latest build. I’ll investigate soon.

OTOH, Open Babel is happy to convert to CJSON:

d3cp03720g2.cjson (10.5 KB)

I have an AppImage from yesterday and it fails there, but weirdly enough I have a build from yesterday using 1.103.0-163-g46aa60f that opens it just fine.

Hmm, that CJSON is missing a whole benzene+ from the crystal structure I see in the paper. You think its a problem with the original file or could it be a problem in OpenBabel?

Figure S9 from Comparing the structures and photophysical properties of two charge transfer co-crystals - Physical Chemistry Chemical Physics (RSC Publishing) .

I’d suggest opening in Mercury. It’s possibly a bug in Open Babel, but it’s also possible the benzene crosses a unit cell or something.

I think after 2.0, I want to evaluate a few other open source libraries for reading CIF directly into Avogadro. I was talking to @Thomas about this a bit. For example, there’s currently no great way to get partial occupancy information through Open Babel without a bunch of new architecture in Babel.

Whenever I run into an issue with a CIF file, I use CCDC Mercury since that’s the whole point. It can adjust packing, etc. if you want and save to a few formats.

@Juanes Your first post remains vague what your eventual intent of (now I speculate:) quantum chemical calculation is .and. what program you then want to use.

One of the problems «other» non-crystallography programs face when managing the information stored in a .cif is that they define i) a structure motif (at the lowest, irreducible level) which ii) by symmetry operators of the unit cell (keyword space groups, of which there are 230 conventional in the International Tables, some only with standard, other with additional non-conventional setting, for instance P2_1/c vs P2_1/a) «complete» the content of the unit cell. Note, the coordinate system in this parallelepiped can, yet need not be one of orthogonal axes (e.g., a cubic vs monoclinic Bravais lattice), nor that the unit vectors in the three directions need to be of equal length (e.g., cubic vs tetragonal Bravais lattice) while e.g., .mol and .xyz (implicitly?) use a Cartesian coordinate system, too. And iii), .cif use fractional coordinates along \vec{a}, \vec{b}, \vec{c} typically in the range of [0.0 \ldots 1.0] because an atom at [1.5,0,0] is equivalent to to one at [-0.5,0.0,0.0] or [0.5,0.0,0.0] but when establishing a crystallographic model it can be useful to use e.g., negative components of one set of atomic coordinates in favour to manage the whole crystallographic motif. And then, 1a is e.g., 3.458 A in one particular model, but 5.88 A in another.

Thus, reading a .cif by a non-crystallographic software can be misleading. Thus e.g., one TCNQ molecule (your second post, left hand illustration) appears «still incomplete». Equally note: a .cif file may, but _need not_define bonds, types of bonds, or bond order as you would find e.g., in the bond block of a .sdf file. If authors do so (e.g., small molecule crystallography/chemical crystallography), they do with a particular syntax managed by the IUCr, for instance with _chemical_conn_bond_type (record)

Side note about the illustration S9: the magenta colored spheres are centroids. Depending on the software at disposition, you may select a couple of real atoms, the program then computes the geometric centre of them. Centroids can facilitate the description of intermolecular distances; for interactions (let’s say \pi - \pi between adjacent arenes) you have to include at least relative orientation/angles, too.

What do you want to compute later on? Which program in mind? If Avogadro2 (and openbabel) struggle, use CCDC Mercury (there is a free community edition) for «the completion of molecules» and packing, Jmol with its interface to Gaussian (as an entry, see the tutorial in JApplChem by Hanson, doi:10.1107/S0021889810030256), or Vesta to mention a few GUIs out there. Presuming your are comfortable with the CLI, cod-tools by the COD (and repackaged by DebianChem) provide many helpful utilities like codcif2sdf, cif_fillcell etc., too.

There is no «one solution fits all purpose» here. Sometimes, you want to display that there is orientational/positionional; or occupational disorder. Then you may have a depiction with atoms spatially well resolved, e.g., terminal tBu groups

(image credit J. MoncoÄľ)

but sometimes there is partial overlap in this kind of (averaging) representation

(COD 4517098)

or occupational disorder where one site sometimes is used by one, other times by another atom type and you depict them as polychromic spheres:

(both illustrations in a recent J.Appl.Cryst about inorganic compounds, open access)


On the other hand, if you want to perform computations (or here: want Avogadro to prepare the input file), frequently enough you want to toggle/iterate through the sets of coordinates and export them as if each of them were the only non-disordered structure model not affected by fractional occupancies (e.g., Tonto & CrystalExplorer for the computation of Hirshfeld surfaces / promolecule electron densities).

Thank you both for the information. My eventual intent is to get the structure of the minimal unit cell to then run gas phase calculations on it (optimization, spectra) with standard quantum chemistry codes (Orca, QChem, etc.) and see how it compares with the experimental measurements. Basically trying to see how well we can do with simple gas phase calculations without resorting to periodic codes.

I’ll take a look at Mercury CCDC or, if that fails, some of the other programs you suggested.

1 Like

Ok I was able to get an .xyz file from Mercury - thank you again for the suggestion!

1 Like

To me, this reads like processing batches of structures. Though it is possible to drive a GUI like Mercury via a script (e.g., Al Sweigart’s pyautogui, even more so if you have accelerators/short cuts), it is more efficient to stay at level of the CLI, or a script. Use codcif2sdf of cod-tools to extract / reconstruct a .sdf for instance by

codcif2sdf input.cif > output.sdf

for at least one complete organic molecule per .cif. Read at least as: there can be multiple symmetry independent molecules in a .cif file (i.e., molecules which are not related with each other via the symmetry operators of the unit cell). codcif2sdf would (at least attempt to) reconstruct these independent molecules, too. Though cod in name, the Perl script equally works well with small molecule .cif e.g., by CCSD.

Prior to convert these structures (.mol/.sdf now in Cartesian coordinates) into an input format of quantum chemical programs (e.g., openbabel), openbabel equally

  • can split a multimolecule .mol file into multiple .mol with one molecule per file (flag -m)
  • return only largest contiguous molecule (flag -r) which is handy to remove smaller solvent molecules (e.g., crystal water), and salts’ ions
  • provides filter by its descriptors (preview by obabel -L descriptors) like MW (molar weight), where one can set intervals of interest, upper/lower thresholds, or negate a criterion (by a ~)
1 Like

You are correct, eventually I am looking to extract a large quantity of .cif files and transform them into .xyz file. Once I have a clean .xyz file I can use my own scripts to generate the quantum chemistry inputs. codcif2sdf / cod-tools sound very promising for the purpose. “Valga la redundancia”, as we say in Spanish: thank you!

1 Like