Heuristic for Reading CIF / Crystal Structures?

Hi,

As some of you know, in the long-awaited v1.1 release, there’s significantly improved solid-state and CIF support. This is, in no small part, due to work by David Lonie and his crystallography extension.

To further improve, I’m thinking of a heuristic when reading CIF (and other) crystallography files to separate between “molecular crystals” and solid-state crystals.

  • For molecular crystals the user often (but not always) wants the primitive unit cell – just an isolated molecule. To fill out the entire cell, it’s OK to have a separate command.
  • For solid-state crystals, the user usually wants to see the entire unit cell, including all symmetry-defined positions.

Now this type of heuristic seems to be in use by other programs like Mercury (from the CCDC) and CrystalMaker, IIRC.

I’m proposing that when Avogadro reads a file (through File -> Open) it decides if it’s likely a molecular crystal if:

  • There’s a unit cell (i.e., it’s a crystal)
  • It has at least one carbon, AND at least one hydrogen in the primitive cell
  • OR it has at least five carbons (e.g., benzene, pyridine, etc. with no hydrogens defined) in the primitive cell

I’m open to suggestions – I’d like to include as many molecular crystals without picking up carbonates or carbides.

Thoughts?
-Geoff


Prof. Geoffrey Hutchison
Department of Chemistry
University of Pittsburgh
tel: (412) 648-0492
email: geoff.hutchison@gmail.com
web: http://hutchison.chem.pitt.edu/

On Mon, Nov 7, 2011 at 12:05 PM, Geoffrey Hutchison geoff.hutchison@gmail.com wrote:

I’m proposing that when Avogadro reads a file (through File → Open) it decides if it’s likely a molecular crystal if:

  • There’s a unit cell (i.e., it’s a crystal)
  • It has at least one carbon, AND at least one hydrogen in the primitive cell
  • OR it has at least five carbons (e.g., benzene, pyridine, etc. with no hydrogens defined) in the primitive cell

I’m open to suggestions – I’d like to include as many molecular crystals without picking up carbonates or carbides.

I see what you’re saying. Perhaps we could screen for elements besides
C,H,N,O,S – If all atoms are common organic elements, it is probably
just a molecular packing and we can stick with the primitive
structure, but if there are, say, transition metals, present then the
full cell is probably of interest.

I should also point out that I don’t work with molecular crystals very
much, so my use-cases are imaginary :wink:

Dave

On Nov 7, 2011, at 12:45 PM, David Lonie wrote:

I’m open to suggestions – I’d like to include as many molecular crystals without picking up carbonates or carbides.

I see what you’re saying. Perhaps we could screen for elements besides
C,H,N,O,S – If all atoms are common organic elements, it is probably
just a molecular packing and we can stick with the primitive
structure, but if there are, say, transition metals, present then the
full cell is probably of interest.

That doesn’t work so well. Consider ferrocene, or any organometallic (molecular) catalyst. This is why I recommend a certain number of carbons in the primitive cell → probably a molecular crystal.

-Geoff

On Mon, Nov 7, 2011 at 12:50 PM, Geoffrey Hutchison
geoff.hutchison@gmail.com wrote:

On Nov 7, 2011, at 12:45 PM, David Lonie wrote:

I’m open to suggestions – I’d like to include as many molecular crystals without picking up carbonates or carbides.

I see what you’re saying. Perhaps we could screen for elements besides
C,H,N,O,S – If all atoms are common organic elements, it is probably
just a molecular packing and we can stick with the primitive
structure, but if there are, say, transition metals, present then the
full cell is probably of interest.

That doesn’t work so well. Consider ferrocene, or any organometallic (molecular) catalyst. This is why I recommend a certain number of carbons in the primitive cell → probably a molecular crystal.

I guess my question is: when is the full cell of interest, and when is
unfilled cell desired? The way I’m thinking about it, if it is just a
simple organic molecule then the unfilled cell will suffice. If
there’s anything else going on besides VDW/H-bonding in a simple
organic crystal, then I’d want to see the full structure (this
includes organometallics).

As you mention, carbonates etc will pose a problem otherwise. I’d
rather make the mistake of filling a cell when the unfilled cell is
sufficient then the other way around.

Is the problem with a filled molecular crystal cell just that it’s
difficult to see what’s going on (too much info)? If so, I think that
some of the visualization tricks I’ve been working on for
intercellular bonding may be a better solution, instead of presenting
an unfilled cell. (I have plans to add improved intercell bond
visualization in the coming months. I’ll post an update once I have
some stable code.)

Dave

Is the problem with a filled molecular crystal cell just that it’s
difficult to see what’s going on (too much info)?

That’s part of it, but also users often just want “the experimentally-derived” molecular structure. So maybe I synthesize a compound, get the x-ray structure and a CIF. I don’t want the full unit cell – I probably just want the molecule to paste into a manuscript or presentation, or maybe to submit to Gaussian.

So there really are two different use cases – I’m opening a file because it has x-ray defined coordinates of my molecule VS I’m opening a crystal and want to see the whole unit cell.

Thus the need for a decent heuristic. It won’t always work – I might open benzene and want the whole unit cell, or I might want the primitive coordinates for YBCO. Fortunately, we have ways of handling that already. I just want the default experience to be good.

Hope that makes sense.

-Geoff

On Tue, Nov 8, 2011 at 3:40 PM, Geoffrey Hutchison
geoff.hutchison@gmail.com wrote:

Is the problem with a filled molecular crystal cell just that it’s
difficult to see what’s going on (too much info)?

That’s part of it, but also users often just want “the experimentally-derived” molecular structure. So maybe I synthesize a compound, get the x-ray structure and a CIF. I don’t want the full unit cell – I probably just want the molecule to paste into a manuscript or presentation, or maybe to submit to Gaussian.

So there really are two different use cases – I’m opening a file because it has x-ray defined coordinates of my molecule VS I’m opening a crystal and want to see the whole unit cell.

Thus the need for a decent heuristic. It won’t always work – I might open benzene and want the whole unit cell, or I might want the primitive coordinates for YBCO. Fortunately, we have ways of handling that already. I just want the default experience to be good.

Ok – that works for me. My biggest concern with not showing the full
unit cell is that, as Avogadro is educational software, students may
not realize that they’re only seeing part of the structure. So some
sort of notification may be useful here. A popup or slim dockwidget
with a “don’t show me again” checkbox informing the user that a
reduced representation is being shown would be a nice touch. I could
add some API to the crystallography extension that would return the
Fill Unit Cell QAction, so that we could even give the option of
filling the cell from the popup.

Dave

Hi all.

A brief comment.

The cif file contains only the asymmetric unit content. So, if a user wants to see “my molecule”, only part of it will be displayed
when opening a cif or res file in Avo, at present. This depends on the way connectivity is generated and is complicated by the fact
that it may well be an infinite structure. In other words, displaying a crystal without taking site symmetries into account will
produce incomplete coordinates for use in QM programs.

Louis

Le 10 nov. 2011 à 17:04, David Lonie a écrit :

On Tue, Nov 8, 2011 at 3:40 PM, Geoffrey Hutchison
geoff.hutchison@gmail.com wrote:

Is the problem with a filled molecular crystal cell just that it’s
difficult to see what’s going on (too much info)?

That’s part of it, but also users often just want “the experimentally-derived” molecular structure. So maybe I synthesize a compound, get the x-ray structure and a CIF. I don’t want the full unit cell – I probably just want the molecule to paste into a manuscript or presentation, or maybe to submit to Gaussian.

So there really are two different use cases – I’m opening a file because it has x-ray defined coordinates of my molecule VS I’m opening a crystal and want to see the whole unit cell.

Thus the need for a decent heuristic. It won’t always work – I might open benzene and want the whole unit cell, or I might want the primitive coordinates for YBCO. Fortunately, we have ways of handling that already. I just want the default experience to be good.

Ok – that works for me. My biggest concern with not showing the full
unit cell is that, as Avogadro is educational software, students may
not realize that they’re only seeing part of the structure. So some
sort of notification may be useful here. A popup or slim dockwidget
with a “don’t show me again” checkbox informing the user that a
reduced representation is being shown would be a nice touch. I could
add some API to the crystallography extension that would return the
Fill Unit Cell QAction, so that we could even give the option of
filling the cell from the popup.

Dave


RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1


Avogadro-devel mailing list
Avogadro-devel@lists.sourceforge.net
avogadro-devel List Signup and Options