Import smiles only displays one atom

Hi! I’m trying to import my molecule into Avogadro v2 on Mac OS and after loading “Generating 3D Molecule” for a few minutes, it only displays a single carbon atom. I have checked the SMILES of the molecule before pasting, and it seems fine. Are there any suggested fixes? In addition, I’ve tried constructing my molecule in Avogadro v1 (with the orange logo) and it displays it, but my progress is often erased by random crashes.

Looks like it hasn’t actually generated the geometry (i.e., all the atoms have 0,0,0 as the coordinates)

  • What version of Avogadro2 are you using?
  • What’s the SMILES?

I’m using the one that’s currently available on the website for MacOS. And the SMILES is: C[C@@H]1CC[C@@H](C)CCC[C@@H](C)CCC[C@@H](C)CCC[C@@H](C)CCOC[C@@H](CO[C@@H]2O[C@H](CO)[C@@H](O)[C@H](O)[C@H]2O)OCC[C@H](C)CCC[C@H](C)CCC[C@H](C)CCC[C@H](C)CC[C@@H](C)CCC[C@@H](C)CCC[C@@H](C)CCCC(C)CCOC[C@@H](CO[C@H]3[C@H](O)[C@@H](O)[C@H](O)[C@@H](CO)O3)OCC[C@H](C)CCC[C@H](C4)CCC4[C@H](C)CCC1

@lipidsolidarity To some part, I can replicate your findings. Running a somewhat elder instance of Windows (8GB RAM, Intel i5-8250U CPU/8th generation), the nightly built for Windows/64bit either equally displayed only one carbon atom, or (attempting an optimization of the geometry) snapped. I speculate the number of atoms and bonds to wiggle to identify one reasonable conformer contributes to the issue here.

There a couple of bypasses one can consider:

  • pick a recent release of openbabel to convert the SMILES string for instance into a .sdf file then read by Avogadro. In principle, it is up to you if you do this from the command line, or the GUI version (obgui is the command to type from the command line, e.g. in Linux Debian). For either path, the successful conversion required the generation of a V3000 dialect of the .sdf file to be written.

    After storing the SMILES string into file, on the CML, I run

    obabel test.smi -h --gen3d -O 3d_x3.sdf -x3
    

    for the addition of (explicit) hydrogen atoms, the generation of a conformer (--gen3d, see here for its levels) for a record 3d_x3.sdf. Flag -x3 instructs to write the more modern syntax (see reference).

    In case of GUI obgui, after setting input format to .smi and indicating the input file on the left hand side, and .sdf mdl mop format on the right hand side, my adjustments in the central column were: i) add hydrogens (make explicit), ii) generate 2d coordinates, iii) generate 3d coordinates, iv) output V3000 not V2000 (used for > 999 atoms/bonds).

    I’m not sure if the number of atoms plus the number of bonds add up to the threshold of 999. On the other hand, given the flexibility of the molecule, I’m not too surprised the (intermediate) generation of a 2D structure followed by the generation of a 3D structure was helpful here. I didn’t venture out the systematic exploration of conformers (see e.g., here). The .sdf file is readable to Avogadro2.

  • As second (installation free) option I opened the test page of Marvin JS, clicked on the folder symbol, pasted the SMILES code and subsequently launched the 3D optimization (the button with the 3-axial coordinate system). A click on the floppy-disk symbol allows to select a (structural) export, e.g. as .sdf (see marvinjs_untitled_file.sdf) though in Avogadro you likely wish to add explicit hydrogen atoms via the build menu (see marvinjs_untitled_file_avogadro-H.sdf).

The archive below contains files eventually of interest, as well as auxiliary/intermediate ones.

test_run.gz (189.7 KB)

Yeah, I can replicate it. It’s clearly an issue in Open Babel’s coordinate generation code.

obabel lipidsolidarity.smi -O test.sdf --gen3d
==============================
*** Open Babel Error  in Do
  3D coordinate generation failed
==============================

The trick with such large macrocycles is that Open Babel uses template matching for coordinate generation. Nothing is that large, so it falls back to distance geometry (like RDKit). Unfortunately, there are a large number of stereocenters too, so it eventually times out.

If you’re doing this a lot in the near future, I’d suggest using RDKit for the coordinate generation. Greg and Prof. Dr. Sereina Riniker have done a really nice job with improving macrocycle generation.