Algorithm for building molecules from SMILES

Hi, thanks for providing such great software!

I’m an ML researcher trying to integrate OpenBabel’s conformer generator into my pipeline. Specifically, I need to convert SMILES strings into 3D conformers where local structures (e.g., bond angles, lengths) closely match ground truth, while torsion angles are not critical at all.

I have a question about the BUILD -> INSERT -> SMILES workflow. It generates the desired local structures for the following molecules, but using OpenBabel directly via the following code results in poor local structures.

smiles_list = [
    "C(=O)(c1c(cc(cc1)N1C(=O)c2c3c(C1=O)c(cc1c4c5c6c(C(=O)N(C(=O)c6c(cc5c(c31)cc2)O)c1cc(c(C(=O)[O])cc1)F)cc4)O)F)[O]",
    "[H]C(=O)c1c(OC([H])([H])[H])c(N2C(=O)c3c(OC([H])([H])[H])c(OC([H])([H])[H])c4c5c(c(OC([H])([H])[H])c(OC([H])([H])[H])c(c35)C2=O)C(=O)N(c2c(OC([H])([H])[H])c(C([H])=O)c(C([O])=O)c(C([H])=O)c2OC([H])([H])[H])C4=O)c(OC([H])([H])[H])c(C([H])=O)c1C([O])=O"
]

# My openbabel code to generate 3D conformer
mol_pred = pybel.readstring("smi", smiles_list[0])
mol_pred.make3D(forcefield="mmff94", steps=100)

As I know of, Avogadro’s implementation is OpenBabel-based, so is there a way to replicate Avogadro’s results for better local structures?

Here are some results comparing the two approaches:

Thank you!

Seems like a better question to ask on the Open Babel forum. But regardless, the underlying Python code you’re calling is open, so it’s pretty easy to see:

The “make3D” code from pybel is only intended as a first step. Even if you don’t want to do a conformer search, you’ll want to use ConjugateGradients and more like 250 steps or so.

(Incidentally, this is why in Avogadro2 (which you’re not using) the optimization uses BFGS because it takes many fewer steps.)

Thank you for your reply.

My question is how Build -> Insert -> SMILES workflow of “Avogadro” has been implemented internally, so that I can replicate it with python code.

Could you please elaborate on how Avogadro2 makes use of BFGS? Is this optimization method limited to Avogadro2 only?

Thank you in advance.

I thought I was pretty clear. Since you’re using pybel start with:

mol_pred.make3D(steps=250)
mol_pred.localopt(steps=250)

The other difference in the structure you depict is that Avogadro does a quick conformer search, resulting in the more planar structure. You said you didn’t want to bother with that.

Yes. Avogadro2 has a completely separate optimization framework. It’s not a big deal though, you can get very similar results with Conjugate Gradients - it just takes more steps to get there.