Full and privileged xtb integration?

matterhorn103 · November 7, 2023, 7:58pm

Hi ghutchis,

Posted a github issue as well here, not sure where you prefer such things to be submitted.

Saw that you did significant work on adding an interface between Avogadro and various force field implementations, including the Python API for XTB. Currently this seems to use just the GFN-FF force field method, in alignment with the other options for optimization. This is fine, but since optimizations with the normal semi-empirical GFN2-xTB are also extremely fast, and the accuracy improvement over force fields is significant, it would be cool to include that. Likewise, it would be cool if the user could run CREST calculations directly from Avogadro, to find low energy conformers prior to doing DFT, in the same way various people I know use SPARTAN.

At the moment you seem to be using the xtb-python API. This is apparently then in turn limited to what is available in the xtb C API. Which I guess mean CREST and things is right out.

Also currently the user has to install python, and install xtb-python using conda, and set up their python environment, which is not something 95% of Avogadro users (lots of students) can even begin with, at least not without their hands being held throughout the process.

I have a suggestion.

Since xtb is completely open source and freely available, would it not be possible to simply package it with Avogadro? And then, on request by the user, run calculations using system calls? As in, complete native integration, as the privileged tool for optimization and calculation?

The license for xtb is LGPL/GPL, so if it was packaged as a dynamically-linked binary it should be ok with the Avogadro license, right? The binaries come to 87 MiB on my PC, which I guess is significant, but it can do a lot!

I suggest this because I find xtb to be a truly excellent tool for general use (as it seems you do too based on e.g. this thread), and I really think it would be great for chemical education and open-source chemistry if the use of it was widely accessible via a GUI, especially one as widely distributed as Avo.

Alternatively, I have some intermediate-level Python, including a little PyQt/PySide, but no experience with C++. Do you reckon making an optional add-on or plugin that can be downloaded and set up with one click, that included the xtb binary, would be a) technically feasible and b) within my capabilities to contribute?

Of course it would also be an option to have an interface to xtb go via the interface you built in for force fields in 1.98, but then it would have to not refer exclusively to force fields all the time!

I think effectively making Avogadro a portal to xtb for non-computational chemists would really give a sort of synergistic benefit to both tools.

ghutchis · November 7, 2023, 8:17pm

Sure.

It’s pretty easy to set up Python plugins and there’s already an xtb charge script, which grabs the electrostatics from GFN2.

It would not be hard to write a set of command scripts which send the current molecule to xtb or crest to optimize / etc. It might also be nice to get the molden or cube outputs for orbitals, FOD, etc.

I think I’d feel better about an optional download. For example, I’d like to see the Python download function install via requirements.txt or conda. I can also imagine implementing a feature for the “Download Extensions” that grabs, for example authenticated binaries from trusted sources, e.g.,

xtb-6.6.1-windows

I’m definitely open to feedback here. We’ve included yaehmop for example.

ghutchis · November 7, 2023, 8:22pm

By the way, I agree 110% … I think having a general-purpose fairly fast quantum method that works across most elements would be highly useful.

Happy to talk more about the best way to accomplish this. One general issue for the 2.0 release is that a variety of features require Python, which is generally installed on Linux / Mac / FreeBSD, etc. but not on Windows.

matterhorn103 · November 7, 2023, 8:58pm

Various other Windows applications (ChemOffice for example) install Python via the installation wizard. Would that be an option?

Of course on Linux you have the issue of different Python versions as well, the problems that come with installing Python packages system-wide as opposed to in virtual environments, and I’m not sure how Python functions within Flatpaks given the sandboxing (and I see Flatpak as critical for cross-distro support given the majority of work on Avogadro is done by a single person) or whether a Flatpak is allowed to execute terminal commands.

yaehmop is packaged with Avogadro?

With regards to how it was set up, I understand. I just know that most people never change anything from defaults or install any extensions unless there’s a need to, and since Avogadro includes methods to optimize geometries already, the discoverability for non-technical users would be poor if xtb integration required the user to actively install it themselves! So I wonder how that could be achieved while also leaving it as an optional extension. I will do some thinking about UI solutions, if you think the idea is at least a good one

ghutchis · November 7, 2023, 10:18pm

Probably? I don’t know enough about NSIS installers because I don’t use Windows.

Yes. At the moment, it’s used for crystal structures / band diagrams / DOS plots. There are a few molecular-related feature tickets:

The trick with integrating xtb (the command-line program) with the optimizer is latency. The idea with the New Force Field Framework 🎉 is that you set up the molecule, then calculate energies / gradients as needed.

In principal, you could repeatedly call xtb --grad but it’s definitely faster through xtb-python since the SCF is fast to re-compute for small differences in geometry.

I think the first step would be to consider what “full integration” means… what features, for example.

matterhorn103 · November 7, 2023, 11:21pm

By full integration I guess I just meant that xtb would either come with Avogadro or it is discoverable and installable with one click, that basic functionality is available via menu options, that Ctrl-Alt-O automatically uses xtb if configured to/xtb is detected, and that all functionality of xtb is available via some menu. The minimum steps and minimum friction possible for people who don’t know the slightest thing about comp chem. My personal experience is that the average chemist has a tendency to put greater significance on force field geometries (e.g. from the Chem3D functionality in ChemDraw) than they ought to, and making cheap quantum methods easily accessible is the only way to avoid that.

I figured the simplest implementation would be just to directly call xtb using an xtb input.xyz --opt command, with temp files dealt with and options fed automatically by Avogadro. Avo could just wait for the calc to finish, then read the final geometry. That surely minimizes latency because the optimization algorithm is handled by xtb, there’s no need for a constant back and forth.

Since the trajectory is printed in a nice parsable format to xtbopt.log, if it was desirable for the evolution of the optimization to be displayed, that file could simply be sampled for the latest geometry every second or 100 ms or whatever.

My feeling is that relying on an API is risky as it is likely to change significantly or break, or indeed be very dependent on the Python environment. Am I wrong in thinking that? What is the advantage of using the API as you have done for the force field implementation?

My vision was instead just to run xtb on the command line, but to automate away the actual use of the command line. So clicking “optimize with XTB” would just run xtb <current geometry> --opt with any appropriate settings for charge or multiplicity. Clicking “conformer search” would run the crest command with sensible defaults (with a warning that it can take a long time!). A menu option would be available to configure the options for these quick settings, just like is currently implemented for OpenBabel and Force Field calcs.

Another menu option for “run calculation…” would open up a dialog which would look and function essentially the same as the ones that open when the user clicks on Input > ORCA... i.e. with a button for “Run”, and the user would choose options via buttons and drop-down lists and Avo would pass that on as part of the xtb command, and – crucially – the actual command to be executed is displayed and customizable in a text field (just like with Input > ORCA...) so that literally any valid input option can be provided. This would mean a lower maintenance burden as essentially any xtb calculation could be run, even if it wasn’t yet nicely implemented with buttons.

So essentially the same as that implemented for the input file generators, but with the xtb code installed and set up by Avogadro so it requires zero user input (or at most a click on a “Get xtb” button).

(By the way, I think charge and multiplicity should be settable for a molecule by right-clicking the entry in the molecules pane, so that it can be added automatically to input files and the like, and so that the OpenBabel optimization works even when a radical is optimized, I’ll open another suggestion for that).

matterhorn103 · November 7, 2023, 11:33pm

I guess doing it like that would mean missing out on features of your force field framework though, such as the constraints. And code for various aspects would be duplicated or the same thing achieved differently for no real reason.

But maybe xtb is special/useful/powerful enough to deserve its own implementation similar to the OpenBabel integration.

Or maybe it makes more sense to have it go through the extensible force field framework. You know that better than I do!

Edit: xtb can handle constraints itself, though, so if the right options are fed to it, xtb’s optimization algorithm can be relied on rather than needing the back and forth.

matterhorn103 · November 8, 2023, 12:01am

In my head everything to do with running calcs would be taken out of the Extensions menu and put in its own Calculate menu, organized into sections and submenus as follows:

Calculate

Quick energy
Quick optimization
Change quick method...
Constrain...

Open Babel >
- Add hydrogens
- (etc., the same options as currently)

Force field >
- Energy
- Optimize
- Configure...

Semi-empirical >
- Energy
- Optimize
- Frequencies
- Conformer search
- Run calculation...
- Get xtb
- Configure...

Prepare input file >
- GAMESS
- Gaussian
- ORCA
- etc., everything that is currently in the Input menu

This way the user would get a nice overview of the different levels of theory available, and the optimize button is only two clicks away rather than three. Change quick method would just give a dialog box to choose from Open Babel, Force field and Semi-empirical in a drop-down, but with a Configure... button that would take the user to the configure menu for the individual method.

If xtb isn’t installed everything under semi-empirical would be greyed out except Get xtb.

Ideally the constraints set in Avogadro would permeate to all methods and input file generators!

ghutchis · November 8, 2023, 1:04am

BTW, ideas are most welcome - I appreciate the enthusiasm about this.

Would “Download Plugins…” => pick “xtb” count? That shouldn’t be too hard to do:

Add xtb binaries to the plugin repository itself, e.g. xtb-linux vs xtb-windows paths
Add something to the plugin.json that indicates binaries to install on download, e.g.

{
"install": { "linux": "https://github.com/grimme-lab/xtb/releases/download/v6.6.1/xtb-6.6.1-linux-x86_64.tar.xz",
               "win": "https://github.com/grimme-lab/xtb/releases/download/v6.6.1/xtb-6.6.1-windows-x86_64.zip"
               "mac": …
           }
}

As far as adding commands for optimizing, calculating orbitals (through an xtb --molden) … that’s easy. As I said, you could do that right now using the command scripts or a small amount of C++ for a new qtplugin bit.

It’s the constraints … but more importantly the interactive optimization through the “AutoOptimize” tool. It’s been very clear that people want that back. There’s no good way to do that with a different program’s optimizer. Thus the new framework … which was designed with AutoOptimize (and eventually molecular dynamics) in mind. It’s noticeably slower when calling xtb as a process for each geometry.

I know a lot of people would like GFN1 / GFN2 … or somewhat slower options for a general Optimize command (vs. interactive optimization). Still thinking about that.

Yes, constraints will be available through cjson and hopefully implemented into generators, etc.

I think any discussion about menu reorganization belongs in a separate thread. I’m open to ideas, but I don’t think I’d organize into subections like that. Consider … where do ML methods like ANI go? What about DFTB or Hotbit users? Molecular Dynamics? That’s one reason I pulled everything into an “Input” menu.

But it’s also possible that “Extensions” gets renamed as “Calculate” … as I said, I’m definitely open to ideas about menu reorganization (in a new thread )

JGrantHill · November 9, 2023, 9:54am

Just wanted to add that I think something along these lines would be an excellent addition and really useful (certainly in my group).

I’m pretty crunched for time, but if there’s something useful that I can help with then please let me know.

matterhorn103 · November 9, 2023, 10:54am

Would “Download Plugins…” => pick “xtb” count?

Sure, certainly for a first implementation. I think if it is just one of many options in plugins users won’t discover it without already knowing what xtb is. Ideas on how to improve discoverability of plugins/various methods can come later.

My vision was of xtb effectively being adopted as Avogadro’s native/default backend, hence my suggestions taking the form they did. I can of course appreciate why you would maybe not want to treat it differently to any other computational method though.

I’ll have a look to see if I can implement something along the lines of the suggestions you made I’m very keen to help with Avogadro development where I can, when I get time, as I think it’s an amazing bit of software with an important role to play.

It’s the constraints … but more importantly the interactive optimization through the “AutoOptimize” tool. It’s been very clear that people want that back. There’s no good way to do that with a different program’s optimizer. Thus the new framework … which was designed with AutoOptimize (and eventually molecular dynamics) in mind. It’s noticeably slower when calling xtb as a process for each geometry.

If I understand correctly, you are comparing the speed of using GFN-FF as the calculation method linked up to the new framework in two different ways:

By getting, for each step, the energy and gradient via the python API for xtb;
By getting, for each step, the energy and gradient via system calls/terminal commands of xtb geom.xyz --scc and xtb geom.xyz --grad

… and then in each case you put the energy and gradient back into Avogadro and use that to decide the next geometry step within Avogadro’s optimization algorithm? And case 2 is significantly slower?

In that case could one not just implement a special case for GFN2-xTB and use the same front end but not use the Avogadro algorithm? As I suggested in a previous post, Avogadro could simply call xtb geom.xyz --opt, with any necessary constraints, and then read the latest geometry at regular intervals from xtbopt.log. The live optimization feature could be implemented here too, just by updating the geometry in Avo from the log file sufficiently frequently that it looks like real time (or indeed by artificially adding a time delay between steps if the calculation runs faster than the structure should relax in the program). That way xtb continues running in the background, so only the visualization is slowed down by inter-program communication, not the calculation itself.

If the user clicks and drags an atom, the xtb process could simply be stopped and restarted after the change has been made. Though it has been a long time since I used that feature - does the molecule normally continue optimizing while the user manipulates atoms? Would it be unsatisfactory if optimization only continued after mouse release?

Perhaps I am misunderstanding what makes it too slow, and it’s not the constant back and forth between programs at all.

I hope my enthusiasm is not too strong, it’s only that I think it would be a killer feature and would fit a lot of use cases, especially in my group, where we have many synthetic chemists who want to see what a molecule looks like in 3D but have no idea how to use a command line, use python, or run xtb or DFT. These people want to know what a molecule really looks like, without DFT, so would benefit from having GFN2 geometries by default as opposed to a force field for accuracy, and need conformer searches. I imagine this use case is pretty common! And I mean, pedagogically, I would argue it’s fairly crucial that students have access to a tool that gives them the correct geometry of e.g. ferrocene.

By the way, how do I include a username when quoting?

ghutchis · November 9, 2023, 1:11pm

Select a bit of text from the user’s post. A “quote” button should show up to include in your reply.

ghutchis · November 9, 2023, 3:18pm

Enthusiasm is great, and I agree 100% with the point. We definitely want to make Avogadro easy to use … and to help synthetic chemists who don’t have a great intuition in 3D or understand conformers.

On the conformer side, the Open Babel conformer generator will be up and running soon. I honestly thought it was already in there. (My group will have more conformer options sometime next year.)

Yes. Conceivably it could resume after release.

Let’s see if we can come up with a checklist of smaller tasks to get this started.

ghutchis · November 9, 2023, 4:34pm

As far as tasks, I’m going to suggest the task list go on GitHub for tracking: Full and privileged xtb integration · Issue #1447 · OpenChemistry/avogadrolibs · GitHub

I think the first step would be to figure out how to get Mac static binaries for the xtb releases to go with the current Linux and Win-64 builds.

That can certainly occur in parallel with itemizing the list of features (e.g., you seem very interested in crest conformers) and starting with some Python scripts because they’re easy / fast to try before the C++ implementations.

Ainosya · November 10, 2023, 11:24pm

Hello,

I truly think that this kind of organization is really nice and neat.
Currently, my focus is on implementing the AMOEBA FF (and others) directly within AVOGADRO using C/C++. If you are working on the python scripts for xtb and crest, I would be more than happy to assist in integrating them directly into C, eliminating the need for a separate script after the different FFs implementation. While my proficiency in C/C++ and AVOGADRO might be basic, I am enthusiastic about contributing and learning more.

ghutchis · November 12, 2023, 6:50pm

My initial thought was that would be tricky, but it does look like xtb does offer a C interface, which looks pretty similar to the xtb-python interface.
https://xtb-docs.readthedocs.io/en/latest/capi.html

This would require compiling the xtb pieces with a Fortran compiler so libxtb can be distributed (e.g., Windows, Mac, Linux AppImage) … I can provide some pointers or a branch if you’re interested in going this route.

matterhorn103 · November 12, 2023, 11:21pm

Perhaps using that would be faster for use in your framework? And allow using GFN2-xTB for live optimization? It would also solve any issues with python/conda packaging, I guess.

Of course, while energies and gradients (and solvation) are available via the API, frequency calculations and conformer searches are not. So I will still look at making that python script/plugin to enable use of command-line xtb via the Avogadro GUI.

ghutchis · November 13, 2023, 2:37am

Maybe? It would certainly solve the issue with GFN-FF returns an energy of 0 … but GFN2 is decidedly slower than GFN-FF.

And yes, I think having some simple scripts to calculate orbitals, vibrations, and run crest would be great.

matterhorn103 · November 13, 2023, 12:51pm

Hi ghutchis,

Sorry to be such a pain, but I’m having difficulty even getting plugins to work. Wasn’t sure whether to open a new thread/issue for it because maybe it’s just user error rather than a general bug…

I’m on OpenSUSE Tumbleweed with KDE Plasma, up to date. I have the nightly build of 1.98.1 as an AppImage.

For a start, the Download Plugins window is completely unpopulated, with only the column headers and the Download Selected button to be seen.

I tried cloning the avogadro-scikit-nano repo, then dragging the python scripts into the Avogadro window. The “Install plugin script” dialog appears and I can choose the command option, then click ok, but nothing changes.

I tried making a folder at ~/.local/share/avogadro (it didn’t yet exist) and adding the avogadro-scikit-nano directory to it, but it changed nothing.

I also started playing around making my own python script and can also find no way to add it to Avogadro.

Any ideas? Sadly my only machines are Linux and a Windows ARM Surface, so I can’t just use a Windows or Mac version.

matterhorn103 · November 13, 2023, 6:19pm

Ok I just about managed to build Avogadro from source on my Linux PC. Something is screwy with the graphics (possibly due to Wayland) so it’s not a useable version yet but it allows me to have a look at how some things work. For example, I can run the binary from the command line and now I see that I had the wrong path for plugin script location. I have moved my plugin and the avogadro-scikit-nano to ~/.local/share/OpenChemistry/Avogadro/commands/
I checked the AppImage and the plugins are still not loaded.

The GUI functions enough for me to open Extensions|Download Plugins… and it seems that now it is populated the way it should be. (So it seems something is broken in the AppImage, I’ll open an issue for it.) I could successfully download e.g. the nanocar plugin.

On starting the binary Avogadro prints that it found all three scripts but that it can’t load them. The same applies for the other 48 scripts that it finds in the various directories. This also explains why I only have the Lennard-Jones force field available and none of the others.

Bit of a facepalm moment for me when I just now realised I can also get the printed terminal output by starting the AppImage from the CLI. But there it says exactly the same, that the scripts are found but can’t be loaded. Though it also finds an extra 6 scripts, not sure why.