Combined plugins & plugin list

For a bit @matterhorn103 has been outlining some ideas for “combined” plugins. For example, maybe you want the xtb plugin to provide:

  • commands (e.g., generate some orbitals for me)
  • energy scripts (e.g., add GFN-FF and GFN2 to the AutoOptimize tool)
  • charge scripts (e.g., calculate GFN2 charges)
  • etc.

At the same time, the “plugin infrastructure” is a text file in a GitHub repo, manual review and some scripts to help cache the details (vs. checking secure hashes like git)

So I’m opening up this thread for him to outline some of his plugin manifesto, but also to discuss ways to improve plugins as we move towards 2.0. (I’m intending on writing about a dozen different plugins, and I’m kinda hoping many more will show up.)

For example, at some point, the download plugin table needs some searching / filtering and auto-update features.

Thoughts?

Ok, as requested, my vision is outlined in detail below. Feedback, questions, criticism all welcome.

Key goals

  1. Plugins in binary form should be enabled as well as ones written in Python
  2. Plugins should be able to offer multiple functionalities of mixed types
  3. Plugins written in Python should be able to use a more conventional package structure compatible with the broader Python ecosystem
  4. Plugins should be able to expect to be run via a single entry point, allowing them more flexibility in how they handle requests
  5. Metadata and information needed by Avogadro for population of the UI should be parsed statically from files rather than obtained dynamically by running Python at launch
  6. Relatedly, launch times should be improved by doing the minimum amount of work necessary to load plugins
  7. Plugins should be able to have Avogadro store user configuration options for them, and also receive the user’s entire Avo config on request

Proposed plugin infrastructure and API

Plugin directory structure

  • Plugins stored in subfolders under OpenChemistry/Avogadro/plugins/:
    • plugins/python for plugins written in Python (though they may have non-Python dependencies)
    • plugins/bin for plugins in compiled binary form
  • All plugins for distribution (and therefore downloadable via Avogadro’s plugin manager) must be plugin packages
    • A package means that the plugin is distributed and stored as a folder with the exact same name as the plugin e.g. my-plugin, and that the folder contains not only code but also a metadata file (see below)
    • This applies to both binary and Python plugins (or any other languages that might be supported in future)
    • Python scripts are still supported but intended just for convenient personal local use – they will not be distributed going forwards

Plugin types

  • Binary plugins are always plugin packages. They are stored under plugins/bin The plugin folder contains, at minimum:
    • A binary with the exact same name as the plugin e.g. plugins/bin/my-plugin/my-plugin
    • An avogadro.toml file (other name suggestions accepted, avoplugin.toml was another idea) containing the plugin’s metadata
      • This file is necessary for Avogadro to know how to run the plugin, so standalone binaries that are not in the package format are not allowed
  • Plugins written in Python and stored in plugins/python may be:
    • packages
      • Stored in subfolders e.g. plugins/python/my-py-plugin. The plugin folder contains, at minimum:
        • A pyproject.toml containing the plugin’s metadata as well as the usual Python info about the plugin’s dependencies
          • The plugin metadata is in exactly the same format as in avogadro.toml, but under a [tool.avogadro] table
        • Some Python source code, stored in any structure suitable for a Python package
    • standalone scripts
      • All kept together in plugins/python/scripts
      • For a script, the metadata is recorded at the top of the Python file itself, using the in-line metadata standard brought in by PEP 723

Metadata and configuration

  • Metadata that Avogadro needs to know in order to properly run a plugin is no longer obtained dynamically by running the plugin but instead by static analysis of TOML files
    • plugin.json is deprecated
    • Requires adding a TOML parsing library
  • The static metadata for a plugin contains the following information:
    • The subcommands provided by the plugin, what “types” of plugin process they correspond to (charges, energy etc.), and what the display names and menu paths for each of those items should be
    • Info that was previously in plugin.json e.g. author, version number
      • Items that have an equivalent in the [project] table of a pyproject.toml file are specified there for a Python plugin
    • Options regarding configuration for the plugin:
      • Whether or not the plugin wants a configuration to be recorded for it in the main Avogadro configuration file, and if so, what entries the config should have for the plugin
      • Whether the plugin wants to be passed the main Avogadro configuration options in their entirety as well (for example, so that the xtb plugin can use the same energy units as Avogadro). This will enable a future settings menu in the main interface to also show options for the plugins

Python plugins

  • Python plugin packages should be proper Python packages that can be installed; this means they follow Python packaging conventions, and even allows them to be published to PyPI or conda-forge so that they can also be used as libraries in a non-Avogadro context, if desired
  • Python packages define a single entry point that Avogadro will use
    • This allows a plugin to do various things; for example, it allows the code to be organized however best suits the plugin, and it allows the plugin to run any common logic more easily
  • A general, default Pixi environment for running most Python plugins is recorded in a manifest at plugins/python/pyproject.toml
    • For now, the environment has no dependencies by default, but in future there may be a minimal set of common packages specified e.g. numpy
  • Python plugin packages are then installed to the Pixi environment so that their dependencies can be resolved and added to the environment
  • Python plugin scripts may not have dependencies
    • If dependencies are necessary, just make it into a minimal package
    • In future, could think about having some dependencies in the common default environment as a common base, which scripts could be allowed to use

Running plugins

  • Scripts are run using pixi run python path/to/<script>.py [OPTIONS]
  • Binary packages are run using path/to/<plugin> [OPTIONS]
  • Python packages are run using pixi run <plugin> <subcommand> [OPTIONS]
    • This means plugins need to define an entry point in their pyproject.toml so that they can be run using a command rather than as a Python file (fairly trivial)
  • Rethink current option flags
    • Get rid of --run-command and similar – plugins define subcommands
    • Deprecate --display-name and --menu-path, put that information in the static metadata
    • --print-options should be split up – anything that’s always the same should go into the static metadata, while anything that might be dynamic should still be obtained by running a subcommand with an option
    • Add a --config option, which is used by Avogadro to pass configuration information (see above)

The online plugin repository/index

  • The Avogadro/plugins GitHub repo switches to using a repositories.toml file, with each plugin having its own table within it
    • Each plugin has to specify (at minimum, suggestions welcome):
      • The git repository
      • The specific commit
      • Possibly a SHA256 hash that Avogadro can use to check the plugin files after download
      • Whether it is a bin plugin or a python plugin (or any other options that may come later)
    • Requiring the specific commit massively improves security, as it means that updates for plugins have to be approved by submitting a PR to Avogadro/plugins, and the code for the plugin can’t just be changed and delivered to all plugin users
    • Switching to a new file means we can keep the old one for a little while if we like, to avoid breaking plugins for older versions just yet

Runtime plugin loading

  • The process of loading, running, and installing plugins in Avogadro is streamlined significantly:
    • At launch:
      • No plugin is run
      • No Python interpreter is used
      • The plugin directories are scanned for plugins, and each TOML file found is parsed appropriately to get the metadata of each
        • Much faster than spinning up several Python processes for each plugin in turn just to request information like the display name
    • When a plugin needs to be run (because it’s the selected force field, or because its menu option was selected), Avogadro just runs it according to the provided metadata
    • When downloading a plugin, Avogadro knows from the central index on https://avogadro.cc whether the plugin is a binary or Python plugin, puts it in the appropriate location, then installs it if it’s a Python one
  • The metadata should probably be cached in an index
    • That way, the scanning step is skipped altogether
    • It’s impossible to have a situation where a Python plugin is present but not installed
    • To cache the metadata:
      • A plugin index in the form of a plugins.toml file would be maintained in the plugins/ folder i.e. at OpenChemistry/Avogadro/plugins/plugins.toml
      • The index would collate all the metadata from all the installed plugins
      • Note this requires a TOML writing library! Could also just store as JSON, but seems a bit silly not to keep it consistent with the plugin metadata format
    • At launch, Avogadro just reads in the index and does no scanning of the directory tree – the index has everything Avogadro needs to know
    • Plugin discovery/index generation/Python package installation is then only carried out at launch if there is no index found
    • Downloading a plugin using the built-in downloader causes the plugin to be installed (if it’s a Python plugin) as well as appending the metadata to the index
    • Plugin discovery/index regeneration/reinstallation of all Python plugin packages can be manually triggered via a menu option in the Avogadro interface

Miscellaneous

  • Plugins may only have A-Z, a-z, 0-9, - (i.e. ASCII letters, numbers, and the hyphen-minus) in their names (no Unicode, no underscores, no other punctuation or whitespace)
  • Communication over the Avogadro-plugin interface should be exclusively in UTF-8

Let me start with a bit of history. One thing that came up repeatedly with Avogadro 1.0 / 1.2 was that people might want to contribute changes, but programming in C++ was an obstacle. Also, it would be nice to update things like the input generators without waiting for a full release of Avogadro. (For example if there’s a Gaussian 25 release or someone like @brockdyer03 contributes an xsf file reader.)

At the same time, there’s a huge increase in Python use in science research more broadly and chemistry in particular. (It’s also way easier to write parsers in Python than in C++, e.g. cclib). There’s now a broad ecosystem of Python packages and codes on conda-forge.

Marcus and Allison Vacanti wrote the input generator scripts with the idea of “supply some JSON for the form” and IMHO it works really well. It’s easy to point people at the avogenerators repository, and even if you don’t understand Python, they’re pretty easy to modify. Want to add a basis set or solvent? It’s just a text file with a list.

So two big goals:

  • lower the barrier to contributors / changes
  • make it easier to send a script to someone (i.e., Python scripts run pretty much anywhere)

They also added support for Python scripts to read file formats, although it doesn’t (yet) have wide use.

When I was working with some folks to expand to command scripts, those were key motivations too. Make it easy for someone to add a “generate nanotube” or “use pymatgen to create a slab” scripts.

I still want to keep those core goals.

I don’t have time at the moment to go through everything, but I welcome the discussion.

  1. I’m somewhat reluctant to enable binary plugins because we’d need to download platform-specific code, but maybe? It also makes it much harder to audit / inspect. Certainly it would help for some tasks. I think the broader aspect should be that we want to enable “not just Python” for plugins.
  2. I think I’d say some plugins should be able to offer multiple functionalities. With respect to lowering the bar to contribute, it’s pretty easy right now to write quick scripts for commands, charges, etc. So I’d like to keep those low-barrier tasks even if we make it better for more sophisticated plugins like xtb.
  3. As long as we agree that it’s not required to have a package structure, I’m good. I still want people to be able to write quick one-off scripts that use RDKit to select SMARTS patterns or change 10% of the lead atoms to gold.
  4. :man_shrugging:
  5. I’d say the catch is with translation / localization. I’d really like scripts to have the potential for localization.
  6. Of course that’s always a good goal.
  7. I’m not sure about the “entire config” part. Happy to have them store and retrieve settings, which seems useful.

So my suggestion would be that we keep the current subfolders for particular plugins, but add a new one for combined or something like that.

No, sorry. I strongly disagree with this because it creates a barrier for contributors. What’s wrong with someone writing a useful script (e.g., parsing a particular file format) and wanting to distribute it?

I’ll respond to a few other things later, but I think you’re under-estimating the perceived barrier to “creating a minimal package.” If anything we want to make it easy for people to contribute little useful things – which may still require dependencies – while still making it possible to have more sophisticated plugins as well.

Anything that makes it easier / faster seems like a good idea.

Well the issue is that then you don’t have any metadata.

I mean, it’s already not possible to submit just a script by itself for distribution – we require the git repo to have a plugin.json file in it. So we are already requiring a distribution container, I’m just proposing to use a more standard format that allows us to use pixi install.

To clarify, when I say “distribution” I’m talking about distribution by Avogadro’s own infrastructure. I’m not looking to discourage people just sharing quick scripts by email or on this forum – that kind of distribution I’m all for!

To be fair, my reasoning for not distributing scripts was to make things a lot simpler. Distributing scripts means having to account for more possibilities at different points. For example, the plugin manager has to know how to deal with scripts.

But my original idea didn’t have that restriction. The easy way around it is just to require scripts to put the necessary metadata in-line if they want us to host them. I’m happy to do that.

Don’t worry, our aims are aligned. The thing is, I think requiring an Avogadro-specific plugin structure actually raises the barrier, because there’s a lot of help and documentation for the standard way of doing things (and an LLM can easily help out), and not a lot for our way.

A minimal package structure is just a matter of uv init or similar with many other tools. Crucially though, I would propose that Avogadro/plugins includes an example minimal plugin layout that people can use as a template.

To your specific other comments:

  1. Yeah, that’s fair, I was thinking binaries compiled from open-source code though. But as a first step we could just add the API, so that people can at least use binary plugins locally, even if we don’t initially host them.
  2. I don’t see a need to distinguish. Each plugin already has to tell Avogadro what it does, and it would going forward too. Why rely on the directory tree structure?
  3. Well, I feel it has distinct advantages to require it if it’s going to be hosted by us and dealt with via the package manager, because things become so, so much easier.
  4. I suggest it because having each script in a plugin be independent and run separately makes things challenging once you need common functionality.
  5. Localization can easily be done within static metadata, not an issue. :slight_smile: If we like, we can still pass the --lang option to plugins so that they can adapt their output dynamically, if that’s desirable.
  6. :slight_smile:
  7. Sure, I was just going for the simplest approach!

I appreciate your thoughts, and hope you get a chance to look in more detail at the rest some time. Hopefully others might chime in too.

I think your impression is that I want to make things more complicated when actually I want the opposite, to simplify things and reduce friction when writing a plugin! I’m happy to find ways to simplify my proposal, but I’m convinced that aiming for a flexible, universal, and standards-based API is a better strategy (especially long-term) for achieving simplicity than implementing complicated custom solutions.

But you’re proposing that metadata can be included in the script file, right?

The main point of this was to make it pretty easy to have a folder that’s a set of scripts. Put them in a repo and add a fairly simple JSON with the metadata.

I’ve spent a good amount of time looking at research code - from my own group and from other groups.

While I think it’s important to promote good coding practices, I’d like to create a useful “contribution ladder” from “I made a script to generate XRD patterns” to much more sophisticated plugins like xtb. For example PyMol has a huge range of plugins from simple scripts to use of PyQt interfaces.

So if there’s a good way to include the metadata in a simple script, I’d be in favor of allowing a GitHub / GitLab / whatever repo with just a script or three and a README.

Now part of your proposal centers around more sophisticated packages, and suggesting some structure for those is a good idea. That’s also part of the need to create more examples so people can adapt them.

We can use pixi install on anything that has a pyproject.toml. Certainly I’d suggest plugins start using that, and I’m willing to use that as a metadata mechanism instead of the JSON file.

In many cases, it’s faster because the code doesn’t need to load them. 99% of the time, I don’t need to load charge plugins. Probably 50% of the time I don’t need to load energy plugins.

Also, given the list of things I need to write up, there will probably be 20-30 plugins. Having some directory structure for people makes it a bit easier IMHO.

Sure. But again, I think there’s a continuum of things from “I just want to run this one thing with pymatgen” to something like xtb which uses a bunch of common functionality. That’s why I think you’re describing a new category.

I don’t think it’s crucial to separate though. pixi allows tasks, which could be separate scripts or one entry point: Tasks - Pixi by prefix.dev

I agree that we’re on the same page as far as reducing friction. I think I’m coming from the pragmatic “I’d love people to start with a template but think that will be a mental barrier for many.”

Moreover, I think having slightly different plugin types makes it a bit easier on the C++ side. Let’s say that I want to add --batch-energy and --batch-optimize methods to the energy plugins. Right now, I can do that pretty easily. Those plugins are supposed to supply some metadata to Avogadro about their capabilities. I’m less clear on how that works in your universal proposal.

Part of the current complexity is that most useful plugins want some dependencies. That’s why pixi is great because it can pixi install an environment for pytorch for ML plugins or whatever else is needed.

Plugins need to be called at launch right now to make sure they can actually function in the current Python environment. For example if you don’t have pymatgen then don’t show those commands.

We could still make it fairly easy to supply a repository or otherwise automate the generation of the TOML or JSON.

GitHub actions could grab the repository, check the SHA256 (or other hash) etc. as part of a pull request. It’s also possible to provide some sort of automated “create a pull request when a plugin changes” but still require a human to check things over and approve the update or new entry.

(I’ve been looking into some AI-driven code review tools. GitHub copilot provides that for example.)

This discussion also reminds me about this old issue: Download plugins should use `git` if available in the path · Issue #341 · OpenChemistry/avogadrolibs · GitHub (i.e., if git is available, use that to download or update plugins)

Yes, but I was proposing that a script needn’t include metadata that is only necessary if it’s being distributed via the plugin manager, to keep it simpler for people writing one-off scripts. But sure, we could have script plugins include the same metadata as package plugins, that’s not an issue.

That would open the door to have local scripts also be manageable using the plugin manager in future (i.e. add script plugins in the GUI via a file dialog, and see or remove currently installed ones). That would be quite nice, I think.

I wonder whether maybe you have the wrong idea as to what a Python distribution package actually is/looks like? Because this is the thing, at the point where you have what you’re describing, the step to an actual proper Python package is really extremely small.

I have realised that pixi install is not, as I thought, something that installs a package to an environment, but instead a command that creates a local environment from a manifest.

I’ve also realised that a pixi/conda project is not a package itself and cannot be added to another project’s environment as a dependency/package in the same way that is possible with normal Python packages.

This means that various aspects of my proposal as envisaged would not be possible, in particular the way I was suggesting that plugins are run. I’ll have to think about things a bit more.

The whole point of having the metadata available statically, and possibly cached, is that they don’t need to be run at all until they actually need to produce a result. Reading the metadata in order to populate the various options in the GUI is trivially fast, so there’s no need to avoid loading subsets of the plugins, because they’re not really loaded any more.

I just think it actually makes it more, not less complicated, to write a script when you special case stuff and pigeon-hole them by functionality. And I’m not suggesting to get rid of the categories of functionality, just suggesting that a plugin should be a plugin should be a plugin, and the plugin communicates to Avogadro what it can do, rather than what kind of plugin it is.

That would still be very much possible, because again, Avogadro already knows which plugins offer energy calculations, so it can just choose to run those. All that changes is that we no longer say “this plugin does x so it’s an x plugin, this plugin does y so it’s a y plugin”, we say “this plugin offers A, and A does x, while this plugin offers B to do x and C to do y”.

I don’t think that makes sense, in exactly the same way as it doesn’t make sense for Avogadro to run its test suite at launch to check that everything functions. It means that if something is broken with a plugin or the Python setup that it affects the whole application e.g. by extending the launch time.

As to the example, that’s the wrong way of handling it. It shouldn’t be possible to install a plugin that relies on pymatgen if it’s not present, or the installation manager should make sure it becomes present.

It would be nice to do this if we used e.g. Pixi’s ability to add dependencies to an environment from git, but that requires them to be standard Python packages. I don’t think we should introduce yet another functionality that relies on something that may or may not be available (almost no Windows and macOS users will have git installed), as it just leads to fragility. It means another code path to maintain.

Anyway, as I said, I need to reappraise my proposal now that I’ve realised that it has impossible aspects. Likely I’ll then put a file in a GitHub repo so that we can have a living document that can be collaborated on, changes reviewed etc.