Managing Python

It seems like the reliance of Avogadro on Python for key functionality is only going to increase, so for a good user experience a Python installation is becoming a prerequisite.

As such, the necessity of various improvements to the way Python integration is handled by Avogadro, including helping the user to install Python if it is not already, using virtual environments, and allowing plugins to install dependencies, has been mentioned in various GitHub issues: #1391, #1428, and #1449.

There is the idea that the Windows installer for Avogadro should at least offer to install Python via a miniforge installation.

I have also seen a lot of questions on the forum along the lines of “Why isn’t this working?” where the answer is simply: “you need to set up Python” and/or “install these Python packages”. From my own experience and that of friends and colleagues, this is unlikely to be something easy for a non-computational chemist and certainly not something they really know how to do in the right way. These are also support tickets that could be avoided altogether if it is handled by Avo.

In general you seem to be a fan of conda, @ghutchis. And having begun using it for my own projects recently it certainly seems better for chemistry than pip, since non-Python things like Open Babel, xtb, CREST, are in it too.

I’m interested to hear why I would be naive and wrong for having the impression that the easiest way to manage Python in Avogadro would be to make conda either a dependency or a requirement for the Python functionality.

Some sort of initialization procedure on the first start of Avogadro could check if conda is already installed (just to avoid duplication), and if not, could offer to the user to install it in order to “enable all features of Avogadro” i.e. unlock the Python-dependent functionality. If declined, the same offer could be triggered later if the user attempts to install or use plugins.

Ensuring the presence of conda seems like it would have multiple advantages:

  • An Avogadro user doesn’t need to know anything about how Python works, how to install it, or how to manage virtual environments – the latter is really a difficult thing for new Python users

  • Avogadro could create its own avogadro2 conda environment, allowing it to manage all dependencies independent of the system or the user’s other Python work

  • The Python version can then be setup so Avogadro can choose what version of Python is used by it for any given release, improving compatibility

  • Plugins can install any dependencies they need in that environment by providing a environment.yml file or similar, this could optionally be extended to allow plugins to have their own conda environment if needed in future

    • (However Avogadro handles Python dependencies going forward, venv management is going to be unavoidable, as it can’t instal anything with pip into the system Python environment otherwise it might mess up the OS for Linux users)
  • A lot of time and coding effort would be solved by not having to worry about other sources of Python or packages or environments e.g. system Python, pip, homebrew, Linux distro packages, it is no longer necessary to get the user to provide the Python path or a virtual environment. #1391 in particular has the suggestion of handling both conda and pip and requiring user intervention

  • Installation of useful open-source software such as xtb (potentially even Open Babel itself?) could be managed easily by Avogadro without user intervention e.g. via a simple click to install interface within the GUI. This would allow #1447 to happen pretty easily!

The only real disadvantage I can see is that miniforge is added to Avogadro’s footprint, which is a) negligible and b) only necessary if the user doesn’t already have conda.

I’m sure I’m missing good reasons why this is a bad idea, but it seems like a fairly simple solution for Avogadro to deal with the mess that is Python environments and save a lot of work…

I look forward to my ignorance being expounded!

The short answer is because some people don’t like conda and we try to be flexible about it. That said, @mhanwell is working on having a conda-forge build of Avogadro so that conda enthusiasts can install Avogadro from conda.

Yes, I think there are two parts of this:

  • the Windows installer ideally should check for Python / conda and suggest installing it
  • Avogadro should check at least at the first launch for python and link to the miniforge installer for users to download.

That can be an option, but some of us have environments we’d already want to use. Certainly the Python settings / “Choose Python” command should offer the ability to create a new environment … that’s a good idea.

But that’s precisely the reason I don’t want to require conda. For one, not every user has admin rights on their computer. For another, users may have installed various packages / environments on their computer already. Why do they need to install an entirely new package manager / ecosystem to use Avogadro. Why can’t Avogadro just detect and select the environment I want as a user?

Requiring conda / conda-forge makes the code easier sure, but does it make it more user-friendly? I don’t think so. (Maybe it’s “user-grumpy” instead?)

Sure, but what if someone is running Ubuntu or Fedora or FreeBSD and they already installed those programs via their native package manager? Oh, or homebrew on Mac, since the Grimme group maintains a homebrew tap.

A bunch of the needed pieces are in a contributed pull request … unfortunately research11111 doesn’t want to use their full name in the code signoff. I have some pieces in a branch, but I’ve been spending more time on the spectra dialog for now.

Let me just say that I think a bunch of things will be solved by:

  • check for python and/or conda at first launch and suggesting to install miniforge (esp. on Windows)
  • improving the “Choose Python” command to detect environments (and as you say, offer to create a new one)
  • improving the “Download Plugins” code to check for requirements.txt and/or similar and install dependencies

Those are all part of the 1.99 roadmap … maybe later this week or next? Certainly before the end of the calendar year.

In the meantime, suggestions / help wanted.

Alright, seems fairly sensible.

Yeah, I think that would be good, even for non-Windows OSes.

Perhaps if conda is installed, it can be used automatically? And if not the user could be prompted to choose whether they want to use conda or pip, and given the option to specify a Python path and/or environment? Possibly with a (recommended) flag for conda?

But surely definitely a button for “set everything up for me”. I think whatever approach is taken, there has to at least be an option to let Avogadro take care of everything, because so many users will have never even heard of Python. Certainly the majority don’t know what a PATH is or how to add a program to it.

I would even suggest that a conda or pip env (in that order of preference, depending on what is found/specified on first launch) should at least be created by default so that one doesn’t have to be set before use. But with the ability to change it to another by pro users. The creation of an empty environment doesn’t take up much space, after all.

The creation/existence/use of some environment from whichever manager is necessary before the plugin manager installs packages, at least. We can’t have Avogadro changing the system Python.

Oh, ok. I thought the advantage of pip and conda is that everything is installed to a user’s home directory and can be done on a per-user basis without elevated permissions.

Sure, but if Avogadro is going to start installing dependencies in the background wouldn’t some people consider it undesirable behaviour to change existing environments? Though that could be solved just by a small warning that this will happen in the “Choose Python Environment” dialog, I guess.

Fair enough, you’re probably right. :slight_smile: I’m just very keen to make things as user-friendly as possible for non-technical users, that’s all.

Partly by reducing the need for a user to intervene at all, partly by making any extra setup steps as short and frictionless as possible. I think a “batteries included” approach, to borrow from Python, can help accessibility a lot. Hence the xtb plugin!

By the way, what did you think about my suggestion of settling on and standardizing the use of either “plugin” or “extension” throughout Avogadro, but not both, to avoid confusion?

Yes, that’s what I was saying.

I think it’s up to the user how they want to install packages. We can certainly suggest something. But I for one already have things set up. I would be very unhappy if some program starts creating environments for me.

There are some corporate users and some government lab users that are not allowed to install anything resembling untrusted code.

Yes, that’s the goal.

This can be read (and possibly I misread this) as if the three were not in PyPI. However they are,

  • openbabel: the one which normally would do here and its current unofficial bypass here
  • xtb: here
  • CREST: is it this one, or one among these here?

So far, I only dabbled a little in conda/miniforge and assumed the standard Python installation (from https://www.python.org/) amended by a couple of packages then actually needed to require less resources in download and management (the requirements.txt file), than conda (or even miniconda). Anecdotally, an update of an instance of miniconda to support a Python script* requiring numpy then equally (avalanche like) pulled the larger libraries of seaborn and pandas; in total much more, than initially wanted and actively used. Though permanent memory became affordable, if possible, I would prefer if the sum of platter space of Avogadro and its supporting Python ecosystem remained below 1 GB.

A later argument introduced is the limitation to install new programs on devices not owned by the user. In case of the Windows operating system, WinPython offers a portable approach which only installs into the hosting operating system after an optional and intentional opt-in (cf. overview) to be carried e.g. on a thumb drive, or be used just after decompression.** Additional packages are imported either via pip and PyPI, or as wheels e.g., from Gohlke’s index.

* the script in question was not related to Avogadro.
** though by now this can account for multiple 100 MB of platter space, too.

While I hear your concerns about size on disk –

That is the xtb-python bindings, not xtb itself, requiring xtb to be installed already, as far as I can make out. And none of those is the CREST of Grimme.

Regardless, the xtb-python API is also no longer being developed or maintained, and has only a limited subset of xtb’s functionality, so we can’t use that. (@ghutchis I guess the force field framework will have to switch to the C API if going via the command line isn’t fast enough?)

Since xtb and crest are written in pure Fortran they’re presumably never going to be properly on PyPI? Their repos make no mention of pip as an installation option. And I guess that’s why conda is so useful for comp chem – it’s not limited to Python. Basically every chemistry repo I follow on GitHub seems to be available through conda-forge and not PyPI, though that is admittedly fairly anecdotal evidence.

I mean surely that’s because the script had seaborn and pandas as dependencies; they are not dependencies of numpy. A numpy installation is basically the same size when installed with either package manager (actually slightly bigger when done with pip – 37.8 vs 33.8 MiB). If an Avogadro extension requires numpy, or pandas, or whatever, it requires it, there’s no getting round that, because it’s unusable otherwise – so it will end up being installed in any case via either pip or conda.

And after Geoff’s reply, it looks like he’d like to adopt an ecosystem-agnostic approach, and detect/use/support whatever the user has already. Which is fine! So I have no strong feelings really, conda just seemed like a neat way to easily achieve a consistent environment and UX :slight_smile:

Absolutely, that kind of size would be insane. But a miniforge installation + a single Python interpreter is not huge, I’d guess ~100 MB. That would be a 60% increase in the size of the AppImage, which means you’re probably right, it is unnecessary weight. But it’d be nowhere near a gigabyte.

BTW, crest is:

As @matterhorn103 noted, it’s not on PyPI. We’ve dealt with binary wheels on PyPI but it can be a bit tricky, and I don’t know about Fortran packages on PyPI.

Sure. I definitely encourage using conda but as I mentioned, there are people have strong, strong feelings about pyenv vs. conda vs. … And to be fair, if you’re using your Linux or FreeBSD package manager to install packages, you probably don’t want to mess with another package manager with conda. I get annoyed when homebrew and conda install the same libraries multiple times. I don’t need 2 different copies of cairo :thinking:

Heh. A while ago, I noted that CCDC Mercury occupied ~1GB on my Mac. Since I know it’s also built with Qt, I was curious to inspect the package. It had a whole Python install.

Yep. Seaborn sits on top of matplotlib, etc. and undoubtedly pandas includes other dependencies.

I can certainly understand - my “mini” forge directory is now 2.9GB - sorry 2.2GB.

@ghutchis CCDC’s freely available community edition of Mercury equally contains the engine and a couple of libraries of R for statistical analyses, and an executable of povray to generate individual frames of an animation. The installer provides all of them, but it eventually depends on the license key if they are grayed-out, or functional.

Beside potential pitfalls in lines of “whose Python interpreter & libraries (and versions) I use now”, I still think this was a turn for worse in CCDC’s policy in comparison to the old approach with the yearly letter from Cambridge UK with the credentials to download, install, and unlock the full version vs the much smaller free version already powerful (plenty options of file I/O, basic analysis and export of illustrations good enough for a poster or talk, a first prediction of a theoretical PXRD with adjustable wavelength, etc.). Equally more expensive in bandwidth for data to transfer someone has to pay for in units of time and money. But I digress, back to this forum’s topic, Avogadro.