Improving indexing across Avogadro

matterhorn103 · September 26, 2024, 3:59pm

In my group I just experienced an example of how the current way Avogadro handles indexing can lead to significant confusion and potentially to incorrect scientific conclusions. In this case it might have even influenced a publication we have in preparation…

Indeed, I was about to open an issue about Avogadro rendering orbitals from cube files differently to orbitals from ORCA output files, until I realised what the true source of the discrepancy was.

Essentially the specific problem arises because Avogadro is set up to do all indexing starting at 1, while ORCA does all indexing starting at 0.

I can also report back that the difference between “Unique ID” and “Index” being the indexing scheme used is non-obvious to other users, and my colleague tried to change to 0-indexing for the atom labelling but didn’t discover that “Unique ID” was the way to do it.

But even when one knows that, it only helps for the labels, while the 0-indexing approach used by ORCA applies not just to atom numbers, but also vibrational modes and orbitals.

At the moment, Avogadro reads everything from ORCA output files correctly (it doesn’t fail to read atom 0 for example) but then renumbers them i.e. it displays different indices for each atom/vibration/orbital than what is in the output file, or what is needed for giving ORCA further instructions. This leads to difficulties when working out the correct orbital to render for an image, or when working out which atoms to constrain, etc. etc.

A user who isn’t aware of the indexing difference can load an ORCA file in Avogadro, look for a specific atom or orbital by index, and identify the completely wrong one. I would expect that the overlap between ORCA users and Avogadro users is, due to their respective license conditions, fairly large, so I see this as a big problem.

Can we please discuss and come up with a broad plan on how to handle indexing better?

I would suggest three conditions that we need to fulfil in order to eliminate this confusion while also enabling people to work the way they want to work (i.e. some people’s general preference for 0 or 1-indexing):

The current indexing scheme should apply for all indexing, not just e.g. for atom labels;
After loading a file into Avogadro, whatever index is used to refer to an atom/vibration/orbital within the file should be the one displayed to the user;
The user should be able to switch between 0-indexing and 1-indexing at any time, the current indexing scheme should be made prominently visible to the user, and the option to switch should be easy to access.

Obviously actually implementing this may be a fair bit of work but agreeing on a big-picture approach of how it should be handled would be the first step towards it.

At the moment I see this being a significant practical barrier to use of Avogadro, particularly with ORCA, so I definitely think we need an agreed plan of how it should look.

matterhorn103 · September 26, 2024, 4:01pm

I do also have a suggestion for how an actual implementation could look:

The program state records whether the indexing should start at 0 or 1
This state is tracked not for the program globally but for each individual file/“Molecule”
The current indexing state is reflected in all indices:
- In the lists of atoms/bonds/angles/torsions under Atom/Bond/Angle/Torsion properties
- In the index used for atom labels
- In the index used for orbitals in Analysis > Create Surfaces and eventually the future orbitals pane
“Unique ID” is removed as an option for atom labels due to its unclear meaning
The “Element & ID” option for atom labels becomes “Element and Index” and also respects the indexing choice
Two settings/switches control the indexing behaviour:
- A switch to globally change the indexing setting is provided in the future settings menu, but this doesn’t change what is currently in use, it just sets the default
- A switch to temporarily change the indexing method on a per-molecule basis, which should no longer be hidden away in the View Configuration menu for Labels only, but instead be put somewhere much more prominent
  - There could still be a switch to change it there, since people might look for it there, but it shouldn’t be the only place if it affects more than just the labels
Importantly, the current indexing being used is changed automatically upon loading a file based on what indexing the file itself uses, so that the indices shown to the user are the same as those in the file. Loading an ORCA file when the indexing is set to 1 should change it to 0-indexing, but only for that file
- If a silent switch of index is considered confusing UX, a dialog or banner could be displayed on loading the file to alert the user that the indexing has been automatically switched
A CJSON doesn’t need to record the indexing used (though it could), instead the indexing used for a loaded CJSON can just be the user’s default setting, or we can just say “CJSON starts indexing at 1” (even though technically CJSON stores everything unindexed)

Where to put the temporary/per-molecule switch is something I’m unsure about; my first suggestion would be a new pane for Molecule Settings, accessed via a ••• button on the Molecules pane, analogous to the settings for Display Types, but there may well be better ways to do it.

Being biased, I naturally think most of what I have suggested makes sense and would make the indexing behaviour flexible and intuitive and obvious; if there is broad agreement I would open GitHub issues for the individual aspects and try to do what I can to add things myself. But I would be keen to get people’s opinions and other ideas or improvements.

ghutchis · September 26, 2024, 4:35pm

That would be fine if it’s a preference or setting. The default should definitely be 1 and I feel pretty strongly about that from seeing lots of students use Avogadro over the years.

For technical users and programmers, 0-based indexing makes some sense. But that’s not how the vast majority of people think.

I’ll note that a lot of quantum programs also number vibrations, etc. from one. There’s “how the program works internally” (zero-based indexing) and “what the user sees” (one-based indexing).

matterhorn103 · September 26, 2024, 7:03pm

Oh I 100% agree that 1 should be default I prefer 1 as well, and I’m a bit sad that ORCA chose to go with 0.

Yup and that’s actually why I’m suggesting that Avogadro should ideally automatically adapt to files that contain 0-indexed sets – because many chemists probably aren’t aware it’s even a thing and won’t understand why the two programs show a discrepancy

ghutchis · September 26, 2024, 7:18pm

Oof. While I can understand your point (global vs. file-only) … I think this would be a bit confusing. I’d go with a global choice. People can always switch back after they’re done dealing with ORCA files.

matterhorn103 · September 26, 2024, 8:23pm

That’s maybe fair, but then how do you avoid the issue of an ORCA file being opened and the orbitals (atoms etc.) then being displayed with the wrong numbers (because the default indexing is 0) without the user being aware?

It would be surprising if the global choice was silently switched automatically on loading an ORCA file, right?

I guess an option would be to, like you suggest, only have a global setting, but upon loading an ORCA file (or something else with 0-indexing) to trigger a dialog that alerts the user to the mismatch and asks if they want to change the indexing scheme to match the file? Then they’d definitely be aware, I guess, but aren’t obliged to change anything.

ghutchis · September 26, 2024, 8:44pm

For most users, there’s no issue. In Avogadro, they see atoms numbered from 1, a list of vibrations in the table, a list of orbitals, etc.

IMHO it’s a small number of people who need to reference a specific vibration or orbital to compare with Orca.

I think you document the setting and add a note why you might want to change. Maybe Orca decides to provide an “Orca-enhanced Avogadro” with the default changed. And you’d add some comments for people to find in this forum and the Orca forum.

Seems like most users would be confused why this is even being asked. Again, most users are just going to open up a file and analyze in Avogadro.

I’m on making this a global setting and documenting it, changing the atom labels as you described, etc.

I reserve the right to change things later, but I think a global setting and some documentation will solve a lot of the use cases.

matterhorn103 · September 26, 2024, 9:20pm

Fair enough, I’ll see what I can do for that then.

I’m not convinced it’s such a niche concern – it also applies to the atom numbers as well, right, so it has been relevant to me in a number of situations e.g.:

Any time I wanted to apply constraints
If I looked at the orbital coefficients in the output file and wanted to look which atom “C38” actually is, because that’s where, for example, the large LUMO coefficient is
I wanted to work out which atom it is most chemically accurate to draw the negative charge or dot in the ChemDraw structure based on the charge or spin distributions
The optimization found a saddle point and I wanted to give one of the atoms involved in the negative frequency a little nudge
I wanted to double check that Avogadro correctly identified the HOMO/LUMO in an unrestricted Hartree-Fock calculation on a radical (which it doesn’t, by the way, but that’s a bug for a separate issue)
I wanted to average together the calculated NMR coupling constants of the three hydrogen atoms in each methyl group to account for the fast rotation

All I can say is that it’s something I have had to think about many times personally, and that it was also an acute problem for a colleague today. But not everyone uses orca, that’s true.