Opening Trajectory Files

brockdyer03 · September 7, 2024, 12:00am

I believe this to be a bug with Avogadro

Environment Information

Avogadro version: 1.99.0
Operating system and version: Windows 11 Pro Version 10.0.22631 Build 22631

Expected Behavior

Opening a file with multiple XYZ coordinate sets should register as an animation.

Actual Behavior

Only one XYZ set is read, the others are ignored and not registered by Avogadro 2

Steps to Reproduce

Open this XYZ file from ORCA 6.0.0 and notice that it does not register as being an animation. This was output originally as a .allxyz file, but usually just changing the extension makes it readable.

2H_PA.xyz (40.1 KB)

I am not sure how the animation stuff is handled in A2, I don’t quite see what the method in lines 131-174 of xyzformat.cpp checks for multiple coordinates. Once I understand that I will probably try to write a fix for it myself, but until then I figured I’d just bring attention to it.

matterhorn103 · September 8, 2024, 8:01am

I had a look at your file and I also checked an ORCA .trj file I have, and I suspect the problem is that the individual XYZs are separated in your file by an extra line containing nothing but a >.

I assume the file can be loaded correctly if you just remove those extra lines?

It’s possible that the XYZ parser is designed so that whitespace/empty lines between multiple XYZ entries are ignored, but I highly doubt it’s currently set up to interpret lines containing symbols.

matterhorn103 · September 8, 2024, 8:08am

I might have a look at working on the xyz parser myself tbh as I have been sorely wanting for a while that the user is given the option to open a multi-molecule file either with the animation tool or as multiple different molecules. But don’t let that stop you having a go at fixing the parser for your issue, you’ll get there first I imagine

ghutchis · September 8, 2024, 3:16pm

(sigh) Fortunately, an easy fix. Small patch to parsing XYZ trajectories to handle ORCA 6 separators by ghutchis · Pull Request #1705 · OpenChemistry/avogadrolibs · GitHub

I’m curious to know more about that use case.

But that would take some work. At the moment, there’s no good way for a parser to return multiple molecules from one file. It’s also really kind of a pain - if you modify a molecule in the middle, you have to put the whole file back together, etc. It’s not impossible, but it’s a bunch of work when it’s usually easy to split a multi-molecule file into pieces.

matterhorn103 · September 8, 2024, 6:48pm

Well so a key one would of course be for sets of conformers, as a crest/xtb conformer search returns a set of conformers with their energies in a multi-molecule xyz file, and so it’s unintuitive (and also in no way indicated in the UI) that the animation tool is the way to view them. Obviously a conformer pane/menu/view is something that we’ve talked about and you’re planning, so that specific use-case will be solved, but it made me aware it is hard to predict all the different possible things a program or person might use a multi-section xyz for, so some option for a generic “open all” that doesn’t make any assumptions as to what the different molecules represent could be good don’t you think?

I was thinking of it as more of an import function where it’s one-way only to be honest. But that’s a fair point.

An alternative idea though comes to mind - the pane in the bottom left has a molecules section and a layers section at the moment, right? But “molecules” in that context kind of means “files” right, in that opening a new file starts a new entry in the list of molecules, and there is no reason why a single molecule/file can’t contain many discrete molecules.

So how about renaming the current “molecules” to “files” and then adding a third tab called “geometries” (or “molecules” again I guess but I feel that it’s not really an accurate term) to switch between the different geometries in a multi-geometry file?

That would also make it possible to offer the option to add a new geometry to an existing file.

ghutchis · September 8, 2024, 7:45pm

How is that different from a window for conformer properties? (Which was a feature in Avo1)

matterhorn103 · September 8, 2024, 8:28pm

Well because an xyz file containing multiple entries doesn’t necessarily represent a set of conformers right? It might be a trajectory from a geometry optimization, or a surface scan, or the output of a dynamics simulation, or a set of snapshots destined to be an animation, etc.

Each entry doesn’t even have to contain the same set of atoms, right? Someone could have a single file containing some set of molecules that means something to them (maybe all the members of a test set, or all the canonical amino acids, could be anything) and they still want/need some way of viewing the contents of the file, but Avogadro has no means of interpreting what the set represents; displaying them to the user as conformers would be incorrect, and displaying them as an animation would lead to bizarre results…

brockdyer03 · September 8, 2024, 8:43pm

I found another interesting error in parsing multiple atoms in an XYZ file, and to be honest the error only occurs with an improperly formatted file but it does have some interesting behavior that may be a problem in other cases.

Essentially the issue arises when at the end of the file there are extra blank lines. I think what is happening is that when the animation check section is in the while loop it checks the line immediately after the last coordinate and if that line is a number that is equal to the initial number of atoms. (lines 156-159)

if (!getline(inStream, buffer)) {
  numAtoms2 = lexicalCast<int>(buffer);
  if (numAtoms == numAtoms2)
    break;

I am pretty sure that if the end of the file is a series of \n characters then attempting the lexicalCast\<int\>(buffer) does not work and doesn’t properly redefine numAtoms2 to be a new number. I think that this basically has the code try to check subsequent lines but since they are empty lines with just a line feed then it returns this error:

    if (tokens.size() < 4) {
      appendError("Not enough tokens in this line: " + buffer);
      return false;

A potential fix for this would be adding a section to redefine numAtoms2 to be zero right before it checks the next line, just so that if the line read fails it will break out of the loop.

In a similar vein, the reason this doesn’t seem to be a problem when there is only one line feed at the end of the file is because the inStream hits the end of file and terminates the read.

brockdyer03 · September 8, 2024, 8:53pm

There is currently a number of spots in OpenChemistry/avogadrolibs/blob/master/avogadro/io/xyzformat.cpp that specifically check for subsequent molecules having the same number of atoms. You could perhaps just remove that part and hope for the best, but I am not sure how the animation part of A2 works in that regard.

ghutchis · September 8, 2024, 9:51pm

Yeah, the patch I posted above could use a bit of cleanup to check for empty lines as well. That’s probably the cleanest fix.

I think we’ve already had this discussion. There’s a general need to mark the coordinate sets by “type” (e.g., not necessarily conformers). But yes, every entry needs to have the same set of atoms and atom count.

Right now, the main Molecule class is set up to assume that the atom list (including count) is consistent across all coordinate sets. I’m aware that sometimes there are dynamics simulations in which the # of atoms change, but fixing that is going to take a lot of work.™

Yeah. What I’m saying is that I wrote the code to deal with that in Avogadro v1, and I’m not keen to revisit it. For one, that use case is a lot less common than you might think and it’s fairly easy to split such files (often SDF).

Avogadro v2 currently has no way to read in a list of different molecules from a file. Changing that is certainly possible, but would take a bunch of work. I’m happy to point people towards what would need to change and mentoring it, but it’s way down my todo list.

So “displaying [a file with multiple molecules] as conformers” is indeed incorrect, but there’s literally no way for that to happen at the moment.

brockdyer03 · September 29, 2024, 12:34pm

Hey @ghutchis, just noticed a bug in the trajectory file fix . There seems to be an extra molecule created at the end of the animation that is just a singular carbon atom. Feel free to use the attached trajectory file from an NEB run that I recently did to check that out. I will take a look at the git page and see if I can’t suss out what is happening to make this the case.

034-NEB_MEP_trj.xyz (7.2 KB)

ghutchis · September 29, 2024, 4:08pm

I’ll take a look - does it get all the other frames right? (Right number of steps and things?)

brockdyer03 · September 29, 2024, 7:05pm

It does seem to get the right number of steps and the molecules are correct, barring of course the additional extra frame with just a carbon atom.

ghutchis · September 29, 2024, 10:05pm

As it turns out, it’s not just a carbon atom. All the coordinates are zero in that last frame. That’s the bug.

github.com/OpenChemistry/avogadrolibs

Fix XYZ trajectories - prevent final frame with 0,0,0 coords

OpenChemistry:master ← ghutchis:fix-xyz-trajectories

opened 10:11PM - 29 Sep 24 UTC

ghutchis

+4 -1

Thanks to brockdyer03 https://discuss.avogadro.cc/t/opening-trajectory-files/55…70/11 Developer Certificate of Origin Version 1.1 Copyright (C) 2004, 2006 The Linux Foundation and its contributors. 1 Letterman Drive Suite D4700 San Francisco, CA, 94129 Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Developer's Certificate of Origin 1.1 By making a contribution to this project, I certify that: (a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or (b) The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or (c) The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it. (d) I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open source license(s) involved.

brockdyer03 · September 29, 2024, 10:17pm

That’s a funky bug! Glad it has been patched!

ghutchis · September 29, 2024, 10:19pm

Ah, it’s basically an “off by one” bug. It goes into the end of the file, but added the “all zeros” coordinates.

C++ is a little weird in that if you’re at the end of the file, the stream is good – you have to read beyond the end of the stream to get an error.