Opening Trajectory Files

I believe this to be a bug with Avogadro

Environment Information

Avogadro version: 1.99.0
Operating system and version: Windows 11 Pro Version 10.0.22631 Build 22631

Expected Behavior

Opening a file with multiple XYZ coordinate sets should register as an animation.

Actual Behavior

Only one XYZ set is read, the others are ignored and not registered by Avogadro 2

Steps to Reproduce

Open this XYZ file from ORCA 6.0.0 and notice that it does not register as being an animation. This was output originally as a .allxyz file, but usually just changing the extension makes it readable.

2H_PA.xyz (40.1 KB)

I am not sure how the animation stuff is handled in A2, I donā€™t quite see what the method in lines 131-174 of xyzformat.cpp checks for multiple coordinates. Once I understand that I will probably try to write a fix for it myself, but until then I figured Iā€™d just bring attention to it.

I had a look at your file and I also checked an ORCA .trj file I have, and I suspect the problem is that the individual XYZs are separated in your file by an extra line containing nothing but a >.

I assume the file can be loaded correctly if you just remove those extra lines?

Itā€™s possible that the XYZ parser is designed so that whitespace/empty lines between multiple XYZ entries are ignored, but I highly doubt itā€™s currently set up to interpret lines containing symbols.

I might have a look at working on the xyz parser myself tbh as I have been sorely wanting for a while that the user is given the option to open a multi-molecule file either with the animation tool or as multiple different molecules. But donā€™t let that stop you having a go at fixing the parser for your issue, youā€™ll get there first I imagine

(sigh) Fortunately, an easy fix. Small patch to parsing XYZ trajectories to handle ORCA 6 separators by ghutchis Ā· Pull Request #1705 Ā· OpenChemistry/avogadrolibs Ā· GitHub

Iā€™m curious to know more about that use case.

But that would take some work. At the moment, thereā€™s no good way for a parser to return multiple molecules from one file. Itā€™s also really kind of a pain - if you modify a molecule in the middle, you have to put the whole file back together, etc. Itā€™s not impossible, but itā€™s a bunch of work when itā€™s usually easy to split a multi-molecule file into pieces.

Well so a key one would of course be for sets of conformers, as a crest/xtb conformer search returns a set of conformers with their energies in a multi-molecule xyz file, and so itā€™s unintuitive (and also in no way indicated in the UI) that the animation tool is the way to view them. Obviously a conformer pane/menu/view is something that weā€™ve talked about and youā€™re planning, so that specific use-case will be solved, but it made me aware it is hard to predict all the different possible things a program or person might use a multi-section xyz for, so some option for a generic ā€œopen allā€ that doesnā€™t make any assumptions as to what the different molecules represent could be good donā€™t you think?

I was thinking of it as more of an import function where itā€™s one-way only to be honest. But thatā€™s a fair point.

An alternative idea though comes to mind - the pane in the bottom left has a molecules section and a layers section at the moment, right? But ā€œmoleculesā€ in that context kind of means ā€œfilesā€ right, in that opening a new file starts a new entry in the list of molecules, and there is no reason why a single molecule/file canā€™t contain many discrete molecules.

So how about renaming the current ā€œmoleculesā€ to ā€œfilesā€ and then adding a third tab called ā€œgeometriesā€ (or ā€œmoleculesā€ again I guess but I feel that itā€™s not really an accurate term) to switch between the different geometries in a multi-geometry file?

That would also make it possible to offer the option to add a new geometry to an existing file.

How is that different from a window for conformer properties? (Which was a feature in Avo1)

Well because an xyz file containing multiple entries doesnā€™t necessarily represent a set of conformers right? It might be a trajectory from a geometry optimization, or a surface scan, or the output of a dynamics simulation, or a set of snapshots destined to be an animation, etc.

Each entry doesnā€™t even have to contain the same set of atoms, right? Someone could have a single file containing some set of molecules that means something to them (maybe all the members of a test set, or all the canonical amino acids, could be anything) and they still want/need some way of viewing the contents of the file, but Avogadro has no means of interpreting what the set represents; displaying them to the user as conformers would be incorrect, and displaying them as an animation would lead to bizarre resultsā€¦

I found another interesting error in parsing multiple atoms in an XYZ file, and to be honest the error only occurs with an improperly formatted file but it does have some interesting behavior that may be a problem in other cases.

Essentially the issue arises when at the end of the file there are extra blank lines. I think what is happening is that when the animation check section is in the while loop it checks the line immediately after the last coordinate and if that line is a number that is equal to the initial number of atoms. (lines 156-159)

if (!getline(inStream, buffer)) {
  numAtoms2 = lexicalCast<int>(buffer);
  if (numAtoms == numAtoms2)
    break;

I am pretty sure that if the end of the file is a series of \n characters then attempting the lexicalCast\<int\>(buffer) does not work and doesnā€™t properly redefine numAtoms2 to be a new number. I think that this basically has the code try to check subsequent lines but since they are empty lines with just a line feed then it returns this error:

    if (tokens.size() < 4) {
      appendError("Not enough tokens in this line: " + buffer);
      return false;

A potential fix for this would be adding a section to redefine numAtoms2 to be zero right before it checks the next line, just so that if the line read fails it will break out of the loop.

In a similar vein, the reason this doesnā€™t seem to be a problem when there is only one line feed at the end of the file is because the inStream hits the end of file and terminates the read.

There is currently a number of spots in OpenChemistry/avogadrolibs/blob/master/avogadro/io/xyzformat.cpp that specifically check for subsequent molecules having the same number of atoms. You could perhaps just remove that part and hope for the best, but I am not sure how the animation part of A2 works in that regard.

Yeah, the patch I posted above could use a bit of cleanup to check for empty lines as well. Thatā€™s probably the cleanest fix.

I think weā€™ve already had this discussion. Thereā€™s a general need to mark the coordinate sets by ā€œtypeā€ (e.g., not necessarily conformers). But yes, every entry needs to have the same set of atoms and atom count.

Right now, the main Molecule class is set up to assume that the atom list (including count) is consistent across all coordinate sets. Iā€™m aware that sometimes there are dynamics simulations in which the # of atoms change, but fixing that is going to take a lot of work.ā„¢

Yeah. What Iā€™m saying is that I wrote the code to deal with that in Avogadro v1, and Iā€™m not keen to revisit it. For one, that use case is a lot less common than you might think and itā€™s fairly easy to split such files (often SDF).

Avogadro v2 currently has no way to read in a list of different molecules from a file. Changing that is certainly possible, but would take a bunch of work. Iā€™m happy to point people towards what would need to change and mentoring it, but itā€™s way down my todo list.

So ā€œdisplaying [a file with multiple molecules] as conformersā€ is indeed incorrect, but thereā€™s literally no way for that to happen at the moment.

Hey @ghutchis, just noticed a bug in the trajectory file fix :frowning:. There seems to be an extra molecule created at the end of the animation that is just a singular carbon atom. Feel free to use the attached trajectory file from an NEB run that I recently did to check that out. I will take a look at the git page and see if I canā€™t suss out what is happening to make this the case.

034-NEB_MEP_trj.xyz (7.2 KB)

1 Like

Iā€™ll take a look - does it get all the other frames right? (Right number of steps and things?)

It does seem to get the right number of steps and the molecules are correct, barring of course the additional extra frame with just a carbon atom.

As it turns out, itā€™s not just a carbon atom. All the coordinates are zero in that last frame. Thatā€™s the bug.

Thatā€™s a funky bug! Glad it has been patched!

Ah, itā€™s basically an ā€œoff by oneā€ bug. It goes into the end of the file, but added the ā€œall zerosā€ coordinates.

C++ is a little weird in that if youā€™re at the end of the file, the stream is good ā€“ you have to read beyond the end of the stream to get an error.