Avogadro version: 1.99.0
Operating system and version: Windows 11 Pro Version 10.0.22631 Build 22631
Expected Behavior
Opening a file with multiple XYZ coordinate sets should register as an animation.
Actual Behavior
Only one XYZ set is read, the others are ignored and not registered by Avogadro 2
Steps to Reproduce
Open this XYZ file from ORCA 6.0.0 and notice that it does not register as being an animation. This was output originally as a .allxyz file, but usually just changing the extension makes it readable.
I am not sure how the animation stuff is handled in A2, I donāt quite see what the method in lines 131-174 of xyzformat.cpp checks for multiple coordinates. Once I understand that I will probably try to write a fix for it myself, but until then I figured Iād just bring attention to it.
I had a look at your file and I also checked an ORCA .trj file I have, and I suspect the problem is that the individual XYZs are separated in your file by an extra line containing nothing but a >.
I assume the file can be loaded correctly if you just remove those extra lines?
Itās possible that the XYZ parser is designed so that whitespace/empty lines between multiple XYZ entries are ignored, but I highly doubt itās currently set up to interpret lines containing symbols.
I might have a look at working on the xyz parser myself tbh as I have been sorely wanting for a while that the user is given the option to open a multi-molecule file either with the animation tool or as multiple different molecules. But donāt let that stop you having a go at fixing the parser for your issue, youāll get there first I imagine
But that would take some work. At the moment, thereās no good way for a parser to return multiple molecules from one file. Itās also really kind of a pain - if you modify a molecule in the middle, you have to put the whole file back together, etc. Itās not impossible, but itās a bunch of work when itās usually easy to split a multi-molecule file into pieces.
Well so a key one would of course be for sets of conformers, as a crest/xtb conformer search returns a set of conformers with their energies in a multi-molecule xyz file, and so itās unintuitive (and also in no way indicated in the UI) that the animation tool is the way to view them. Obviously a conformer pane/menu/view is something that weāve talked about and youāre planning, so that specific use-case will be solved, but it made me aware it is hard to predict all the different possible things a program or person might use a multi-section xyz for, so some option for a generic āopen allā that doesnāt make any assumptions as to what the different molecules represent could be good donāt you think?
I was thinking of it as more of an import function where itās one-way only to be honest. But thatās a fair point.
An alternative idea though comes to mind - the pane in the bottom left has a molecules section and a layers section at the moment, right? But āmoleculesā in that context kind of means āfilesā right, in that opening a new file starts a new entry in the list of molecules, and there is no reason why a single molecule/file canāt contain many discrete molecules.
So how about renaming the current āmoleculesā to āfilesā and then adding a third tab called āgeometriesā (or āmoleculesā again I guess but I feel that itās not really an accurate term) to switch between the different geometries in a multi-geometry file?
That would also make it possible to offer the option to add a new geometry to an existing file.
Well because an xyz file containing multiple entries doesnāt necessarily represent a set of conformers right? It might be a trajectory from a geometry optimization, or a surface scan, or the output of a dynamics simulation, or a set of snapshots destined to be an animation, etc.
Each entry doesnāt even have to contain the same set of atoms, right? Someone could have a single file containing some set of molecules that means something to them (maybe all the members of a test set, or all the canonical amino acids, could be anything) and they still want/need some way of viewing the contents of the file, but Avogadro has no means of interpreting what the set represents; displaying them to the user as conformers would be incorrect, and displaying them as an animation would lead to bizarre resultsā¦
I found another interesting error in parsing multiple atoms in an XYZ file, and to be honest the error only occurs with an improperly formatted file but it does have some interesting behavior that may be a problem in other cases.
Essentially the issue arises when at the end of the file there are extra blank lines. I think what is happening is that when the animation check section is in the while loop it checks the line immediately after the last coordinate and if that line is a number that is equal to the initial number of atoms. (lines 156-159)
if (!getline(inStream, buffer)) {
numAtoms2 = lexicalCast<int>(buffer);
if (numAtoms == numAtoms2)
break;
I am pretty sure that if the end of the file is a series of \n characters then attempting the lexicalCast\<int\>(buffer) does not work and doesnāt properly redefine numAtoms2 to be a new number. I think that this basically has the code try to check subsequent lines but since they are empty lines with just a line feed then it returns this error:
if (tokens.size() < 4) {
appendError("Not enough tokens in this line: " + buffer);
return false;
A potential fix for this would be adding a section to redefine numAtoms2 to be zero right before it checks the next line, just so that if the line read fails it will break out of the loop.
In a similar vein, the reason this doesnāt seem to be a problem when there is only one line feed at the end of the file is because the inStream hits the end of file and terminates the read.
There is currently a number of spots in OpenChemistry/avogadrolibs/blob/master/avogadro/io/xyzformat.cpp that specifically check for subsequent molecules having the same number of atoms. You could perhaps just remove that part and hope for the best, but I am not sure how the animation part of A2 works in that regard.
Yeah, the patch I posted above could use a bit of cleanup to check for empty lines as well. Thatās probably the cleanest fix.
I think weāve already had this discussion. Thereās a general need to mark the coordinate sets by ātypeā (e.g., not necessarily conformers). But yes, every entry needs to have the same set of atoms and atom count.
Right now, the main Molecule class is set up to assume that the atom list (including count) is consistent across all coordinate sets. Iām aware that sometimes there are dynamics simulations in which the # of atoms change, but fixing that is going to take a lot of work.ā¢
Yeah. What Iām saying is that I wrote the code to deal with that in Avogadro v1, and Iām not keen to revisit it. For one, that use case is a lot less common than you might think and itās fairly easy to split such files (often SDF).
Avogadro v2 currently has no way to read in a list of different molecules from a file. Changing that is certainly possible, but would take a bunch of work. Iām happy to point people towards what would need to change and mentoring it, but itās way down my todo list.
So ādisplaying [a file with multiple molecules] as conformersā is indeed incorrect, but thereās literally no way for that to happen at the moment.
Hey @ghutchis, just noticed a bug in the trajectory file fix . There seems to be an extra molecule created at the end of the animation that is just a singular carbon atom. Feel free to use the attached trajectory file from an NEB run that I recently did to check that out. I will take a look at the git page and see if I canāt suss out what is happening to make this the case.
Ah, itās basically an āoff by oneā bug. It goes into the end of the file, but added the āall zerosā coordinates.
C++ is a little weird in that if youāre at the end of the file, the stream is good ā you have to read beyond the end of the stream to get an error.