New Molecular Properties Window

This one still needs some work, but I’m looking for some feedback.

A few things should get hidden like fileName. (I also don’t know why there are two total charge fields right now.)

For a few properties, we can code “special cases,” for the left header labels:

  • Total Charge (or “Net Charge” maybe?)
  • Net Spin (… should this get translated into “Neutral”, “Radical”, “Biradical”, etc.? … what to do for high spin species?)
  • Total Energy (… which is problematic since we’d want to adjust to something like kJ/mol but of course SCF total energy is a bit weird)

Among editable things:

  • name
  • total charge
  • total spin
  • (anything else?)

The key challenge is that some properties that might be set by a file might be “PUBCHEM_CACTVS_COMPLEXITY” or “dipoleMoment”.

So one question is whether the dialog should attempt to “normalize” those properties (e.g., “PubChem CACTVS Complexity” and “Dipole Moment” … or at least for a few common things that can also be picked up by language translation features?

Should there be a way for a user to add a property?

How should we handle energies? Convert everything to kcal / mol? How should we handle SCF total energies in that case? Leave them in Hartree?

I suspect that among others @brockdyer03 and @erb74 might have some ideas?

I dig the new additions!

I think that the file name should probably be hidden (it is usually in the molecules list on the left anyways). I think Net Charge is good, and for Net Spin I think using the Singlet, Doublet, Triplet, etc. nomenclature would be good, perhaps after it in parentheses it could say like “(1 Unpaired Electron)” for those who aren’t familiar with the nomenclature. For total energy I would say that trying to use whatever units came with the output file is good, or if you’d prefer a standard then I’d say Hartrees (ORCA FTW) are a good unit for single point energy. When things like thermodynamic values get added I’d say those should be in kcal/mol, since almost every big computational paper uses it.

There may arise an issue with some programs, I know that Quantum ESPRESSO uses Rydbergs instead of Hartrees (honestly I can’t even fathom why) and eV instead of kcal/mol (but only in some places). My solution to this would be a comprehensive list for all of the output parsers that says what properties they are capable of extracting from a given output file, and then populating the table with only those values.

Looks good!

Agreed.

I also don’t think this is the right place for the list of energies, so I would leave them out of here. There’ll be a conformers window or something for it to go in where it is more clear what they correspond to.

Making as much as possible translatable is important in my view so +1.


Please please please take the time to make this dependent on a setting in the user’s config. It doesn’t necessarily need to be possible yet to actually change that setting in the UI, but I’d beg you to set up a config option now in advance, and refer to that for the molecular properties dialog.

Ideally all energies across the whole program should then be shown in the user’s preferred units. It can be overruled in individual cases if there is a very good reason to use different units for them.

For this molecular properties dialog in particular I would add a third column for units, and in some cases I would let this be a drop-down to let the user temporarily switch the units for a given field. Then everything is super convenient – the user can set their preferred global default units, yet also get values in other units when necessary without having to change the global setting or get out a calculator.

To cover different disciplines, the options should be at minimum:

  • kJ/mol
  • kcal/mol
  • hartree (not capitalized, like kelvin, ohm etc.)
  • eV
  • cm-1

Only for computational chemists! School and undergrad students have no idea what a hartree is.

Besides, reaction/transition state energies etc. are always in kJ or kcal, so in my personal experience it’s rare that a single point energy in hartrees is useful as it always needs converting.

Yes, but most people aren’t computational chemists :slight_smile: Even leaving aside the fact that kcal is a stupid unit, it’s use is mostly restricted to chemistry and even within that is most prevalent in comp chem. Avogadro should be broader than that. And I know in the US everyone still uses kcal/mol for e.g. bond enthalpies, but I can tell you that in the UK all chemical education at all levels speaks in kJ/mol. I only had to deal with kcal when I went abroad, and British comp chemists end up switching during their PhD lol.

You both are computational chemists so no doubt have different experiences and needs, but that’s my point – different users need different units :slight_smile:

IMO kJ should be default, because SI and IUPAC, but as long as it can (at some point) be changed I don’t mind hugely.

This information is not currently stored within the CJSON anywhere, so I imagine it would be quite complex and quite a change to store it for each value. It is easier to store in hartree (like now) and convert on-the-fly for display. And I think users would rather get the result in the units they like rather than the units that the orca/gaussian/turbomole/xtb/molden/gamess/quantum espresso authors like, especially people who work with more than one program!

1 Like

Re: spin/multiplicity/unpaired electrons – radical chemistry is literally my field so I have fairly strong opinions on this…

Net/Overall/Total Spin are all fine, but

definitely not. These are too loaded as terms. Many people don’t think of metals with unpaired electrons as radicals; some people like to try to distinguish between biradicals and diradicals; and so on. Besides, “neutral” for a spin of 0 would be incorrect, as a neutral radical is of course common – “closed-shell” would be more appropriate.

So “Net Spin” is one option, with a numerical result; another would be “Multiplicity” or “Overall Multiplicity” (in which case I would also have it be “Overall Charge” to match, as “Total” and “Net” don’t work as qualifiers for multiplicity).

On balance I prefer multiplicity over spin, because it is unclear what a spin of 1 actually means without clarification – would a radical have spin 1 or spin 1/2? Multiplicity avoids that ambiguity. I think “overall” and “net” are better terms than “total”, as it better conveys the fact that it is the overall state of the system rather than the sum of the parts, which is an important distinction if the user has two separate radicals, for example.

I don’t like this suggestion of referring to singlet etc. as the spin. If we want to use the named terms, best to call the field multiplicity.

An option would be to have two rows, one for net/overall spin, and one for (overall) multiplicity, where the first uses an integer and the second a text label. The two would then be linked so that changing one would automatically change the other. This would be maximally informative to novice chemists, and automatically provides clarification on the “what units is the spin in here?” issue for experienced chemists.

No, definitely not, sorry. The multiplicity doesn’t necessarily follow from the number of unpaired electrons. A diradical has two unpaired electrons but it could be a triplet (multiplicity=3, spin=2, I=1) or an open-shell singlet (multiplicity=1, spin=0, I=0).

Each of the following points can be discussed. Since this widget is “an identity card like” summary of the present molecule and in case Avogadro detects connection to PubChem’s servers (I presume the database accessed when loading a structure by name), could Avogadro check (and if present fetch)

  • a first CAS registry number? E.g., glucopyranose

    I acknowledge CAS numbers can be bit “tricky” – not every record in pubchem has one, “on occasion” there are/were more than one per molecule (e.g., titanium dioxide, fullerene-C60) even in the records of chemical abstracts. On the other hand, CAS RN are one of the handles for subsequent search in literature databases/the catalogues of suppliers, etc.

  • provision of the canonical SMILES and standard InChI string (excluding auxiliary layer) as a description complementary to a CAS number, and actually retaining some chemistry information. They equally are handles for subsequent searches e.g., WIPO patent database and somewhat are complementary of each other (the discern of tautomers works better with SMILES, than with standard InChI excluding for instance the optional fixed-H layer). These could be computed locally (openbabel) – even beneficial for compounds constructed in Avogadro which don’t have (yet) a record in PubChem.

    The hashed InChI key (presumed computation from the standard InChI string), though it no longer contains chemical information one can recover might be appealing because of its uniform shortness. On occasion already seen as a criterion to search e.g., on Sigma-Aldrich and chemspider, or as part of the identification (PubChem, Leo Paquette’s Encyclopedia of Organic Reagents (example), etc.)

I don’t think we can easily get the CAS number from PubChem.

Here’s an example JSON that we get back from PubChem (for caffeine)

There are a few properties from PubChem, and I could include a link to the PubChem CID or SID record, but CAS isn’t one of them:

I suspect Chem Abstracts Service doesn’t want programmatic access to CAS numbers outside of their tools like SciFinder.

@brockdyer03 was asking about SMILES… I guess I don’t know about including SMILES and InChI since they’re already available in the edit menu.

On occasion Python-implemented cirpy can lend a hand here:

$ python
Python 3.12.6 (main, Sep  7 2024, 14:20:15) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cirpy
>>> cirpy.resolve('Aspirin', 'CAS')
['50-78-2', '11126-35-5', '11126-37-7', '2349-94-2', '26914-13-6', '98201-60-6']
>>> 
>>> cirpy.resolve('Coffein', 'CAS')
['71701-02-5', '95789-13-2', '58-08-2']
>>> 

though it didn’t see for long an update (I speculate its fate is similar to the one of PubChemPy initiated by the same author).

I suspect Chem Abstracts Service doesn’t want programmatic access to CAS numbers outside of their tools like SciFinder.

CAS’ tiny preview of 500k entries on commonchemistry echoes a bit the long discussion if/how many of the CAS RN should be public for use on Wikipedia’s property boxes for chemicals. CAS (to some degree understandably) will not give this away for free, even if the real value results in the data associated with this book keeping number.

I retract the proposal to display SMILES and InChI in multiple spots of the program.

Good point. That would require a separate network request (to the NIH resolver) but could work.

I know Matt. He’s now at DE Shaw. So far neither has needed a significant update. I’m sure he’d happily merge fixes even if he’s not actively developing those tools.

I retract the proposal to display SMILES and InChI in multiple spots of the program.

I’m definitely open to feedback on this. If there seems to be interests, I can definitely add them.

Right now, I’m going to merge the feature as-is, but I’ll tag CAS number as a feature request.

Speaking of which … any idea how to pick one from multiple CAS numbers? The smallest numbers? (e.g., it’s always #-#-# so sort them based on the first, second and third entry)

Can I suggest we add a new molecular identifiers dialog where all that stuff can go? And the automatically fetched name that’s currently in the properties dialog can move there too? That way everything that requires an internet connection can be separate.

Identifiers are after all not really intrinsic properties of molecules :slight_smile:

Given the issues we have with Python stuff still it’s sensible imo to not have things that don’t need Python to be affected by its presence or not.

There wouldn’t be any Python - there’s just a network connection.

I don’t know about the identifiers, but a lot of users like the name in the properties window.
(Adding a web link to the corresponding PubChem page could be useful though.)

There is an open feature request for Python properties though - basically call the script periodically when the molecule changes to calculate new properties.

e.g Property Calculator Scripts · Issue #1439 · OpenChemistry/avogadrolibs · GitHub

Looks interesting, good idea.

I was speaking mainly in regard to the Python things that @Thomas was suggesting adding :slight_smile:

A bit late to the party here, but I have some feedback on what @matterhorn103 has mentioned about units and whatnot.

I definitely agree that, if possible, it should be configurable by the user, perhaps a drop-down menu of units in the dialogue would be nice, maybe something else.

To this point, I think that while the hartree is only really used by computational chemists, I think that for single-point energy it is really the only unit that makes sense. Consider a simple case, the single-point energy of plain coumarin. At the wB97X-D3BJ/def2-TZVPPD level of theory, its single point energy is -497.395614038258 hartrees. If we convert that to say, kJ/mol, its single point energy is -1305912.184657446379. In my opinion, the hartree as a unit makes the actual number a lot easier to understand, since in general no energy in hartrees will exceed 10^4, while in kJ/mol energies can go as high as 10^8. So while the unit may be a bit obscure for those who are unfamiliar, it is much more legible and is indeed the standard for SPE. I will also say that this point is completely moot if there are options to change units in the program.

For any matter related to spin, @matterhorn103 is the person who I trust to give good insight. It is his research after all!

I’ve had some more time to think here, and I still stand by my desire to have molecular identifiers in the properties menu.

The reason I am not a fan of this approach is because of the user experience. One thing that I particularly appreciated about Avo1 was that there was some information that was displayed by default on the screen with the molecule. This meant that I didn’t have to go hunting through menus to find the information that I wanted since it was all right there. This is something that I really think should go into Avo2, perhaps the same way that many menus can sort of “dock” into the sidebar, it should also go there so that things are very accessible.

The more dialogues we add, the more people have to work to get the information they want.

Well the number of significant figures you need is the same so I’d say the difference is just where the decimal point goes and what you’re familiar with.

I agree. A while back I suggested a new pane, that would go on the right by default, with a tabbed interface. My idea was that when all these dialogs that contain data or information are opened, they would open not as a floating dialog but instead as a new tab in that pane. I thought that would make switching between each properties window and various calculation results views easy. Like you say, it’s about reducing clicks.

Voilà, one of the motivations of InChI. Because it is up to CAS Columbus to assign and to retire the registry numbers at their discretion (example C60 fullerene – right hand, bottom – currently filed as 99685-96-8), eventually, one has to go back to the source and their (subscription based) records. Sometimes other databases (see the record of chemspider) or Elsevier’s Reaxys retain a trace of previously assigned numbers, catalogues hopefully are updated in timely manner. But I’m not aware about a program/a script to automate this.

Sorry, I missed the mention here. I agree with everything said about multiplicity ambiguity, CAS numbers, and possibly SMILES and InChI.

If you choose to also add things like the dipole moment or the polarizability, it would be neat to also control the visibility (on/off) of the 3D vector from somewhere in this window in addition to the usual mechanism. That could ease the connection between “just a bunch of numbers” with the actual system.

Other people probably like having it, but I find “total energy” meaningless for so many reasons that I’m sure we’re all familiar with. The lack of context necessary to assign any usefulness to the (electronic) energy is pronounced here since you’re likely only looking at the one system in Avogadro, or maybe two side-by-side, as opposed to a spreadsheet or table. Having the thermodynamic quantities (free energy, enthalpy, ZPE) if present would be more useful.

To expand further on spreadsheet or table, having the ability to export this window to CSV or similar could be valuable. This was a requested feature for IQmol by people who didn’t want to write code.

2 Likes

Almost certainly. Of course the gen-chem folks at Pitt would like the dipole moment here. (Which is turning out to be more annoying than I thought to add to the new electrostatics system because it’s hard to update interactively.)

Yes. You should already be able to copy or right-click to copy / export from the window.

1 Like