Multiple formats in plugin input/output – some requests

Hi @ghutchis,

I’ve been working fairly intensely on a complete overhaul of the avo_xtb plugin and will be releasing a new version in the coming days. As part of this I have been immersed again in the plugin API as it stands.

I’d like to make some proposals:

As a first small change, could it be possible to request for an arbitrary number of different formats, all of which will be passed in the json under their respective keys?

One way would just be to allow inputMoleculeFormat to be either a single string or a list of strings. Having two valid types might be something you find undesirable though I guess? Only allowing a list would break existing scripts, so if you aren’t a fan of that, I’d suggest a new field in the options print-out; it could be called inputMoleculeFormats plural, or, to be more distinct, something like requestedFormats, i.e.

if args.print_options:
        options = {"requestedFormats": ["xyz", "mol"]}
        print(json.dumps(options))

# or, if you don't hate it:
if args.print_options:
        options = {"inputMoleculeFormat": ["xyz", "mol"]}
        print(json.dumps(options))

gets the plugin

{
  "charge": 0,
  "cjson": {
    "atoms": {
      "coords": {
        "3d": [
          -3.7502126693725586,
          2.830498456954956,
          0,
          -4.15645283088028,
          3.491789003415348,
          -0.5478719938913942,
          -4.068236302382353,
          2.918291411901762,
          0.8908722373377651
        ]
      },
      "elements": {
        "number": [
          8,
          1,
          1
        ]
      },
      "formalCharges": [
        0,
        0,
        0
      ],
      "layer": [
        0,
        0,
        0
      ]
    },
    "bonds": {
      "connections": {
        "index": [
          0,
          1,
          0,
          2
        ]
      },
      "order": [
        1,
        1
      ]
    },
    "chemicalJson": 1,
    "layer": {
      "enable": {
        "Ball and Stick": [
          true
        ],
        "Cartoons": [
          true
        ],
        "Close Contacts": [
          false
        ],
        "Crystal Lattice": [
          false
        ],
        "Force": [
          false
        ],
        "Labels": [
          true
        ],
        "Licorice": [
          false
        ],
        "Meshes": [
          true
        ],
        "Non-Covalent": [
          false
        ],
        "Van der Waals": [
          false
        ],
        "Wireframe": [
          false
        ],
        "undef": [
          false
        ]
      },
      "locked": [
        false
      ],
      "settings": {
        "Ball and Stick": [
          "true true 0.300000 0.100000"
        ],
        "Cartoons": [
          "false false false false false true false"
        ],
        "Labels": [
          "2 0 0.5 255 255 255"
        ]
      },
      "visible": [
        true
      ]
    },
    "properties": {
      "totalCharge": 0,
      "totalSpinMultiplicity": 1
    }
  },
  "mol": "\n  Avogadro\n\n  3  2  0  0  0  0  0  0  0  0999 V2000\n   -3.7502    2.8305    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0\n   -4.1565    3.4918   -0.5479 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -4.0682    2.9183    0.8909 H   0  0  0  0  0  0  0  0  0  0  0  0\n  1  2  1  0  0  0  0\n  1  3  1  0  0  0  0\nM  END\n",
  "selectedAtoms": [],
  "spin": 1,
  "xyz": "3\nXYZ file generated by Avogadro.\nO     -3.75021    2.83050    0.00000\nH     -4.15645    3.49179   -0.54787\nH     -4.06824    2.91829    0.89087\n"
}

for water.

Sure, needing more than two formats is probably not all that common, but I don’t see a disadvantage in this approach, as it allows maximum flexibility and means that plugin authors can benefit from Avogadro’s excellent I/O capabilities without having to convert things themselves.


My second request would be similar, but for the json passed back to Avogadro by plugins.

If anything is passed back to Avogadro it replaces the molecule’s current state wholesale, so only data that the plugin passes back will be retained. If the plugin author has made changes to the bonds, for example, they can’t just pass back a CJSON with bond data – they need to pass back the original atom data too. As a general rule, therefore, my plugin does a calculation on the provided geometry, then takes the input CJSON, takes out values that are no longer physical, supplements it with the new data, then passes the whole thing back, to ensure nothing gets lost.

Having to combine some new data in e.g. molden format with the original CJSON in order to pass back everything in a single CJSON is tedious for the plugin to have to handle. Or at least, much more tedious than it is for Avogadro with its sophisticated IO.

When I raised this you addressed this by adding a readProperties flag, so that when the following is passed back:

output = {
    "readProperties": True,
    "moleculeFormat": "molden",
    "molden": result_molden_string,
}

Avogadro keeps the original atoms and bonds as they are and only reads the various properties. The way this is currently implemented it seems “properties” actually doesn’t mean some_cjson["properties"], it seems to just mean anything in some_cjson that isn’t in the atoms or bonds objects, so it includes frequencies, orbitals, and so on).

Unfortunately this only partially achieves the desired outcome; the atoms and bonds are maintained but any other data that was in the file before will be lost, and only that “property” data contained in the molden file will be available in Avogadro.

Can we please have it so that when the readProperties flag is found, the molecule state in Avogadro is only updated with the new data, and old data that doesn’t have new values is left alone rather than deleted?

Can we please also make readProperties take a string like moleculeFormat does, so that Avogadro can take the data given as moleculeFormat as the base and supplement it with the data given as readProperties?

To show what I mean:

output = {
    "readProperties": "molden",
    "moleculeFormat": "cjson",
    "molden": result_molden_string,
    "cjson": result_cjson,
}

or if we’d like a change of syntax for compatibility reasons

output = {
    "updateFrom": "molden",
    "moleculeFormat": "cjson",
    "molden": result_molden_string,
    "cjson": result_cjson,
}

In this situation, I’d like Avogadro to:

  1. Load the CJSON, just like it would if solely a CJSON was passed back
  2. Convert the Molden-format string to CJSON using Avogadro’s usual IO functions
  3. Add only those parts of the Molden “properties” CJSON to the main CJSON that aren’t already present in the main CJSON
  4. Pull the main CJSON into memory

This would be a great help to plugin authors because it is far easier for them to take stuff out of a CJSON (so that it’ll be overwritten) than add it from another file format.

This isn’t necessarily urgent, by the way, and possibly I could look at implementing what I need myself, I don’t know. I just wanted your feedback on the proposals and on any potential issues with my suggestions before I try to put together any pull requests.

Please no. Right now, we’re passing the cjson to every plugin, plus any format requested by the plugin.

What’s the use case?

The disadvantage is that the Avogadro side becomes more complex, and the JSON passed to the plugin will get bigger and bigger. Let’s take your case with both xyz and sdf (plus cjson). So we write the file 3 times. For a small molecule, no worries. For something bigger (say a polymer), there will definitely be a latency gap.

You want to merge the properties, yes?

I can definitely understand this use case better (e.g., reading orbitals from a molden file or whatever).

Hmm ok, slightly surprised at your opposition.

Sure, but that would then be the problem of the plugin author. I’m absolutely not suggesting passing all formats all the time, only those that have been requested. A plugin won’t request formats it doesn’t need, so if it does request it, surely that’s because it genuinely needs it? So why prevent that?

The Avogadro native format conversion is always going to be much faster than any plugin can do it in Python, and it’s far better tested. :smiley:

I wouldn’t have been motivated to suggest it if I hadn’t wanted it at some point myself, and there was a time when I would have liked a script to get both turbomole and xyz formats passed to it, though that is no longer relevant.

Yes essentially, merge while prioritizing one source over the other in the case of clashes.

Yeah I’d agree this is the bigger issue. And reading the orbitals is bang on, that’s precisely the place where it crops up for me right now – if the user has run a frequency calculation and then wants orbitals, I have no way to preserve the calculated frequencies for them. :frowning:

Maybe I came across too strong. I just don’t understand the use case. So it’s added complexity on the app side – but what does it “buy” you on the plugin side?

I have limited time for coding / testing. So while I understand the reason for merging properties on the read into Avogadro side, I don’t really understand why the extra work is worth it on the “supply a lot of formats to a plugin.”

I’m also just generally wary about potential for bugs / security issues with plugins.

So I’m not opposed. I just don’t understand how it would be useful.

Fair enough, maybe I just underestimate the complexity of what I suggested, sorry. From a naive Python perspective it felt like it’d just be an extra for loop.