Why did Avogadro switch to/develop cjson?

matterhorn103 · December 14, 2023, 12:05pm

This question is one asked primarily out of interest in the history of the project.

As a bonus, it would help me understand the merits of adopting cml vs cjson for a side project of mine.

Originally Avogadro used cml as its default file format, and did so for a long time. This had various benefits, as described in the Avogadro paper:

Avogadro has used CML [19, 20] as its default file format from a very early stage; this was chosen over other file formats because of the extensible, semantic structure provided by CML, and the support available in Open Babel [51]. The CML format offers a number of advantages over others in common use, including the ability to extend the format. This allows Avogadro and other programs to be future-proof, adding new information and features necessary for an advanced semantically-aware editor at a later time, while still remaining readable in older versions of Avogadro.

From looking at the commit history, it seems like cjson support was added in 2012. I can’t quite tell when it became the default in all aspects of the program, but it seems like maybe fairly recently? However, I can’t really find any record of discussions over the change in file format, either here or on GitHub. So I’d be interested to know what the perceived advantages of creating a new format were, and why cml no longer fit the bill?

From the chemicaljson repo and the old wiki the main things I can see so far are:

cjson is a little more efficient storage-wise, as is json in general vs xml;
if I’ve understood correctly, json is easy to map to C++ structures so parsing it is faster than xml;
support for reading json amongst programming languages is better than xml.

I can see why these would be nice for database or ML applications but they don’t seem like a huge deal for Avogadro, or were they? And the big downside was presumably loss of compatibility with applications using cml, which seems pretty significant.

ghutchis · December 14, 2023, 3:07pm

There’s some discussion in this paper: Open chemistry: RESTful web APIs, JSON, NWChem and the modern web application | Journal of Cheminformatics

JSON has wider support, including in databases (Postgres, Mongo, etc.)
There are a few ways to write compressed binary JSON if needed / desired
We got fed up with how CML was being updated / maintained (e.g., it’s essentially dead)
We’ve been involved with several efforts to have an open standard format for chemistry interchange (esp. in computational chemistry)

MolSSI had a few workshops to encourage a new JSON-based format … and they have their QCSchema format, but frankly, it’s been hard to work with them.

So we’ve been working with cclib and anyone who’s interested in adopting and updating cjson for other use. Open Babel 3.2 will support full read/write with Avogadro, and IIRC the Grimme tools now support cjson as well.

matterhorn103 · December 14, 2023, 7:52pm

Thanks, this was insightful

I saw this too, yes. Nice to see it supported elsewhere I imagine, shows it has wider appeal.

ghutchis · December 14, 2023, 9:55pm

One thing I intend for 2024 is to publish an article on the cjson format (somewhere) and to push it a bit more outside of Avogadro.