This question is one asked primarily out of interest in the history of the project.
As a bonus, it would help me understand the merits of adopting cml
vs cjson
for a side project of mine.
Originally Avogadro used cml
as its default file format, and did so for a long time. This had various benefits, as described in the Avogadro paper:
Avogadro has used CML [19, 20] as its default file format from a very early stage; this was chosen over other file formats because of the extensible, semantic structure provided by CML, and the support available in Open Babel [51]. The CML format offers a number of advantages over others in common use, including the ability to extend the format. This allows Avogadro and other programs to be future-proof, adding new information and features necessary for an advanced semantically-aware editor at a later time, while still remaining readable in older versions of Avogadro.
From looking at the commit history, it seems like cjson
support was added in 2012. I can’t quite tell when it became the default in all aspects of the program, but it seems like maybe fairly recently? However, I can’t really find any record of discussions over the change in file format, either here or on GitHub. So I’d be interested to know what the perceived advantages of creating a new format were, and why cml
no longer fit the bill?
From the chemicaljson repo and the old wiki the main things I can see so far are:
cjson
is a little more efficient storage-wise, as isjson
in general vsxml
;- if I’ve understood correctly,
json
is easy to map to C++ structures so parsing it is faster thanxml
; - support for reading
json
amongst programming languages is better thanxml
.
I can see why these would be nice for database or ML applications but they don’t seem like a huge deal for Avogadro, or were they? And the big downside was presumably loss of compatibility with applications using cml
, which seems pretty significant.