I previously discussed atom properties in CJSON e.g.
cjson["atoms"]["properties"] = {
"fukui" = [ 0.0, 1.0, 2.0, 3.0 … ],
"polarizability" = [ 0.1, 0.2, 0.3 …],
}
Obviously, it would be nice to expand this to bonds and residues, since we already support arbitrary molecular properties.
Since the code is in C++ we need to think a bit about types of properties:
- strings (e.g., labels)
- numbers (e.g., partial charges, Fukui indices, micro-pKa, etc.)
- booleans / flags
- possibly vectors (atom dipoles) and matrices (e.g., full polarizability tensor)
One other caveat for the C++ implementation is that we may often want to only add properties to some atoms or bonds (e.g. “S” or “R” labels on chiral atoms). So we need a dictionary / map data structure between the atom or bond index and the property.
This discussion was inspired in part by Greg Landrum (the RDKit developer) posting about partial charges in SD files. RDKit can store arbitrary properties: The RDKit Book — The RDKit 2025.03.3 documentation
Basically, RDKit actually keeps the atom / bond properties with keys like:
atom.dprop.DASH_CHARGES
- the dprop
indicates float / double properties for help with the C++ implementation. (This also makes it easier for JSON validation, e.g., list of strings, list of numbers, list of booleans)
Personally, I’d prefer the CJSON syntax of storing ["atoms"]["properties"]
(i.e., properties belonging to atoms) rather than as special keys in the molecular ["properties"]
block.
I’m also inclined to keep partial charges separately because those are a fairly common set.
Thoughts? Critiques (esp. for access in CJSON or Python)? Better ideas?