Python help wanted - minimize SVG (vs. PNG)

ghutchis · June 26, 2025, 5:16pm

tldr, I’d like someone to help write some Python to scour, simplify, and compress SVG depictions of our ligands and fragments. Should save a bunch of space…

In the new template tool, there are preview images for the ligands, functional groups, etc. e.g.:

Both @matterhorn103 and @thomas have helped with the scripts:

cleaning and minimizing the cjson files
cleaning up the image generation

Since SVG support is generally good (e.g., we’re using SVG tool icons) I think we should drop the png files in favor of directly using svg depictions.

Here’s where I need some help. The current SVG depictions generated by RDKit can be optimized a lot if someone can help with some Python.

There’s a Python module scour that reduced an image 57% (e.g. PNG is 10K vs. 3K compressed SVG)
There’s clearly irrelevant info because these are auto-generated, e.g.

<path class="bond-0 atom-0 atom-1" d="m21.6 106.6 15.1-8.7" fill="none" stroke="#191919" stroke-width="2px"/>
<path class="bond-0 atom-0 atom-1" d="m36.7 97.9 15.1-8.8" fill="none" stroke="#000" stroke-width="2px"/>
<path class="bond-1 atom-1 atom-2" d="m54.4 90.6v-12.4" fill="none" stroke="#000" stroke-width="2px"/>

Clearly all the black bonds, red bonds, etc. can be compressed to one class, and then fill, stroke, etc, can be simplified.

Anyone have a bit of free time? I can send a bunch of SVG depictions as examples:
svg.tar.gz (12.8 KB)

Thomas · June 26, 2025, 6:40pm

I’m a bit on the fence here; because one one hand I like the highlight by CPK-like colors on N and O for an easier discern while on the other F, Cl; S in front of white background (either paper, or mirror-like laptop screens in bright environment) are less intelligible.

Inspection of the test svg shared by you defines many coordinates with four decimals; likely, this high precision isn’t necessary (and eventually, eats storage). I’m aware about svgcleaner archived on GitHub; implemented in Rust, can run from the shell (e.g., bash in Linux Debian after a chmod +x). If you don’t intend to have a git diff view later, it works well even on larger sets of .svg (example).

Attached a MWE with your test data (incl. one file manually compressed another loop in the shell may provide).

smaller.tar.gz (826.7 KB)

ghutchis · June 26, 2025, 7:31pm

We can certainly tweak the element colors for better contrast on white background. (Hmm, seems like another recent thread.) Certainly the yellow S isn’t great.

Seems like the svgcleaner is doing a better job than scour. I was curious for a script because it was obvious that a variety of text was still un-necessary.

So it seems like that’s the best solution - albeit picking some colors with better contrast?

Any suggestions for color replacements? Or should we add a thin black stroke to the letters like sulfur?

Thomas · June 26, 2025, 9:29pm

This is not an easy question, one hand the colorschemes around (CPK, old/new Rasmol as illustrated by Jmol’s documentation, but colorbrewer2 knows only one scale with four qualitative levels considered (by their measures) colorblind safe.

Will the background for the previews remain fix to bright/white (like here light → black characters on white ground) or optional dark (like here to then white (for non cross-linking text) in front of charcoal)?

It seems sensible to check RDKit’s default, bitonal BW, Avalon, and CDK palette (as described by Gregg Landrum’s blog) in front of default (white) or/and other backgrounds with an experimental branch of depict_ligands.py of fragments.

ghutchis · June 26, 2025, 9:45pm

Even though “dark mode” is becoming common, I think it’s better for now to stick to a white background for these. Admittedly if it’s an SVG it would be a bit easier to program color swaps (e.g. you swap the text for #000 and #fff before rendering).

If you can check out a few atom palettes in the fragments scripts, that would be great (or just change a few light elements with setAtomPalette)

It’s also fairly easy in the SVG to add stroke="#000" stroke-width=0.5 px

jmf · June 27, 2025, 11:28pm

Hi @ghutchis (and everyone),
I’ve just opened a pull request that attempts to address this (PR #2071). I’ve wired in scour for now, but happy to swap to svgcleaner if that’s the preferred optimizer. For PR #2071 recently submitted:

Replaces all ligand & fragment PNGs with optimized SVGs

The .png preview icons have been swapped out for .svg
SVGs were generated via RDKit, cleaned/minified (using a small Python script + scour, then pushed through an SVG optimizer

Contains build scripts (in scripts/)

read_cjson.py: extract SMILES from .cjson templates
generate_svgs.py: batch-generate raw SVG depictions from those SMILES
optimize_svgs.py: run scour to strip metadata, collapse IDs, shorten precision, etc.

Updated files

template.qrc & CMakeLists.txt: now reference .svg files instead of .png, and link against Qt’s SVG module
Minor tweaks in templatetoolwidget.cpp to load the SVG icons correctly

File size decrease

On average, each ligand SVG is ~60% smaller than its PNG

I’m still getting up to speed with the Avogadro codebase and C++ development more broadly, so I’d really appreciate any feedback on style, CMake conventions, or anything I may have overlooked. Thanks!

ghutchis · June 28, 2025, 12:04am

Wow, thanks and welcome!

In general, this looks great. I’ll do a code review tomorrow on the pull request. You somewhat did things the hard way (i.e., CJSON ⇒ SMILES).

The various fragments, etc. actually come from GitHub - OpenChemistry/fragments: Molecular fragments and inorganic ligands for rapidly building structures and they’re generally built up from SMILES. There’s a discussion at light revision of and about usage of `depict_ligands.py` by nbehrnd · Pull Request #32 · OpenChemistry/fragments · GitHub with @Thomas recently suggesting switching to plain black-on-white for contrast / colorblind-safe. (This also has the benefit of making it easy to swap “dark mode” with the SVG since the code can just exchange #fff with #000.

But that’s somewhat separate. Updating the SVG is easy once template.qrc and templatetoolwidget.cpp handle these.

No worries. It’s a big codebase. Please ask and I’ll be happy to point you in the right direction or otherwise offer help.