Developing a chemical visualization software

Hi there,

I am a high-school graduate who is looking into developing a chemical visualization/simulation software like Avogadro, Vipster, ChemDraw, LAMMPS, etc.

What colleague course should I take and what do I have to know in order to develop one?

Thank you.

The question addressed is difficult to answer because you did not include if/how much of your background is chemistry, if you already have some familiarity with a programming language (or multiple), and if this is for a short time project (like Google Summer of Code for a couple of weeks), or longer.*

Thus I would recommend to see how programs already in the field tackle the same question. For such a comparison, ChemDraw may be interesting for it’s surface (sample page) vs. Marvin (sample page) with a sub-set of functionality compared to the full versions of their sketchers – but usually, this does not show you the underlying source code. In this regard, it would be a good choice to look for programs which are open source you find for example hosted on GitHub, GitLab, GitBucket, or on the web page of the developers. As a few examples of 3D visualizations, you have Avogadro (preference to C++), or Jmol (Java) with its outreach to JavaScript as JSmol, or chemdoodle’s webcomponents as freely available library e.g, for web pages. It just happens the the three (in this sequence) look like of decreasing complexity to get familiar enough to build something useful.

It just happens that on August 2nd Robert Hanson (one of the main developers of aforementioned Jmol/JSmol) asked in the Jmol users mailing list for favorite sites about chemistry; the ongoing replies to his question yield examples not only about Jmol (like symmetry@otterbein).

*) Kevin Theisen, who initiated Chemdoodle, for example is a chemistry graduate of Rutgers, NJ.

There are two approaches to that.

  1. You have a very specific need that you don’t think is met by existing software.

@Thomas mentioned Kevin Theisen, who started ChemDoodle because, in part, he thought ChemDraw was severely over-priced and he initially thought he could create something better and out-innovate ChemDraw.

When I wrote the initial design draft for Avogadro, there were very few molecular editor / builder programs and none really worked as a “molecular sketchpad.” Lots of visualization tools, but nothing that would let me just sketch out a new molecule.

At that point, the particular programming language or platform is less important, because the idea is new and you’d likely be starting from scratch. You’d want some level of programming proficiency, but that may not directly map to computer science classes. (For example, maybe you decide to write in Python, but your school doesn’t teach CS classes in Python.)

  1. You don’t necessarily need to create a whole new program from scratch. Many chemistry-related programs are open source – you mentioned 3 in your list. (I hadn’t heard of Vipster, thanks that looks interesting.)

In that case, you may want to learn some level of coding experience in Python or Java or Fortran or JavaScript … whatever language or framework is used by your program of interest. That’s to have some proficiency … but many of these programs have “help wanted” or “good first tasks” or if they don’t you can ask.

We participate in Google Summer of Code, but we offer ways to help get new contributors up to speed … after all there are thousands of lines of code … but you’re still working on an idea … like a better tool or better interface to LAMMPS, etc.

Hope that helps. I’d suggest picking an existing package or two and seeing if there are small ways you can help at first.

When he started, ChemDraw was sold on CDs and some functions perceived very useful (e.g., reaction balance

acd-chemsketch-reaction

(reaction balance in ACD ChemSketch, image credit to academicsoftwareblog)

in day-to-day work were scattered over different license levels like standard, professional, ultra, etc. Among the points Kevin Theisen recognized early was to simplify this to one level, only with different licenses if you purchase this for you own, or for a group where in return you get a license code you enter during the installation. And if you move to a different computer (e.g., replacement by a newer one), you simply detach the previous one from this code and carry the code to the new computer/other OS. Speaking of the later equally is an other bonus of ChemDoodle vs ChemDraw because ChemDoodle is aware of so many file formats and equally works well on Linux the computational chemists like to use; thus, you bypass the hassle of ChemDraw and wine there (reference).

Thank you @Thomas and @ghutchis for your responds and resources. Looking at existing projects is a good idea.

I am competent in high-school chemistry and is highly interested in this subject. I’m only beginning to learn programming in C and I know how to use git. The project is planned to be a long and extensive like ChemDraw and ChemDoole.

This is actually my case. What I understand from your post is that the choice of programming language is not as important as being proficient?

No; typically, you don’t need to know the minutiae of the language of implementation to start, especially if you work together with the maintainers of an already existing program.

Some of the programs mentioned in this thread have a long history and were written to address a specific problem. In the case of ChemDraw, for example, because drawing chemical reaction schemes with ink and a skencil

stencil

(image credit to kenemak, application video of a small scale skencil for a ballpoint pen)

to submit your (typewriter) written publication to the publishers is tedious, time consuming, prone to errors, and cut-and-paste literally requires a pair of scissors and a bottle of glue to get an illustration ready to be photographed (on film, analogue process). So if you are a prolific author, like David Evans with his large group surely was, this is a thing where a computer program may help you a lot.[1] Or, to use Al Sweigart’s word about Python, to (partially) “automate the boring stuff”.[2] Here, ChemDraw offered a significant an improvement.

So, Geoffrey’s reply is more like a question to you: is there a specific gap you recognize, a missing functionality in a program you use? Is it possible to amend the program e.g., by a supplementary template (like for ChemDraw, example)? If the source code is free and open, learn the language of implementation and get in touch with the maintainers. Since you already know about git, reporting an issue, and preparing a pull request for an improvement/a new feature on e.g., GitHub would be a next step to take.

For a single person, writing something in scale of (contemporary) ChemDraw may require much (as in too much) effort, than the addition of a feature to an already existing program. In addition, beside sharing insight and the work ahead, other missing features may be recognized faster.

[1] History of the Harvard ChemDraw Project, Angew. Chem. Int. Ed. https://doi.org/10.1002/anie.201405820
[2] https://automatetheboringstuff.com/

The programming language can be important if you’re integrating into something else. LAMMPS, for example, is written in C++, so contributing to it requires some proficiency in C++. But if you’re starting something from scratch, you can use … whatever. I’ve seen chemistry tools written in Python, C / C++, Fortran, JavaScript, Julia, Rust … probably a few others.

Sometimes, undergrad or graduate students add particular features or packages as part of their research. In that case, it might depend a bit on the coding language used by the group.

In any case, I think the most important question is “what is the unmet need?” As @Thomas has outlined, this is usually the motivating factor for either starting a new program or adding features to one.

  • ChemDraw = computer version of chemical stencils
  • ChemDoodle = simpler / better / cheaper version of ChemDraw
  • Jmol / Jsmol = chemical visualization in webpages, including animations (replaced Chime / Rasmol)
  • Avogadro = need for 3D molecular editor / builder
  • LAMMPS = need for high-performance molecular dynamics

I can’t speak for others, but I wrote out a “design document” for Avogadro - what I thought were critical features and parts of how the code would work (e.g., types of plugins).

I would suggest thinking about the core / minimal features. It’s easy to think about a long and extensive list. Start small, ship it, and grow.

1 Like