"Good First Issues" - First Steps to Contributing Code

ghutchis · October 19, 2024, 2:29pm

Several people have contacted over the week, interested in contributing, but having questions about how to get started on a large codebase like Avogadro.

I’d like to ask @matterhorn103 and @brockdyer03 and @perminder-17 for some feedback on this. My suggestion might be something like this:

Make sure you can build the project from source. Ask questions if you’re having trouble getting that to work.
Focus on a small change. Fix a typo somewhere
Pick a small code change (e.g., updating QRegExp code to the new QRegularExpression

Moving beyond this, there are some “Good First Issues” but many of them that I’ve marked may require explanation of what needs to be done.

As far as the QRegularExpression pieces, I see these classes / files as the highest priorities:

coordinateeditor
forcefield
inputgenerator
interfacescript
molecularproperties (see WIP: Improve the Molecular Properties Dialog by ghutchis · Pull Request #1745 · OpenChemistry/avogadrolibs · GitHub)
obprocess

The other generally useful “first step” is using clang-tidy particularly any of the bugprone checks, fixing warnings, and submitting a pull request for testing.

The key thing is that I don’t know what I don’t know (i.e., what is confusing to interested newcomers? what would be helpful to know about navigating the codebase?) I’m happy to write up more docs, but I could use pointers in terms of what to write…

matterhorn103 · October 19, 2024, 3:48pm

Great initiative!

I’d agree with this approach. Can’t think of much to add except that it pays to briefly familiarize yourself with Git before you actually make changes to the code, and in particular to make sure you sign every commit using -s with e.g. git commit -s -m "Message", as that saves pain later.

Speaking from experience and from exchanges with @brockdyer03, I’d say by far the two most difficult things are:

Getting the project to build in the first place

To improve this, Brock and I have been making detailed notes of the steps we had to take (from a blank canvas), the issues we encountered, and the solutions we found during building. We’re now working on expanding the build documentation significantly to guide beginners like us through the process in a much more in-depth and platform-specific way, so hopefully that will help a lot.

That work is not finished yet though, so:

to any newcomers: please feel free to get in touch with either me (particularly for Linux) or @brockdyer03 (particularly for Windows) if you need any help compiling Avogadro!

With your macOS-specific experience as well, Geoff, I think we should have most bases covered. I know Windows was a bit of an uncovered area before, so it’s great to have Brock working on Windows.

perminder-17 · October 19, 2024, 3:55pm

Thanks, @ghutchis , for bringing this up. From my perspective, I’ve been consistently interacting with new contributors, helping them build the project and providing guidance wherever possible. Based on my experience, here are some observations:

When contributors start exploring the repository, they often try to build it right away. However, on the AvogadroLibs repository page, there’s a link under the installation section that points to “Building OpenChemistry” (GitHub - OpenChemistry/avogadrolibs: Avogadro libraries provide 3D rendering, visualization, analysis and data processing useful in computational chemistry, molecular modeling, bioinformatics, materials science, and related areas.), but this link is currently broken, causing confusions (https://two.avogadro.cc/install/build.html)

I’ve noticed two more issues faced by newcomers. First, during the --recursive cloning process, many fail to set up the entire OpenChemistry repository correctly due to incomplete SSH configuration. Second, some mistakenly create the openchemistry-build CMake folder inside the main OpenChemistry folder, likely because the instructions are not entirely clear to them.

The build documentation on the OpenChemistry wiki (Build - wiki.openchemistry.org) also needs an update. For example, the command:


git clone --recursive git://github.com/OpenChemistry/openchemistry.git

is outdated, as we no longer use git://github. It should be replaced with:


git clone --recursive https://github.com/OpenChemistry/openchemistry.git

using the HTTPS protocol.

These are the primary issues preventing new contributors from successfully building the project. It’s essential that they feel encouraged to ask questions when they encounter problems. Thanks so much for raising this and to any newcomers: please feel free to get in touch with me as well with any issues you face.

matterhorn103 · October 19, 2024, 4:04pm

(Following on from my previous comment…)

Understanding the codebase

The project is honestly very complex (or at least appears it to a chemist like me) and the roles and responsibilities of the different modules, the interplay between things, and what controls each aspect of the GUI, are difficult to understand.

This made it challenging for Brock and I because sometimes we know what we want to do, but don’t even know where we should start looking for the responsible code. You’re always happy to point us in the right direction but you’re also busy!

I guess 2) is a difficult thing to improve, but it would really make (/would have made at the beginning) a world of difference.

If you could take a small amount of time to write a documentation page that gives a bit of an overview of the project structure, that could go a long way honestly. As in, how the project is split up, why it’s done that way, and what avogadrolibs, fragments etc. actually do. I started writing one at Project Structure — Avogadro 1.99.0 documentation, but didn’t get very far (I just now realised that the state of that page is one of the reasons that PR you merged was a WIP…).

Beyond that, obviously full documentation of the code/API would be ideal. There’s a fair bit at Class List — Avogadro 1.99.0 documentation, but it is truly a bit impermeable to people unfamiliar with the code. If you’d like my suggestions on making it more useable:

For a start I’d like to change the section title from “Class List” to “API Documentation”, if that’s ok with you, as for a long time I didn’t even realise that it was full documentation and not just a page with literally a simple list of main classes!
It could do with an organizational structure that better reflects the codebase structure
- I think it would be good if there was a section under the API docs page for each Git module (avogadrolibs, avogadroapp etc.), even if the sections are empty or marked as WIP, just so that the structure of the project is clear
- Best also if the subsections for the modules within avogadrolibs are titled with the actual names of the modules as used in the code e.g. call it Avogadro::Rendering rather than “Rendering Classes”, both to make it clear that the structures correspond but also so that if I am in the avogadroapp code and see that Avogadro::IO:FileFormatManager is being used, it’s quick to find
The word “Class” is also in every single page title and is just visual noise that makes it harder to find stuff, as literally everything documented is a class – is there a way to change that and just use the name of each class alone?
Each module’s page (e.g. Core Classes — Avogadro 1.99.0 documentation) is just overwhelming
- The methods of each class don’t need to be listed, it makes it very cluttered
- It would be good if each class had a brief description of what it does – many of the classes already have their docstring displayed at the top of their page e.g. Class Avogadro::Core::Array — Avogadro 1.99.0 documentation – can the docstring, or the first sentence of it, be automatically used as the summary for it on the module page?
- Basically I mean like Qt Core C++ Classes | Qt Core 6.8.0. The Qt docs are great and it would be amazing if the Avogadro ones could be similar

While some modules like Core are liberally commented and well-documented, I found that Avogadro::QtPlugins in particular wasn’t. The stuff in there is quite frequently used, sees a fair bit of development, and contains much of what new contributors would likely want to work on (menus, panes, widgets, GUI stuff in general). So that could help a lot too.

matterhorn103 · October 19, 2024, 4:19pm

I recently took out the link to the OpenChemistry wiki from the Avogadro docs for that reason, and if Geoff agrees, I think it would be a good idea to change the GitHub readme to point at the more up-to-date Avogadro build guide instead.

Your link is broken because the build guide got moved to https://two.avogadro.cc/develop/build.html

I agree, and that’s something I intend to change the build instructions for to cover in detail. I just haven’t got to the second half of the page yet.

perminder-17:

the command:
git clone --recursive git://github.com/OpenChemistry/openchemistry.git
is outdated, as we no longer use git://github. It should be replaced with:
git clone --recursive https://github.com/OpenChemistry/openchemistry.git
using the HTTPS protocol.

Thanks for spotting this, I can change that at the same time as everything else.

ghutchis · October 19, 2024, 6:16pm

Sure. I’ll see if I can spin up some pieces into that project structure page.

Sure, no problem.

We don’t have docs for avogadroapp in the API because it’s kinda a separate thing. In probably 80-90% of cases, people should be in avogadrolibs.

Dunno. That part is generated by doxygen which reads the code and then breathe which generates the various Sphinx docs. There’s probably a way to insert some Python as the docs are generated by breathe before the website is built?

The reason that section isn’t documented is because it’s all pretty specialized. You said the API documentation was overwhelming. So you want docs for another ~70 plugin directories?

A brief overview of the plugin directories is probably a good idea (at least general categories).

I just submitted a small patch that should take care of a few things…

brockdyer03 · October 20, 2024, 12:10am

I have been very busy with research and classes lately, so haven’t had a lot of time to look at code. I am a bit surprised this post slipped through my net!

I have a lot of thoughts, firstly being that I am extremely grateful that this is something you’re (@ghutchis) concerned about! I love Avogadro, and while I may say it a lot, I could not count a single day in probably the past year that I have not needed to use it. There are a lot of things that I think about, a lot of bugs that I encounter, and some things I feel like would be an easy fix, while some seem so intractable that I could never even know where to start.

I’ll take a change for me that seemed like it would be extremely easy to deal with at first and use it as a case study: the molecular properties dialogue. (this is going to be a long one)

At first I thought this would be an easy thing to work on, I just need to know how to add elements and pull the variables to populate it. I realized pretty quickly that the first step would be to find the correct file to work on. I searched through the GitHub page, looking first in avogadroapp, since I figured that UI stuff would be an application side endeavor. That wasn’t right, so then I looked into avogadrolibs, and went into the avogadro file within. After that I figured, okay, this is a Avogadro thing, so lets look in core or rendering, but neither of those were right. Then searched through qtgui since this is a UI thing, but no luck there either. Then I made my way to qtplugins where finally I found the file called molecularproperties, exactly what I wanted! It only took me around an hour of poking through things, which while it wasn’t a huge amount of time, was a bit tedious. That was the first step.

Next was trying to figure out what I needed to do next; do I need to look through the molecularproperties.cpp, molecularpropertiesdialog.cpp, or molecularpropertiesdialog.ui file? My solution was to just check through all of them and see if I can understand what was happening. The first stop was the plain molecularproperties.cpp file, which is pretty short, and I felt somewhat confident after reading that I knew slightly what was going on there. Next up was molecularpropertiesdialog.cpp, which is a very different beast. This had a preamble that pulled in 16 different headers or Qt pieces, which seemed fine, and I recognized many of those pieces. I started reading through the file, and immediately realized I had no idea what questions I even needed to ask to get started. Some parts seemed easier, like MolecularPropertiesDialogue::updateLabels(), which made some sense, all it did was clear labels and reupdate them using some of the other functions in that file. I used some of these as a foothold, and I do think that, for the most part, I understand that file. Next up on the chopping block was the .ui file. I really do not know where to start for this file, do I need to just add stuff and it’ll get handled, or do I need to change the .cpp file to get it to register properly?

Then it hit me, I know that I want the dialogue to display information about an output file if I am opening it, but I have no clue how that works. I’d like to display stuff like single point energy, thermodynamic information, HOMO and LUMO energies, stuff like that. That all comes from an output file, but how do I pull that information from the output file and put it in there? What do I need to put in the preamble, what is the object that I need to call, do I need to make all that stuff optional so it isn’t there if it isn’t an output, and if so how? Those questions need answers, but who do I ask? I don’t know anyone versed in C++ and Qt, nor do I have the technical skills (yet) to do the asking myself, not without spending hours and hours on it, definitely not time I have available to me.

Okay, that was a lot of text. As a TLDR, my main problems are the following:

Inexperience with C++ (self problem)
Inexperience with Qt (self problem)
Confusion around where the files for a feature are located in the codebase
Confusion about what objects exist (for example, something like m_molecule, what does it contain, and where do I find out what that object looks like)
Confusion about how to call things
Issues in running tests
a) I could use my typical method for working on code, just change something and see what happens, but as far as I know, I have to recompile and rebuild the whole thing to test a single change, which takes a huge amount of time.
b) If there is a way to only change a single bit of code and not have to recompile the whole project, that would be a game changer, for me at least.
Commenting of code seems to be extremely low, in my experience reading through it, I often don’t even know the purpose of entire .cpp files.
a) An example is something like viewfactory.cpp, which seems to do nothing at all, but is referenced in other files like multiviewwidget.cpp.

Goodness, that TL:DR is also pretty long. If I had to say 2 things that would make everything easier it would be better documentation and an easier way to test changes locally.

I also felt like I rambled a lot there, so if anyone is confused please let me know!

ghutchis · October 20, 2024, 12:34am

Well, the point is to post the intractable ones. If not here, then on GitHub issues so they can be addressed. (Some intractable things might be easy for others, while some things take more time / effort.)

Unfortunately, that’s not how C++ works. But in most cases, it shouldn’t rebuild much – the files you changed, and then a bunch of linking. In general the time for ninja && prefix/Avogadro2.app/Contents/MacOS/Avogadro2 is maybe a minute or two for most changes. It’s longer if I change some base class in core but that’s the main reason for all the subdirectories – it makes it faster to build.

(Ninja is also great because it uses as many cores as possible to build in parallel.)

Then it hit me, I know that I want the dialogue to display information about an output file if I am opening it, but I have no clue how that works. I’d like to display stuff like single point energy, thermodynamic information, HOMO and LUMO energies, stuff like that. That all comes from an output file, but how do I pull that information from the output file and put it in there?

Yeah, it’s a bunch of steps for this particular case, because you need “plumbing” to go from the file into the Molecule class and then to the dialog.

First off, let’s decide on a few properties you’d want to parse, because we’ll need to at least add that to the parsers (e.g., Orca)
As it turns out, the Molecule class has a flexible key / value property system for this kind of thing (setData("name", property) and data("name"). I’ve found a few bugs recently, but it works, e.g. “name”, “totalCharge” and “totalSpinMultiplicity” already use this.
Then there’s the dialog. Unfortunately, it’s not super flexible right now.

Property	Value
Name	ethane
Formula	C2H6
Molar Mass	30.069 g/mol
Number of Atoms	8
Number of Bonds	7
Dipole Moment (B3LYP)	0.0
(more properties)

So the idea is that the dialog loops through anything in the Molecule::dataMap() list automatically. Some properties should be editable (e.g., if someone wants to add new properties).

It would also look and feel a lot more like the other property tables.

But the main thing is that with any large codebase, it either takes experience or some tutoring to know where to find things. I’ll see if I can provide a better “map” in the docs, but sometimes tasks like you describe require a few pieces.

matterhorn103 · October 23, 2024, 11:11am

Sorry, meant to reply to this the other day.

I had a look at all that and it’s all good stuff. I don’t know what others think but I find it’s made such a difference to ease of navigation. You can actually see the contents of things at a glance now, so it’s not overwhelming in the same way. And the overviews are super helpful

But shouldn’t it be documented somewhere? If nothing else, to aid us in working out where the avogadroapp/avogadrolibs boundary lies? Like I mean, it’s also a critical part of “Avogadro” the desktop program, so enabling the automated docs generation doesn’t seem like a bad thing.

Well they don’t have to be covered in detail, but I don’t see a reason not to have doxygen/breathe automatically generate the respective pages, even if there’s no information on them. Now that the classes of a namespace are only shown on a namespace’s page, and the methods of each class are only shown on the class’ page, I don’t feel that it would make the overview pages any more overwhelming.

You say that but it has a lot of stuff for the GUI in there – all the tools are in there, for example, as are many of the dialogs. It’s the main place both I and @brockdyer03 have tried to make changes, so it’s hardly obscure. I’m not sure it’s reasonable to consider the tools, for example, which is what users interact with the majority of the time, to be specialised.

The renderer and the core backend logic of Avo 2 are really super nice already, so I and others are wanting primarily to help shape the GUI around or add functionality that builds on that. And from experience, some of the points we’ve mentioned make that in particular a bit challenging. We very much rely on you taking the time to give us help and advice.

I thought about giving a detailed example in the same manner as Brock, but I really don’t want to come across negative with walls of text! So suffice it to give one prime example of something that is currently pretty tricky, and that’s trying to make head or tail of the code for a given tool; the relevant code is spread over many classes and files, and there’s not a whole lot in the way of docstrings or comments to point you in the right directions.

ghutchis · October 23, 2024, 7:58pm

I’ll write up a bit more on the boundary including a brief description of the files and key functions in avogadroapp. But I’m not sure the auto-generated docs are going to be particularly helpful – not the least because there’s a lot less in the way of docstrings in avogadroapp. I can see what it generates though…

Hmm. I’ll see what qtplugins generates, but I think a general overview might be more useful than whatever the auto-generated docs turns up.

Right. That’s where it’s time to ask questions

ghutchis · October 28, 2024, 3:47pm

Okay, I’ve added:

https://two.avogadro.cc/develop/avogadroapp.html#develop-avogadroapp

And for qtplugins, I tried to categorize. Suggestions are welcome there…

https://two.avogadro.cc/develop/qtplugins.html

matterhorn103 · October 29, 2024, 10:14am

This looks really helpful, thanks!

Geoff has now fixed this! Which is genuinely a massive QOL improvement and huge reduction in barriers to contributing, so thanks!