Over-riding copy behaviour and group indices

Jens_Thomas · January 13, 2011, 5:47pm

Hi,

I’m implementing support for multiple conformers in the Properties widgets, so that the (e.g) Bond properties table will show the bond lengths for all conformers, rather then just one.

I’ve got most of this done (with some much appreciated help from Marcus), but am stuck on a couple of things and wondered if anyone had any suggestions.

Firstly, I’d like to be able to copy data out of the table into the clipboard, so that it can be pasted into a spreadsheet.

I’ve got this to work by implementing a keyPressEvent function in the PropertiesView class that calls a copyToClipboard function to copy the model data into the clipboad.

The code to do that is:

void PropertiesView::keyPressEvent(QKeyEvent *event)
{
if ( event->matches(QKeySequence::Copy)
copyToClipboard();
else
QTableView::keyPressEvent(event);
}

This works fine on Linux, but on OSX, the QKeySequence::Copy event isn’t matched, and instead I get the one that is implemented for the main avogdro window being called. I think this might be something to do with a shortcut being set up for copy, but various things I’ve tried to over-ride this behaviour for the PropertiesView window don’t seem to work.

Can anyone give me any pointers on how I can get around this?

The second thing concerns the group indexes for atoms.

When the bond table is displayed, it uses the group indices (the Atom data m_groupIndex) to indicate the start and end atom for the bond. Currently, if I just load in a molecule and display the table, the group indices haven’t been calculated and so I get rubbish displayed.

This isnt a particular problem as I can just call calculateGroupIndices on the molecule before I process any data, but I guess that isn’t ideal.

I tried adding a call to calculateGroupIndices to the molecule addAtom call, but that doesn’t work beause the atomic number of the atom hasn’t been set, so the group indices are calculated incorrectly (the last atom added has an index of 0, regardless of how many other ones there are).

Is there a sensible place to calculate the group indices so that you can be sure that it’s been done whenever you access a molecule?

Best wishes,

Jens

Scanned by iCritical.

mhanwell · January 13, 2011, 6:23pm

Hi Jens,

On Thu, Jan 13, 2011 at 12:47 PM, jens.thomas@stfc.ac.uk wrote:

Hi,

I’m implementing support for multiple conformers in the Properties widgets, so that the (e.g) Bond properties table will show the bond lengths for all conformers, rather then just one.

I’ve got most of this done (with some much appreciated help from Marcus), but am stuck on a couple of things and wondered if anyone had any suggestions.

Firstly, I’d like to be able to copy data out of the table into the clipboard, so that it can be pasted into a spreadsheet.

I’ve got this to work by implementing a keyPressEvent function in the PropertiesView class that calls a copyToClipboard function to copy the model data into the clipboad.

The code to do that is:

void PropertiesView::keyPressEvent(QKeyEvent *event)
{
if ( event->matches(QKeySequence::Copy)
copyToClipboard();
else
QTableView::keyPressEvent(event);
}

This works fine on Linux, but on OSX, the QKeySequence::Copy event isn’t matched, and instead I get the one that is implemented for the main avogdro window being called. I think this might be something to do with a shortcut being set up for copy, but various things I’ve tried to over-ride this behaviour for the PropertiesView window don’t seem to work.

Can anyone give me any pointers on how I can get around this?

This does seem like something we should be able to do in an extension.
We do a few things differently on the Mac and Geoff probably has a
better idea on this point.

The second thing concerns the group indexes for atoms.

When the bond table is displayed, it uses the group indices (the Atom data m_groupIndex) to indicate the start and end atom for the bond. Currently, if I just load in a molecule and display the table, the group indices haven’t been calculated and so I get rubbish displayed.

I am not sure I understand why it would use the group index, rather
than the atom index. I will dig into this code, as it seems like the
bond already has pointers to the atoms. I was going through the code
cleaning up the group index, and it looked like it was only used in
the label engine.

This isnt a particular problem as I can just call calculateGroupIndices on the molecule before I process any data, but I guess that isn’t ideal.

I tried adding a call to calculateGroupIndices to the molecule addAtom call, but that doesn’t work beause the atomic number of the atom hasn’t been set, so the group indices are calculated incorrectly (the last atom added has an index of 0, regardless of how many other ones there are).

Is there a sensible place to calculate the group indices so that you can be sure that it’s been done whenever you access a molecule?

We want to avoid recalculating things like this on every atom
addition/removal. This really hurts us when we load big atoms, but I
could add a variable indicating that the group indices are dirty and
need calculating. It wasn’t entirely clear to me what the group index
was trying to achieve, or how it should be used.

Marcus

Konstantin_Tokarev · January 13, 2011, 7:33pm

I am not sure I understand why it would use the group index, rather
than the atom index.

I’ve used it to provide more natural naming of atoms in the bond. If you don’t
like it, you can remove it

–
Regards,
Konstantin

mhanwell · January 13, 2011, 10:02pm

On Thu, Jan 13, 2011 at 2:33 PM, Konstantin Tokarev annulen@yandex.ru wrote:

I am not sure I understand why it would use the group index, rather
than the atom index.

I’ve used it to provide more natural naming of atoms in the bond. If you don’t
like it, you can remove it

It is a reasonable addition, but things like this must be balanced
with scalability and only calculated when they are needed. We also
need to document things like this as it is quite unclear from the docs
(to me at least) what it is and where it should be used.

It seems like the group indices belong as part of the molecule object,
and could be initialized/calculated the first time it is requested. I
think for multiple fragments that labeling scheme makes a lot of
sense, but it should not burden large molecular systems where it seems
far less useful.

I will see if I can fix this up to behave correctly. This also
outlines the need for more unit tests, so that we can make bigger
changes and know we haven’t broken other things we are not aware of.

Marcus

Konstantin_Tokarev · January 14, 2011, 6:19am

It is a reasonable addition, but things like this must be balanced
with scalability and only calculated when they are needed. We also
need to document things like this as it is quite unclear from the docs
(to me at least) what it is and where it should be used.

I agree, name I’ve chosen is quite misleading

–
Regards,
Konstantin

mhanwell · January 15, 2011, 7:57pm

Hi Jens,

On Thu, Jan 13, 2011 at 12:47 PM, jens.thomas@stfc.ac.uk wrote:

Hi,

I’m implementing support for multiple conformers in the Properties widgets, so that the (e.g) Bond properties table will show the bond lengths for all conformers, rather then just one.

I’ve got most of this done (with some much appreciated help from Marcus), but am stuck on a couple of things and wondered if anyone had any suggestions.

If you were still stuck on the conformer copying, I think this should
do the trick,

http://review.source.kitware.com/715

It seems like a reasonably clean way of copying all of the conformer
data over, could you test this out when you get chance please?

The second thing concerns the group indexes for atoms.

When the bond table is displayed, it uses the group indices (the Atom data m_groupIndex) to indicate the start and end atom for the bond. Currently, if I just load in a molecule and display the table, the group indices haven’t been calculated and so I get rubbish displayed.

http://review.source.kitware.com/#change,714

This actually initializes the group index to zero, and so it does the
right thing if it was not calculated. We need to be careful about
always calculating derived values that never get used in the majority
of cases.

This isnt a particular problem as I can just call calculateGroupIndices on the molecule before I process any data, but I guess that isn’t ideal.

I actually think this is a good approach for some of this data that is
not universally needed. Many molecules have no need of group indexes,
and so we shouldn’t waste cycles calculating them. Same goes for
storage space for custom radii, custom labels etc. I would like to
ensure the data structure we have is flexible enough to annotate
molecules with lots of things without burdening every atom with 20
values that are never set in 90% of use cases.

I tried adding a call to calculateGroupIndices to the molecule addAtom call, but that doesn’t work beause the atomic number of the atom hasn’t been set, so the group indices are calculated incorrectly (the last atom added has an index of 0, regardless of how many other ones there are).

Is there a sensible place to calculate the group indices so that you can be sure that it’s been done whenever you access a molecule?

I think before you want to interrogate the value. The other place
might be that of the partial changes - in the get function we could
call calculate. We then need to make the calculate function a little
more intelligent but it accomplishes the same goals.

Thanks,

Marcus

Konstantin_Tokarev · January 15, 2011, 9:09pm

I actually think this is a good approach for some of this data that is
not universally needed. Many molecules have no need of group indexes,
and so we shouldn’t waste cycles calculating them. Same goes for
storage space for custom radii, custom labels etc. I would like to
ensure the data structure we have is flexible enough to annotate
molecules with lots of things without burdening every atom with 20
values that are never set in 90% of use cases.

I think this problem is related to differrence between macromolecules and
“small molecules”. If we are talking about macromolecules, I agree that
atom cutomization and “group indices” are not needed at all (or very rarely),
but take extra space in thousands of Atom instances

However, “small molecules” (I use this term with the same meaning as in
“small molecule crystallography”) are most likely to have no more than
100-150 atoms, and memory wasting is not large, but mentioned features
are very useful

Another aspect: usually customization is needed for a few atoms (e.g., transition
metal atoms in different electronic states), and numbering is not requred for all
types of atoms in the molecule.

–
Regards,
Konstantin

mhanwell · January 15, 2011, 11:03pm

On Sat, Jan 15, 2011 at 4:09 PM, Konstantin Tokarev annulen@yandex.ru wrote:

I actually think this is a good approach for some of this data that is
not universally needed. Many molecules have no need of group indexes,
and so we shouldn’t waste cycles calculating them. Same goes for
storage space for custom radii, custom labels etc. I would like to
ensure the data structure we have is flexible enough to annotate
molecules with lots of things without burdening every atom with 20
values that are never set in 90% of use cases.

I think this problem is related to differrence between macromolecules and
“small molecules”. If we are talking about macromolecules, I agree that
atom cutomization and “group indices” are not needed at all (or very rarely),
but take extra space in thousands of Atom instances

I am well aware of this difference after spending so long working on
Avogadro. We started out focusing on small molecules, and this is why
some of our data structures do not scale well. Even for small
molecules, depending on the use case, group indices and custom radii
are not used but the price you pay to always carry it around is
minimal as there are typically less than a few hundred objects to
worry about.

However, “small molecules” (I use this term with the same meaning as in
“small molecule crystallography”) are most likely to have no more than
100-150 atoms, and memory wasting is not large, but mentioned features
are very useful

This is why I suggest that we start using a better pattern that can
allow us to have both. Atoms, bonds, residues, they all belong to a
molecule object. If we store more of this data in a molecule object in
arrays, and only initialize these arrays when the optional decorations
are used, then we get the best of both worlds.

In the case of a macromolecule, the molecule has one array, of size
zero, for each of the properties. If we then assign a group index for
example, or a custom label, this array is allocated with the correct
type at the size of the number of atoms. The atom then has an entry in
the array that maps to its index in the molecule.

I already added some of this for the atom positions/conformers. Even
for a macromolecule, having several empty arrays is not a big price to
pay. Allocating, initializing and deallocating a double, int, QString
etc for each of the atoms in a macromolecule is quite a large hit
though.

Another aspect: usually customization is needed for a few atoms (e.g., transition
metal atoms in different electronic states), and numbering is not requred for all
types of atoms in the molecule.

It could be worth looking at sparse matrix approaches here, but I am
not sure it is worth the effort. So long as we provide an interface to
the data without exposing how we store it then changing this in the
future shouldn’t be a big issue. For less than a few thousand atoms
having extra empty QStrings or ints around isn’t a big issue and the
current situation is worse.

Another advantage of storing these properties in a single contiguous
block of memory is CPU cache hits, and the fast iteration over this
data for large molecules. Lazy evaluation of properties adds a little
more complexity, but it does allow us to scale to bigger molecules
without overly impacting the small.

Another possibility is moving to a model where we have a base Atom
class with little in it, and then a “fully loaded” Atom class with all
of the extra properties you happen to think of. We would like to
return to ABI stability though, and so storing them in a d pointer is
possibly preferable.

I would like to make some big changes to our data structure after the
experience I have gained working with large data in ParaView and VTK.
These changes would break the API though, and so are more appropriate
for a 2.0 release.

I think if we are careful we can satisfy the needs of the small
molecule users while still scaling to large macromolecules.

This reply is a lot longer than I intended, but hopefully it makes
some of my thoughts a little clearer to you.

Marcus

ghutchis · January 20, 2011, 9:37pm

On Jan 15, 2011, at 6:03 PM, Marcus D. Hanwell wrote:

Another aspect: usually customization is needed for a few atoms (e.g., transition
metal atoms in different electronic states), and numbering is not requred for all
types of atoms in the molecule.

This sounds like you want to use custom properties using Qt – much like Eric did for the QTAIM code.

Another possibility is moving to a model where we have a base Atom
class with little in it, and then a “fully loaded” Atom class with all
of the extra properties you happen to think of.

Properties are already handled using QObjects. There’s no need to have a “fully loaded” Atom class, just custom properties.

I would like to make some big changes to our data structure after the
experience I have gained working with large data in ParaView and VTK.
These changes would break the API though, and so are more appropriate
for a 2.0 release.

I think this is a separate discussion.

-Geoff

mhanwell · January 20, 2011, 10:00pm

On Thu, Jan 20, 2011 at 4:37 PM, Geoffrey Hutchison
geoff.hutchison@gmail.com wrote:

On Jan 15, 2011, at 6:03 PM, Marcus D. Hanwell wrote:

Another aspect: usually customization is needed for a few atoms (e.g., transition
metal atoms in different electronic states), and numbering is not requred for all
types of atoms in the molecule.

This sounds like you want to use custom properties using Qt – much like Eric did for the QTAIM code.

Another possibility is moving to a model where we have a base Atom
class with little in it, and then a “fully loaded” Atom class with all
of the extra properties you happen to think of.

Properties are already handled using QObjects. There’s no need to have a “fully loaded” Atom class, just custom properties.

This is one of the reasons why our molecules don’t scale, tiny objects
(atoms and bonds) that are QObjects. It also leads to memory
fragmentation, slow copying. The Atom class currently has quite a few
member variables that are instanced for every atom created, but only
used in certain types of molecule.

This list has grown quite a bit since 1.0, and some of the bugs are
due to them not being initialized correctly etc. Under the current
framework it would be much better to use properties for these values.

I would like to make some big changes to our data structure after the
experience I have gained working with large data in ParaView and VTK.
These changes would break the API though, and so are more appropriate
for a 2.0 release.

I think this is a separate discussion.

Certainly, but relevant to this thread. I have a prototype that I
would like to discuss more widely, but I will start a new thread about
it.

Marcus

Konstantin_Tokarev · February 15, 2011, 8:58am

21.01.2011, 00:37, “Geoffrey Hutchison” geoff.hutchison@gmail.com:

On Jan 15, 2011, at 6:03 PM, Marcus D. Hanwell wrote:

Another aspect: usually customization is needed for a few atoms (e.g., transition
metal atoms in different electronic states), and numbering is not requred for all
types of atoms in the molecule.

This sounds like you want to use custom properties using Qt – much like Eric did for the QTAIM code.

Another possibility is moving to a model where we have a base Atom
class with little in it, and then a “fully loaded” Atom class with all
of the extra properties you happen to think of.

Properties are already handled using QObjects. There’s no need to have a “fully loaded” Atom class, just custom properties.

But it will be needed to determine if some atom does have certain property.
Properties are stored in QHash with approximately constant access time => O(N) complexity for N atoms

16.01.2011, 02:03, “Marcus D. Hanwell” mhanwell@gmail.com:

It could be worth looking at sparse matrix approaches here, but I am
not sure it is worth the effort. So long as we provide an interface to
the data without exposing how we store it then changing this in the
future shouldn’t be a big issue. For less than a few thousand atoms
having extra empty QStrings or ints around isn’t a big issue and the
current situation is worse.

Another advantage of storing these properties in a single contiguous
block of memory is CPU cache hits, and the fast iteration over this
data for large molecules. Lazy evaluation of properties adds a little
more complexity, but it does allow us to scale to bigger molecules
without overly impacting the small.

Another possibility is moving to a model where we have a base Atom
class with little in it, and then a “fully loaded” Atom class with all
of the extra properties you happen to think of. We would like to
return to ABI stability though, and so storing them in a d pointer is
possibly preferable.

Using vectors to store custom properties seems to be more reasonable (O(1) lookup), but it requires
O(N) memory.

I see here a room for two alternative strategies: save memory or save CPU cycles.

Of course, saving CPU will be less important in case of scene graph intoduction,
but I think it might be useful in some circumstances (e.g., very large molecules
on machine with very large RAM)

–
Regards,
Konstantin

Konstantin_Tokarev · February 16, 2011, 7:55pm

14.01.2011, 01:02, “Marcus D. Hanwell” mhanwell@gmail.com:

On Thu, Jan 13, 2011 at 2:33 PM, Konstantin Tokarev annulen@yandex.ru; wrote:

I am not sure I understand why it would use the group index, rather
than the atom index.
I’ve used it to provide more natural naming of atoms in the bond. If you don’t
like it, you can remove it

It is a reasonable addition, but things like this must be balanced
with scalability and only calculated when they are needed. We also
need to document things like this as it is quite unclear from the docs
(to me at least) what it is and where it should be used.

It seems like the group indices belong as part of the molecule object,
and could be initialized/calculated the first time it is requested. I
think for multiple fragments that labeling scheme makes a lot of
sense, but it should not burden large molecular systems where it seems
far less useful.

I will see if I can fix this up to behave correctly. This also
outlines the need for more unit tests, so that we can make bigger
changes and know we haven’t broken other things we are not aware of.

What you’ve done in c3b40888 is a recomputation of indicies inside paintEvent.

–
Regards,
Konstantin

mhanwell · February 17, 2011, 2:33pm

2011/2/16 Konstantin Tokarev annulen@yandex.ru:

14.01.2011, 01:02, “Marcus D. Hanwell” mhanwell@gmail.com:

On Thu, Jan 13, 2011 at 2:33 PM, Konstantin Tokarev annulen@yandex.ru; wrote:

I am not sure I understand why it would use the group index, rather
than the atom index.
I’ve used it to provide more natural naming of atoms in the bond. If you don’t
like it, you can remove it

It is a reasonable addition, but things like this must be balanced
with scalability and only calculated when they are needed. We also
need to document things like this as it is quite unclear from the docs
(to me at least) what it is and where it should be used.

It seems like the group indices belong as part of the molecule object,
and could be initialized/calculated the first time it is requested. I
think for multiple fragments that labeling scheme makes a lot of
sense, but it should not burden large molecular systems where it seems
far less useful.

I will see if I can fix this up to behave correctly. This also
outlines the need for more unit tests, so that we can make bigger
changes and know we haven’t broken other things we are not aware of.

What you’ve done in c3b40888 is a recomputation of indicies inside paintEvent.

Inside the render event, when it is actually used and not the rest
of the time. I am well aware of that, why are you telling me this?

Marcus

Konstantin_Tokarev · February 17, 2011, 2:47pm

17.02.2011, 17:33, “Marcus D. Hanwell” mhanwell@gmail.com:

2011/2/16 Konstantin Tokarev annulen@yandex.ru;:

14.01.2011, 01:02, “Marcus D. Hanwell” mhanwell@gmail.com;:

On Thu, Jan 13, 2011 at 2:33 PM, Konstantin Tokarev annulen@yandex.ru;; wrote:

I am not sure I understand why it would use the group index, rather
than the atom index.
I’ve used it to provide more natural naming of atoms in the bond. If you don’t
like it, you can remove it
It is a reasonable addition, but things like this must be balanced
with scalability and only calculated when they are needed. We also
need to document things like this as it is quite unclear from the docs
(to me at least) what it is and where it should be used.

It seems like the group indices belong as part of the molecule object,
and could be initialized/calculated the first time it is requested. I
think for multiple fragments that labeling scheme makes a lot of
sense, but it should not burden large molecular systems where it seems
far less useful.

I will see if I can fix this up to behave correctly. This also
outlines the need for more unit tests, so that we can make bigger
changes and know we haven’t broken other things we are not aware of.
What you’ve done in c3b40888 is a recomputation of indicies inside paintEvent.

Inside the render event, when it is actually used and not the rest
of the time. I am well aware of that, why are you telling me this?

It seems to be very ineffective: if labels are enabled, indicies are calculated in
every paintEvent, not only when molecule changes, i.e. they are calculated more
often then before, opposite to commit message

–
Regards,
Konstantin

allison.vacanti · February 17, 2011, 4:05pm

2011/2/17 Konstantin Tokarev annulen@yandex.ru:

17.02.2011, 17:33, “Marcus D. Hanwell” mhanwell@gmail.com:

2011/2/16 Konstantin Tokarev annulen@yandex.ru;:

14.01.2011, 01:02, “Marcus D. Hanwell” mhanwell@gmail.com;:

On Thu, Jan 13, 2011 at 2:33 PM, Konstantin Tokarev annulen@yandex.ru;; wrote:

I am not sure I understand why it would use the group index, rather
than the atom index.
I’ve used it to provide more natural naming of atoms in the bond. If you don’t
like it, you can remove it
It is a reasonable addition, but things like this must be balanced
with scalability and only calculated when they are needed. We also
need to document things like this as it is quite unclear from the docs
(to me at least) what it is and where it should be used.

It seems like the group indices belong as part of the molecule object,
and could be initialized/calculated the first time it is requested. I
think for multiple fragments that labeling scheme makes a lot of
sense, but it should not burden large molecular systems where it seems
far less useful.

I will see if I can fix this up to behave correctly. This also
outlines the need for more unit tests, so that we can make bigger
changes and know we haven’t broken other things we are not aware of.
What you’ve done in c3b40888 is a recomputation of indicies inside paintEvent.

Inside the render event, when it is actually used and not the rest
of the time. I am well aware of that, why are you telling me this?

It seems to be very ineffective: if labels are enabled, indicies are calculated in
every paintEvent, not only when molecule changes, i.e. they are calculated more
often then before, opposite to commit message

I’m not sure if this is related or not, but I see occasional segfaults
during xtalopt runs that occur in the calculateGroupIndices code. It
seems that the number of atoms is changing during the function call.
In particular, it looks like an atom is removed mid-iteration from
another thread. Here’s a nearly-useless backtrace:

(gdb) bt full
#0 0x00007ffff58f9b68 in Avogadro::setGroupIndex (this=0x35, index=9)
at /home/baettig/quantum_soft/avogadro/libavogadro/src/atom.cpp:234
No locals.
#1 0x00007ffff594c603 in Avogadro::Molecule::calculateGroupIndices (
this=0x28a1680)
at /home/baettig/quantum_soft/avogadro/libavogadro/src/molecule.cpp:825
match = false
i = 8
group_number = {{d = 0x275ff50, p = 0x275ff50}}
group_ele = {{d = 0x23b8370, p = 0x23b8370}}
atomGroupNumber = {{d = 0x28901a0, p = 0x28901a0}}
#2 0x00007ffff594c87e in Avogadro::Molecule::updateAtom (this=0x28a1680)
at /home/baettig/quantum_soft/avogadro/libavogadro/src/molecule.cpp:885
d = 0x28ce390
atom = 0x7fffbc067a90
#3 0x00007ffff59c10a7 in Avogadro::Molecule::qt_metacall (this=0x28a1680,
_c=QMetaObject::InvokeMetaMethod, _id=12, _a=0x7fffbc066550)
at /home/baettig/quantum_soft/avogadro/build/libavogadro/src/moc_molecule.cxx:107
No locals.
#4 0x00007fffd6022ec8 in GlobalSearch::Structure::qt_metacall (
this=0x28a1680, _c=QMetaObject::InvokeMetaMethod, _id=17,
_a=0x7fffbc066550)

So somewhere between checking that i < numAtoms() and the call to
atom(i), the number of atoms dropped below i+1.

From a design standpoint, what would be the best way to handle this? I
understand that it’s a bad idea to lock Molecule::lock from inside a
member function, but calculateGroupIndices should really be called
inside of a read-lock. Since this is a slot called from a signal, it
doesn’t seem possible to read-lock this in an obvious way.

Perhaps a if(!lock()->tryLockForRead()){return;} at the beginning of
the function would be best? Are the group indices absolutely necessary
anywhere?

Dave

Konstantin_Tokarev · February 17, 2011, 4:23pm

17.02.2011, 19:05, “David Lonie” loniedavid@gmail.com:

2011/2/17 Konstantin Tokarev annulen@yandex.ru;:

17.02.2011, 17:33, “Marcus D. Hanwell” mhanwell@gmail.com;:

2011/2/16 Konstantin Tokarev annulen@yandex.ru;;:

14.01.2011, 01:02, “Marcus D. Hanwell” marcus.hanwell@kitware.com;;:

On Thu, Jan 13, 2011 at 2:33 PM, Konstantin Tokarev annulen@yandex.ru;;; wrote:

I am not sure I understand why it would use the group index, rather
than the atom index.
I’ve used it to provide more natural naming of atoms in the bond. If you don’t
like it, you can remove it
It is a reasonable addition, but things like this must be balanced
with scalability and only calculated when they are needed. We also
need to document things like this as it is quite unclear from the docs
(to me at least) what it is and where it should be used.

It seems like the group indices belong as part of the molecule object,
and could be initialized/calculated the first time it is requested. I
think for multiple fragments that labeling scheme makes a lot of
sense, but it should not burden large molecular systems where it seems
far less useful.

I will see if I can fix this up to behave correctly. This also

outlines the need for more unit tests, so that we can make bigger
changes and know we haven’t broken other things we are not aware of.
What you’ve done in c3b40888 is a recomputation of indicies inside paintEvent.
Inside the render event, when it is actually used and not the rest
of the time. I am well aware of that, why are you telling me this?
It seems to be very ineffective: if labels are enabled, indicies are calculated in
every paintEvent, not only when molecule changes, i.e. they are calculated more
often then before, opposite to commit message

I’m not sure if this is related or not, but I see occasional segfaults
during xtalopt runs that occur in the calculateGroupIndices code. It
seems that the number of atoms is changing during the function call.
In particular, it looks like an atom is removed mid-iteration from
another thread. Here’s a nearly-useless backtrace:

(gdb) bt full
#0 0x00007ffff58f9b68 in Avogadro::setGroupIndex (this=0x35, index=9)
at /home/baettig/quantum_soft/avogadro/libavogadro/src/atom.cpp:234
No locals.
#1 0x00007ffff594c603 in Avogadro::Molecule::calculateGroupIndices (
this=0x28a1680)
at /home/baettig/quantum_soft/avogadro/libavogadro/src/molecule.cpp:825
match = false
i = 8
group_number = {{d = 0x275ff50, p = 0x275ff50}}
group_ele = {{d = 0x23b8370, p = 0x23b8370}}
atomGroupNumber = {{d = 0x28901a0, p = 0x28901a0}}
#2 0x00007ffff594c87e in Avogadro::Molecule::updateAtom (this=0x28a1680)
at /home/baettig/quantum_soft/avogadro/libavogadro/src/molecule.cpp:885
d = 0x28ce390
atom = 0x7fffbc067a90
#3 0x00007ffff59c10a7 in Avogadro::Molecule::qt_metacall (this=0x28a1680,
_c=QMetaObject::InvokeMetaMethod, _id=12, _a=0x7fffbc066550)
at /home/baettig/quantum_soft/avogadro/build/libavogadro/src/moc_molecule.cxx:107
No locals.
#4 0x00007fffd6022ec8 in GlobalSearch::Structure::qt_metacall (
this=0x28a1680, _c=QMetaObject::InvokeMetaMethod, _id=17,
_a=0x7fffbc066550)

So somewhere between checking that i < numAtoms() and the call to
atom(i), the number of atoms dropped below i+1.

From a design standpoint, what would be the best way to handle this? I
understand that it’s a bad idea to lock Molecule::lock from inside a
member function, but calculateGroupIndices should really be called
inside of a read-lock. Since this is a slot called from a signal, it
doesn’t seem possible to read-lock this in an obvious way.

Perhaps a if(!lock()->tryLockForRead()){return;} at the beginning of
the function would be best?

Sounds reasonable.

Are the group indices absolutely necessary anywhere?

Primary user is labelengine. However, I’ve also used it in propextension,
and I think it could be useful in any plugin displaying lists of atoms, bonds or
angles.

Probably I’ve chosen bad name for this feature. Maybe better alternative is one of
“natural index”, “formula index”, “minimal index”.

Is anybody in doubt what this index produces?

–
Regards,
Konstantin

mhanwell · February 17, 2011, 5:36pm

2011/2/17 Konstantin Tokarev annulen@yandex.ru:

17.02.2011, 17:33, “Marcus D. Hanwell” mhanwell@gmail.com:

2011/2/16 Konstantin Tokarev annulen@yandex.ru;:

14.01.2011, 01:02, “Marcus D. Hanwell” mhanwell@gmail.com;:

On Thu, Jan 13, 2011 at 2:33 PM, Konstantin Tokarev annulen@yandex.ru;; wrote:

I am not sure I understand why it would use the group index, rather
than the atom index.
I’ve used it to provide more natural naming of atoms in the bond. If you don’t
like it, you can remove it
It is a reasonable addition, but things like this must be balanced
with scalability and only calculated when they are needed. We also
need to document things like this as it is quite unclear from the docs
(to me at least) what it is and where it should be used.

It seems like the group indices belong as part of the molecule object,
and could be initialized/calculated the first time it is requested. I
think for multiple fragments that labeling scheme makes a lot of
sense, but it should not burden large molecular systems where it seems
far less useful.

I will see if I can fix this up to behave correctly. This also
outlines the need for more unit tests, so that we can make bigger
changes and know we haven’t broken other things we are not aware of.
What you’ve done in c3b40888 is a recomputation of indicies inside paintEvent.

Inside the render event, when it is actually used and not the rest
of the time. I am well aware of that, why are you telling me this?

It seems to be very ineffective: if labels are enabled, indicies are calculated in
every paintEvent, not only when molecule changes, i.e. they are calculated more
often then before, opposite to commit message

I already said you could change the calculate group index stuff to
cache when it was valid. Then the call would basically be a no-op when
called. Building up a large molecule and recalculating that on the
addition of each and every atom addition/removal etc is also very
inefficient - especially when building large molecules.

In many instances the group index is never used, hence the change. I
think an approach like partial charges would work better.

Marcus

Konstantin_Tokarev · February 17, 2011, 5:42pm

17.02.2011, 20:36, “Marcus D. Hanwell” mhanwell@gmail.com:

I already said you could change the calculate group index stuff to
cache when it was valid. Then the call would basically be a no-op when
called. Building up a large molecule and recalculating that on the
addition of each and every atom addition/removal etc is also very
inefficient - especially when building large molecules.

In many instances the group index is never used, hence the change. I
think an approach like partial charges would work better.

I think the best option will be addition of variable d->invalidGroupIndices,
and recalculation on first access to Atom::groupIndex

17.02.2011, 19:05, “David Lonie” loniedavid@gmail.com:

From a design standpoint, what would be the best way to handle this? I
understand that it’s a bad idea to lock Molecule::lock from inside a
member function, but calculateGroupIndices should really be called
inside of a read-lock. Since this is a slot called from a signal, it
doesn’t seem possible to read-lock this in an obvious way.

Perhaps a if(!lock()->tryLockForRead()){return;} at the beginning of
the function would be best?

I guess you propose tryLockForRead() because lockForRead() will
deadlock if write lock is acquired in the same thread. However it would
be better if calculateGroupIndices waited for unlock in case lock was
acquired in another thread. Any ideas?

–
Regards,
Konstantin

allison.vacanti · February 17, 2011, 5:51pm

On Thu, Feb 17, 2011 at 12:42 PM, Konstantin Tokarev annulen@yandex.ru wrote:

17.02.2011, 19:05, “David Lonie” loniedavid@gmail.com:

From a design standpoint, what would be the best way to handle this? I
understand that it’s a bad idea to lock Molecule::lock from inside a
member function, but calculateGroupIndices should really be called
inside of a read-lock. Since this is a slot called from a signal, it
doesn’t seem possible to read-lock this in an obvious way.

Perhaps a if(!lock()->tryLockForRead()){return;} at the beginning of
the function would be best?

I guess you propose tryLockForRead() because lockForRead() will
deadlock if write lock is acquired in the same thread. However it would
be better if calculateGroupIndices waited for unlock in case lock was
acquired in another thread. Any ideas?

Right – Molecule::lock should only be locked from outside of the
Molecule member functions for consistency. One approach I’ve used in
similar situations is:

void memberSlot()
{
if (!memberLock()->tryLockForRead()) {
QTimer::singleShot(100, this, SLOT(memberSlot()));
return;
}

// … do work …

memberLock()->unlock();
}

But this will be horribly inefficient when dealing with large number
of additions/removals. It also behaves in an unexpected way when
calculateGroupIndices is called under a write-lock, since the index
assignment is delayed until the lock is released. I agree that a
caching/dirty-boolean approach is best, since a read-lock should be
used when accessing the indices anyway.

Dave

mhanwell · February 17, 2011, 6:14pm

On Thu, Feb 17, 2011 at 5:42 PM, Konstantin Tokarev annulen@yandex.ru wrote:

17.02.2011, 20:36, “Marcus D. Hanwell” mhanwell@gmail.com:

I already said you could change the calculate group index stuff to
cache when it was valid. Then the call would basically be a no-op when
called. Building up a large molecule and recalculating that on the
addition of each and every atom addition/removal etc is also very
inefficient - especially when building large molecules.

In many instances the group index is never used, hence the change. I
think an approach like partial charges would work better.

I think the best option will be addition of variable d->invalidGroupIndices,
and recalculation on first access to Atom::groupIndex

This is what I just suggested…

17.02.2011, 19:05, “David Lonie” loniedavid@gmail.com:

From a design standpoint, what would be the best way to handle this? I
understand that it’s a bad idea to lock Molecule::lock from inside a
member function, but calculateGroupIndices should really be called
inside of a read-lock. Since this is a slot called from a signal, it
doesn’t seem possible to read-lock this in an obvious way.

Perhaps a if(!lock()->tryLockForRead()){return;} at the beginning of
the function would be best?

I guess you propose tryLockForRead() because lockForRead() will
deadlock if write lock is acquired in the same thread. However it would
be better if calculateGroupIndices waited for unlock in case lock was
acquired in another thread. Any ideas?

It would be great to figure out what is changing the molecule
mid-render. That should not happen.

Marcus