Add support for XSF charge density files

I’ve been using Quantum ESPRESSO quite a lot recently and have grown accustomed to printing out charge densities from it to XSF format. My primary use was reading in the charge densities and performing calculations with them, but I also would really like to be able to visualize those densities directly from the XSF file.

Currently as it stands I have to run another instance of pp.x from QE to print out a Gaussian Cube file, but these files are not quite the same since they do not include the unit cells from the calculation (and per my latest post, it is currently not possible to put one in after the fact). Realistically the format of a Gaussian Cube and an XSF from QE are almost identical, with the key difference being that the Gaussian Cube format doesn’t explicitly print a unit cell.

Reading in an XSF file is extremely similar to a Gaussian Cube, but there are two big differences, first being that while a Cube uses 6E13.5 for its output format, XSF files use 6E14.6 for the output format. From taking a look at the Avogadro source code though, I don’t actually think that the files are read in in byte strings and instead are just read in as lines and then split and converted.

The second difference is that an XSF file uses the full charge density and not the periodic charge density, which would need to be deleted to regain periodicity. Here’s a snippet of some Python code that I wrote that reads XSF files quite quickly (a fraction of a second slower than an equivalent in C++) and accounts for the periodicity:

def read_chg_dens_xsf(xsf_file: PathLike, crystal: Crystal):
    with open(xsf_file, "rb") as xsf:
        # Skip over atomic coordinate information
        for _ in range(7 + crystal.get_num_atoms() + 3):
            next(xsf)

        # Pull size of FFT Grid
        num_grid = [int(i) for i in next(xsf).strip().split()]

        # Assign to variables to reduce indexing in loop
        n_grid_0 = num_grid[0]
        n_grid_1 = num_grid[1]
        n_grid_2 = num_grid[2]

        # Skip over misc. XSF data
        for _ in range(4):
            next(xsf)

        # Calculate total number of datapoints in XSF file
        n_points = n_grid_0 * n_grid_1 * n_grid_2

        # Initialize indices
        i = 0
        j = 0
        k = 0

        # Allocate memory for charge density array
        charge_density = np.zeros((n_grid_0, n_grid_1, n_grid_2))

        # Loop over all data points in file
        for index in range(n_points):
            # Read in bytes for each data point, adding an extra if there is a newline character
            charge_density[i][j][k] = xsf.read(14 + ((index + 1) % 6 == 0))

            # Increment grid components to fill out array
            i += 1
            if i == n_grid_0:
                i = 0
                j += 1

                if j == n_grid_1:
                    j = 0
                    k += 1

    # Convert to periodic array
    charge_density = np.delete(charge_density, [n_grid_0 - 1], axis=0)
    charge_density = np.delete(charge_density, [n_grid_1 - 1], axis=1)
    charge_density = np.delete(charge_density, [n_grid_2 - 1], axis=2)

    return charge_density, num_grid

This code assumes that you’ve already parsed the header and stored the crystal vectors and the atomic coordinates.

The other differences of an XSF file are in the header, an example of which is included further down. There’s also two lines at the very end of the files from QE that say

END_DATAGRID_3D
END_BLOCK_DATAGRID_3D

Now, technically speaking you can determine the size of the unit cell from a Gaussian Cube within some precision limits by just taking the size of the grid (in the case of a Quantum ESPRESSO output this is the periodic FFT grid) and multiplying it by the vectors immediately after them, then converting units appropriately. For example, I have an XSF file from a calculation on a water molecule with a cubic unit cell of length 17.543876553 Angstrom. The header of the XSF file is

 CRYSTAL
 PRIMVEC
   17.543876553    0.000000000    0.000000000
    0.000000000   17.543876553    0.000000000
    0.000000000    0.000000000   17.543876553
 PRIMCOORD
           3           1
O         8.770451552    9.174907782    8.771938274
H         9.544851428    8.573414302    8.771938274
H         8.000511848    8.567492748    8.771938274
BEGIN_BLOCK_DATAGRID_3D
3D_PWSCF
BEGIN_DATAGRID_3D_UNKNOWN
         217         217         217
  0.000000  0.000000  0.000000
   17.543877    0.000000    0.000000
    0.000000   17.543877    0.000000
    0.000000    0.000000   17.543877

and the header of a Gaussian Cube file created from the exact same QE calculation is

 Cubefile created from PWScf calculation
Contains the selected quantity on a FFT grid
    3    0.000000    0.000000    0.000000
  216    0.153487    0.000000    0.000000
  216    0.000000    0.153487    0.000000
  216    0.000000    0.000000    0.153487
    8    0.000000   16.573751   17.338063   16.576561
    1    1.000000   18.037155   16.201405   16.576561
    1    1.000000   15.118776   16.190215   16.576561

which when we take 216 * 0.153487 * 0.529177249 we get 17.5439149, which is pretty close to the actual cell size of 17.543876553 Angstrom.

While there’s already a feature request for this: Support for XSF quantum / volumetric format · Issue #560 · OpenChemistry/avogadrolibs · GitHub

I’d suggest that an initial effort can be done with the Python you have. We support Python format readers. The basic idea is to read a format like XSF and recast into a format we already support (e.g., XYZ, cube, fchk, SDF, CJSON, etc.).

So I can think of two quick ways to do this:

  • cjson should already support saving cube information plus unit cells, so write a script that writes one out with the coordinates, unit cell, etc. including a cube
  • add support to the format script API that allows a script to return a format (cube in this case) plus additional cjson or unit cell information to merge together.

Interestingly when I try to save the .cube file I have as a cjson it doesn’t include any newline characters or spacing.

In the event that whitespace is completely ignored when reading a cjson, then there is the additional problem that after I try to load the cjson back into Avogadro there is no way for me to render the surface that is stored inside of it.

If you’d like to try and replicate this on your end here’s a relatively small cube file that I have.
mol_cube.tar.gz (6.3 MB)

Yeah, I just saw there’s a bug in the cube components in the cjson read/write. I don’t have a lot of time to fix it this afternoon, but I’ll get to it soon.

The data should look something like this:

cjson['cube'] = {
    "origin" = [ x_minimum, y_minimum, z_minimum ],
    "spacing" = [ x_spacing, y_spacing, z_spacing ],
    "dimensions" = [ num_x_points, num_y_points, num_z_points ],
    "scalars" = [ … ]
}

Okay, so that should be ready to merge soon.

I guess the question is whether you’d rather have a format script that hands back a cjson with the coordinates and crystal vectors plus the cube, or whether you’d rather hand back a cube file and separately set the crystal vectors.

Which would be easier (better?) for you?

I think they’re both good options, but if I had to pick it’d be a script that returns a cjson with the coordinates & crystal vectors & cube information.

Once I get a good grasp of the correct file structure then I can quite quickly write a Python script that goes straight from .xsf to .cjson since the .xsf contains all of the information I’d need (and while I do already have a script for reading in a charge density from a cube I’m a bit more confident in my xsf script).

2 Likes

I seem to have a partially functioning xsf -> cjson converter, however Avogadro 2 seems to not be recognizing the cube section inside it for an unknown reason. Here’s an example I constructed and the xsf that it was derived from. If need be I can also post the code, but it’s currently messy and just sitting in a python notebook. The structure is read without issues, as is the unit cell information, but there’s no option for me to try and render a surface.

vac_2.00_dcd.xsf.tar.gz (3.1 MB)
test_cjson.cjson (9.1 MB)

Looks okay to me - do you have the surfaces display type turned on when you load?

I think though that the data is rotated:

BTW, I thought I already added code that enabled the surfaces display if you loaded a cube, but I’ll go back and look.

I do indeed have surfaces turned on, but there’s not even an option for me to try and create the surface

Also, I suspected the data might’ve been rotated, that’s an artifact of the method I’m using to convert the xsf charge density from printing the full unit cell to just printing the periodic unit cell.

Interestingly enough, and this is probably relevant for our discussion about the Windows installer, when I use the Qt6 version of Avogadro 2 it loads fine.

I have definitely fixed some “load cube from file” bugs recently, so that’s probably what you’re seeing. The current development head should definitely load the surface.

As to the crash, it looks like this:

BTW, if at some point you’ve got some code cleaned up enough to share, I’d love to add another real-world Python format script to the plugin list. :rocket:

1 Like

I usually work on scripts in a python notebook, so currently the implementation doesn’t have any actual if __name__ == "__main__": bit to be called, but I’m happy to provide the functions it uses.

xsf_to_cjson.py (10.7 KB)

It uses the Atom and Crystal class I have for my own data analysis and general handling of crystal structures, but I am sure that the internal code is easily adapted to work without them. Also, shoutout to @matterhorn103 for input on writing cjson, as well as some code from his easyxtb package that I’ve used to make sure the cjson output is compact but still readable.

The only part of this code that I would caution against changing is the function read_xsf_chg_dens as I’ve spent many many days and weeks in the past making sure it is very optimized. Currently it’s only a fraction of a second slower than the fastest C++ implementation I’ve found, at least on an XSF file with a dimension 244x244x244 charge density.

1 Like

Not an issue since the classes are a) simple and b) defined in the script :slight_smile: