There are two scripts which were used exensively in this work.

 The first uses the CSD Python API to obtain information pertaining to all molecules within the primitive cell for a given cif file, ignoring molecules which have less atoms than some minimum size if specified. This script takes an entire directory of cif files and processes them in one session as sequentialy applying a script that processes a single cif file is very time consuming presumably because the CSD must be acessed on several occaisions as opposed to only once. I also procide a yml file which describes the python environment needed to use this script (csd_env.yml) and note that you will need acess to the CSD to create this environment. Guidance for building a conda environment which uses the CSD can be found at https://downloads.ccdc.cam.ac.uk/documentation/API/installation_notes.html The data is saved in a set of npy files that can be used in the next stage

The python libraries needed for all other scripts are in ph_env.yml.

The second script takes a single npy file which corresponds to a molecules in a primitive cell and generates a fixed number of points and/or vectors that correspond to atomic/molecular postions and orientation vectors (for example) as described in the thesis. There are a lot of different options that can be used in this script (including the restricted supercell approach) - I have used an argument parser which hopefully should make it reletively easy to figure out how to obtain the descired pointcloud. The output is another npy file and while these are saved with a different name I recommend ensuring that these are kept in a different directiory as to those npy files that correspond to primitive cells as the names are long and often confusing and it makes processing the storing the data easier.

The persistent homology calculations themselves are actually quite easy to run and I found that running these on a jupyter notebook was the most straitforward approach. The npy files can be loaded (this gives a dictionary with keywords describing centroids and vectors so it should be easy to get what you need) in a jupyter session and processed (for exaple for might want to scale the vectors) and applied directly as input to a persistent homology calculation. I use the gudhi library for this but other libraries such as ripser or perseus should give similar results. The code ends up looking very simple such as 

diags = []
for struc in cents:
    rips = gd.RipsComplex(struc)
    simp = rips.create_simplex_tree(max_dimension = 0)
    diags.append(simp.persistence())

for generating a list of persistence diagrams from a list of centroid pointclouds (cents) with the Vietoris-Rips filtration with maximal dimesnion of zero see https://gudhi.inria.fr/python/latest/rips_complex_user.html. The conversion of diagrams to vector images is also straitforward and uses functions from the same library see https://gudhi.inria.fr/python/latest/representations.html

Sometimes when the alpha complex is used, as no maximial dimension may be specified, the calculation can be too time consuming or memory intensive to be completed in the jupyter environment. In this case, rather than running the writing a script after the above and running it elsewhere, I recommend using one of gudhi's built in persistent homology scripts which is much faster. More information can be found at https://gudhi.inria.fr/alphacomplex/ (the script is located in the library binaries that may be called from the command line)


I also provide the script used to generate persistent diagrams from the kde surface from a set of atoms in an xyz file (which could easily be obtained using the above two scripts).




  
