Storing data with OpenPMD

The current MALA default to save volumetric data are numpy objects/files. However, numpy files do not store metadata, which is crucial when attempting to build large-scale data-driven workflows. Furthermore, they are inherently non-parallel and do not offer extensive compression capabilities.

To this end, MALA now supports the openPMD standard. OpenPMD is a powerful standard/library that allows for the efficient storage of volumetric data alongside relevant metadata, and further offers capabilities for parallelization and a declarative runtime configuration for compression options.

Currently, openPMD is tested by the MALA team in production and therefore not the default option for data handling. Yet, MALA is fully compatible with openPMD, and its use is highly encouraged. To do so, just replace the .npy file ending with a openPMD compliant file ending (e.g. .h5) in all instances of the DataConverter, DataHandler and DataShuffler class, and specify openpmd where necessary; the workflows themselves can be left untouched. Specifically, set

parameters = mala.Parameters()

# Changes for DataConverter
data_converter = mala.DataConverter(parameters)
data_converter.convert_snapshots(...,
                                 naming_scheme="Be_snapshot*.h5")
...
# Changes for DataHandler
data_handler = mala.DataHandler(parameters)
data_handler.add_snapshot("Be_snapshot0.in.h5", data_path_be,
                           "Be_snapshot0.out.h5", data_path_be, "tr",
                           snapshot_type="openpmd")
...
# Changes for DataShuffler
data_shuffler = mala.DataShuffler(parameters)
# Data can be shuffle FROM and TO openPMD - but also from
# numpy to openPMD.
data_shuffler.add_snapshot("Be_snapshot0.in.h5", data_path_be,
                            "Be_snapshot0.out.h5", data_path_be,
                            snapshot_type="openpmd")
data_shuffler.shuffle_snapshots(...,
                                save_name="Be_shuffled*.h5")

For further information on the interaction with openPMD data, please consult the official documentation. As a user of MALA, you will be mainly interested in the scientific tooling that can read openPMD, e.g.:

visualization and analysis, including an exploratory Jupyter notebook GUI: openPMD-viewer
yt-project (tutorial)
ParaView has a Python-based openPMD plugin that can be activated by opening a helper text file ending on .pmd that contains one line with the openPMD-api Series filename, e.g. data_%T.bp
VisIt
converter tools: openPMD-converter
full list of projects using openPMD

If you intend to write your own post-processing routines, make sure to check out our example files and the formal, open standard on openPMD.