Storing data with OpenPMD
The current MALA default to save volumetric data are numpy objects/files.
However, numpy files do not store metadata, which is crucial when
attempting to build large-scale data-driven workflows. Furthermore, they are
inherently non-parallel and do not offer extensive compression capabilities.
To this end, MALA now supports the openPMD standard. OpenPMD is a powerful standard/library that allows for the efficient storage of volumetric data alongside relevant metadata, and further offers capabilities for parallelization and a declarative runtime configuration for compression options.
Currently, openPMD is tested by the MALA team in production and therefore
not the default option for data handling. Yet, MALA is fully compatible with
openPMD, and its use is highly encouraged. To do so, just replace the
.npy file ending with a openPMD compliant file ending (e.g. .h5) in
all instances of the DataConverter, DataHandler and DataShuffler
class, and specify openpmd where necessary; the workflows themselves can
be left untouched. Specifically, set
parameters = mala.Parameters() # Changes for DataConverter data_converter = mala.DataConverter(parameters) data_converter.convert_snapshots(..., naming_scheme="Be_snapshot*.h5") ... # Changes for DataHandler data_handler = mala.DataHandler(parameters) data_handler.add_snapshot("Be_snapshot0.in.h5", data_path_be, "Be_snapshot0.out.h5", data_path_be, "tr", snapshot_type="openpmd") ... # Changes for DataShuffler data_shuffler = mala.DataShuffler(parameters) # Data can be shuffle FROM and TO openPMD - but also from # numpy to openPMD. data_shuffler.add_snapshot("Be_snapshot0.in.h5", data_path_be, "Be_snapshot0.out.h5", data_path_be, snapshot_type="openpmd") data_shuffler.shuffle_snapshots(..., save_name="Be_shuffled*.h5")
For further information on the interaction with openPMD data, please consult the official documentation. As a user of MALA, you will be mainly interested in the scientific tooling that can read openPMD, e.g.:
- visualization and analysis, including an exploratory Jupyter notebook GUI: openPMD-viewer 
- ParaView has a Python-based openPMD plugin that can be activated by opening a helper text file ending on - .pmdthat contains one line with the openPMD-api Series filename, e.g.- data_%T.bp
- converter tools: openPMD-converter 
- full list of projects using openPMD 
If you intend to write your own post-processing routines, make sure to check out our example files and the formal, open standard on openPMD.