Integrating PyRosetta initialization files into PyRosettaCluster (#511)
The purpose of this PR is to support several new features:
1. Adds an `output_init_file` instance attribute to `PyRosettaCluster`,
enabling dumping of a `.init` or `.init.bz2` file upon instantiation.
2. Adds `author`/`email`/`license` instance attributes to
`PyRosettaCluster`, which are cached in the `.init` or `.init.bz2` file
and output decoy and scorefile metadata.
3. Enables the `input_file` keyword argument of the
`pyrosetta.distributed.cluster.reproduce` method to accept a `.init` or
`.init.bz2` file that initializes PyRosetta before simulation
reproduction.
- Also adds a `skip_corrections` keyword argument to enable skipping
ScoreFunction corrections so that the reproduced results may be used for
successive reproductions.
4. Adds a `pyrosetta.distributed.cluster.export_init_file` function that
enables exporting an output decoy (in `.pdb`, `.pdb.bz2`, `.b64_pose`,
`.b64_pose.bz2`, `.pkl_pose`, `.pkl_pose.bz2` format) to a `.init` or
`.init.bz2` file format.
5. Adds a `norm_init_options` instance attribute to `PyRosettaCluster`,
enabling normalization of the task's PyRosetta initialization options.
This optional convenience feature takes advantage of the
`pyrosetta.get_init_options` method to update the `options` and
`extra_options` keyword arguments of each task after PyRosetta
initialization in the `billiard` subprocess on the dask workers, which
expands option names and relativizes any input files and directories to
the `billiard` subprocess current working directory. Relativized paths
are ideal for reproduction of simulations by a second party on a
different filesystem.
6. Adds `pyrosetta.distributed.io.read_init_file` and
`pyrosetta.distributed.io.init_from_file` functions, which handle
`.init` and `.init.bz2` files.
Please note that this PR also depends on PR #462 supporting a
`.b64_pose` and `.pkl_pose` file outputs in `PyRosettaCluster`. The
impetus for supporting a `.init` file in the `PyRosettaCluster`
simulation reproduction life cycle is that loading a `.b64_pose` or
`.pkl_pose` file into memory requires that PyRosetta is initialized with
the same residue type set as that used to save the `.b64_pose` or
`.pkl_pose` file (otherwise PyRosetta doesn't know how to reconstruct
the `Pose`, resulting in a segfault). Effectively, a user remains locked
out of `.b64_pose` and `.pkl_pose` files unless PyRosetta is initialized
correctly, which can be easily accomplished by PyRosetta initialization
with a `.init` or `.init.bz2` file. Hence, if a user decides to output
results in `.b64_pose` or `.pkl_pose` format, the `.init` file can then
be used to initialize PyRosetta identically and load the `.b64_pose` or
`.pkl_pose` file into memory.
---------
Co-authored-by: Rachel Clune <rachel.clune@omsf.io>