Minor change to PackedPose.scores dictionary construction (#612)
Currently in PyRosetta, instantiating a `PackedPose` object triggers
deserialization of the `Pose.cache` dictionary (due to the line
`self.scores = dict(pose_or_pack.cache)`). This is usually fine for
single Python processes where users have already added secure packages
to the unpickle-allowed list (i.e., via the `pyrosetta.secure_unpickle`
module); however, `PyRosettaCluster` spawns new threads using the
`billiard` package on Dask workers, where binary blobs are deserialized
into instantiated `PackedPose` objects, triggering deserialization of
the `Pose.cache` dictionary before users have had a chance to add secure
packages to the unpickle-allowed list in the `billiard` subprocess. This
can raise `UnpickleSecurityError` exceptions if users cache arbitrary
Python types in the `Pose.cache` dictionary, even when users add secure
packages in their PyRosetta protocols, since `PackedPose` instantiation
happens before the user-provided PyRosetta protocols run.
In order to remedy this, this PR updates the `PackedPose` class to
instantiate an empty dictionary on the `PackedPose.scores` attribute,
rather than a detached, flattened copy of the `Pose.cache` dictionary.
Not only does this free up duplicated memory usage from all of the
`Pose.cache` scores copied onto the `PackedPose` object, but I think it
improves the `PackedPose` API intention a bit: the `PackedPose` class
already implements the `PackedPose.update_scores` method, which is
specifically designed to embed new scores into the `Pose` object; and
because the `PackedPose.scores` attribute is currently a detached,
flattened copy of the `Pose.cache` dictionary, this can cause confusion
about what are the ground truth scores in the `PackedPose` object.
Regarding `PyRosettaCluster`, by preventing deserialization of the
`Pose.cache` dictionary upon `PackedPose` instantiation, users can now
programmatically add secure packages in their PyRosetta protocols per
Python process (as the `pyrosetta.secure_unpickle` module was designed),
and the underlying `PyRosettaCluster` infrastructure does not
unintentionally deserialize the `Pose.cache` dictionary just to compress
`PackedPose` objects in transit between the client and Dask workers.
Nonetheless, this minor change has the potential to break scripts in the
wild. Anyone's scripts that call `PackedPose.scores` to access score
values (that were originally embedded in the `Pose` object) now needs to
call `PackedPose.pose.cache` (or the deprecated
`PackedPose.pose.scores`). However, the `PackedPose.scores` attribute
still functions the same way as before as a detached dictionary from the
`Pose.cache` infrastructure, but herein this PR it is just not
pre-populated with the `Pose.cache` dictionary upon `PackedPose`
instantiation; users adding scores to the `PackedPose.scores` dictionary
can still access them there (even though the preferred syntax continues
to use the `PackedPose.update_scores` method).
---------
Co-authored-by: Sergey Lyskov <3302736+lyskov@users.noreply.github.com>