Merge pull request #1875 from RosettaCommons/roccomoretti/pose_rts
Enable Storing of ResidueTypes in the Pose
One of the slightly evil things we've been doing is allowing the global ResidueTypeSets to be modified at runtime - that is, you could add/delete ResidueTypes into the main RTSs themselves. This is slightly evil as we start to work with thread-safety and serializing poses to send them to other computers. Not only could you have issues where you delete a RT that another thread is relying on, if you've added a RT to the global RTS, the remote computer which you're sending the pose to may or may not have that RT ... oops.
To solve this, I've made the global RTSs non-writable, except for initialization, and then put in a facility to attache RTSs to the Pose (or rather the Conformation). To enable this, there's been slight-but-substantial changes required in the RT/RTS/ChemicalManager interface.
Instead of identifying RTSs by name ("fa_standard"/"centroid"), the preferred method is now by a TypeSetMode enum from core/chemical/ChemicalManager.fwd.hh. Also, because the Pose (Conformation) can have a ResidueTypeSet which is different from the global RTS, the preferred method of accessing an RTS is by using the new Pose::residue_type_set_from_pose() or Conformation::residue_type_set_from_conf() methods. These will give you the specialized RTS for the pose, or if it's not been specialized, will fall back to the global RTS of the appropriate mode. -- You can either use these with no arguments, in which case it will do the equivalent of looking at is_fullatom()/is_centroid(), or you can explicitly specify the TypeSetMode you're interested in.
To enable this, ResidueTypeSet has been made a virtual base class, with two derived sub-classes: GlobalResidueTypeSet and PoseResidueTypeSet. GlobalResidueTypeSet is the effectively the same RTS we're familiar with, just with all the methods which allow modification made private/protected. Once a GlobalResidueTypeSet has been initialized from the database, it's fixed. (Well, except for the whole lazy loading thing.)
PoseResidueTypeSet is the modifiable RTS which is stored in the Pose/Conformation. It's set up to be a layer on top of another RTS. That is, a PoseRTS will have its own RTs/patches which it will add "on top of" the underlying RTS. So if you have a full atom PoseRTS, you should typically have full access to the full atom GlobalRTS through it, without having to do any additional code.
The difference between the two should be irrelevant, unless you're actually attempting to modify a PoseResidueTypeSet - the major accessors should return a (base class) ResidueTypeSetCOP, and all the relevant methods should be accessible from the base class interface. In fact, Pose::residue_type_set_from_pose() will return a pointer to either a PoseRTS or a GlobalRTS, depending on if the Pose has a custom RTS or not.
The one difference which may trip you up is that the ResidueTypeSet interface no longer has a name() method, as it's meaningless for PoseRTSs. Instead of "name" you should be working with "mode" (TypeSetMode) instead. (There's utility functions in ChemicalManager.hh for converting them to strings, including being able to directly feed them to stream output.)
To keep pose copies relatively fast, the PoseResiueTypeSet function with copy-on-write semantics. That is, to add a RT to the PoseRTS, you get a copy of the PoseRTS from the Conformation::modifiable_residue_type_set_for_conf() method, make your modifications, and then use the Conformation::reset_residue_type_set_for_conf() method to reset the version in the Conformation. This means that multiple Conformations can share the same PoseRTS. Furthermore, multiple RTSs can share the same RTs.
As such, ResidueTypes (and by extension Residues) no longer have a back-pointer to the ResidueTypeSet they come from. Instead, there's a ResidueType::mode() method which returns the TypeSetMode for the RT. As such, you now need to re-write things such that you go through the Pose/Conformation to get the corresponding RTS. (This is a good thing, by the way, as if you're changing ResidueTypes, you want to give the Pose the opportunity to invoke its customizations, instead of going back directly to the GlobalRTS.)
Other minor changes worth mentioning
* AtomTypeSets now have TypeSetMode values, too
* There's read_topology_file() versions which grab the relevant typesets from a ResidueTypeSet.
* "custom" residue types in the RTS interface have been renamed to "unpatchable", to better reflect their role. Additionally, I've switched a bunch of RT loading from the "unpatchable" to base, to permit more wide-spread patching.
* As RTSs are in the pose now, many more things in the core/chemical/ directory have serialization methods added. Side benefit is that it's now much easier to serialize ResidueTypeCOPs, data structures of ResidueTypeCOPs and type set COPs: you should be able to do it directly, and have things just work out.
* The awful mess of RTS loading in ChemicalManger.cc has been moved and reorganized to GlobalResidueTypeSet.cc
Things I haven't done with this pull request
* Making the GlobalResidueTypeSet thread-safe. This should be easier, as we can now know it's only the lazy-loading code which will be touched in conventional usage.
* Exhaustively tested the serialization ability of the PoseResidueTypeSet - there may be some edge cases where bugs still exist. caveat programmor
* The D/L handling is still rather GlobalRTS specific, so mirror image handling might not work quite right with residues stored in the Pose. @vmullig I'll let you rethink this if you want to.
* SQL database interaction isn't necessarily great. If anyone uses it, let me know if there's something that should work better.
* Reading/writing of the in-pose ResidueTypes. Right now, if you output a PDB or a silent file which uses a non-global ResidueType, that ResidueType isn't output - it just disappears. At some point storage of custom ResidueTypes in PDBs/Silent files probably should be added.