Merge pull request #5421 from RosettaCommons/JWLabonte/sugars/bidirectional_linkages
Carbohydrates: Enabling Bidirectional Glycosidic Linkages
This merge introduces "bidirectional glycosidic linkages" into Rosetta.
Most sugar linkages connect an oxygen _not_ attached to the anomeric carbon of one sugar to the anomeric carbon of the "child" sugar. Such linkages are designated with notation such as β-D-Glcp-(1→4)-β-D-Glcp, where C1 is the anomeric carbon and the child residue comes before the parent residue, (since it is considered a substituent to the main group, the parent. Yes, sugar sequences run "backwards" relative to protein ones.)
However, some disaccharides do not have a free hemiacetal group. That is, they are attached from the anomeric hydroxy group of the parent to the anomeric carbon of the child. The notation for such a linkage uses a bidirectional arrow, like in sucrose: α-D-Glcp-(1↔2)-β-D-Fruf.
(One could also write β-D-Fruf-(2↔1)-α-D-Glcp, and, in fact, this is the preferred IUPAC name, because of reasons of arbitrary "group priorities". (See: https://www.qmul.ac.uk/sbcs/iupac/2carb/36.html.) However, this naming is dumb, I declare, because it ignores the biochemical fact that it is the fructose's hydroxyl that is the nucleophile in natural biosyntheses of this disaccharide, and IUPAC allows this so-called "sequential method" for oligo- and polysaccharides: https://www.qmul.ac.uk/sbcs/iupac/2carb/37.html.)
Besides this naming ambiguity, which already mucks up the abyssal nomenclature of sugars in the PDB, there is an issue with main-chain length. "Normal" sugars in Rosetta were designed such that the main chain of a residue always starts at the anomeric carbon and proceeds to the linkage oxygen. However, if the linkage oxygen is off the anomeric carbon, this gives a too-short main chain for Rosetta to be able to handle, _e.g.,_ C2-O2-UPPER for the furanose residue above.
A further problem involves patching. The fructose residue of sucrose can only be a lower terminus; since its anomeric hydoxyl was the nucleophile used to form the linkage, it cannot also have been the leaving group to accept attack from a previous residue.
My solution is to make saccharide residues for such linkages `LOWER_TERMINUS` variants in their `.params` files and to build their main chains from one atom previous to their anomeric carbon. For ketoses, this is would be no big deal, but it is more challenging/annoying for aldoses, which have an anomeric carbon of C1. Thus, I use the virtual ring atom as the first atom for both cases, which requires some minor tweaking of rules for `Pose` creation.