Add atom mapping facility to PDB file loading.
-- Gemetric Remapping.
The core of this commit is to allow Rosetta to remap atom names when loading
PDB files, specifically in the case where you may have renamed the atoms.
(Main use case was for sdf file loading, which don't have atom names.)
For example, if you have a ligand params file with one set of atom names,
but want to use it to load a PDB with different atom names for that residue,
you can do that in one of three ways:
* Call ResidueType::remap_pdb_atom_names( true ) in the code
* Add the line "REMAP_PDB_ATOM_NAMES" to your params file.
* Add the commandline flag "-remap_pdb_atom_names_for LG1 LG2", specifying
one or more three letter names of the residues you wish to remap
Then when that ResidueType is used in PDB loading, remapping will happen
based on the percieved geometry of the input residue, matching up the elements
and bonding as best as it can between what's in the ResidueType and the
percieved geometry in the PDB residue. Each input residue of the same name3
can get different name mappings, although the name remapping occurs after the
detection of the best ResidueType, so geometric remapping will not help with
distinguishing between two residues with the same name3.
-- Atom Aliases
As part of the name remapping, it was simple to add atom aliases.
That is, params file can now have an ATOM_ALIAS line, which tells Rosetta
about alternative names for atoms. (Again, this happens post ResidueType
decision.)
For example, we could now add the line:
ATOM_ALIAS 1HH1 HH11
To the ARG.params file to tell Rosetta that HH11 is a valid alternative name
for the hydrogen it normally calls 1HH1, allowing direct coordinate loading
for that atom from PDB NMR structures.
Note that while I've added the facility to Rosetta, I haven't yet added any
ATOM_ALIAS lines to the database, as I was seeing conflicting information
about how to map PDB-typical atom names to Rosetta atom names. Specifically,
depending on source I saw different chiralities for atoms. (e.g. do the ALA
methyl protons HB1->HB2->HB3 go clockwise or counter-clockwise when viewed
along the CA->CB vector?).
-- Bug Fixes
In PDB file loading there were issues with atom recognition. Specifically,
coordinate assignment went via whitespace-stripped name, whereas missing atom
detection went by whitespace-preserved name. This causes Rosetta to rebuild
things like "OXT " atoms, which occur in a number of integration tests (the
expected name is " OXT"). This commit now attempts to use the whitespace
stripped version for coordinate assigment, and then bases the missing atom
detection on which atoms had coordinates assigned. (This difference accounts
for the bulk of the test changes.)
Also, there was a chain-termini adjustment finalization which updated the
residue type with the new terminius variants, but didn't attempt to assign
coordinates to new atoms if there were (otherwise) ignored atoms with the
appropriate names (e.g. " OXT"). This resulted in atoms that didn't get
coordinates assigned, but that weren't necessarily annotated as missing
atoms. This commit attempts to detect that somewhat, and assign coordinates
if the atom is present. (But not as well as if the termini is detected
originally.)
-- Integration test changes expected:
Extra Trace-level output
----------------
FAIL fold_and_dock
Whitespace stripping issues on atom name loading
--------------
FAIL contactMap
FAIL database_jd2_compact_io
FAIL grid_scores_features
FAIL kinemage_grid_output
FAIL ligand_database_io
FAIL ligand_dock_7cpa
FAIL ligand_dock_grid
FAIL ligand_dock_script
FAIL ligand_water_docking
FAIL residue_data_resource
FAIL sdf_reader
FAIL startfrom_file
FAIL write_mol_file
Termini fixup coordinate related changes
----------------
FAIL zinc_homodimer_design
Also silently affects:
AnchorFinder
motif_extraction
rs_flexbbmoves
zinc_heterodimer
Cosmetic tracer changes
-----------------------
FAIL AnchorFinder
FAIL gen_lig_grids
FAIL jd2test