Merge pull request #3969 from RosettaCommons/JWLabonte/PTMs/kinases2
Kinases: new kinase data
This merge simply adds a single data file for a new kinase consensus sequence.
All tests pass, except for the plethora of already-broken ones.
notify author
notify list [rosetta-logs@googlegroups.com]
Merge pull request #3933 from RosettaCommons/JWLabonte/sugars/OGT
Glycosylation: Adding EnzymaticMover for O-linked glycosylation
This merge primarily adds an O-linking glycosyltransferase to the database for use with the `GlycosyltransferaseMover`.
It also moves some testing apps to public, so that people can start using `EnzymaticMover`s more readily.
It also allows an optional string `-` to be used to tell the `EnzymaticMover` system to modify at the default atom for a `ResidueType`. This is helpful for cases with ambiguous site modifications, such as either Ser or Thr, which each have a different name for their hydroxyl oxygen in their `.params` files.
Finally, it fixes a bug in `GlycanTree`.
Documentation for the two added apps is available on the documentation wiki.
All unit tests pass. The two integration test changes are expected.
notify author
notify list [rosetta-logs@googlegroups.com]
Merge pull request #3966 from RosettaCommons/smlewis/fix_NamedAtomPairConstraint_type
overrides for some constraint classes. Fix NamedAtomPairConstraint's type()
notify author
notify list [rosetta-logs@googlegroups.com]
Merge pull request #3963 from RosettaCommons/JWLabonte/PTMs/kinases
Kinases: Update GSK3B consensus sequence
This merge simply updates a single consensus sequence used by the `EnzymeManager`.
Database change only.
notify author
notify list [rosetta-logs@googlegroups.com]
Merge pull request #3960 from RosettaCommons/rfalford12/fix-franklin2019-fprint-test
Adding a flag to mute the franklin2019 integration test sha1 and date from output
notify author
notify list [rosetta-logs@googlegroups.com]
Merge pull request #3955 from RosettaCommons/roccomoretti/fix_unit_timing
Update timeout for certain unit tests 'on the edge'.
There's certain unit tests which are just on the edge of their timeouts. Most of the time they're fine,
but if the test server has a slow down, they fail. From the test history on the server, update those tests which have had such issues in the past.
notify author
notify list [rosetta-logs@googlegroups.com]
Merge pull request #3952 from RosettaCommons/roccomoretti/fix_stderr_redirect
Fix stderr redirect issues.
The order in which you put the "2>&1" stderr redirection matters. The shell parses output redirection from left to right, and the "2>&1" tells the shell to redirect stderr ("2") to the current location of stdout ("1"), not to make them match in perpetuity. So if you redirect output to a file, you need `> logfile 2>&1`, not `2>&1 > logfile`.
As such, fix a number of locations (primarily in testing) which were putting things the wrong way around.
notify author
notify list [rosetta-logs@googlegroups.com]
Merge pull request #3953 from RosettaCommons/vmullig/bin_analysis_in_cluster_app
Add bin string analysis to energy_based_clustering application
This allows the number of unique ABOXYZ bin strings (with or without cyclic permutation, depending on whether cyclic permutations during clustering are allowed) to be counted.
Completed tasks:
- Add flag for this analysis.
- Checks that this is only being applied to alpha-amino acids or to peptoids.
- Add functions for computing bin strings.
- Add counters for unique bin strings.
- Add output:
- Bin strings for every structure.
- Bin strings for every cluster center.
- Summary of number of unique bin strings (for structures and for cluster centers).
- Summary of number of unique bin strings if strings and their mirror images are considered to be equivalent (for structures and for cluster centers).
- Unit tests.
- Beauty.
- Documentation -- added to this page: https://www.rosettacommons.org/docs/latest/application_documentation/analysis/energy_based_clustering_application (though it may take a day or so to show up).
- Integration test.
notify author
notify list [rosetta-logs@googlegroups.com]
Merge pull request #3367 from RosettaCommons/rfalford12/implicit_lipid_membrane
Implicit Lipid Membrane Energy Function
A new all-atom, physics-based energy function for membrane protein modeling and design. The energy function captures the anisotropic structure and dimensions of phospholipid bilayers through parameterization from biophysical data and multi-scale computational modeling. We use an implicit representation inspired by the original Lazaridis Implicit Membrane Model. The key improvements are (1) polarity gradients derived from thermodynamically-rigorous transfer energy measurements, (2) membrane thickness parameters derived for different lipid compositions and (3) a continuous, differentiable aqueous pore/channel representation.
All of this work is documented in:
Alford RF, Fleming PJ, Fleming KG, Gray JJ (2019) "Protein structure prediction and design in a biologically-realistic implicit membrane" bioRxiv
Link: https://www.biorxiv.org/content/10.1101/630715v1
The energy function is called `franklin2019`, named after Rosalind Franklin to honor the achievements of women in science. The energy function will be the **default** for applications using the RosettaMPFramework. To revert to the previous behavior, use the flag `-restore_imm_lazaridis_behavior`
Below is a summary of the code captured by this branch:
## Additions & Changes to the source code
#### Database
_Deleted_
- membrane.mp
- embeddings.mp
_Added_
- `database/membrane/implicit_lipid_parameters.txt`: Parameters describing different membrane lipid compositions derived from a combination of all-atom molecular dynamics simulations and X-Ray/Neutron scattering measurements of planar phospholipid bilayers
- `database/membrane/memb_fa_params_2019.txt`: Per-atom water-to-lipid transfer energies derived from Moon & Fleming, 2011
#### Applications
_Added_
- `apps/pilot/ralford/mp_seqrecov.cc`: Calculates sequence recovery statistics dependent on the fractional hydration of side chains relative to the membrane environment. In addition to total, buried, and exposed sequence recovery, we also compute statistics for lipid-facing, aqueous-facing, and interfacial side chains.
- `apps/pilot/ralford/color_by_lipid_type`: Fills the b-factor column with xyz-dependent fractional hydration values. Can be used to visualize the hydration in PyMOL
_Made compatible with the membrane framework_
- `apps/pilot/frank/min_test.cc`
- `apps/public/design/fixbb.cc`
- `apps/public/backrub.cc`
_Removed redundant mpframework calls_
- `apps/public/membrane/mp_lipid_acc.cc`
#### Objects
- `numeric/linear_algebra/EllipseParameters.*`: A class defining the shape of a two-dimensional rotated ellipse
- `core/conformation/membrane/AqueousPoreParameters.*`: A class defining the shape of an aqueous pore which varies in the x-, y-, and z-dimensions
- `core/conformation/membrane/ImplicitLipidInfo.*`: A class to define the physical and chemical properties of the implicit membrane environment. Currently, it is mainly used by the energy function and stores (1) parameters of the hydration function (.e.g thickness, rate of transition, pore size), (2) lipid composition details, (3) hydration function smoothing parameters, and (4) structure-based lipid accessibility information
- `protocols/membrane/scoring/MEnvAtomParams.*`: A container class for atomic water-to-bilayer transfer energy parameters
#### Algorithms
- `numeric/linear_algebra/minimum_bounding_ellipse`: An implementation of the Khachiyan minimum-bounding ellipse algorithm
#### Movers
- `AqueousPoreFinder`: Calculates the parameters of an elliptical aqueous pore with varying cross section
- `MembraneEnergyLandscapeSampler`: Map the energies to all possible orientations of single transmembrane peptides as a function of tilt angle and depth relative to the membrane normal and center.
- `PeptideOrientationMover`: Calculate the energy of a peptide at a specific tilt angle and depth
#### Energy Terms
- `fa_water_to_bilayer`: Calculates the water-to-bilayer transfer energy of an atomic group given its identity and fractional hydration. This energy term is defined within `FaWaterToBilayerEnergy` in `protocols/membrane/scoring`
#### Adjustments to the RosettaMP Framework
- `MembraneInfo` now stores an ImplicitLipidInfo object
- The hydrogen bonding energy correction is no longer the default, since it is not currently used by franklin2019. The user must pass the flag `-mp:scoring:hbond`. This has been adjusted in the older MP Framework integration tests.
- `AddMembraneMover` includes three new steps in the default setup: (1) initialize per-atom lipid accessibility data (2) initialize lipid-specific parameters, and (3) initialize the dimensions of the aqueous pore by default. The prior behavior can be obtained by passing the flag `-restore_imm_lazaridis_behavior`
- `MPLipidAccessibility`: Adjusted to store data so it can be passed to other classes. Also added an additional critera for being an alpha_helical memrbane protein, and updated the thickness to use ImplicitLipidInfo where appropriate
- Updated the sub-class `SymmetricAddMembraneMover` so it still adheres to the class definition
#### Options
- `-restore_lazaridis_imm_behavior`: Restore default membrane energy function behavior to Lazaridis IMM1
- `-mp:lipids:composition`: Type of lipids to use in implicit model representation, default is DLPC
- `-mp:lipids:temperature`: Temperature at which the lipid composition parameters were measured, default = 37.0
- `-mp:lipids:has_pore`: Manual override to not use pore estimation
## Additions & Changes to the tests
#### Unit Tests
_Added_
- test/core/conformation/membrane/ImplicitLipidInfo.cxxtest.hh
- numeric/linear_algebra/minimum_bounding_ellipse.cxxtest.hh
_Extended_
- protocols/membrane/AddMembraneMover.cxxtest.hh
- protocols/membrane/AqueousPoreFinderTest.cxxtest.hh
- protocols/membrane/MPLipidAccessibility.cxxtest.hh
- test/protocols/membrane/MembraneUtil.cxxtest.hh
#### Integration tests
_Added_
- mpil_load_implicit_lipids
- mpil_find_pore_ahelical
- mpil_find_pore_bbarrel
_Adjusted for compatibility_
- homodimer_fnd_ref2015_memb
- mp_dock
- mp_dock_prepack
- mp_dock_setup
- mp_domain_assembly
- mp_domain_assembly_FtsQ
- mp_find_interface
- mp_interface_statistics
- mp_mutate_relax
- mp_mutate_repack
- mp_quick_relax_ref2015_memb
- mp_range_relax
- mp_relax
- mp_span_ang_ref2016_memb
- mp_symdock
- mp_transform_optimize
- mp_vis_emb
- res_lipo_ref2015_memb
#### Score Function Fingerprint Tests
_Added_
- franklin2019
_Adjusted for compatibility_
- membrane_fa_pH
- membrane_fa_scorefxn
- menv_smooth_sfxn
- ref2015_memb
- ref2015_on_memb
Rocco Moretti 6 years It looks like there's date and version info in the output structures for the franklin2019 score tests, which are causing persistent "failures" on the test server.
notify author
notify list [rosetta-logs@googlegroups.com]
JD3 Checkpointing (#3939)
Checkpoint progress in JD3 when using MPI. Currently works for multistage_rosetta_scripts.
Checkpointing is a technique where the state of the system at a particular point in time is saved in a stable way (e.g. on disk) so that if the job dies or is killed, then the work up until the point of the checkpointing is not lost.
Checkpointing in JD3 is managed by the JobDistributor. The other classes involved in the checkpointing (including the JobQueen) do not need to think about how checkpointing will work, but merely how to serialize and deserialize their data. (There is a notable exception here, discussed below).
The user will tell the job distributor to checkpoint every certain number of minutes (flag `-jd3::checkpoint_period <minutes>`), e.g. 30 minutes, and the job distributor on node 0 will look at the (wall) clock each time it can* and when 30 minutes have passed since the last checkpoint was made, it will ask the job distributors on the archive nodes (if any) to begin checkpointing, and it will serialize its data to disk. If the job should be killed before it completes, then the user can restart the job (taking care to use the same command-line flags as before) with the additional flag `-jd3::restore_from_checkpoint`. The JobDistributor on node 0 will deserialize the data in the checkpoint file and then resume execution of the jobs from that point where the checkpoint was taken. Some work would have been lost: the work that took place between the last checkpoint and the time the job died.
(* The JobDistributor on node 0 spends most of its time in a while loop where it waits to hear from other MPI processes and then responds to their request. At the top of this while loop is a "receive mpi integer message from anyone" call which blocks until some node sends node 0 an integer. The JobDistributor on
node 0 might wait inside this blocking receive call beyond the moment when the wall clock would say that a new checkpoint is due. The JobDistributor has to wait until someone sends it a message, then the JobDistributor will process that message. After it has processed the message, but before it re-invokes the blocking receive request, it will look at the clock and checkpoint if necessary. For this reason, the JobDistributor will not checkpoint at the exact moment it becomes possible to checkpoint. If you have a job that will be killed at exactly 1 hour, e.g., then you should not set the checkpoint interval to 59 minutes: the JobDistributor might not ever checkpoint before the job is killed.)
Not all MPI Nodes serialize their data: only the master node and the archive nodes. The worker nodes do not need to store their data: they are presumed to have no significant state. One advantage of this system is that you can restore from a checkpoint with a different number of worker nodes. (You need to have the same number of archive nodes as the original job). The only JobQueen to be checkpointed is the JobQueen on the master node, (node 0, thus, we call this JobQueen JQ0).
The job distributor makes this pledge: if the JobQueen delivered messages to the JobDistributor, then the JobDistributor is responsible for ensuring those messages are acted on. If the JobQueen delivers a LarvalJob to the JobDistributor, then the JobDistributor ensures those LarvalJobs get run. If the JobQueen delivers a JobOutputSpecification to the JobDistributor, the JobDistributor ensures that those outputs get written.
The JobDistributor does not guarantee, however, that the JobQueen's discard messages are delivered. The idea is this: the discard messages are to remove lazily-loaded data from memory after that data are no longer needed. If the original process has died and the process re-launched, then the lazily-loaded data will not be in memory when the job starts again. Remember, the JQs on the worker nodes are not checkpointed.
The exception to the idea that the JobQueen does not need to think about how checkpointing should work is that if she has any data that cannot / should not be serialized, then the JobQueen should gracefully handle events where that data is surprisingly absent. E.g., let's say the JobQueen has a pointer to a big blob of data, BBOD_OP. If the JQ doesn't serialize that data, then during the restore-from-checkpoint process, that BBOD_OP will not get set. In that case, the JobQueen should make sure to load that data before trying to use it. In this way, the JQ should be minimally aware of how checkpointing might work.
What are examples of this kind of data? If the JobQueen holds a RosettaScriptsParserOP, e.g., that class is currently not serializable. (Let's assume that it could not be serializable, even if that might not be true of this class). In this case, the RosettaScriptsParser serves the purpose of storing the libxml2 objects defining the schema so that it does not need to be regenerated repeatedly (since this step can take ~10 seconds). One option for the `save`/`load` methods of the JobQueen would be to 1. not archive the RosettaScriptsParser in its `save` method, and 2. to create a new (empty) RosettaScriptsParser in its `load` method. This would guarantee that the RSPOP was never null. Alternatively, step 1 could remain the same, but for step 2, the JQ could set its RSPOP to null. Then the code that intends to use the RSPOP would have to surround its usage with an `if (rspop_ == nullptr ) { rspop_ = make_shared< RSP >(); }`. Probably the first option is better!
Some points about restoring from a checkpoint
* The number of archive nodes must be the same, but the total number of nodes can be different
* It is possible to enable `-jd3::archive_on_disk` in the restored job even if that flag was not present on the command line for the first job (which might be useful if your job died the first time because it ran out of memory on the archive nodes!)
* If you are writing pdbs to output silent files, then jobs that were output after the checkpoint was created might be written a second time to the same or different silent file when restoring from that checkpoint.
* You cannot add a new option to Rosetta, recompile, and then try to restore a job from a previously generated checkpoint. This is due to the way the OptionCollection is created: each OptionKey is assigned an integer from a static counter at program load. If the integer assigned to a particular option key is different when trying to restore from a checkpoint, the OptionCollection will misbehave. (I can imagine a scenario in which an OptionCollection serializes itself as a string resembling the command-line that would generate the state of that OptionCollection and then deserialize itself by re-interpretting that string: this would fix this limitation. Studying the OptionCollection more closely just now, however, makes my imagination seem unrealistic. The OptionCollection and the OptionKey system is configured in a way so as to make a option-key-name-string-based serialization strategy impossible.).
notify author
notify list [rosetta-logs@googlegroups.com]
Merge pull request #3949 from RosettaCommons/roccomoretti/shorten_sewing_runs
Shorten legacy sewing runs.
A quick & dirty profile of the sewing integration tests indicates that a substantial amount of time is being spent in Model::model_end() calculations in the Hasher::transform_model() function. This is likely superflous recalculation, as it's only being used as a loop end condition.
Fix the code to only calculate the end iterator once. It doesn't solve the integration test timeout issues, but it should reduce the runtime somewhat.
notify author
notify list [rosetta-logs@googlegroups.com]
Merge pull request #3945 from RosettaCommons/JackMaguire/FlatSetsAreNotSorted
Small MC HBNet touch-ups
- Addressing a flaw that @Haddox found in MC HBNet's saturation estimator for Nitrogen atoms.
- see HBNet.cc line 4169
- Adding sanity checks in debug mode
- see HBNet.cc line 3800
- Removing a small amount of redundant work in an inner loop
- see HBNet.cc line 4001
All of the test failures seem to be timeouts. I don't think any are tied to this PR but I am willing to take a closer look if you think one is fishy.
notify author
notify list [rosetta-logs@googlegroups.com]