Revisions №60145

branch: master 「№60145」
Commited by: Jared Adolf-Bryfogle
GitHub commit link: 「4441cacd03a9ff91」「№3080」
Difference from previous tested commit: code diff
Commit date: 2018-04-10 11:51:20

Merge pull request #3080 from RosettaCommons/jadolfbr/simple_metrics SimpleMetrics for analysis, filters, and features This PR adds **SimpleMetrics** to Rosetta, which I'm hoping will be the way we write metrics and filters going forward. They are easy to write, easy to use and (IMO) powerful - as you can use them to analyze data, use all of them for filtering, and/or create features reporter databases. I have not rewritten all of the filters, as they do some really crazy stuff. Additionally, we can use this to revive the scientific benchmarks. These have all been tested. Extras: - MinMover now works with new the new MoveMapFactory in RS - utility to define/limit RS option xsd as a vector of strings - All simple metrics have code_templates ToDo: - [x] Test all code - [x] Write integration tests - [x] Write unit test suite - [x] Write Documentation # SimpleMetrics Base Class: **SimpleMetric** Main SubClasses: **RealMetric**: Returns `Real` **StringMetric**: Returns `string` **CompositeRealMetric**: Returns `map< string, Real >` **CompositeStringMetric**: Returns `map< string, string >` ## SimpleMetric code - `core/simple_metrics` - `core/simple_metrics/metrics` - `protocols/analysis/simple_metrics` The two main methods of these classes are ` calculate( const & pose ) const` which returns a value (listed above), and `void apply( & pose, prefix="", suffix="") const`, which runs calculate and adds data to the pose. - `calculate( const pose )` : The calculate method is basically the only method you will need to code in your derived simple metric. The value depends on the type of subclass you are deriving from. - `apply( pose, prefix="", suffix="")` : This method adds the data to the pose as setPoseExtraScore so EVERY metric can be easily output into the score file. This apply method is defined in each main class, **so you do not need to write it and can always rely on it**. The metric name that is output as the score tag is `prefix+metric()+suffix`. Additionally, each metric from a composite metric is named (the key from the map). ## Implemented SimpleMetrics ### Main **RMSDMetric**: - Calculate the RMSD between an input (native) or reference pose to the current pose. Accepts a `ResidueSelector`. Works for protein/ligands/glycans/etc. Can accept TWO `ResidueSelector`s in order to compare non-matching areas. More options code-wise. Many options accepted as a string to determine HOW the calculation is done. Default is to be robust against non-matching atoms in a residue, allowing comparison of post-traslational modifications and similar mutations (option `robust`). **DOES NOT SUPERIMPOSE ** (and nor should it) RMSD Types (`rmsd_type` rosetta_script option) ``` rmsd_protein_bb_heavy, rmsd_protein_bb_heavy_including_O, rmsd_protein_bb_ca, rmsd_sc_heavy, rmsd_sc, rmsd_all_heavy, rmsd_all, ``` **DihedralDistanceMetric**: - Return the normalized BB dihedral angle distance from directional statistics in degrees. This was used for North/Dunbrack CDR clustering, but can be useful for comparing loops or regions of interest. A good internal comparison of structural change - as lever-arms do not effect measurement too much, so natural fluctuation is compared well. This also does not require a superposition. Accepts a `ResidueSelector`. Works for proteins and glycans. **TotalEnergyMetric** - Returns the total energy of the score function. Can set a `ResidueSelector` to limit to only those residues (yes, the hbond-energy is decomposed). Can set input structure or reference pose to get **DELTAS** **CompositeEnergyMetric** - Returns the energy of each nonzero score term in a score function. Can set a 'ResidueSelector' to limit to only those residues (hbonds decomposed). Can set input structure or reference pose to get **DELTAS**. **SasaMetric**: - Calculate the SASA of the pose or set of residues from a `ResidueSelector` ### Utility **TimingMetric** - Safely output the time passed from construction till calculate/apply in minutes. Options for hours. Useful for getting runtimes or averaging runtimes for protocols. Using two between movers or sets of movers can allow you to calculate time between them. **SelectedResiduesMetric** - Output residues selected by a `ResidueSelector`. Either pose or PDB numbering. Useful to get at the pose-numbers of residues of interest. **SelectedResiduesPyMOLMetric** - Output residues selected by a `ResidueSelector` as a PYMOL selection. Useful for very complex selections such as Layer Selection. # RunSimpleMetrics Run the metrics defined in the the `<SIMPLE_METRICS>` block (or defined as sub tags) (or set via code). The RunSimpleMetrics (mover) takes a list of metrics and runs them, adding the data to the pose for direct output into the score file using any set prefix/suffix. In this way, we can use them to show differences between movers or sets of moves if desired. Recommended to use json-format score files. # SimpleMetricFilter Takes a single metric from to run as a filter (including strings and composites). ## General Use You are required to give a `cutoff` value and `comparison_type` (eq, ne, lt, gt, lt_or_eq, gt_or_eq) to control the behavior of the filter (basically, when to return True). We do not define the cutoffs within the metrics - as filters currently work. If you want to write a filter that does this, write a `SimpleMetric` and use it within a classic filter with a cutoff. Comparison is done as `value comparison_type cutoff`. So if your `cutoff=4.0` for RMSDMetric and your` comparison_type` is lt, we return true if the value is less than 4.0. ## StringMetrics String metrics only work with eq and ne and this is checked. Instead of cutoff, StringMetrics require a `match` option to be set, which is simply a string. In this way, you can filter unwanted SS, sequence changes, or some other metric. ## CompositeMetrics Composite metrics require an additional option, `composite_action`. This can be `any`, `all`, or a specific composite type. For example, the `CompositeEnergyMetric` can return the energy value or delta energy between an input pose for each energy term. If we set this to `any`, then we return TRUE if ANY composite matches. ALL only returns TRUE if all of the metrics match our filter criteria. Additionally, we can give, for example, 'chainbreak' to filter specifically on the chain break term. # SimpleMetricFeatures The SimpleMetricFeatures simply takes a list of metrics. You can control the table that they get written to and can run the same set multiple times. If the table columns do not match on subsequent runs, we will exit with an informative error message. ## Documentation: https://www.rosettacommons.org/docs/wiki/scripting_documentation/RosettaScripts/RosettaScripts#rosettascript-sections_simple_metrics https://www.rosettacommons.org/docs/wiki/scripting_documentation/RosettaScripts/SimpleMetrics/SimpleMetrics

Rocco Moretti 7 years
It looks like the new simple_metric_features and simple_metric_filter integration tests are in a "perma-broke" state.

Jared Adolf-Bryfogle 7 years
Thanks Rocco, I'll fix this after the commit freeze.

Summary

...