Merge pull request #4997 from RosettaCommons/vmullig/non_recursive_refold
Attempt to remove recursion in internal coordinate updates
Our current internal coordinate update logic can result in call stacks thousands of layers deep. When stack size is limited, this can result in stack overflows. Andy and I have been running into this working with larger poses in threads on MacOS, where the stack size of non-primary threads is limited to 512 kB (something that it's not possible to override when using std::threads).
Switching to non-recursive logic will probably have benefits for performance even outside of a multi-threaded context, so let's give this a try.
This is complicated by the fact that, given a parent atom P and children C1...CN, the stub used to update C1 is modified by the updating operation _and then_ passed to C2 to update it (where it changes again). So the existing recursive logic was much easier than a non-recursive iterative approach, since it's not straightforward to store the data that are needed to compute each child. The new logic is to iterate over atoms at a given level of depth, updating those atoms' coordinates while constructing two lists: a list of unique stubs with which to update their children, and an ordered list of pairs of (child atom, pointer to stub to use to update). In the next iteration of batches of parents, the old children become the new parents and the old children's stublist becomes the new parents' stublist, parents' coordinates are updated in sequence (suitably modifying the stubs in the parent stublist), and a new children's stublist and children list is populated. This continues until a generation produces no children. This is summarized below:
Old:
- Start with root atom and its stub.
- Recursive function:
- Update current atom with stub (updating stub in the process).
- Construct child's starting stub.
- Loop over all children:
- Recursively call this function, passing in a child and the child's stub (both of which will be modified).
New:
- Allocate space for list of parents' stubs, ordered list of parents.
- Allocate space for list of children's stubs, ordered list of children.
- Put root atom and its stub into ordered list of parents and list of parents' stubs, respectively.
- Do:
- Loop over current parents:
- For each parent, update current atom with stub (updating stub, owned by parent stublist, in the process).
- Construct child's starting stub, and store it in the children's stub list.
- For each child, store pair of (child, pointer to child's stub in children's stub list) in children's list.
- Delete parents' stub list and list of current parents.
- Children's stub list and children's list become parents' stub list and parents list, respectively. (Done with dual buffers and pointer swap for efficiency).
- Break if new parents list is empty (i.e. no children from last round).
A few details:
- I'm avoiding owning pointers due to the cost of incrementing and decrementing reference counts.
- I am using raw pointers, but only for accessing objects that are owned by something else (where the lifetime is longer than the function in which I use raw pointers), which is no more dangerous than using references, or for objects owned by std::deques where the lifetime is longer than the raw pointers (again, no more dangerous than using references).
- Since appending to a deque doesn't invalidate references to existing elements, storing pointers to those elements is safe.
Tasks:
- [x] Switch internal logic to a non-recursive algorithm based on ordinary iteration.
- [x] Debug.
- [x] Check unit tests.
- [x] Check integration tests.
- [x] Check performance tests.
- [x] Check profile tests.
- [x] Check scientific tests --> simple_cycpep_predict unchanged.
- [x] Beauty.