「view this page in B3 βῆτα server」

Revisions №60618

branch: master 「№60618」
Commited by: Andrew Leaver-Fay
GitHub commit link: 「443610ff4bc39f2b」 「№3792」
Difference from previous tested commit:  code diff
Commit date: 2019-02-11 11:45:47

Merge pull request #3792 from RosettaCommons/aleaverfay/jd3_fix_archive_crosstalk_deadlock Fix deadlock bug in archive-to-archive communication in JD3 Previously, if archive 1 finished outputting all of its results, the master node could assign it to retrieve a result from archive 2, which was not done outputting all of its results. Archive 1 would then go and send a message to archive 2: an MPI_Send request. Archive 1 would block until archive 2 responded. While it was waiting, output work could get assigned to archive 1. When archive 2 would get back to the master node, the master node would assign it to retrieve the result from archive 1. Then archives 1 and 2 are both sending MPI_Send requests. This is deadlock. MPI_Send requests do not exit until the corresponding MPI_Recv has been called on the remote host. Oops. The solution is simple: do not allow archives to talk to each other until the very end of the simulation when it is possible to guarantee that no work will arrive for Archive 1 while it waits to hear from Archive 2.

...