「view this page in B3 βῆτα server」

Revisions №60046

branch: master 「№60046」
Commited by: Vikram K. Mulligan
GitHub commit link: 「8458d94ef365c640」 「№2948」
Difference from previous tested commit:  code diff
Commit date: 2018-02-15 20:31:05

Merge pull request #2948 from RosettaCommons/vmullig/fix_bluegene_crash_bug Fix spiky memory usage in multi-threaded mode causing crashes on Blue Gene/Q system The current implementation of the threadsafe lazy loading errs to the side of avoiding long locks, but thereby allows threads to waste effort concurrently loading large objects from disk that result in temporary spikes in memory usage (even though only one copy of the object is permanently stored). The old scheme for loading rotamer libraries looked like this: - obtain read lock - check for existence of library, and return it if it exists, releasing read lock. - release read lock - read library from disk (creating memory object) <-- Many threads might do this at once. - obtain write lock - add library to map if it isn't in there already <-- Only one thread does this. For other threads, the memory object that was created is eventually discarded. - release write lock - obtain read lock - return pointer stored in map, releasing read lock This meant that write-locking (which lets only one thread access the object) was as short as possible, but many threads might read a rotamer library from disk and set it up in memory before the first thread obtained its write-lock and added the created rotamer library to the map of rotamer libraries. This pull request revises this slightly, so that it now looks like this: - obtain read lock - check for existence of library, and return it if it exists, releasing read lock - release read lock - obtain write lock - check again for existence of library in map, and return it if it exists, releasing write lock - read library from disk (creating memory object) <-- Only one thread does this, now. - add library to map - release write lock - obtain read lock - return pointer stored in map, releasing read lock This means that more threads might be sitting idle for slightly longer, while reads from disk are taking place, but that every read from disk is a productive one, and every memory object created temporarily is actually stored and used. It avoids big spikes in memory usage from transiently-duplicated memory objects. These spikes were causing crashes on limited-memory systems like the Blue Gene/Q system. Tasks: - [x] Fix the problem. - [x] Confirm that rotamer libraries are only loaded once (by launching 40 threads on Jojo simultaneously and measuring the number of reads of the OU3_TRP library). - [x] Confirm that this addresses the memory spike and crash issue on the Blue Gene/Q system. - [x] Check whether this has any significant impact on multithreaded performance on Blue Gene/Q. (It ought not to. Initialization might be very slightly slower but with reduced disk access and fewer memory spikes; runs after lazily-loaded objects are loaded should be unaffected.)

...