Merge pull request #514 from RosettaCommons/roccomoretti/autoruntime
Add -run:maxruntime_bufferfactor option to JD2
On the cluster that I use, there's a hard time limit cutoff - exceed your requested time, and your job gets mercilessly killed and you get an email about it. This isn't necessarily good for output files, and if you're running a lot of jobs your email can be spammed if you misjudged timing.
You could use -run:maxruntime, which will stop Rosetta with a clean exit if you exceed the setting - although you have to give yourself plenty of space, as JD2 only checks this when launching a new job. This means it's easy to run over - particularly if the cluster node is slower than your test run machine.
Enter the new -run:maxruntime_bufferfactor option. This allows you to specify a real number which is the multiple of the estimated (currently average) job runtime. If you're within that time of the maxruntime value, you'll exit cleanly with a "too little time remaining" message. The estimated job runtime is calculated from the actual runtimes of the jobs on the run.
So if you're launching an eight hour run on a cluster, you can simply add "-run:maxruntime 28800 -run:maxruntime_bufferfactor 1.5" to your commandline/options, and if there isn't enough time to complete 1.5 jobs (on average) in the remaining time before the 8 hours is up, Rosetta will exit cleanly (though with an error message). No timing test runs or mental math required - it should work if you jobs take 33 s or if they take 3333 s. (Rosetta should stop at 7h59m36s and 7h24m24s, respectively.)
This option is off by default, even when maxruntime is on, so no test changes are expected. The default behavior should stay the same for all Job Distributor types, although I'm not sure how well things work for MPI and the like when the new option is active.