Pull Request №454 RosettaCommons/RFdiffusion/main ← mooreneural/RFdiffusion/main
Merge: 2d0c003df46b9db41d119321f15403dec3716cd9←2c21e73dc63d2c1653f9c8b1e6461bbe8b76a917
perf/accuracy: Flash Attention, torch-native SO(3), cosine schedule, DDIM, analytical g(t), acos fix
----------------
Merge commit message:
fix: numerically stable Log_torch near theta=pi; add benchmark script
The original Log_torch used theta/(2*sin(theta)) * skew throughout [0, pi].
Near theta=pi, the float32 R matrix loses trace precision (sin(theta) -> 0),
causing the computed theta from acos(trace) to diverge from the theta
encoded in the skew elements -- producing up to 10x rotation-matrix error
in the worst case.
Fix: for cos(theta) < 0, estimate theta via pi - asin(||skew||/2) instead.
The skew magnitude 2*sin(theta) remains accurate in float32 even near pi,
avoiding the trace instability entirely. Fall back to R+I decomposition
(R+I = 2*outer(n,n)) only for the exact-pi case where skew -> 0.
All arithmetic is done in float64 on-device; result is cast back to input dtype.
Round-trip error R -> Log_torch -> Exp_torch -> R:
Before fix: max|dR| = 9.83 near theta=pi (catastrophic)
After fix: max|dR| = 2.25e-04 full range, mean = 1.58e-07
Also adds scripts/benchmark_pr454.py for measuring PR #454 improvements.