Pull Request №454 RosettaCommons/RFdiffusion/main ← mooreneural/RFdiffusion/main
Merge: 2d0c003df46b9db41d119321f15403dec3716cd9←63f0e71f4edd2d79c46f04b603fe6e628680418c
perf/accuracy: Flash Attention, torch-native SO(3), cosine schedule, DDIM, analytical g(t), acos fix
----------------
Merge commit message:
perf/accuracy: Flash Attention, torch-native SO3, cosine schedule, DDIM, analytical g(t)
Attention (Attention_module.py):
- Replace hand-rolled einsum attention with F.scaled_dot_product_attention in
Attention, AttentionWithBias, and MSAColAttention. Uses Flash Attention
automatically when available on CUDA (20-40% speedup, O(1) memory).
- AttentionWithBias passes the pairwise bias as attn_mask so it is folded into
the fused kernel rather than materializing a separate attention matrix.
SO3 diffusion (igso3.py, diffusion.py, inference/utils.py):
- Add hat_batch(), Log_torch(), Exp_torch() -- on-device rotation ops using
the Rodrigues formula. Eliminates all scipy CPU round-trips during inference.
- Replace scipy_R calls in reverse_sample_vectorized() and diffuse_frames() with
the new torch-native equivalents (stay on GPU, no .cpu()/.numpy() transfers).
- Remove redundant scipy rotation normalization in get_next_frames(); rotation
matrices from rigid_from_3_points are already orthogonal.
Noise schedule (diffusion.py):
- Add cosine schedule (Nichol & Dhariwal, 2021). Enabled via
schedule_type="cosine"; b0/bT are ignored for this mode.
- Analytical g(t) for linear schedule: eliminates a per-step autograd call.
Formula: g(t) = sqrt(2 * sigma(t) * (min_b + t*(max_b - min_b))).
IGSO3 cache (diffusion.py):
- Add module-level _igso3_cache dict. Avoids repeated disk deserialization when
multiple Diffuser objects are created in the same process (batch inference).
DDIM sampling (inference/utils.py):
- Add get_mu_xt_x0_ddim() implementing the deterministic DDIM update rule.
- Wire ddim=True flag through Denoise.__init__() -> get_next_pose() -> get_next_ca().
Setting ddim=True produces deterministic, lower-variance trajectories and
enables fewer-step inference at equivalent quality.
Numerical stability (kinematics.py):
- Clamp input to acos in get_ang() to [-1, 1] to prevent NaN from float
rounding at exactly +/-1.
Pull Request №452 RosettaCommons/RFdiffusion/main ← RosettaCommons/RFdiffusion/docs_video_tutorial_files
Merge: 9535f1938203a24937d7dadf0cb831d02cb5fc0e←92f3c4ca278165a390557fbfc5fc4e637695a35b
Adding files for the soon-to-be-released RFdiffusion video tutorial
----------------
Merge commit message:
Updated README
Added details about:
- the origins of the example used in the tutorial
- how the input file was generated
- how to install STRIDE
Pull Request №452 RosettaCommons/RFdiffusion/main ← RosettaCommons/RFdiffusion/docs_video_tutorial_files
Merge: 9535f1938203a24937d7dadf0cb831d02cb5fc0e←122a2157c1dd74d5737af18e5f6981cd5c37905a
Adding files for the soon-to-be-released RFdiffusion video tutorial
----------------
Merge commit message:
Adding files for the soon-to-be-released RFdiffusion video tutorial
The materials were created by Diego Lopez Mateos, Matthew Hvasta, and Kush Narang for the Tutorial Hackathon track of the 2026 Megathon event.
Pull Request №448 RosettaCommons/RFdiffusion/main ← haoyu-haoyu/RFdiffusion/refactor/extract-magic-numbers
Merge: 9535f1938203a24937d7dadf0cb831d02cb5fc0e←2a7aa2d3a49c4fd3b52bd4cb95e6cc61de90d0d1
refactor: extract magic numbers into named constants module
----------------
Merge commit message:
refactor: extract magic numbers into named constants module
Add rfdiffusion/constants.py centralizing magic numbers used across the
codebase, with documentation of each constant's meaning and provenance.
Replace inline values in 4 files:
- Cbeta reconstruction coefficients (-0.58273431, 0.56802827, -0.54067466)
used in util.py (2x), Embeddings.py, coords6d.py → CBETA_A/B/C
- Amino acid token indices (21=mask, 7=glycine) in run_inference.py
→ AA_MASK_TOKEN, AA_GLYCINE
Also documents additional constants (NO_CONTACT_DIST, CHAIN_BREAK_*,
SE3_*_SCALE, diffusion schedule params) for future refactoring.
Pull Request №445 RosettaCommons/RFdiffusion/main ← haoyu-haoyu/RFdiffusion/fix/replace-deprecated-torch-cuda-amp
Merge: 9535f1938203a24937d7dadf0cb831d02cb5fc0e←a7cc837a05b7944bf06c22795d390adf8821b681
fix: replace deprecated torch.cuda.amp with torch.amp
----------------
Merge commit message:
fix: replace deprecated torch.cuda.amp with torch.amp
`torch.cuda.amp.autocast`, `torch.cuda.amp.GradScaler` were deprecated
in PyTorch 1.13 and will be removed in a future release. Replace with
the device-explicit `torch.amp.autocast('cuda', ...)` and
`torch.amp.GradScaler('cuda', ...)` equivalents.
Files changed:
- rfdiffusion/Track_module.py (decorator on Str2Str.forward)
- env/SE3Transformer/se3_transformer/runtime/inference.py
- env/SE3Transformer/se3_transformer/runtime/training.py (2 instances)
Pull Request №446 RosettaCommons/RFdiffusion/main ← haoyu-haoyu/RFdiffusion/feat/add-input-validation
Merge: 9535f1938203a24937d7dadf0cb831d02cb5fc0e←ac7215240a48d6e41d78dc5ae18204d5a617115b
feat: add input validation for early error detection
----------------
Merge commit message:
feat: add input validation layer for early error detection
Add rfdiffusion/validation.py with validators for:
- PDB file existence and ATOM record format
- Contig string syntax (ranges, chain-residue specs)
- Model checkpoint existence
- Hotspot residue format (chain letter + number)
- Diffuser config parameters (T, partial_T bounds)
Validators are called in Sampler.initialize() and sample_init(), before
GPU allocation and model loading, so users get clear error messages
instead of cryptic tensor shape mismatches.
Pull Request №447 RosettaCommons/RFdiffusion/main ← haoyu-haoyu/RFdiffusion/perf/reduce-gpu-memory-usage
Merge: 9535f1938203a24937d7dadf0cb831d02cb5fc0e←5735aff0e6d512ffa26a4d787613820c2c9bf16f
perf: reduce GPU memory usage during inference
----------------
Merge commit message:
perf: reduce GPU memory usage during inference
- Move trajectory data (px0, x_t, seq, plddt) to CPU immediately after
each denoising step instead of accumulating on GPU. This frees GPU
memory for the next forward pass, reducing peak memory usage
proportional to the number of diffusion steps.
- Remove two unused ComputeAllAtomCoords() instantiations in
get_next_ca() and get_next_pose() that were created every timestep
but never referenced, wasting memory and compute.
Pull Request №426 RosettaCommons/RFdiffusion/main ← RosettaCommons/RFdiffusion/updated_scaffoldguided_fix
Merge: ff20fbafefbdc9b9eb9423754fd418939c32e89e←ecf161b4e2579fdbb9f4a668ac39656bfbc2180a
Updated "Fixed issues with designing in scaffoldguided mode" original PR 386
----------------
Merge commit message:
Move cyclic_reses initialization to a helper function and call it for Sampler and ScaffoldedSampler
Pull Request №426 RosettaCommons/RFdiffusion/main ← RosettaCommons/RFdiffusion/updated_scaffoldguided_fix
Merge: ff20fbafefbdc9b9eb9423754fd418939c32e89e←723a66408c9a5f722812e7bbe13a116fa2fd4f42
Updated "Fixed issues with designing in scaffoldguided mode" original PR 386
----------------
Merge commit message:
Reverting changes to flag name, so you still use scaffoldguided.scaffoldguided=True instead of scaffoldguided_enabled
Pull Request №410 RosettaCommons/RFdiffusion/main ← RosettaCommons/RFdiffusion/fix_test_diffusion
Merge: e22092420281c644b928e64d490044dfca4f9175←dd7643d6406cfbb1332eccaa36e6c0fc786251de
Fix workflow tests that are failing
----------------
Merge commit message:
Revert "Remove try block with except FileExistsError that isn't needed"
After running the test, I found that this was needed
This reverts commit ead721f32670eae379612d9e545690c3f288d251.