* 2024-04-22, [https://dl.acm.org/doi/abs/10.1145/3627703.3650085 Just-In-Time Checkpointing: Low Cost Error Recovery from Deep Learning Training Failures] | * 2024-04-22, [https://dl.acm.org/doi/abs/10.1145/3627703.3650085 Just-In-Time Checkpointing: Low Cost Error Recovery from Deep Learning Training Failures] |