Accelerate documentation
Start Here!
Getting started
Tutorials
OverviewAdd Accelerate to your codeExecution processTPU trainingLaunching Accelerate scriptsLaunching distributed training from Jupyter Notebooks
How to guides
Accelerate
Start Here!Model memory estimatorModel quantizationExperiment trackersProfilerCheckpointingTroubleshootExample Zoo
Training
Gradient accumulationLocal SGDLow precision (FP8) trainingDeepSpeedUsing multiple models with DeepSpeedDDP Communication HooksFully Sharded Data ParallelMegatron-LMAmazon SageMakerApple M1 GPUsIntel CPUIntel GaudiCompilation
Inference
Concepts and fundamentals
Accelerate's internal mechanismLoading big models into memoryComparing performance across distributed setupsExecuting and deferring jobsGradient synchronizationFSDP vs DeepSpeedFSDP1 vs FSDP2Context parallelismSequence parallelismLow precision training methodsTraining on TPUs
Reference
Start Here!
Please use the interactive tool below to help you get started with learning about a particular feature of Accelerate and how to utilize it! It will provide you with a code diff, an explanation towards what is going on, as well as provide you with some useful links to explore more within the documentation!
Most code examples start from the following python code before integrating Accelerate in some way:
for batch in dataloader:
optimizer.zero_grad()
inputs, targets = batch
inputs = inputs.to(device)
targets = targets.to(device)
outputs = model(inputs)
loss = loss_function(outputs, targets)
loss.backward()
optimizer.step()
scheduler.step()