00127 🤗 加速分布式训练 windows11
前言
本文介绍了如何使用 🤗 加速进行分布式训练。
Hugging Face Github 主页: https://github.com/huggingface
In this tutorial, learn how to customize your native PyTorch training loop to enable training in a distributed environment.
操作系统:Windows 11 家庭中文版
参考文档
Setup
Get started by installing 🤗 Accelerate:
1 | pip install accelerate |
Then import and create an [~accelerate.Accelerator
] object. The [~accelerate.Accelerator
] will automatically detect your type of distributed setup and initialize all the necessary components for training. You don’t need to explicitly place your model on a device.
1 | from accelerate import Accelerator |
Prepare to accelerate
The next step is to pass all the relevant training objects to the [~accelerate.Accelerator.prepare
] method. This includes your training and evaluation DataLoaders, a model and an optimizer:
1 | train_dataloader, eval_dataloader, model, optimizer = accelerator.prepare( |
Backward
The last addition is to replace the typical loss.backward()
in your training loop with 🤗 Accelerate’s [~accelerate.Accelerator.backward
]method:
1 | for epoch in range(num_epochs): |
As you can see in the following code, you only need to add four additional lines of code to your training loop to enable distributed training!
1 | + from accelerate import Accelerator |
Train
Once you’ve added the relevant lines of code, launch your training in a script or a notebook like Colaboratory.
Train with a script
If you are running your training from a script, run the following command to create and save a configuration file:
1 | accelerate config |
Then launch your training with:
1 | accelerate launch train.py |
Train with a notebook
🤗 Accelerate can also run in a notebook if you’re planning on using Colaboratory’s TPUs. Wrap all the code responsible for training in a function, and pass it to [~accelerate.notebook_launcher
]:
1 | from accelerate import notebook_launcher |
For more information about 🤗 Accelerate and its rich features, refer to the documentation.
结语
第一百二十七篇博文写完,开心!!!!
今天,也是充满希望的一天。