You can clone the repo and use the code as you see fit. Or you can follow the instructions in this article to create your own project. Home - List of features in this code example Users - List of ...
We build a 10K math preference datasets for Step-DPO, which can be downloaded from the following link. We use Qwen2, Qwen1.5, Llama-3, and DeepSeekMath models as the pre-trained weights and fine-tune ...