Trainer Node Guide
How to train the model defined by the Task Creator/Admin
Last updated
Was this helpful?
How to train the model defined by the Task Creator/Admin
Last updated
Was this helpful?
This guide provides step-by-step instructions to SoraEngine's trainer node and automate the training process. By the end, you will have successfully trained a hugging-face model in privacy-preserving fashion, contributed to a global model, and is ready for inferencing.
Make sure you have completed pre-requite tasks.
Connect your Metamask wallet and log in to the dashboard.
you will see API key for the authenticating user. You will use those API keys as arguments to the Client automation module.
At the moment we have disabled api key usage for testing purposes.you can directly proceed to the next step.
Before proceeding, make sure you have cloned the client repository described in the "AI Layer Repo" section.
There are some scripts pre-defined for preprocessing datasets for training different models.
In the dev environment, we are using micro version of nano mistral which is a text-completion model. Use preprocess_nanoArticles.py to process first 1000 lines of dataset in the required format.
The dataset that is being processed is from HuggingFace.
We have developed a streamlined automation script that handles all necessary configurations and initiates the trainer node, enabling it to connect with the aggregator and request tasks. The script requires the following parameters as arguments:
Client ID
Model name or path
Data path (location of the dataset)
Workspace directory (where configuration files will be retrieved)
Training mode (defined by the task creator) – SoraEngine supports standard SFT training, as well as efficient LoRA PEFT training and quantization
TrainingServer ( defines the aggregator node endpoint to connect to. )
SoraAccess keys (for authentication)
SoraBucketName (specifies the directory from which the client’s configuration files will be fetched)
This automation simplifies the process of setting up and running trainer nodes within the SoraEngine ecosystem. 🚀
In the dev/test environment, We can enable automation without defining access keys. By default, we are storing our configuration files in workspace/SoraWorkspace directory.