Architecture
Last updated
Last updated
SoraChain AI is built with CMU (Collaborative Model Update) framework.
It is a framework for hosting and training publicly available machine learning models on the blockchain.
The goal is to democratize AI by allowing anyone to contribute data to improve the models, and for anyone to use the models for free to get predictions.
The architecture is structured into three distinct layers—each responsible for specific functions. These layers are organized within subnets that work together to enable federated learning with a blockchain-based decentralized infrastructure. Each subnet contains three layers: the SORA Client, SORA AGM Layer, and Blockchain Layer. Here’s a detailed breakdown:
The SORA Client layer is where the federated learning process takes place at the edge, on the user’s device or machine. This layer is responsible for:
Data Locality: Keeps the data on the device (privacy-preserving), preventing it from being sent to a centralized server.
Model Updates: Instead of sharing data, this layer computes the model update locally by training on the edge device’s dataset.
Communication with Compute Server: Once the local model is updated, the Compute Client sends these updates to the Compute Server for aggregation and processing.
Security & Privacy: The client layer is designed to ensure that no raw data leaves the user’s device. It leverages secure aggregation techniques to protect sensitive data.
This layer acts as the intermediary between all the Compute Clients and the Blockchain Layer. It is responsible for:
Model Aggregation: The server collects local model updates from the Compute Clients, aggregates them (e.g., by averaging updates), and creates a global model update.
Communication Management: The Compute Server coordinates communication between multiple clients within the subnet, ensuring synchronized model updates and preventing conflicts.
Scaling: This layer handles the scaling of computation across multiple nodes and ensures efficient processing even as the number of clients grows.
Resource Management: It dynamically allocates compute resources to handle varying workloads based on client activity and model complexity.
The Blockchain Layer is responsible for recording the metadata and state of the framework, ensuring transparency and traceability in the federated learning process. This layer:
Storing Model Updates: Each global model update generated by the Compute Server is logged on the blockchain, creating an immutable record of the training process.
Metadata Management: Stores metadata related to model versions, training participants (clients), and the specific datasets involved (without revealing the data itself).
Incentive Mechanism: It manages the token-based incentive system, rewarding clients for contributing high-quality updates to the global model.
Decentralized Governance: This layer is designed to provide trust, transparency, and decentralized control over the federated learning process, ensuring that no single entity controls the training process.
State Tracking: Tracks the current state of model training and ensures consensus on which model version is the latest and most accurate.
Within the project’s framework, there are multiple subnets, each containing the SORA Client, SORA AGM Layer, and Blockchain Layer. Each subnet operates as an independent unit, working together to create a decentralized, scalable network of federated learning nodes. Here’s how they work:
Subnet A: Focuses on data from specific industries, e.g., healthcare, with clients on various devices such as hospital systems or medical IoT devices. The Compute Server aggregates the model updates from these clients, and the Blockchain Layer stores the model’s history and ensures privacy.
Subnet B: Specializes in financial data, where edge devices such as mobile banking apps or stock market tracking systems send updates. The Compute Server ensures the models are correctly aggregated, and the Blockchain Layer records the transaction metadata.
Subnet C: Targets consumer IoT devices, where edge devices such as smart home systems contribute to the federated learning process. The Compute Server in this subnet manages a large volume of diverse, less powerful clients, while the Blockchain Layer tracks all model iterations.
Each subnet functions independently but contributes to the overall learning process, creating a decentralized, privacy-preserving AI model that leverages blockchain for transparency and federated learning for data security.
The framework aims to use efficient machine learning models like Perceptron, Naive Bayes, and Nearest Centroid Classifier to keep computational costs low. More complex models can also be integrated using off-chain computation and APIs.
The code on GitHub showcases the framework and explore different incentive mechanisms for encouraging good data contributions.
In summary, SoraChain AI's project is a framework to host and collaboratively improve machine learning models in a decentralized, transparent, and accessible way using blockchain technology.