FederatedLearningPOC
Trustless federated learning audit trail: verify ML training integrity on-chain
Problem Statement
Federated learning is a technique where multiple parties train a shared AI model on their own private data, sending only model updates (not raw data) to a central server for aggregation. This preserves privacy, but participants must trust the server to honestly aggregate updates and report results. This project creates a trustless, verifiable audit trail for federated learning by recording every training round on 0G Chain with model hashes, client contributions, and decentralized storage references. Model weights live on 0G Storage (too large for on-chain), while metadata stays on-chain for permanent verification. Each training round logs: SHA-256 model hash for integrity verification, 0G Storage root hashes pointing to model artifacts, client participation and contribution scores from actual training metrics, and summary data for LLM-powered querying via 0G Compute. The key innovation is hybrid architecture: on-chain verification provides "what happened, when, and who participated" while off-chain storage handles the heavy data. This enables full transparency without blockchain bloat. The 0G Compute integration allows natural language queries directly against training results stored in decentralized storage (no need to trust a centralized API). This solves real problems in collaborative ML: pharmaceutical companies can verify drug discovery model training without exposing patient data, financial institutions can audit fraud detection models for regulatory compliance, and AI researchers can prove training provenance for reproducibility.
Solution
Architecture: This is a proof-of-concept showing how to add trustless verification to federated learning using Flower framework. Flower is production-ready software that integrates with PyTorch, TensorFlow, and other ML libraries. I used CIFAR-10 for testing, but the architecture works with any training workflow. The stack combines a Solidity smart contract on 0G Chain for verification records, 0G Storage for model artifacts, Python server with Web3.py for coordination, and Node.js with 0G Compute for querying results. Technical Integration: The Python server integrates with Flower's training lifecycle to create a verifiable audit trail. During training across 10 simulated clients, it captures model updates using FedAvg aggregation, uploads checkpoints to 0G Storage, records root hashes on-chain, and logs client contributions with actual metrics. Every training round produces cryptographic proof: SHA-256 hash of the model weights, 0G Storage root hash pointing to the artifact, list of participating clients with their sample counts, and a summary CID for the training metrics. This means anyone can verify what happened - which model was produced, who contributed, and what the results were - without trusting the coordinator. 0G Integration: The system leverages 0G's full stack for end-to-end verifiability. Storage provides content-addressed model hosting where root hashes prove file integrity - you can download the model and verify it matches what was logged on-chain. The smart contract enforces sequential round IDs and stores all metadata immutably, creating a permanent audit trail. 0G Compute enables decentralized querying where you can ask questions about training results without relying on a centralized API - the LLM fetches summary CIDs from the contract, downloads data from storage, and processes queries entirely on 0G infrastructure. The key innovation is separation of concerns: on-chain verification proves what happened (model hash, participants, timestamp), off-chain storage holds the actual data (models, summaries), and decentralized compute provides analysis. This creates a fully verifiable training pipeline where participants can prove their contributions, model consumers can verify provenance, and auditors can reconstruct the entire training history from blockchain records.
Hackathon
ETHGlobal Buenos Aires
2025
Contributors
- ivandda
34 contributions