LLMBench

Verifiable on-chain Large Language Models drift benchmarking

GitHub Live Demo

Screenshots

Problem Statement

This project was inspired by the following paper: https://arxiv.org/abs/2307.09009It detailed how ChatGPT's behavior and performance have been changing over time. That makes it difficult for companies to integrate the models into their pipeline, considering the unpredictability of these changes. The researchers thus developed a set of benchmarks that they ran on two snapshots of OpenAI's models. In this project, this benchmarking process is made recurrent and the results are stored on-chain for immutability and transparency purposes.

Solution

The project is split into three modules:Front-end: one-page application made with Vue.js 3 with GPT-3.5 Turbo and GPT-4 benchmarksSmart Contract: benchmark storing contract deployed on GnosisLLMDrift Scripts: scripts meant for bacalhau, running the LLMDrift benchmarks on the gpt-3.5-turbo and gpt-4 current models, and writing the result on the Gnosis chain. These scripts were based on the "lchen001/LLMDrift" repo, developed by the researchers of the aforementioned paper.

Hackathon

ETHGlobal Paris

2024

Contributors

codethazine
46 contributions