← Back to home

AVA -AI Voice Agents

Create an agent which talks in any voice you can think of and share the proceeds with voice owner

Problem Statement

AVA is designed to allow anyone to create an AI Voice Agent by retrieving audio from X and assembling an immutable audio dataset that creates a cryptographic record every time a new Text-to-speech request is created by the agent.Our goal is for AVA to become the foundational attribution and data provenance stack for AI generated voices on the agentic web, establishing a flywheel that enables voice owners to be compensated when a user launches an agent which features their voice.Currently AVA can retrieve content from X, analyze the audio and pick out the individual speakers within a clip, harvest the audio data from the clip, generate an NFT collection to represent the audio dataset, suggest a name, symbol and emoji for the dataset, generate a text-to-speech made by the user and suggest a tokenomic model that splits the supply with the voice owner.

Solution

AVA leverages the elizaOS/agent-twitter-client to retrieve media from X, the individual speakers is picked out with Pyannote's speaker diarization 3.1 and once the user has selected the speaker the correct audio is retrieved.The audio data is instantiated to the Base Sepolia testnet through a custom NFT contract written with hardhat using the openzeppelin library and Alchemy's infrastructure. The Text-to-speech request is handled with the Llasa 3b TTS Zero Shot Voice cloning which only requires a short clip of source audio.The agent is based on the Groq SDK, and much of the site design (such as the responsive background krakenEffect) can be attributed to Claude 3.5 Sonnet running via Cursor, which uses three.js and currently runs in React.

Hackathon

Agentic Ethereum

2025

Prizes

  • 🏆

    AgentKit Pool Prize

    Coinbase Developer Platform

  • 🏆

    Create your Agentic Future2nd place

    Nethermind

Contributors