Doppelganger

Fighting AI slop with better data. Tokenized Memory Layer for AI Agents

Problem Statement

Project Description CONCEPT The core bottleneck for modern AI Agents is not the model itself, but the context it consumes. While Large Language Models (LLMs) are powerful, they suffer from hallucination and lack real-time access to specific, high-fidelity knowledge. Current solutions like web scraping are messy, unstructured, and often legally gray, while simple prompting fails to capture deep domain expertise.Doppelganger AI solves this by creating a pipeline that transforms raw data sources into structured "Context Streams." We allow users to convert any social channel or data feed (e.g., a Twitter profile, a Telegram group, a Substack) into a highly optimized Vector Store.Vectorized Signal: We don't just store text; we process it into semantic vector embeddings. This turns noisy social feeds into a structured database that AI Agents can programmatically query for accurate, retrieval-augmented generation (RAG).Proof of Human Context: In an era of AI-generated spam, we use Worldcoin to verify the creator's humanity. This certifies that the data stream is organic, high-value human thought—a premium asset for training and inference.Programmable Access: By minting these vector stores as NFTs on Filecoin on chain cloud, we replace API subscriptions with on-chain ownership. Users "lease" knowledge to agents by holding the NFT, creating a direct economy between data creators and AI developers.This is a new category of infrastructure: a decentralized "Context Delivery Network" where anyone can mint and monetize a live stream of vectorized human knowledge.PIPELINE MECHANICS Doppelganger AI is a data processing engine designed to turn raw noise into high-value signal for AI Agents. The workflow is automated, moving from raw ingestion to immutable, token-gated access.Rules & Architecture Input Flexibility: Converts dynamic streams (Twitter, RSS, Chat logs) into static, queryable vector store. Currently twitter is live as MVP datasource.Processing Engine: Automated chunking and embedding generation using industry-standard models.Storage Layer: Heavy vector data is pinned on Filecoin onchain cloud for verifiable, persistent storage.Access Control: Token-Gated via NFT. Only wallets holding the specific "Data Key" NFT can query the vector store API.Asset Type: The "Context Stream" Instead of static files, users create live "Context Streams." Each stream represents a specific domain of knowledge.Raw Source: The user connects a live data source (e.g., "My Twitter Feed on Crypto Macroeconomics").Vectorization: The pipeline cleans this text, removes noise, and converts it into vector embeddings (mathematical representations of meaning).The Product: The output is not a JPEG, but a Vector Store Endpoint. AI Agents can plug into this endpoint to "think" with the creator's specific knowledge base.Asset Type: The Access NFT These NFTs function as the license keys for the data.Minting: Performed instantly via Privy embedded wallets inside the app.Utility: Grants the bearer the cryptographic right to query the vector dataset.Dynamic Metadata: The NFT metadata points to the immutable hash (CID) of the vector store on Filecoin.MINTING AND MONETIZATION One of the main features of Doppelganger AI is the seamless bridge between Web2 social data and Web3 ownership. We have abstracted away the complex data engineering required to build vector databases, making it accessible to any user.The "Mini-App" Experience We integrated our solution directly into a Worldcoin Mini-app to maximize accessibility and trust.Verify: The user logs in with World ID. This attaches a "Verified Human" badge to their data stream, increasing its value to AI developers who need clean data.Connect: The user links a data source (e.g., their Twitter handle).Process: Our backend ingests the history, generates embeddings, and indexes them.Using Privy's embedded wallets we created web2 user experience, where even no-crypto native people can mint NFTs and monetize data.The Knowledge Marketplace Once minted, these Data NFTs create a liquid market for AI context.For Users: They turn their niche expertise and social footprint into a rent-generating asset.For Developers/Agents: Instead of scraping messy websites, they buy a "Context Stream" NFT. This gives their Agent immediate, API-level access to structured, verified data (e.g., "The last 6 months of Alpha governance discussions," cleaned and vectorized).HOW IT'S MADE DESIGN The interface is designed as a "No-Code Data Factory." We stripped away the complexity of vector databases (Pinecone, Milvus, etc.) and presented it as a simple "Connect & Mint" flow. The UI is mobile-first, optimized for the Worldcoin App ecosystem.AI & BACKEND The core logic is a high-throughput data pipeline:Ingestion Engine: Python-based workers that hook into social APIs to fetch live data streams.Embedding Pipeline: We use advanced embedding models (e.g., OpenAI text-embedding-3 or localized models) to convert text chunks into vectors.RAG Interface: The system exposes a standardized API that accepts a natural language query and returns the most relevant context chunks from the vector store.WEB3 & BLOCKCHAIN The following tools were used to build the trust and ownership layer:Worldcoin (World ID): Used for Sybil-resistance and Data Quality. We ensure that the "Knowledge Asset" originates from a real human, filtering out bot-generated noise which is toxic for AI training.Privy: We used Privy to enable Embedded Wallets. This allows non-crypto natives to mint NFTs using social logins, removing the friction of seed phrases while keeping the experience self-custodial.Filecoin & IPFS:Decentralized Storage: The actual vector indices (which can be large) are stored on Filecoin.Compute-over-Data: We utilize the FVM (Filecoin Virtual Machine) to deploy the NFT smart contracts. These contracts map the NFT ID to the specific Data CID, ensuring that ownership of the token mathematically guarantees access to the data.

Solution

HOW IT'S MADE The following tools were used to build the trust and ownership layer:Worldcoin (World ID): Used for Sybil-resistance and Data Quality. We ensure that the "Knowledge Asset" originates from a real human, filtering out bot-generated noise which is toxic for AI training.Privy: We used Privy to enable Embedded Wallets. This allows non-crypto natives to mint NFTs using social logins, removing the friction of seed phrases while keeping the experience self-custodial.Filecoin & IPFS:Decentralized Storage: The actual vector indices (which can be large) are stored on Filecoin onchain cloud.Compute-over-Data: We utilize the FVM (Filecoin Virtual Machine) to deploy the NFT smart contracts. These contracts map the NFT ID to the specific Data CID, ensuring that ownership of the token mathematically guarantees access to the data.

Hackathon

ETHGlobal Buenos Aires

2025

Prizes

🏆
World Pool Prize
World

Contributors

andrewkrynin
1 contributions