Naughty Agents
web-of-trust reviewers blacklisting computer-controlling AI agents
Problem Statement
Naughty Agents is a decentralized, human-in-the-loop (HITL) security protocol designed to mitigate the emerging threat of AI agent hijacking, specifically targeting on-chain financial actions.As AI agents gain autonomy over digital wallets (e.g., Coinbase Server Wallets) to perform tasks like trading, swapping, or paying for services, the risk of manipulation by malicious actors increases. A significant emerging threat vector is the "Malicious Image Patch" (MIP) or adversarial attacks via visual inputs (e.g., an agent viewing a compromised social media feed) (Source: Anthropic: Sleeper Agents). These attacks can hijack the agent's objective function, leading to unauthorized transactions and loss of funds.Our solution creates a robust, on-chain "firewall" that verifies every transaction an agent proposes.The protocol operates on a "Trust but Verify" principle, enforced at the Smart Contract Account (SCA) level. It uses an on-chain registry to instantly block known-malicious transactions (Blacklist). Unknown transactions are automatically reverted by the SCA and escalated to a decentralized network of human reviewers (The Review Oracle).The system is powered by a crypto-economic model. Users pay a subscription fee for protection, which funds rewards for the reviewers. The integrity of the reviewer network is secured by a "Web of Trust" with a delegated slashing mechanism (simplified for MVP), ensuring all participants are financially incentivized to act honestly. Naughty Agents makes on-chain AI safety a public good, secured by the community, for the community.
Solution
Naughty Agents is a decentralized security protocol that acts as an on-chain firewall to prevent hijacked AI agents from draining user funds. We built a full-stack solution with on-chain enforcement at the Smart Contract Account (SCA) level.Core Architecture & TechOur system combines a React frontend, Solidity smart contracts, and a Python agent simulator.Frontend:We bootstrapped aReact/Viteapp using@coinbase/create-cdp-app. User interactions are powered byViemvia the pre-configured CDP hooks, connecting directly to our smart contracts.On-Chain Logic (Solidity & Hardhat 3):The protocol's core is built on-chain.Enforcement Layer:A user'sUserSCA(Smart Contract Account) has a mandatorySecurityModulehook. This module intercepts every transaction, reverting malicious or unknown ones before they can execute.Protocol Contracts:AWebOfTrustmanages reviewer staking, anActionRegistrystores the on-chain blacklist, and aReviewOraclequeues unknown transactions for human review.Agent Simulator:APythonscript usingweb3.pysimulates a hijacked agent proposing a malicious transaction, allowing us to demonstrate the protocol's real-time defense.Leveraging Partner TechnologiesWe built our project on the Coinbase CDP stack, which was crucial for rapid development.Coinbase CDP Stack:Embedded Walletsprovided seamless email-based onboarding and served as the designatedOperatorkey for the user's SCA. TheCDP Hooksand integratedViemclient drastically simplified frontend development.Hardhat 3:This was the backbone for our smart contract development. We used itsViem integrationfor robust, type-safe testing andHardhat Ignitionfor streamlined and repeatable deployments.Base:Our protocol is designed for a low-cost L2 like Base, as the on-chain security checks on every transaction would be too expensive on L1.Notable Hackathon HacksTo deliver a working MVP, we made two key simplifications:Simplified Slashing:We implemented an immediate, user-triggered slashing function in theWebOfTrustcontract. This demonstrates the economic incentive model (skin in the game) without building a complex, multi-stage arbitration system.Deterministic Action Hashing:We created a unique "fingerprint" for any transaction by calculatingkeccak256(abi.encode(dest, value, data)). This simple but effective method allowed ourActionRegistryto easily identify and block known-malicious actions.
Hackathon
ETHGlobal New York 2025
2025
Contributors
- kirilligum
11 contributions