gh0st.market
A privacy-first job network for verifiable web data collection
Problem Statement
The Problem We're SolvingThe modern data economy runs on web data that lives behind logins, paywalls, and account gates—think Crunchbase company profiles, LinkedIn insights, SaaS analytics dashboards, or proprietary B2B databases. This data is incredibly valuable for investors, growth teams, AI training pipelines, and market intelligence platforms.But accessing it programmatically is a nightmare:Scraping is broken: Proxy networks get blocked, headless browsers get fingerprinted, CAPTCHAs multiply, and sites actively fight automation. You spend more time maintaining infrastructure than building products.Trust is zero: When you pay a data vendor, you have no idea if the data is real, fabricated, or stale. There's no cryptographic proof that data actually came from the source claimed.Compliance is risky: Fake accounts, credential sharing, and ToS violations create legal exposure. Enterprises can't touch these solutions.AI agents are stuck: The next generation of autonomous AI agents need real-time web data, but they can't authenticate into services or prove their outputs are genuine.Our Solution: A Decentralized Data Access Layer with Cryptographic Proofsgh0st.market creates a two-sided marketplace where data requesters connect with authorized operators (humans or AI agents) who already have legitimate access to target platforms. The magic? Every data delivery is accompanied by a zk-TLS proof—cryptographic evidence that the data genuinely came from the claimed source, without revealing credentials or session details.How It WorksRequesters Define Jobs A requester creates a Job Spec—a reusable template that defines:Target domain (e.g., crunchbase.com)URL pattern with placeholders (e.g., https://crunchbase.com/organization/{{slug}})Data extraction instructions for AI agentsExpected output schemaValidation rulesThen they create individual Jobs referencing that spec, with concrete inputs (e.g., {slug: "anthropic"}) and an escrowed bounty in ETH or any ERC-20 token. The bounty is locked in the smart contract until work is verified.Workers Approve & Execute Workers—who already have subscriptions, accounts, or access to target platforms—browse available job specs and approve the ones they can fulfill. They set minimum bounty thresholds so they only see jobs worth their time.The gh0st browser extension acts as their AI-powered work environment:Monitors the blockchain for new jobs matching approved specsOpens a dedicated worker tab and navigates to target URLsCollects the requested data fieldsGenerates a zk-TLS proof via vlayer that cryptographically attests the HTTPS session really hit that domain and the response matches what's being submittedSubmits the result + proof to the smart contractTrustless Settlement The JobRegistry smart contract:Verifies the zk-TLS proof against the spec's target domainConfirms the proof is validAutomatically releases the escrowed bounty to the workerRecords the result payload on-chainNo middleman. No disputes. No trust required.Technical ArchitectureSmart Contracts (Solidity + Foundry)JobRegistry.sol: Core protocol contract managing job specs, jobs, escrow, and proof-verified payoutsProofVerifier.sol: Interface for vlayer zk-TLS verification (mock for hackathon, production-ready interface)Supports multiple tokens: Native ETH and any ERC-20 (USDC, WBTC, etc.)Gas-efficient batch queries: getJobSpecsRange() and getJobsRange() for frontend paginationEvent-driven architecture: All state changes emit events for efficient indexingWeb Application (Next.js 16 + React 19)Dynamic Labs integration: Seamless wallet connection with email/social fallbacksDual-role UX: Single app serves both requesters and workers with role toggleReal-time blockchain state: wagmi v3 + TanStack Query for reactive contract readsRequestor Dashboard: Create specs, post jobs, track completion statusWorker Dashboard: Browse specs, approve with min-bounty filters, monitor active tasksBrowser Extension (Plasmo + TypeScript)Worker Engine: Sophisticated state machine managing job queue, auto-mode, and parallel executionLocal Database: Drizzle ORM with SQLite for followed specs, active jobs, and earnings historyWeb ↔ Extension Protocol: Typed message passing (GH0ST_*) for seamless integrationvlayer Client: Abstracted zk-TLS proof generation with mock fallback for developmentPrivacy-first: Credentials never leave the worker's browser; only proofs are sharedzk-TLS Integration (vlayer)The cryptographic backbone that makes this trustless:Proves a TLS session occurred with a specific domainAttests that response data matches what's being claimedZero-knowledge: verifier learns nothing about credentials, cookies, or session tokensOn-chain verifiable: smart contracts can validate proofs directlyWhy This MattersFor AI Agent Buildersgh0st.market is infrastructure for the agentic web. AI agents need real-time data from authenticated sources, but they can't hold credentials safely or prove their outputs are genuine. With gh0st, agents can:Request verified data from any web sourceTrust results without trusting the providerPay programmatically via smart contractsFor Data TeamsStop maintaining brittle scraping infrastructure. Instead:Post jobs describing what you needGet verified results with cryptographic provenancePay only for successful, proven deliveriesFor Workers & OperatorsMonetize access you already have:Use subscriptions you're paying for anywaySet your own prices and filtersWork anonymously—your identity is never revealedLet AI agents work for you in auto-modeWhat Makes This a Strong Hackathon ProjectFull-Stack Implementation This isn't a mockup—it's a working system spanning smart contracts, a production-quality web app, and a browser extension with real state management.Novel Architecture Combining zk-TLS proofs with a job marketplace is genuinely new. We're not just "blockchain + scraping"—we're creating verifiable data infrastructure.Real Market Need The web data market is $5B+ and growing. Every AI company, hedge fund, and growth team struggles with this problem. We're building picks and shovels for the AI gold rush.Privacy-First Design Both sides stay pseudonymous. Requesters don't reveal what data they're collecting at scale; workers don't reveal their credentials. Only proofs and payments hit the chain.Extensible Foundation The Job Spec system is a protocol primitive. Anyone can create specs for any website. The ecosystem grows organically as workers approve new domains.AI-Native Built from day one for AI agents to participate—as requesters posting jobs or as workers executing them autonomously.The Visiongh0st.market is the HTTP of authenticated web data. Just as APIs standardized how services talk to each other, we're standardizing how AI agents and applications access human-permissioned web data with cryptographic trust.Imagine:AI assistants that can fetch your real portfolio data and prove it's accurateMarket intelligence platforms with verifiable, real-time competitor dataResearch tools that can cite sources with cryptographic provenanceAutonomous agents that earn revenue by monetizing access their operators already haveWe're not building a scraping tool. We're building the trust layer for the web data economy.
Solution
How We Built gh0st.market: The Technical Deep DiveArchitecture Overviewgh0st.market is a three-part system that had to work together seamlessly: smart contracts handling escrow and verification, a web application for both requesters and workers, and a browser extension that actually executes jobs and generates proofs. Getting these pieces to communicate reliably was the core engineering challenge.Smart Contracts: Foundry + SolidityWe chose Foundry over Hardhat for its speed and native Solidity testing. The contract architecture centers on two key abstractions:The JobSpec / Job PatternRather than having requesters define everything per-job, we separated templates (JobSpecs) from instances (Jobs):struct JobSpec { string mainDomain; // "crunchbase.com" string notarizeUrl; // "https://crunchbase.com/organization/{{orgSlug}}" string promptInstructions; // AI extraction instructions string outputSchema; // Expected JSON schema string inputSchema; // Placeholder types address creator; bool active; }struct Job { uint256 specId; // Reference to template string inputs; // Concrete values: {"orgSlug": "anthropic"} address token; // ETH (address(0)) or ERC-20 uint256 bounty; JobStatus status; string resultPayload; address worker; }This means anyone can create a reusable spec for a domain, and the ecosystem benefits from shared templates. Workers approve specs once, then see all matching jobs automatically.Multi-Token EscrowWe wanted to support both native ETH and stablecoins (USDC) from day one:function createJob(CreateJobParams calldata params) external payable { if (params.token == address(0)) { // Native ETH - must match msg.value if (msg.value != params.bounty) revert InvalidBounty(); } else { // ERC-20 - pull tokens via transferFrom if (msg.value != 0) revert TokenMismatch(); IERC20(params.token).safeTransferFrom(msg.sender, address(this), params.bounty); } // ... create job }On payout, the same logic reverses—ETH via call{value} or ERC-20 via safeTransfer. We use OpenZeppelin's SafeERC20 and ReentrancyGuard to prevent the obvious attack vectors.The Proof Verifier InterfaceThe contract calls an external IProofVerifier to validate zk-TLS proofs:interface IProofVerifier { function verifyProof( bytes calldata proof, string calldata targetDomain ) external view returns (bool valid); }Hacky but necessary: For the hackathon, our ProofVerifier.sol is a mock that always returns true. The interface is production-ready for vlayer integration—we just swap the implementation address. This let us build the full flow without blocking on proof generation complexity.Batch Query OptimizationFrontends need to display lists of specs and jobs. Rather than N+1 RPC calls, we added range queries:function getJobSpecsRange(uint256 from, uint256 to) external view returns (JobSpec[] memory specs) { uint256 length = to - from; specs = new JobSpec; for (uint256 i = 0; i < length; i++) { specs[i] = _specs[from + i]; } }Two RPC calls (get count, then get range) instead of potentially hundreds.Web Application: Next.js 16 + React 19 + wagmi v3Why This StackNext.js 16 with App Router for file-based routing and React Server ComponentsReact 19 for the latest concurrent featureswagmi v3 + viem for type-safe contract interactions (wagmi v3 just released—we're on the bleeding edge)TanStack Query for caching and background refetchingDynamic Labs for wallet connection with social login fallbackswagmi CLI for Type GenerationThis was a huge DX win. We use wagmi generate to produce fully-typed hooks from our contract ABIs:// wagmi.config.ts export default defineConfig({ out: 'src/generated.ts', contracts: [ { name: 'JobRegistry', abi: jobRegistryAbi, }, ], });Now useReadContract and useWriteContract have full TypeScript inference for function names, argument types, and return types. No more ABI typos at runtime.Event-Based Data FetchingHere's where it gets interesting. We needed to show "all jobs created by this user" but the contract only stores jobs by ID, not by creator. Solution: query events.export function useUserJobs(userAddress:0x${string}| undefined) { const publicClient = usePublicClient();return useQuery({ queryKey: ["userJobs", userAddress], queryFn: async () => { // Get all JobCreated events filtered by requester const logs = await publicClient.getLogs({ address: JOB_REGISTRY_ADDRESS, event: parseAbiItem( "event JobCreated(uint256 indexed jobId, uint256 indexed specId, address indexed requester, address token, uint256 bounty)" ), args: { requester: userAddress }, fromBlock: DEPLOYMENT_BLOCK, // Skip genesis blocks toBlock: "latest", }); // Fetch full job details for each return Promise.all(logs.map(async (log) => { const job = await publicClient.readContract({ address: JOB_REGISTRY_ADDRESS, abi: jobRegistryAbi, functionName: "getJob", args: [log.args.jobId], }); return { ...job, id: log.args.jobId }; })); }, });}We store DEPLOYMENT_BLOCK in a generated config file so we don't scan from block 0 on Sepolia (which would timeout).Dynamic Labs IntegrationDynamic gives us wallet connection with a much better UX than raw RainbowKit:export function Web3Provider({ children }: { children: React.ReactNode }) { return ( <DynamicContextProvider settings={{ environmentId: process.env.NEXT_PUBLIC_DYNAMIC_ENV_ID!, walletConnectors: [EthereumWalletConnectors], }} > <DynamicWagmiConnector> <WagmiProvider config={wagmiConfig}> <QueryClientProvider client={queryClient}> {children} </QueryClientProvider> </WagmiProvider> </DynamicWagmiConnector> </DynamicContextProvider> ); }Users can connect with MetaMask, WalletConnect, or even email—Dynamic handles the embedded wallet creation. This dramatically lowers the barrier for non-crypto-native users.Dual-Role ArchitectureThe same app serves requesters and workers. We handle this with a role toggle that preserves navigation context:function getEquivalentPath(currentPath: string, newRole: Role): string { // /requestor/jobSpecs/123/jobs → /worker/jobSpecs/123/jobs const segments = currentPath.split('/'); segments[1] = newRole; return segments.join('/'); }Both roles see different sidebars and slightly different UIs, but share components like JobSpecCard and DashboardLayout.Browser Extension: Plasmo + Worker Engine ArchitectureThis is where most of the complexity lives. The extension needs to:Store worker preferences (approved specs, min bounties)Listen for new jobs on-chainExecute jobs in a controlled browser tabGenerate zk-TLS proofsSubmit results back to the contractCommunicate status to the web app in real-timeWhy PlasmoPlasmo is a framework for building browser extensions with React. It handles:Manifest generationHot reload during developmentTypeScript compilationContent script injectionBackground service worker bundlingWe can write the popup as a React component and Plasmo handles the Chrome extension boilerplate.Local Database with Drizzle ORMWorkers need persistent state that survives browser restarts. We use Drizzle ORM with SQLite (via sql.js compiled to WASM):// db/schema.ts export const followedSpecs = sqliteTable("followed_specs", { id: integer("id").primaryKey({ autoIncrement: true }), specId: integer("spec_id").notNull(), walletAddress: text("wallet_address").notNull(), mainDomain: text("main_domain").notNull(), minBounty: real("min_bounty").default(0), autoClaim: integer("auto_claim", { mode: "boolean" }).default(false), });export const activeJobs = sqliteTable("active_jobs", { jobId: text("job_id").notNull().unique(), status: text("status", { enum: ["pending", "navigating", "collecting", "generating_proof", "submitting", "completed", "failed"], }).notNull().default("pending"), progress: integer("progress").default(0), // ... });This gives us type-safe queries and migrations without a server.The Worker Engine State MachineThe core of the extension is workerEngine.ts—a state machine that orchestrates everything:export interface WorkerEngine { start(): void; // Begin listening for jobs stop(): void; // Pause everything openWorkerTab(): Promise<number>; // Create dedicated execution tab setApprovedSpecs(specIds: Set<number>, minBountyBySpec: Map<number, number>): void; setAutoMode(enabled: boolean); // Auto-process queue processNextJob(): Promise<JobResult | null>; // Manual trigger getStatus(): WorkerStatus;// Event subscriptions onStatusChange(cb: (status: WorkerStatus) => void): () => void; onProgress(cb: (progress: JobProgress) => void): () => void; onJobComplete(cb: (result: JobResult) => void): () => void;}The engine coordinates three sub-modules:JobListener: Polls the blockchain for new jobs matching approved specsJobQueue: Priority queue of jobs waiting to be processedQueueProcessor: Actually executes jobs in the worker tabThe Worker Tab Pattern: Jobs execute in a dedicated browser tab (/worker/runner), not in a headless context. This is intentional—it uses the worker's real browser profile with real cookies and sessions. The extension controls this tab via chrome.tabs APIs, navigating it to target URLs and extracting data.Web ↔ Extension CommunicationThe web app needs to know if the extension is installed, query worker preferences, and receive job progress updates. We built a typed message protocol:// Message types with GH0ST_ prefix for namespacing export type WebToExtensionMessage = | { type: "GH0ST_PING" } | { type: "GH0ST_START_JOB"; payload: StartJobPayload } | { type: "GH0ST_QUERY"; payload: QueryPayload } | { type: "GH0ST_FOLLOW_SPEC"; payload: FollowSpecPayload };export type ExtensionToWebMessage = | { type: "GH0ST_PONG"; payload: { version: string } } | { type: "GH0ST_JOB_PROGRESS"; payload: JobProgressPayload } | { type: "GH0ST_JOB_COMPLETED"; payload: JobCompletedPayload };Communication flows through a content script injected into the web app:Web App → window.postMessage → Content Script → chrome.runtime.sendMessage → Background Script Background Script → chrome.tabs.sendMessage → Content Script → window.postMessage → Web AppHacky detail: We track which tabs have the web app open (connectedTabs Set) so we can broadcast progress updates to all of them. When a tab closes, we clean it up via chrome.tabs.onRemoved.vlayer Client AbstractionFor zk-TLS proof generation, we abstracted behind an interface:export interface IVlayerClient { generateProof(request: ProofRequest): Promise<ProofResult>; verifyProof(proof: string, domain: string): Promise<boolean>; }export function createVlayerClient(): IVlayerClient { const useMock = process.env.PLASMO_PUBLIC_USE_MOCK === 'true';if (useMock) { return new MockVlayerClient(); // Returns fake proofs instantly } return new VlayerClient({ clientId, secret }); // Real vlayer integration}The mock client lets us develop the full flow without vlayer credentials. In production, we swap to the real client—same interface, real proofs.Notable Hacks & Clever SolutionsGenerated Contract ConfigDeployment addresses change between local Anvil and Sepolia. We generate a config file at deploy time:// Generated by deploy script export const JOB_REGISTRY_ADDRESS = "0x5FbDB2315678afecb367f032d93F642f64180aa3"; export const DEPLOYMENT_BLOCK = 12345678n; export const CHAIN_ID = 11155111;The web app imports this, so switching networks is just a redeploy + regenerate.Mock Extension Mode for DevelopmentTesting extension features without building/installing the extension constantly:// In useExtensionStatus hook if (localStorage.getItem("gh0st_extension_mock") === "true") { return { connected: true, version: "dev", activeTask: { jobId: "0x...", status: "collecting", progress: 45 }, }; }Set a localStorage flag and the web app pretends the extension is connected with an active task.Optimistic UI with Event RefetchWhen a user creates a job spec, we don't wait for blockchain confirmation to update the UI. We show a toast, then refetch via events once confirmed:useEffect(() => { if (isSpecCreated) { refetchSpecs(); // Re-query events to get the new spec setIsCreateSpecModalOpen(false); } }, [isSpecCreated, refetchSpecs]);Extension Setup FlowThe popup has two modes: setup (first run) and operational. We persist config in chrome.storage.local:export async function saveConfig(config: ExtensionConfig): Promise<void> { await chrome.storage.local.set({ gh0st_config: config }); }export async function hasConfig(): Promise<boolean> { const result = await chrome.storage.local.get("gh0st_config"); return !!result.gh0st_config; }On first open, workers enter their RPC URL, contract address, and a private key for signing submissions. The key never leaves local storage.Job Listener with Polling + DeduplicationThe extension polls for new jobs but needs to avoid re-queueing jobs it's already seen:// In jobListener.ts const seenJobIds = new Set<string>();function onNewJob(job: Job) { const jobKey = job.id.toString(); if (seenJobIds.has(jobKey)) return; seenJobIds.add(jobKey);// Check against approved specs and min bounty if (!approvedSpecIds.has(Number(job.specId))) return; const minBounty = minBountyBySpec.get(Number(job.specId)) || 0; if (parseFloat(formatEther(job.bounty)) < minBounty) return; onJobFound(job);}Partner Technologies| Technology | How It Helped | |--------------|---------------------------------------------------------------------------------------------------| | Dynamic Labs | Wallet connection with email/social fallback—critical for onboarding non-crypto users | | vlayer | zk-TLS proof infrastructure—the cryptographic core that makes trustless verification possible | | Foundry | Fast Solidity compilation and testing; forge test runs our full suite in ~2 seconds | | Plasmo | Made browser extension development feel like building a React app instead of fighting Chrome APIs | | wagmi v3 | Type-safe contract hooks with TanStack Query integration—caught multiple bugs at compile time |What We'd Do DifferentlyUse an indexer: Event-based queries work but get slow as job count grows. The Graph or Ponder would scale better.WebSocket subscriptions: Polling for new jobs works but adds latency. A WebSocket connection to an RPC with eth_subscribe would be real-time.Multi-chain from day one: We hardcoded Sepolia. Abstracting chain config earlier would make multi-chain deployment trivial.Lines of CodeContracts: ~600 lines of Solidity + ~600 lines of testsWeb App: ~4,000 lines of TypeScript/ReactExtension: ~2,500 lines of TypeScriptAll written in 48 hours. We're proud of how complete the system is—not just a demo, but a working protocol with real escrow, real payments, and a real browser automation pipeline.
Hackathon
ETHGlobal Buenos Aires
2025
Contributors
- wu-s-john
8 contributions