Dual-Agent Autonomous Trading System
Technical Whitepaper v1.0 // January 2026

1. Abstract

POCKET is a dual-agent autonomous trading system that combines reinforcement learning with large language model research to trade prediction markets on Polymarket. The system operates two distinct agents: an RL Agent for high-frequency 15-minute crypto markets, and an Opus Agent for longer-term event-driven markets requiring real-world research.

Key Innovation

Cross-market state fusion: exploiting information lag between fast markets (Binance futures) and slow markets (Polymarket) through real-time multi-source data fusion.

~$50K
Training PnL
34,730
Training Trades
2,500%
Training ROI

2. System Architecture

2.1 Dual-Agent Design

The system runs two independent agents that complement each other:

Component RL Agent Opus Agent
Market Type 15-min crypto binary markets All Polymarket events
Time Horizon 15 minutes Hours to days
Decision Engine PPO Neural Network Claude AI (Anthropic)
Data Sources Binance + Polymarket orderbook Web search + market data
Trade Frequency Multiple per hour Every 30 minutes scan

2.2 Infrastructure

┌─────────────────────────────────────────────────────────────────────┐ │ POCKET SYSTEM │ ├─────────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │ │ │ BINANCE │ │ POLYMARKET │ │ WEB SOURCES │ │ │ │ FUTURES │ │ CLOB │ │ (News, Research) │ │ │ └──────┬───────┘ └──────┬───────┘ └──────────┬───────────┘ │ │ │ │ │ │ │ └─────────┬─────────┴────────────────────────┘ │ │ │ │ │ ┌─────────▼─────────┐ │ │ │ DATA FUSION │ │ │ │ 18-dim state │ │ │ └─────────┬─────────┘ │ │ │ │ │ ┌──────────────┴──────────────┐ │ │ │ │ │ │ ▼ ▼ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ RL AGENT │ │ OPUS AGENT │ │ │ │ (PyTorch) │ │ (Claude) │ │ │ │ │ │ │ │ │ │ LSTM+Attn │ │ Research + │ │ │ │ PPO v3.5 │ │ Reasoning │ │ │ └──────┬───────┘ └──────┬───────┘ │ │ │ │ │ │ └─────────┬───────────────┘ │ │ │ │ │ ┌────────▼────────┐ │ │ │ EXECUTION │ │ │ │ Polymarket API │ │ │ └────────┬────────┘ │ │ │ │ │ ┌────────▼────────┐ │ │ │ SUPABASE │──────▶ VERCEL DASHBOARD │ │ │ (Real-time) │ (Public Monitoring) │ │ └─────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────┘

3. RL Agent: Technical Deep Dive

3.1 State Space (18 Dimensions)

The RL agent observes an 18-dimensional state fused from multiple real-time sources:

Category Features Source
Momentum returns_1m, returns_5m, returns_10m Binance Futures
Order Flow ob_imbalance_l1, ob_imbalance_l5, trade_flow, cvd_accel Binance Futures
Microstructure spread_pct, trade_intensity, large_trade_flag Polymarket CLOB
Volatility vol_5m, vol_expansion Combined
Position has_position, position_side, position_pnl, time_remaining Internal State
Regime vol_regime, trend_regime Derived

3.2 Neural Network Architecture

The V3.5 architecture uses LSTM temporal encoding with cross-market attention:

LSTMTemporalEncoder:
    Input: (batch, seq_len=10, features=18)
    LSTM: 2 layers, hidden_dim=64, dropout=0.1
    Output: 64-dim temporal embedding

CrossMarketAttention:
    Multi-head attention (4 heads) across 4 markets
    Captures inter-market correlations

Actor Network:
    [temporal(64) + attention(64)] → 128 → LayerNorm → ReLU
    → 64 → LayerNorm → ReLU → 3 (softmax)

Critic Network:
    [temporal(64) + attention(64)] → 128 → LayerNorm → ReLU
    → 64 → LayerNorm → ReLU → 1 (value)

3.3 PPO Hyperparameters

Parameter Value Notes
Learning Rate (Actor) 1e-4 Conservative for stability
Learning Rate (Critic) 3e-4 Higher for faster value learning
Gamma (γ) 0.95 Short horizon (15-min markets)
GAE Lambda 0.95 Advantage estimation
Clip Epsilon 0.2 PPO clipping
Entropy Coefficient 0.03 Allows sparse policy
Buffer Size 256 Fast adaptation

3.4 Action Space

Action Description
HOLD (0) No action - wait for better opportunity
BUY_UP (1) Long YES token (bet price goes up)
BUY_DOWN (2) Long NO token (bet price goes down)

3.5 Reward Engineering

Key Breakthrough

Share-based PnL calculation that matches actual binary market economics:

shares = dollars / entry_price
pnl = (exit_price - entry_price) × shares

This amplifies returns from low-probability entries proportionally. Buy at 0.30 → 3.33 shares per dollar. Buy at 0.70 → 1.43 shares. Same price move, larger return at lower entries.

4. Opus Agent: AI Research Engine

4.1 Overview

The Opus Agent uses Claude (Anthropic's LLM) to research and trade longer-term prediction markets. It scans all Polymarket markets, performs web research, estimates true probabilities, and identifies trading edges.

4.2 Research Pipeline

  1. Market Discovery: Scan Polymarket for liquid markets with reasonable time horizons
  2. Web Search: Gather real-time information from news, social media, and official sources
  3. AI Analysis: Claude analyzes market question, current odds, and web context
  4. Edge Calculation: Compare AI's probability estimate vs market price
  5. Execution: If edge > 8% and confidence > 60%, execute trade

4.3 Trading Parameters

Parameter Value
Minimum Edge 8%
Minimum Confidence 60%
Max Position Size 15% of bankroll
Min Time to Resolution 6 hours
Max Time to Resolution 30 days
Scan Interval 30 minutes

5. Training Results

5.1 Training Evolution

The RL agent evolved through 5 phases, each fixing problems discovered in the previous:

Phase Change Size PnL ROI
1 Shaped rewards (failed) $5 $3.90 -
2 Sparse PnL only $5 $10.93 55%
3 10x scale up $50 $23.10 12%
4 Share-based PnL $500 $3,392 170%
5 LSTM + Attention $500 ~$50K 2,500%

5.2 Key Insights

6. Risk Management

6.1 Exit Rules (RL Agent)

Rule Value
Take Profit 15%
Stop Loss 10%
Time Stop 300 seconds
Min Hold Time 3 minutes

6.2 Position Sizing

7. Technology Stack

Component Technology
RL Framework PyTorch
LLM Claude (Anthropic)
Execution Polymarket CLOB API
Data Streaming Binance WebSocket
Database Supabase (PostgreSQL)
Dashboard Vercel (Static)
Real-time Updates Supabase Realtime

8. Conclusion

POCKET demonstrates that combining reinforcement learning with large language model research creates a powerful autonomous trading system. The RL agent exploits short-term information lag in crypto prediction markets, while the Opus agent leverages AI reasoning for event-driven opportunities.

The system is fully autonomous, running 24/7 with real-time public monitoring. All trades are executed with real USDC on Polymarket, with performance transparently displayed on the live dashboard.

Live Dashboard

Monitor real-time performance at nostradopus.tech