POCKET is a dual-agent autonomous trading system that combines reinforcement learning with large language model research to trade prediction markets on Polymarket. The system operates two distinct agents: an RL Agent for high-frequency 15-minute crypto markets, and an Opus Agent for longer-term event-driven markets requiring real-world research.
Cross-market state fusion: exploiting information lag between fast markets (Binance futures) and slow markets (Polymarket) through real-time multi-source data fusion.
The system runs two independent agents that complement each other:
| Component | RL Agent | Opus Agent |
|---|---|---|
| Market Type | 15-min crypto binary markets | All Polymarket events |
| Time Horizon | 15 minutes | Hours to days |
| Decision Engine | PPO Neural Network | Claude AI (Anthropic) |
| Data Sources | Binance + Polymarket orderbook | Web search + market data |
| Trade Frequency | Multiple per hour | Every 30 minutes scan |
The RL agent observes an 18-dimensional state fused from multiple real-time sources:
| Category | Features | Source |
|---|---|---|
| Momentum | returns_1m, returns_5m, returns_10m |
Binance Futures |
| Order Flow | ob_imbalance_l1, ob_imbalance_l5, trade_flow, cvd_accel |
Binance Futures |
| Microstructure | spread_pct, trade_intensity, large_trade_flag |
Polymarket CLOB |
| Volatility | vol_5m, vol_expansion |
Combined |
| Position | has_position, position_side, position_pnl, time_remaining |
Internal State |
| Regime | vol_regime, trend_regime |
Derived |
The V3.5 architecture uses LSTM temporal encoding with cross-market attention:
LSTMTemporalEncoder:
Input: (batch, seq_len=10, features=18)
LSTM: 2 layers, hidden_dim=64, dropout=0.1
Output: 64-dim temporal embedding
CrossMarketAttention:
Multi-head attention (4 heads) across 4 markets
Captures inter-market correlations
Actor Network:
[temporal(64) + attention(64)] → 128 → LayerNorm → ReLU
→ 64 → LayerNorm → ReLU → 3 (softmax)
Critic Network:
[temporal(64) + attention(64)] → 128 → LayerNorm → ReLU
→ 64 → LayerNorm → ReLU → 1 (value)
| Parameter | Value | Notes |
|---|---|---|
| Learning Rate (Actor) | 1e-4 |
Conservative for stability |
| Learning Rate (Critic) | 3e-4 |
Higher for faster value learning |
| Gamma (γ) | 0.95 |
Short horizon (15-min markets) |
| GAE Lambda | 0.95 |
Advantage estimation |
| Clip Epsilon | 0.2 |
PPO clipping |
| Entropy Coefficient | 0.03 |
Allows sparse policy |
| Buffer Size | 256 |
Fast adaptation |
| Action | Description |
|---|---|
HOLD (0) |
No action - wait for better opportunity |
BUY_UP (1) |
Long YES token (bet price goes up) |
BUY_DOWN (2) |
Long NO token (bet price goes down) |
Share-based PnL calculation that matches actual binary market economics:
shares = dollars / entry_price
pnl = (exit_price - entry_price) × shares
This amplifies returns from low-probability entries proportionally. Buy at 0.30 → 3.33 shares per dollar.
Buy at 0.70 → 1.43 shares. Same price move, larger return at lower entries.
The Opus Agent uses Claude (Anthropic's LLM) to research and trade longer-term prediction markets. It scans all Polymarket markets, performs web research, estimates true probabilities, and identifies trading edges.
| Parameter | Value |
|---|---|
| Minimum Edge | 8% |
| Minimum Confidence | 60% |
| Max Position Size | 15% of bankroll |
| Min Time to Resolution | 6 hours |
| Max Time to Resolution | 30 days |
| Scan Interval | 30 minutes |
The RL agent evolved through 5 phases, each fixing problems discovered in the previous:
| Phase | Change | Size | PnL | ROI |
|---|---|---|---|---|
| 1 | Shaped rewards (failed) | $5 | $3.90 | - |
| 2 | Sparse PnL only | $5 | $10.93 | 55% |
| 3 | 10x scale up | $50 | $23.10 | 12% |
| 4 | Share-based PnL | $500 | $3,392 | 170% |
| 5 | LSTM + Attention | $500 | ~$50K | 2,500% |
| Rule | Value |
|---|---|
| Take Profit | 15% |
| Stop Loss | 10% |
| Time Stop | 300 seconds |
| Min Hold Time | 3 minutes |
| Component | Technology |
|---|---|
| RL Framework | PyTorch |
| LLM | Claude (Anthropic) |
| Execution | Polymarket CLOB API |
| Data Streaming | Binance WebSocket |
| Database | Supabase (PostgreSQL) |
| Dashboard | Vercel (Static) |
| Real-time Updates | Supabase Realtime |
POCKET demonstrates that combining reinforcement learning with large language model research creates a powerful autonomous trading system. The RL agent exploits short-term information lag in crypto prediction markets, while the Opus agent leverages AI reasoning for event-driven opportunities.
The system is fully autonomous, running 24/7 with real-time public monitoring. All trades are executed with real USDC on Polymarket, with performance transparently displayed on the live dashboard.
Monitor real-time performance at nostradopus.tech