1
Raw Market Data Ingestion
L2 order book data via TCP/UDP, upgraded to L3 granularity through Rust-based market simulator with per-order tracking.
Rust
L2 → L3
KRX · HKEX · TWSE
reconstructed L3 data
2
Rust Execution Environment ↔ Python Agent
Full intraday episodes at 5s steps. Rust validates orders against book state and tracks queue position via L3 order IDs.
Rust ↔ Python
5s steps
Order validation
observation
3
4-Head Observation Encoder
Observations decomposed into 4 semantically distinct groups, independently normalized, then concatenated.
Order Book
Market depth, bid/ask imbalance
Execution
Trade volume, price momentum
Job
Progress vs. time, constraints
Pending
Agent's open order positions
state vector
4
Policy Network
CNN+LSTM (production) or Transformer (better cross-market transfer). Shared weights across markets with market conditioning input.
Temporal Backbone
CNN+LSTM or Transformer
Action Heads
Price level, order/cancel, qty
Value Head
V(s) for GAE in PPO
action → Rust validates
5
Action Execution
Order validated against book state, placed and tracked via L3 order IDs.
6
Reward Signal
Triggered on fill — execution price vs. VWAP benchmark + quantity progress. Propagated via GAE.
trajectories
7
Parallel PPO Training
16–128 workers, synchronized updates. Multi-task across KRX, HKEX, TWSE with meta-learning.
8
Production Deployment
TorchScript traced → Rust inference. Validated via Python↔Rust output comparison.