Architecture Overview
The ML pipeline consists of three components:- Feature Extraction (Rust): Extract LOB features in real-time
- Model Training (Python): Train Mamba-LOB model on historical data
- Inference (Rust + ONNX): Run predictions with <1μs latency
Mamba-LOB Model
Mamba-LOB uses selective state space models instead of attention for O(L) complexity and faster inference.Model Architecture
python/training/models/mamba_lob.py:152-205
Mamba Block
The core Mamba block uses selective state space models:python/training/models/mamba_lob.py:11-91
Why Mamba vs Transformer?
| Feature | Transformer | Mamba-LOB |
|---|---|---|
| Complexity | O(L²) | O(L) |
| Inference latency | ~5-10μs | <1μs |
| Parameters | 500K-1M | ~300K |
| Memory | Higher | Lower |
| Sequential processing | Parallel | Sequential (optimized) |
- Linear complexity selective scan
- No attention computation
- Efficient state updates
- Better hardware utilization
Training Pipeline
Data Preparation
Convert market data to ML training samples:Training Script
Train the model using the provided training script:python/training/train.py:183-302
Training Configuration
Transaction Cost-Aware Loss
The model uses a custom loss function that accounts for trading costs:- Only predict when confident (low entropy)
- Focus on larger price moves (higher PnL potential)
- Account for spread and slippage costs
python/training/models/mamba_lob.py:259-317
ONNX Export
Export Trained Model
- Dynamic batch size support
- Constant folding optimization
- FP16/FP32 precision options
python/training/models/mamba_lob.py:320-354
ONNX Optimization
Optimize the exported model:Rust Inference
Run inference in production using ONNX Runtime (planned - not yet implemented):Integration with Trading Strategy
Performance Optimization
Inference Latency
Target latencies:- Feature extraction: <1μs
- ONNX inference: <1μs
- Total prediction: <2μs
Optimization Techniques
-
Model quantization (FP16):
-
Sequence buffer reuse:
- Keep features in circular buffer
- Only extract new features, not entire sequence
-
ONNX Runtime optimization:
-
Batch inference (if latency allows):
- Accumulate N predictions
- Run batch inference
- Trade latency for throughput
Benchmarking
Model Versioning
Manage multiple model versions:Model Monitoring
Track model performance in production:Next Steps
Feature Extraction
Learn about LOB feature engineering
Strategy Development
Build ML-powered strategies
Backtesting
Backtest ML strategies
Performance
Optimize inference latency