Skip to main content

Overview

NanoARB exposes Prometheus metrics for comprehensive monitoring of trading performance, latency, and system health. The monitoring stack includes Prometheus for metrics collection and Grafana for visualization.

Architecture

The monitoring stack consists of:
  • NanoARB Engine: Exposes metrics on port 9090
  • Prometheus: Scrapes metrics every 1 second and stores time-series data
  • Grafana: Provides real-time dashboards and alerting

Setup

Starting the Monitoring Stack

# Full stack (engine + monitoring)
cd docker
docker compose up -d

# Monitoring only (for local development)
docker compose -f docker-compose-monitoring.yml up -d

Accessing Dashboards

Once the stack is running:

Prometheus Configuration

The Prometheus configuration in docker/prometheus.yml:
global:
  scrape_interval: 1s        # Scrape every second for HFT
  evaluation_interval: 1s
  external_labels:
    monitor: 'nanoarb'

scrape_configs:
  - job_name: 'nanoarb'
    static_configs:
      - targets: ['host.docker.internal:9090']
    scrape_interval: 1s      # High-frequency scraping
    metrics_path: /metrics
The 1-second scrape interval is optimized for high-frequency trading. For longer-term monitoring, increase to 5s or 15s to reduce storage requirements.
Data retention:
  • Default: 30 days (configured via --storage.tsdb.retention.time=30d)
  • Adjust in docker-compose.yml if you need longer retention

Available Metrics

NanoARB exposes metrics in the MetricsRegistry defined in crates/nano-gateway/src/metrics.rs:

Trading Metrics

MetricTypeDescription
nanoarb_orders_totalCounterTotal number of orders submitted
nanoarb_fills_totalCounterTotal number of fills received
nanoarb_positionGaugeCurrent net position in contracts
nanoarb_pnlGaugeCurrent profit/loss in dollars
nanoarb_events_totalCounterTotal events processed by the engine

Latency Metrics

All latency metrics are recorded in nanoseconds with histogram buckets:
MetricTypeDescription
nanoarb_inference_latency_nsHistogramML model inference time
nanoarb_order_latency_nsHistogramOrder submission latency
nanoarb_book_update_latency_nsHistogramOrder book update processing time
nanoarb_event_latency_nsHistogramEvent processing latency
Histogram buckets:
  • Range: 100ns to ~100ms
  • Exponential buckets with factor of 2 (20 buckets total)
  • Enables percentile queries (p50, p95, p99)

Example Queries

# Orders per minute
rate(nanoarb_orders_total_total[1m]) * 60

Grafana Dashboards

The default dashboard is located at grafana/dashboards/main.json and includes:

1. Key Performance Indicators (Top Row)

P&L

Current profit/loss in dollars

Position

Current net position

Orders/min

Order submission rate

Fills/min

Fill execution rate

2. Equity Curve

Real-time visualization of cumulative P&L:
nanoarb_pnl
Shows your trading performance over time, helping identify profitable and unprofitable periods.

3. Inference Latency

Tracks ML model performance with percentiles:
  • p50 (median): Typical inference time
  • p95: 95th percentile - most requests complete within this time
  • p99: 99th percentile - worst-case latency for optimization
# p50
histogram_quantile(0.50, sum(rate(nanoarb_inference_latency_ns_bucket[1m])) by (le))

# p95
histogram_quantile(0.95, sum(rate(nanoarb_inference_latency_ns_bucket[1m])) by (le))

# p99
histogram_quantile(0.99, sum(rate(nanoarb_inference_latency_ns_bucket[1m])) by (le))
For HFT strategies, target p99 inference latency under 1 microsecond (1000ns). Higher latencies may result in adverse selection.

4. Position Over Time

Tracks position changes throughout the trading session:
nanoarb_position
Useful for:
  • Identifying position accumulation
  • Monitoring inventory risk
  • Verifying position flattening at session end

5. Event Processing Rate

Monitors system throughput:
rate(nanoarb_events_total_total[1m])
High event rates (>10,000 events/sec) indicate:
  • Heavy market data processing
  • Potential bottlenecks in event loop
  • Need for performance optimization

Dashboard Configuration

The dashboard is provisioned automatically in grafana/provisioning/:
grafana/
├── provisioning/
│   ├── dashboards/
│   │   └── dashboards.yml     # Dashboard provider config
│   └── datasources/
│       └── datasources.yml    # Prometheus datasource
└── dashboards/
    └── main.json               # Main trading dashboard

Adding Custom Panels

  1. Navigate to Grafana: http://localhost:3000
  2. Open Dashboard: “NanoARB Trading Dashboard”
  3. Add Panel: Click “Add panel” in top-right
  4. Configure Query: Use Prometheus queries from examples above
  5. Save Dashboard: Exports to JSON for version control

Custom Dashboard Example

{
  "title": "Sharpe Ratio (Rolling 1h)",
  "targets": [
    {
      "expr": "(avg_over_time(nanoarb_pnl[1h]) - avg_over_time(nanoarb_pnl[1h] offset 1h)) / stddev_over_time(nanoarb_pnl[1h])",
      "legendFormat": "Sharpe"
    }
  ],
  "type": "stat"
}
Calculates rolling Sharpe ratio based on P&L standard deviation.

Alerting

Configure Grafana Alerts

1

Create Alert Rule

In Grafana, go to AlertingAlert rulesNew alert rule
2

Define Condition

Example: Alert when P&L drops below -$10,000
nanoarb_pnl < -10000
3

Configure Notification

Set up notification channels (Slack, email, PagerDuty)
4

Save and Test

Test the alert and save the configuration

Example Alert Rules

Alert when p99 inference latency exceeds 10 microseconds:
histogram_quantile(0.99, 
  sum(rate(nanoarb_inference_latency_ns_bucket[1m])) by (le)
) > 10000
Alert when position exceeds 80% of max:
abs(nanoarb_position) > 40  # Assuming max_position = 50
Alert on significant drawdown (requires additional calculation):
(max_over_time(nanoarb_pnl[1d]) - nanoarb_pnl) > 50000

Metrics Export

Raw Metrics Format

View raw Prometheus metrics:
curl http://localhost:9090/metrics
Example output:
# HELP nanoarb_orders_total Total number of orders submitted
# TYPE nanoarb_orders_total counter
nanoarb_orders_total_total 1542

# HELP nanoarb_pnl Current P&L in dollars
# TYPE nanoarb_pnl gauge
nanoarb_pnl 2450.75

# HELP nanoarb_inference_latency_ns Model inference latency in nanoseconds
# TYPE nanoarb_inference_latency_ns histogram
nanoarb_inference_latency_ns_bucket{le="100"} 245
nanoarb_inference_latency_ns_bucket{le="200"} 489
nanoarb_inference_latency_ns_bucket{le="400"} 723
...

Export to CSV

Use Prometheus API to export historical data:
# Export P&L for last hour
curl -G http://localhost:9091/api/v1/query_range \
  --data-urlencode 'query=nanoarb_pnl' \
  --data-urlencode 'start='$(date -u -d '1 hour ago' +%s) \
  --data-urlencode 'end='$(date -u +%s) \
  --data-urlencode 'step=1s' \
  | jq -r '.data.result[0].values[] | @csv' > pnl.csv

Performance Monitoring

Key Metrics to Watch

Inference Latency

Target: <1μs p99Critical for strategy competitiveness

Order Latency

Target: <100μs p99Impacts fill probability

Event Processing Rate

Target: >50k events/secIndicates system capacity

Fill Ratio

Target: >80%Measures execution quality

Latency Optimization

If latencies are too high:
  1. Check CPU pinning: Ensure process runs on isolated cores
  2. Review event loop: Look for blocking operations
  3. Profile code: Use perf or flamegraph to identify hotspots
  4. Optimize model: Reduce inference complexity
See Production Deployment for optimization techniques.

Troubleshooting

Metrics Not Appearing

# Check if engine is exposing metrics
curl http://localhost:9090/metrics

# Check Prometheus targets
open http://localhost:9091/targets

# Verify Prometheus is scraping
docker compose logs prometheus

Dashboard Not Loading

# Check Grafana logs
docker compose logs grafana

# Verify datasource connection
curl http://admin:nanoarb@localhost:3000/api/datasources

# Restart Grafana
docker compose restart grafana

High Memory Usage

Prometheus stores metrics in memory. Reduce retention or scrape interval:
# In docker-compose.yml
command:
  - '--storage.tsdb.retention.time=7d'  # Reduce from 30d
  - '--storage.tsdb.retention.size=1GB' # Add size limit

Next Steps