Linus Kendall, Kateryna Shyndina

Streaming Solana's history: the open path to backfill indexing

Updated 30 Apr 2026

TL;DR

Compliance, analytics, research, and teams that need single-digit read latency and arbitrary filters will always benefit from their own index of Solana history
The most popular path (RPC polling) is slow, serial, rate-limited, blind to its own coverage, and expensive as costs scale per call
While newer RPC methods like getTransactionsForAddress solve part of the challenge, they don't scale up to chain-wide indexing
The better alternative is Triton's Old Faithful: the only complete, verified, public Solana ledger
Anza's Jetstreamer replays it at up to 2.7M TPS, feeding blocks, transactions, rewards, and entries straight into your local plugin. Compared to RPC polling, it's:
- Easier: known slot coverage, no query loops, no guessing at missing data
- Faster: pick your slot ranges, run many streams in parallel, at the full potential of your own wire speed
- Cheaper: run it yourself against the Old Faithful archive, or use managed Faithful streams to cut overhead
We currently host the archives you can use with Jetstreamer for free, as a public good (subject to change)
Pair with Fumarole reliable streams for ongoing indexing at the tip

Accessing Solana history

Solana's ledger spans trillions of rows and grows by millions of transactions every day, making fast, flexible access to it genuinely hard. Much of our RPC 2.0 work is aimed at fixing that, and we're open-sourcing projects like Hydrant to make affordable, fast history available to everyone.

But some workloads will always need more than that: local reads in single-digit milliseconds, joins across arbitrary fields, filters on any program or account, deterministic replays, and a pipeline they control end to end.

Typical workloads that need their own copy of history:

Compliance and tax systems running pattern detection across years of transfers
Analytics platforms computing PnL for every trading pair on a DEX
Risk engines replaying market conditions
On-chain researchers building datasets for papers or models

If you're building one of these workloads, the default assumption is that polling JSON-RPC in a loop is the best way to build it (or the only way).

In practice, it breaks down on every axis that matters for indexing:

Throttled: RPS limits bottleneck you regardless of your actual bandwidth
Unverifiable coverage: data gaps you can't detect or backfill after the fact
Linear cost: every block or tx is a paid call, scaling your bill with the history size

Newer RPC methods like getTransactionsForAddress patch this for specific lookups (one address, narrow windows), but they only solve part of the challenge. They don't scale up to indexing the whole chain, becoming costly and serial at scale.

The streaming (better) path

There's a much better path, and it's been available for years: stream archived blocks and transactions directly into your indexer at wire speed.

It's completely open source, carries no dependencies, costs nothing beyond your own hardware, and almost nobody talks about it (which is why it might be new to you).

It takes two tools to make the streaming path work:

Old Faithful	Jetstreamer
The only complete, verified, and public Solana ledger built by Triton One	The Rust toolkit for streaming Old Faithful at production scale, built by Anza
Every block and transaction, packaged into hash-verifiable content-addressed CAR files	Plugin framework with built-in ClickHouse batching and server
Organised by epoch, available via public HTTP and S3-compatible mirrors	Parallel multi-threaded replay, auto-sized to your hardware
Free public archive for the community (subject to change)	Pluggable archive backends: public HTTP mirror or your own S3

By using them together, you get a complete, ready-to-run indexing pipeline at no cost:

Query by slot, signature, or content hash, or stream every transaction and block across any slot range (cross-epoch included) with guaranteed coverage between slots
Get a verified, gap-free chain from genesis with parent/child links checked, and PoH validated epoch by epoch
Filter server-side by the accounts, votes, or failed transactions
Use any destination: bundled ClickHouse, Geyser plugin, or a custom Jetstreamer plugin you build
Out-of-the-box support for program and instruction tracking plugins
Deterministic, bit-identical replay against any fixed range, guaranteed by CAR files
Parallel multi-threaded stream at 2.7M TPS (benchmarked on a 64-core, 30 Gbps host)
Runs on your own hardware against the free public archive, no per-query credits or rate limits

How it works

You run Jetstreamer on your own hardware. It pulls archived ledger data from Old Faithful, decodes it, and dispatches typed events to your plugin. Here's the pipeline in five steps:

Runner takes your input. A single epoch, or an arbitrary slot range that can span epochs. Parallel mode spreads the decode across worker threads for the highest throughput, while the sequential one keeps writes in slot order
Firehose pulls CAR files and indexes from Old Faithful. It defaults to the public HTTP mirror at files.old-faithful.net, but you can also pull from your own S3-compatible bucket.
Multi-threaded workers decode the CARs into events. Each worker takes a chunk of the slot range, unpacks the binary IPLD structures into typed Rust events, and feeds them into your plugin. Five event types flow out: Block, PossibleLeaderSkipped, Transaction, Reward, Entry. Each event carries its full metadata (signatures, account keys, instructions, log messages), so the plugin doesn't need to re-parse anything.
Each event fires the matching hook on your plugin. Plugin trait exposes five methods (on_epoch, on_block, on_transaction, on_reward, on_entry), so you can write a custom plugin to filter, transform, and persist the data. Two plugins ship bundled (Program Tracking, Instruction Tracking).
Your plugin writes rows to your sink. ClickHouse is wired in by default: Jetstreamer spawns a local ClickHouse server out of the bin/ directory, and the bundled cargo clickhouse-server and cargo clickhouse-client aliases handle starting and inspecting it. Set a DSN to point at an external cluster instead, or use a custom plugin to write to Postgres, Parquet, etc.

Get started

Add jetstreamer to your Cargo.toml, implement the Plugin trait for your indexing logic and point the JetstreamerRunner at Old Faithful's public archive. All hook methods have default no-op implementations; override only the ones your indexer needs:

💡

At the time of writing, jetstreamer v0.5.1 has an open compile-time issue caused by a reqwest version conflict in one of its dependencies. A two-line Cargo.toml fix is staged in PR #48. Until that's merged, you can apply the patch locally for a clean build.

use jetstreamer::{
    firehose::{epochs, BlockData, TransactionData},
    plugin::{Plugin, PluginFuture},
    JetstreamerRunner,
};
use std::sync::Arc;
use clickhouse::Client;

struct MyIndexer;

impl Plugin for MyIndexer {
    fn name(&self) -> &'static str { "my-indexer" }

    fn on_transaction<'a>(
        &'a self,
        _thread_id: usize,
        _db: Option<Arc<Client>>,
        tx: &'a TransactionData,
    ) -> PluginFuture<'a> {
        Box::pin(async move {
            // filter, decode, and write to your store
            println!("sig={}", tx.signature);
            Ok::<(), Box<dyn std::error::Error + Send + Sync + 'static>>(())
        })
    }
}

fn main() {
    let (start_slot, end_inclusive) = epochs::epoch_to_slot_range(800);

    JetstreamerRunner::new()
        .with_plugin(Box::new(MyIndexer))
        .with_threads(4)
        .with_slot_range_bounds(start_slot, end_inclusive + 1)
        .with_clickhouse_dsn("https://your-clickhouse-dsn") // optional
        .run()
        .expect("runner completed");
}

Or skip the custom plugin entirely and run with one of the bundled plugins from the CLI:

# Replay epoch 800 using the bundled instruction-tracking plugin
cargo run --release -- 800 --with-plugin instruction-tracking

# Custom slot range with more parallelism
JETSTREAMER_THREADS=8 cargo run --release -- 358560000:367631999

You now have a backfilled, self-owned, open-spec index of Solana's history.

To keep indexing forward, switch to Fumarole reliable streams, a persistent, high-availability gRPC stream of live Geyser events with at-least-once delivery and subscribers that resume cleanly after reconnects.

Get an endpoint View Fumarole docs

Streaming Solana's history: the open path to backfill indexing

TL;DR

Accessing Solana history

The streaming (better) path

How it works

Get started

Further reading

Yellowstone gRPC: how to parse Solana gRPC streams: Vixen, Carbon, and manual

Compressed filters for Yellowstone gRPC: track millions of accounts with ~10x less overhead

Subscribe to the Triton newsletter