Complete guide to Solana streaming and Yellowstone gRPC

Complete guide to Solana streaming and Yellowstone gRPC

TL;DR

  • Starting in 2021, standard RPC polling (repeatedly asking the chain for data) was too slow for HFT, MEV, and high-performance indexing. By the time you get the data, it’s already stale
  • In 2021, Solana recognised this limit and added the Geyser plugin system to Agave. This allowed the node to emit all data directly from memory, enabling the ecosystem to build plugins that run outside the node but tap into its state
  • Triton pioneered the use of gRPC for this interface, building the Yellowstone gRPC plugin. It allows you to subscribe to the specific data you need and receive it the moment it arrives
  • Shortly, other providers and the whole ecosystem adopted Yellowstone to move from pull → push for any workload that needs fresh, structured data: trading, analytics, DEXs, RFQ engines, indexers, explorers, and portfolio management tools
  • We never run streaming services on voting validators. We use dedicated, non-voting nodes specifically for streaming to ensure the stream never competes with consensus (or other RPC requests) for resources
  • Some of the streaming benefits are: minimising latency and data loss, receiving lightweight payloads (strongly typed Protobufs), eliminating "429 Too Many Requests”, and much more

Introduction

If you are building a trading bot, a real-time indexer, or a DEX on Solana, you have likely hit the "RPC Wall." You hammer an endpoint with getAccountInfo every 200ms, only to get rate-limited or, worse, realise your data is 200ms old by the time it reaches your application logic.

Unlike EVM chains, Solana moves at (almost) the speed of light. On a 400ms block time, being only a little late isn't a thing.

At Triton One, our mission is IBRL – Increase Bandwidth, Reduce Latency. Providing high-performance open-source streaming solutions for the community is one of the ways we achieve it.

This guide walks through how streaming on Solana works, what the Yellowstone stack looks like, how to choose a provider when streaming is your primary workload, when to use RPC vs streaming (and which streaming tools to consider), and how to get started.

How does the Yellowstone Geyser plugin work?

At its core, Yellowstone gRPC (or Dragon's Mouth, as we call it) is a piece of software that compiles down to a dynamic shared library (a .so file on Linux).

It utilises the Geyser Plugin Interface provided by Solana Labs. When a validator client starts up, we pass it a configuration flag telling it to load our library into its own memory space. Once loaded, Yellowstone isn't just "watching" the node; it becomes part of the node. It registers callbacks (hooks) directly with the node’s AccountsDb and Bank.

Here’s the step-by-step of the data flow:

  • Whenever the node processes a transaction or updates an account state in memory, it triggers a callback to our plugin
  • We capture raw data, accounts, slots, blocks, and transactions before they are even written to disk
  • Instead of waiting for a client to request it, we immediately serialise it to Protobuf and push it out via a gRPC interface opened by the plugin
  • You receive the data stream (at the commitment level of your choice: processed, confirmed, or finalised)

This bypasses the entire JSON-RPC HTTP layer, eliminating request parsing, JSON serialisation overhead, and polling loops. You subscribe once; the node keeps sending data until you disconnect (unsubscribe).

Where do we run it?

We don’t run Yellowstone on voting validators. Neither do we run it together with nodes serving RPC calls, and we advise our customers with dedicated nodes to do the same (streaming and streaming-only nodes). The goal of streaming is to deliver data immediately, and we want to ensure that speed isn't compromised by "noisy neighbours" spamming heavy RPC calls.

Instead, we run dedicated streaming RPC nodes. These are full bare-metal Solana nodes that follow the cluster and verify the ledger, but don’t vote.

  • Optimised for data. Tuned specifically to handle the massive I/O load of streaming
  • No contention. Your gRPC subscription never competes with the consensus
  • Isolation. We keep streaming nodes physically and logically isolated from general RPC traffic
  • Network topology. We run these nodes next to validators in top-tier data centres to shave every last microsecond of physical latency, optimising the path from validators to RPCs and from there to you

How to run Triton’s Geyser plugin yourself?

While we sell managed infrastructure, we believe the core tools should be shared. We do this to empower more innovation on the network, protect the ecosystem from single points of failure, and give you alternatives to proprietary SDKs from the moment you start building.

We maintain the suite so you can tinker with it or bootstrap it anytime.

Yellowstone streaming ecosystem

"Yellowstone" isn't just one engine; it's a suite of open-source tools we built to handle different customer needs. Each project is named after a geyser at Yellowstone National Park because, like a geyser, this infrastructure manages high-pressure, high-speed streams. We’ll focus on its 4 core components:

1. Dragon’s Mouth (aka Yellowstone gRPC)

What it is: Triton’s original ultra-low latency Geyser-fed gRPC interface.
How it works: It connects directly to the node's memory, ingests raw bank data, outputs strongly typed Protobufs, and immediately sends them over the gRPC connection. This is also our "source of truth" for other Yellowstone tools.
Use it for: the absolute lowest possible latency. This is for HFT, MEV, RFQ desks, liquidation engines, and arbitrage traders identifying market inefficiencies.

This was our first product in the suite. As revolutionary as it was, it solved only the problems of speed and wasted bandwidth, and only for the backends.
But at Triton, being the serial problem-solvers we are, we wanted to help everyone. So we kept shipping.

Solana shreds 
    │  
    │  
    ▼
[Validator / Follower] 
    │ Geyser callbacks  
    │ 
    ▼  
[Dragon's Mouth plugin] 
    │ 
    │ Protobuf over HTTP/2  
    ▼  
[gRPC streams → traders / indexers / infrastructure]  

2. Whirligig WebSockets

What it is: A high-performance WebSocket proxy that allows you to reap the benefits of Dragon's Mouth for the frontends.
How it works: It stands between the Dragon’s Mouth plugin and your client. Whirligig ingests the high-speed gRPC stream and translates it into standard Solana JSON-RPC WebSocket messages.
Use case: Ultra-low latency for frontends (Live feeds for DEXs, dApps & Wallets)

You are probably wondering, why build another WebSocket if Solana JSON-RPC already had one?

Standard WebSocket implementation was a great first “draft” at the beginning of Solana, but with Solana's massive throughput and breakneck speed, standard websockets became unreliable under load, degrading in performance, and frequently disconnecting. Furthermore, they were slow: internally waiting until the end of each slot, rather than streaming updates as they occur within the slot.

Whirligig solved all of these issues by moving streaming logic outside of Agave, while maintaining complete backward compatibility.

[Dragon’s Mouth stack]  
    │
    │ gRPC  
    ▼ 
[Whirligig proxy] 
    │ 
    │ WebSocket (Solana WS API)  
    ▼  
[browser / dApp]  

3. Fumarole reliable streams

What it is: A high-availability, persistent, multiplexed streaming service.
How it works: It connects to multiple downstream Dragon's Mouth nodes, aggregates the data, removes duplicates (so you don't process the same block twice), sends it through the gRPC connection, and tracks your position in the data stream.
Use case: When you need whole blocks or confirmed commitment levels for indexers, analytics, lending protocols, and compliance systems that demand 100% data completeness.

We built Fumarole to end the struggle with streaming reliability and eliminate the need for complex backfilling and redundancy logic. Data gaps happen for two reasons:

  • Node restarts: validators patch and reboot
  • Client disconnects: your server goes down, or the network blips

At Triton, we eliminate p.1 via health checks, redundancy, and immediate auto-failover. However, p.2 is just as important.

To eliminate the issue, we created a service that doesn't try for the "perfect scenario" but works well because disconnects happen. It remembers exactly the moment of disconnect, and once reconnected, you start precisely from where you left off, eliminating the need for complex backfilling and redundancy logic.

[DM node A] [DM node B] [DM node C]  
  └─────── gRPC events ───────┘  
                  │
                  │
                  ▼  
              [Fumarole]  
         merge + dedupe + log  
                  │ 
                  │  gRPC (cursor)  
                  ▼  
    [indexers / DeFi / analytics]  

4. Old Faithful historical streams

What it is: Historical data streaming via gRPC.
How it works: It takes data from our massive historical archive and streams it via gRPC as if it were happening live.
Use it for: Backfilling databases, compliance auditing, taxes, and booting up new indexers.

While developing Old Faithful (the only complete, verified, and public Solana archive), we realised the same gRPC interface could be used to replay history. This solves the "cold start" problem for protocols, relying on streaming. Now you can replay the entire chain history through the exact same pipe you use for live data.

[warehouse nodes]
      │  
      │ snapshots  
      ▼  
[Old Faithful CAR builder] 
      │ 
      │ CAR archives  
      ▼  
[storage: disk / S3 / IPFS]  
   │                    │  
   │ RPC/gRPC (history) │ Geyser replay  
   ▼                    ▼  
[apps needing history]  [Historical streams → indexers]  

Shred streaming

If you’re building on Solana, you’ve probably heard of shred streaming. Simply put, shreds are the atomic units of a block: fixed-size, erasure-coded packets of block data that a leader cuts from batched transactions and broadcasts to other validators, who use them to reconstruct entries and replay the full block

High-staked validators are the first to receive shreds. Standard RPC nodes sit much further downstream in Turbine. If you let nature (or Solana) take its course, you’re waiting an extra 15-80 ms (or even >200 ms for certain slots) for propagation hops and block reconstruction before you see anything.

But at Triton, we bypass the wait through:

  • Colocation: our streaming nodes are colocated with high staked validators, giving an edge through rebroadcast shreds whenever they have leader slots
  • Global shred distribution network: we automatically distribute all shreds that hit our edge routers through our global network of low-latency networking links to all our streaming nodes, so they see shreds as soon as they enter our network.

By optimising the ingestion step, we eliminate the physical hops and give you a real, measurable speed advantage.

[leader]  
   │
   │ shreds via Turbine  
   ▼  
[high-stake Triton validator]  
   │
   │ early shreds + replay  
   ▼  
[relay nodes]  
   │
   │ low-latency hop  
   ▼  
[streaming nodes]  
   │
   │ gRPC / WS  
   ▼  
[HFT / RFQ / bots / indexers]  

Decision matrix: when to use what

By now, you’ve probably realised that “streaming” on Solana isn’t one thing; it’s a family of patterns that solve different problems. The only question left is which tool to use, and when.

To make that easier, here’s a simple decision matrix mapping scenarios to the Yellowstone (and non-Yellowstone) component that fits best:

If you are:Use:Because:
HFT / MEV / RFQ enginesDragon’s Mouth gRPCYou need to be first to see state updates and react to them. Every millisecond of overhead matters.
Mission-critical indexerFumarole gRPCYou need 100% data completeness; network instability, or client restarts can’t lead to missed slots.
Frontend / dApp UIWhirligig WebSocketsYou need real-time updates for user interfaces to feel instant.
Backfills / audits / cold startsOld Faithful (gRPC replay)You need to replay history as a stream: backfill databases, run compliance/tax pipelines, or boot new indexers using the exact same interface as live data.

When is Solana RPC polling enough?

Before you tear down your infrastructure, let's talk about when you should stick with RPC polling. You likely don’t need streaming if:

  • There's no need for real-time data
  • Your ops are mostly happening off-chain
  • Requests are infrequent (once every few minutes/hours)
  • Your requests don’t have a pattern; they are random, and you wouldn't need to "subscribe" to any one of them for a long period

Industries that usually stick to RPC:

  • Centralised exchanges (one-off reads)
  • Minting / burning (transactional / one-off)
  • RWA / tokenisation (infrequent updates)
  • DePIN (high volume but often batched)
  • Infrastructure (ephemeral sessions)

How to choose an RPC provider

Speed matters; a lot. But without the other pillars in place, it won’t take you very far. Here’s the complete checklist to keep in mind when you choose a provider for streaming-heavy systems:

24/7 uptime

It doesn't matter if your p99 is 90ms if the service goes down during market volatility. You need to be sure your infrastructure will stay responsive no matter what.

Data freshness

Some providers will serve you cached data at breakneck speeds, but if that data is stale (higher chances when cached), acting on it will be useless, and sometimes cost you losses.

Feature depth

Many providers simply host our open-source Yellowstone plugin. That's great (we built it for the community), but if you need advanced features like persistent, high-availability streaming, or enhanced WebSockets, you want the team that constantly ships, improves and pioneers such tools.

Engineering support

Many providers simply host our open-source Yellowstone plugin. That's fine (we built it for the community), but if you need advanced features like Fumarole (deduplication) or Whirligig (WS proxy), you want the team that actually builds and maintains the code.

Vendor-lock

Vendor lock is easy to ignore when things are going well and very hard to fix in an emergency (changes of terms, unmet scaling needs, or surprise overages). Before you commit to any RPC provider, check whether core components are open source and confirm that you can run the same stack elsewhere. The more of your pipeline you can’t move without rewriting, the more dependent you are on the provider’s will.

Pricing

If you are building an analytics dashboard, trading bot, explorer, DEX, or liquidator, streaming will be a major expense.

Some providers hide this with “credits” that get expensive at scale, or gate streaming behind high-tier plans. Here’s how the most popular Solana providers compare when it comes to streaming:

Comparison (price per GB of bandwidth streamed):

ProviderModelApprox Cost
Triton OneUsage based$0.08
HeliusCredits$0.15 or 30,000 credits
QuicknodeAdd-ons marketplaceStarts at $499,heavily limited
AlchemyCredits$0.08 / GB

Implementation guide

We don’t believe in proprietary lock-in. The same clients we use in production are open source, so you can read the code, adapt it to your stack, and self-host your own instance.

Rust example

The gold standard for performance:

use {
    futures::StreamExt,
    std::collections::HashMap,
    yellowstone_grpc_client::GeyserGrpcClient,
    yellowstone_grpc_proto::prelude::{
        CommitmentLevel, SubscribeRequest, SubscribeRequestFilterSlots,
        SubscribeRequestFilterAccounts, subscribe_update::UpdateOneof,
    },
};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Connect to the gRPC endpoint
    let mut client = GeyserGrpcClient::build_from_shared("https://your-endpoint.rpcpool.com")?
        .x_token(Some("your-x-token".to_string()))?
        .connect()
        .await?;

    // Build subscription request
    let mut slots = HashMap::new();
    slots.insert(
        "client".to_string(),
        SubscribeRequestFilterSlots {
            filter_by_commitment: Some(true),
            interslot_updates: Some(false),
        },
    );

    let mut accounts = HashMap::new();
    accounts.insert(
        "client".to_string(),
        SubscribeRequestFilterAccounts {
            account: vec![], // specific account pubkeys
            owner: vec!["675kPX9MHTjS2zt1qfr1NYHuzeLXfQM9H24wFSUt1Mp8".to_string()], // Raydium AMM
            filters: vec![],
            nonempty_txn_signature: None,
        },
    );

    let request = SubscribeRequest {
        slots,
        accounts,
        commitment: Some(CommitmentLevel::Processed as i32),
        ..Default::default()
    };

    // Subscribe and handle updates
    let (mut _tx, mut stream) = client.subscribe_with_request(Some(request)).await?;

    while let Some(message) = stream.next().await {
        match message?.update_oneof {
            Some(UpdateOneof::Slot(slot)) => {
                println!("Slot: {}, Status: {}", slot.slot, slot.status);
            }
            Some(UpdateOneof::Account(account)) => {
                println!("Account update at slot: {}", account.slot);
            }
            _ => {}
        }
    }

    Ok(())
}

TypeScript example

We recently released a NAPI upgrade for the grpc-js client for TypeScript. This increases throughput by 4x compared to standard JS implementations, removing the single-thread bottleneck (read the deep dive).

import Client, {
  CommitmentLevel,
  SubscribeRequest,
} from "@triton-one/yellowstone-grpc";

async function main() {
  // Connect to the gRPC endpoint
  const client = new Client("https://your-endpoint.rpcpool.com", "your-x-token", {
    grpcMaxDecodingMessageSize: 64 * 1024 * 1024, // 64MiB
  });

  await client.connect();

  // Subscribe to events
  const stream = await client.subscribe();

  // Handle stream events
  const streamClosed = new Promise<void>((resolve, reject) => {
    stream.on("error", (error) => {
      reject(error);
      stream.end();
    });
    stream.on("end", resolve);
    stream.on("close", resolve);
  });

  // Process incoming data
  stream.on("data", (data) => {
    if (data.slot) {
      console.log(`Slot: ${data.slot.slot}, Status: ${data.slot.status}`);
    }
    if (data.account) {
      console.log(`Account update at slot: ${data.account.slot}`);
    }
  });

  // Build and send subscription request
  const request: SubscribeRequest = {
    accounts: {
      client: {
        account: [],
        owner: ["675kPX9MHTjS2zt1qfr1NYHuzeLXfQM9H24wFSUt1Mp8"], // Raydium AMM
        filters: [],
      },
    },
    slots: {
      client: { filterByCommitment: true },
    },
    transactions: {},
    transactionsStatus: {},
    entry: {},
    blocks: {},
    blocksMeta: {},
    commitment: CommitmentLevel.PROCESSED,
    accountsDataSlice: [],
    ping: undefined,
  };

  await new Promise<void>((resolve, reject) => {
    stream.write(request, (err) => (err ? reject(err) : resolve()));
  });

  await streamClosed;
}

main().catch(console.error);

Streaming vs RPC performance comparison

We’ve spent years obsessing over these microseconds, so you don’t have to. Here’s how the main approaches compare at a high level.

MetricStandard RPC (Polling)Native RPC WebSocketsYellowstone gRPC (Dragon’s Mouth)
Latency*p90 ~150ms for slotsp90 ~10ms for slots, ~374ms for accountsp90 ~5ms for slots, ~215ms for accounts
Payload sizeHigh (JSON + Base64)Medium (JSON)Low (Protobuf binary)
ReliabilityMedium (rate limits, retries)Low–Medium (fragile connections)High, especially with Fumarole persistence
BackpressureNone (client can spam)LimitedNative HTTP/2 + gRPC flow control
ComplexityLowMediumHigher, requires Protobuf and gRPC tooling
Best forSimple apps, occasional readsUIs with low traffic,“ok” real-time UXHFT, MEV, indexers, high-traffic and volume dApps

*Measured from a Triton mainnet RPC endpoint

Start building

Streaming is how you keep up with Solana’s throughput and turn microseconds into an edge instead of a liability.

Whether you use Triton’s managed services or bootstrap your own Dragon’s Mouth and Whirligig stack on bare metal, you get:

  • Open-source building blocks
  • Production-grade performance and reliability
  • A path that doesn’t vendor-lock you