TL;DR
- Streaming via Yellowstone-gRPC means sending a subscribe request with an explicit pubkey list, and re-sending it every time the set changes
- That wastes bandwidth, forces a full filter rebuild on every change, and pushes high-volume subscribers into multi-connection sharding workarounds
- We fixed it by adding a new field that carries your tracked accounts as a
CompressedFilterSetprobabilistic filter - Insert and remove are O(1) (constant-time), so updates skip the full filter rebuild
- Filter size drops ~10x at every account count, allowing a single stream to handle workloads that previously took multiple connections
- Available today in yellowstone-grpc, with Rust API ready, TypeScript coming soon
When the account list size becomes the bottleneck
When you stream account data via Yellowstone-gRPC, you tell the server which accounts you want by listing them in the subscribe request.
The list is sent explicitly on the wire, where 1M accounts add up to ~44 MB per request! As accounts open and close, the tracked set shifts, and you need to re-upload the full list every time, multiplying your bandwidth consumption.
Anyone running a long, fast-changing account list runs into this: aggregators tracking pools, bots watching position accounts, and so on. The common workaround is sharding, opening multiple connections that each carry a slice of the list. While it fixes the per-connection filter size, it also creates three new problems:
- Coordination: each add or remove must target the right shard, and shards drift in size, so you need to rebalance
- Reconnects: every dropped connection means N reconnects, possibly out of order, with different gaps per shard
- Slot ordering: slot N can arrive on shard 3 before shard 1 finishes slot N-1, so you need to handle buffering and reordering across streams client-side
The solution
We added a new field to the subscribe request, a Cuckoo filter implemented as a custom Rust module inside yellowstone-grpc-proto, which compactly represents your tracked accounts and lets the server filter without seeing the full list of pubkeys.
Instead of storing your full 32-byte pubkeys, the filter stores small fingerprints, each one a hash of the pubkey truncated to a few bits. For every account update Solana produces, our gRPC server hashes the incoming pubkey and checks whether the resulting fingerprint is already in the filter.
If it isn't, the pubkey is definitely not in your set, so the server drops the update. If it is, the server forwards the update to your stream, with a small chance (under 1%) that two different pubkeys happened to share a fingerprint and produced a false positive.
Your client already keeps the exact list of accounts you care about in memory and checks every incoming update against it, because that's how you route updates to the right logical account in your app. If the server returns a false positive, your client catches it during the same check and drops it.
You pay the build cost once: a 2M-account filter takes ~390ms on a release build at subscribe time. After that, every insert, remove, and resend stays cheap, mutations are O(1) until capacity fills up, and re-serialising the filter is a flat memcpy of the underlying state rather than a recomputation.
The subscription payload shrinks ~10x across every account set size we benchmarked:
| Tracked accounts | CompressedFilterSet | Explicit pubkey list |
|---|---|---|
| 1,000 | ~4 KiB | ~44 KB |
| 10,000 | ~32 KiB | ~440 KB |
| 100,000 | ~256 KiB | ~4.4 MB |
| 1,000,000 | ~4 MiB | ~44 MB |
| 2,000,000 | ~8 MiB | ~84 MB |
Getting started
The Rust API is available today in yellowstone-grpc (PR #732) with TypeScript coming soon.
To get started, build a CompressedFilterSet from your tracked pubkeys, and attach it to a subscribe request.
use yellowstone_grpc_proto::cuckoo::CompressedAccountFilterSet;
let mut accounts = CompressedAccountFilterSet::with_capacity(2_000_000)?;
for pk in my_tracked_pubkeys() {
accounts.insert(*pk)?;
}
let mut req = SubscribeRequest::default();
accounts.insert_into_subscribe_request(&mut req, "tracked");Send req over the gRPC stream the same way you've always sent a SubscribeRequest.
When your tracked set changes, mutate the local set and resend:
accounts.insert(new_pubkey)?;
accounts.remove(old_pubkey)?;
accounts.insert_into_subscribe_request(&mut req, "tracked");
// resend reqWhen account updates flow back, filter out false positives in one line:
if accounts.contains(&incoming_pubkey) {
// this update is for one of your tracked accounts
}contains on the client is exact (backed by a HashSet); the probabilistic part lives only on the wire.
Three design choices keep the proto flexible, so we can change the filter without breaking your code:
- Seed in the filter: each filter records the seed it was built with, so we can pick a new default seed later without it affecting your application
- Hash function name in the filter: today it's SipHash-2-4. If we add a faster hash later, today's applications keep using SipHash until they upgrade
- SipHash-2-4 specifically: Rust's built-in hash function changes output between compiler versions, so two applications built with different Rust versions would send different filter bytes for the same pubkey. SipHash-2-4 is an open spec with the same output everywhere, so a TypeScript client produces the same filter bytes as a Rust one
If you don't have a Triton endpoint yet, self-onboarding takes only 2 minutes: