Building the Go Driver for Apache Gremlin

February 27, 2026

What I learned contributing a Go client to Apache TinkerPop — graph traversal, connection pooling, and getting a PR merged into the ASF.

goopen-sourceapachegraph-databases

Graph databases don't get enough attention. When you're modelling relationships — who knows whom, what depends on what, how A connects to B through C — a graph is often the most honest data structure. TinkerPop is the Apache project that tries to unify all the graph databases under one traversal language: Gremlin.

I came to TinkerPop through work at BitQuill Technologies, where we were building tooling on top of Amazon Neptune. Neptune speaks Gremlin. We needed a Go client that could speak it too.

The Problem

There was no official Go driver. Python, Java, JavaScript — those existed. Go did not. There were community forks floating around, some abandoned, some subtly broken. There needed to be something reliable — so the decision was made to write one and contribute it upstream.

How Gremlin Works Over the Wire

Gremlin communicates over WebSockets. You send a request containing a Gremlin traversal string (or bytecode), and the server replies with a serialized response. TinkerPop supports two serialization formats: GraphSON (JSON-based, the older default) and GraphBinary (a compact binary format introduced later). The Go driver targets GraphBinary — it's faster to parse and more type-precise, which matters when you're mapping Go types onto graph values.

The tricky part isn't the WebSocket handshake. It's the response handling. TinkerPop responses can be partial: a single query might come back in multiple response chunks, each with its own requestId correlating it to the original request. You need a multiplexer — a way to keep multiple in-flight requests alive, matching incoming chunks to the right caller.

In Go, this maps naturally onto goroutines and channels. Each outbound request gets a resultChannel chan []byte. The reader goroutine fans incoming frames out to the right channel by requestId. The caller blocks on its channel until it receives all the chunks, then deserializes.

type connection struct {
    conn     *websocket.Conn
    pending  map[uuid.UUID]chan []byte
    mu       sync.Mutex
}

Connection Pooling

A single WebSocket connection is a bottleneck if you're running concurrent queries. The driver needed a connection pool — a fixed set of long-lived connections, with requests distributed across them.

The interesting design question in Go is what primitive to reach for. sync.Pool is tempting but wrong: it's designed for short-lived reusable objects (like buffers) and makes no guarantees about when pooled items are discarded — actively harmful for connections you need to keep alive and healthy. The right structure is a bounded set of connections with a proper checkout/checkin mechanism, where callers block (up to a timeout) when all connections are busy rather than spawning unbounded new ones.

The Windows Problem

One of the more painful moments: CI was green on Linux, failing on Windows. The culprit was line endings. Windows uses CRLF (\r\n); Linux uses LF (\n). Test fixtures and generated files that looked fine locally were arriving at the Windows runner with mangled line endings, causing string comparisons to silently fail.

This was also the first time I set up Docker-based CI properly — spinning up a Gremlin Server container as part of the test pipeline so integration tests ran against a real server, not a mock. Getting that working across Linux and Windows runners, with consistent line ending handling, was more yak-shaving than expected. But it's the right shape for a driver: test against the actual thing.

Writing the Serializer

GraphBinary is TinkerPop's binary serialization format — more compact than GraphSON and more type-precise. Writing the Go serializer meant mapping every GraphBinary wire type to its closest Go equivalent, and getting the edge cases right.

Go's type system doesn't map cleanly onto graph types. An int in Go is platform-width, but GraphBinary distinguishes Int32 from Int64 — send the wrong one and the server rejects it or silently coerces it wrong. float32 vs float64, UUID encoding, nullable vs zero-value — every type has a decision. The serializer had to be explicit about all of them.

Performance mattered too. Serialization sits on every read and write path. The implementation leans on bytes.Buffer and careful allocation — avoid unnecessary copies, don't allocate where you can reuse.

Getting Merged

The Apache Software Foundation has a formal contribution process. JIRA ticket, mailing list discussion, code review from existing committers. The review was thorough — TinkerPop committers know their codebase and held the Go driver to the same standards as the Java reference implementation.

The back-and-forth took a few months. The serializer's type mapping was the sharpest area of review — exact wire compatibility with the Java reference implementation had to be demonstrated for every supported type. Test coverage gaps, API surface questions. But it shipped. The Go driver is part of TinkerPop 3.6.

Getting that first ASF commit merged felt genuinely meaningful. Open source at that scale — software that runs under every major graph database — is a different thing than a personal project. Someone you'll never meet will depend on something you wrote. That's worth caring about.