Part 2: Blockchain Analytics is Tricky at Scale

By Coinbase Special Investigations Team

In our last post we walked through the basics of blockchain analytics and attribution. In this follow-up post, we will demonstrate how powerful blockchain analytics is and how tricky it can get at scale. We’ll start with reviewing some of the common blockchain analytics scaling methods used in fortifying Compliance programs as well as bolstering sanctions controls.

1. Commonspend

Blockchain analytics software relies on detecting patterns of certain address activities, known as heuristics. The primary heuristic applied to all UTXO blockchains (Unspent Transaction Output, like Bitcoin, Litecoin and their forks) is the commonspend heuristic.

It works as follows: take the following address 1P354Tw8VaSteYph84ext3f4fAYnSJQGuZ, as seen in this Youtube video involving a deposit to LocalBitcoins. So, we know this address belongs to LocalBitcoins and is an individual’s deposit address.

In this transaction we see that our LocalBitcoins address appears as one of the inputs:

Since we know that 1P354Tw8VaSteYph84ext3f4fAYnSJQGuZ belongs to LocalBitcoins and because we know that in order for this address and others to be spending funds together in the same transaction hash (i.e. inputs), the sender must have all of the private keys to each input address. We therefore can reason that all input addresses in this transaction belong to LocalBitcoins. Thus all input addresses belonging to Local Bitcoins can be clustered together.

Some block explorers automatically apply the commonspend heuristic to their analysis. For example, if you take a look at our original address in CryptoID or WalletExplorer, you’ll see that it belongs to a cluster of 990k+ addresses.

This heuristic remains a cornerstone of blockchain analytics. In fact, the most popular blockchain analytics tools already apply the commonspend heuristic to all Bitcoin addresses before they even know what the attributions for the addresses are.

But heuristics, even as straightforward as commonspend, can’t always be trusted.

2. Commonspend isn’t always common

So when does the common spend heuristic not apply? Consider this transaction:

The above transaction has multiple inputs and also multiple outputs. This is a more complex type of a transaction, referred to as coinjoin. Several users who don’t necessarily know each other might decide to participate together in a coinjoin transaction, pooling all their funds together. This is often done through dedicated privacy software such as Samourai or Wasabi wallets.

Coinjoin above leads to obfuscation of funds through seemingly random output addresses. It also renders any commonspend-based analysis ineffective, even though each party that participated in the coinjoin still gets out the same amount of Bitcoin that they originally put in (minus the fee paid to the service). Demixing such transactions is difficult (but not always impossible), and it is just one example of defeating commonspend.

3. Bringing it all together

Now that we’ve learned about ground truth, evidence quality, deconflictions, misattributions, and what commonspend is, let’s walk through how it comes together in identifying addresses belonging to illicit entities, like those 25k we discussed in our previous blog post.

The Office of Foreign Assets Control (OFAC) — a regulatory agency in the US responsible for sanctions enforcement — published a notice designating about 100 addresses, as well as entities they belong to. So, how did we go from under a hundred to over 25 thousand addresses?

3E7YbpXuhh3CWFks1jmvWoV8y5DvsfzE6 was one of the addresses designated by OFAC as belonging to Chatex — Russian Telegram bot that allows users to exchange crypto:

An official government website is a pretty reliable source of information, giving us confidence in the evidence quality. Now we need to assess each address to identify whether it’s a part of a larger group of addresses (e.g. a cluster) controlled by an entity. Using commonspend heuristic, we can associate 3E7YbpX…vsfzE6 address with a group of over 25k addresses. You too can verify this using a public block explorer, such as CryptoID:

After some additional checks we confirmed that all of these addresses belong to Chatex. And since the entity was sanctioned by OFAC, we are required to block respective transactions. It is worth noting that our list of blocked addresses is significantly larger. It includes other sanctioned entities as well as designated individuals. We also engage in proactive work to identify sanctioned activity originating from various jurisdictions, including Russia. But that’s a subject for another blogspot…


Part 2: Blockchain Analytics is Tricky at Scale was originally published in The Coinbase Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.