Part 1: Blockchain Analytics is More of an Art Than Science

By Coinbase Special Investigations Team

Intro

Bitcoin and many other cryptocurrencies are often referred to as pseudonymous. Everyone can view records on a public ledger, but not necessarily know who’s behind each address or transaction. But what does pseudonymity look like in practice? How are cryptocurrencies tracked? And can you really unmask someone on the blockchain? Let’s find out.

The public nature of blockchains allows for a certain degree of predictive analysis, enabling researchers to associate addresses and transactions with entities and sometimes individuals. Anybody can look at blockchain, but what makes a difference is the accurate interpretation of this public data, as well as corroborating it with other types of information gathered externally. Once combined such data can be used for blockchain analytics.

Blockchain analytics is widely used for market intelligence, trend analysis, and investigations, among many emerging spaces. The main objective of blockchain analytics is attribution — linking specific assets and events to particular entities or even individuals.

Attributing ownership, however, is often nuanced because outside observers can only infer it depending on factors such as availability and quality of the evidence. Evidence means proof that indeed an address belongs to an individual or entity. Unless you own an address yourself, it is very difficult to say with absolute certainty who an address is owned by. This is why it’s more fitting to consider blockchain analytics more of an art than science.

Let’s understand the basics of blockchain analytics and learn why attribution is often more complicated than it looks.

Attribution Basics

Can you tell what entity this address belongs to:

1JxXMEbYX6juuEK7QPe6CxGXywQ91ZB5mZ?

Is it an exchange? Is it a darknet market? Or maybe a private (otherwise known as an unhosted) wallet? To answer this question we need to dig for some ground truth.

1. Ground Truth Evidence

A search for truth often starts with plain googling or crowd-sourced sites like BitcoinAbuse.com:

Websites like BitcoinAbuse.com can be used by anyone to anonymously report BTC addresses linked to suspicious activity. Sadly, the reliability of such information can be very low. According to Blockchain.com, our address of interest received over 767 BTC. WalletExplorer.com implies this address is linked to a large offshore cryptocurrency exchange, which is corroborated by commercial blockchain analytics tools.

Indeed, commercial blockchain analytics tools identify this address as belonging to a large offshore cryptocurrency exchange.

So what about the nature of the activity? Is the exchange user involved in ransomware?

Further research connects this address to an exchanger called Coinguru.pw:

Coinguru allows users to swap between various cryptocurrencies, providing nothing more than an email address.

At this point you’re probably asking yourself: so who does this address belong to?

  • the BitcoinAbuse crowd-reported ransomware operator?
  • A large offshore cryptocurrency exchange?
  • Coinguru?
  • …all of the above?!

Well, the answer is complicated.

We have first-hand evidence of 1JxXMEbYX6juuEK7QPe6CxGXywQ91ZB5mZ being used by Coinguru, an exchange service operating an account on a large offshore cryptocurrency exchange. Exchangers like Coinguru often use bigger platforms’ infrastructure to reduce costs and get access to liquidity. We refer to these as nested services. These also cater to users who might not want to go to the trouble of creating their own accounts on an exchange. In fact, some nefarious actors may use these services to cash out of illicit funds.

For labeling purposes, it would suffice to say this is an exchange-owned address. If a regulator or a law enforcement agency investigating ransomware related transactions decides to enquire about the details, the cryptocurrency exchange will refer them to Coinguru who would be best positioned to provide further information on specific transactions.

2. Evidence quality and standard of proof

Evidence can vary in quality and blockchain analytics is no exception. Sometimes you might stumble upon a “smoking gun”, but it’s more likely you will need to spend time corroborating incomplete, circumstantial, fragmented or straight out misleading evidence. Nevertheless, even the weakest evidence can hint on a particular activity or entity behind it.

As we’ve already witnessed, crowd-reported sources such as BitcoinAbuse stand on the bottom of the reliability ladder. Not that they should be fully discounted, but evidence leading to attribution of crypto addresses is best gathered directly from the source. In the case of exchange services, the source would be their website displaying a deposit address.

The ultimate attribution comes from the ability to interact with the service, earning such evidence the highest confidence score. However, this is often prohibited, especially when investigating activities such as terror funding (TF). In cases like these, research shifts into the world of open source intelligence (OSINT). Much can be learned from aggregator websites, online forums, chat groups, mobile communication platforms, hidden domains on the Tor network and information scraping in an automated fashion by third party vendors. But even the best evidence is not helpful without proper investigative tools.

3. Deconflicting misattribution

Blockchain investigation tools include blockchain analytics software, private and open source databases, search engines, etc. The best investigative practice is to combine a mix of these tools, including commercially available software, and corroborate evidence using independent sources. Sometimes, however, those sources can offer conflicting information.

For instance, consider this address: 1N9SxKeNvFoBFuFKEDU8yFCwPwoeHqgmhu.

Imagine an investigator receiving intelligence linking this address to the sale of Child Sexual Abuse Material (CSAM). Attribution of this address will vary depending on which blockchain analytics tool you consult: some don’t have it labeled at all, while others attribute it to a merchant service. Open source research confirms this particular service allowed users to upload files and sell them for various cryptocurrencies. Addresses like the one above were generated for every user and were all connected to different types of activity, depending on what an individual user was buying.

While some uploads to this merchant service have been benign, some were identified as illicit, according to the Internet Watch Foundation (IWF), a non-profit combating the distribution of CSAM. Reportedly, the same merchant service was also used for ransomware decryptor key uploads. So, can the address of interest belong both to an illicit vendor and to the merchant service? Yes.

The correct way to attribute this service in a blockchain analytics tool would be to take all of the known addresses associated with the service and label them accordingly. Then, as a result of investigating individual addresses and their related activities, specific labels should be applied in accordance with documented findings. Labeling the whole service as illicit would be a misattribution. It can negatively impact tools and services that rely on blockchain analytics data, such as transaction monitoring systems or law enforcement subpoenas, leading to increased false positive alerts and erroneous leads.

4. The unknown unknowns

Back in October 2019, a medium article was published with a flashy title — “Huge Ethereum Mixer”. A Russian data scientist analyzed ETH flows between February and September 2017 claiming that “…68% of total Ethereum transaction value [is] controlled by one system… Funds come and leave within one hour, and addresses are never used again.” The researcher spent a great deal of effort analyzing the behavior of the “mixer”, its transaction patterns, and share of total transactions across Ethereum over time. At the center of the article was this diagram:

Notice how most large exchanges at the time are present: Kraken, Poloniex, Bitfinex, etc. Can you guess which one(s) are missing?

Hopefully, at this point it’s fairly evident that an external observer cannot possibly gain a full picture or claim 100% confidence in attribution. Keep in mind, when it comes to blockchain, everyone is an external observer, with the exception of addresses you control.

Stay tuned for the second part, where we’ll dive deeper into examples of how blockchain analytics can both enlighten and confuse.


Part 1: Blockchain Analytics is More of an Art Than Science was originally published in The Coinbase Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.