Programming languages prevent mainstream DeFi

November 25, 2022 Coin Telegraph

Coding

Asset-oriented programming makes fundamental functions native to the programming language. DeFi needs more of that to improve security.

Decentralized finance (DeFi) is growing fast. Total value locked, a measure of money managed by DeFi protocols, has grown from $10 billion to a little more than $40 billion over the last two years after peaking at $180 billion.

Total value locked in DeFi as of Nov. 2022. Source: DefiLlama

The elephant in the room? More than $10 billion was lost to hacks and exploits in 2021 alone. Feeding that elephant: Today’s smart contract programming languages fail to provide adequate features to create and manage assets — also known as “tokens.” For DeFi to become mainstream, programming languages must provide asset-oriented features to make DeFi smart contract development more secure and intuitive.

Current DeFi programming languages have no concept of assets

Solutions that could help reduce DeFi’s perennial hacks include auditing code. To an extent, audits work. Of the 10 largest DeFi hacks in history (give or take), nine of the projects weren’t audited. But throwing more resources at the problem is like putting more engines in a car with square wheels: it can go a bit faster, but there is a fundamental problem at play.

The problem: Programming languages used for DeFi today, such as Solidity, have no concept of what an asset is. Assets such as tokens and nonfungible tokens (NFTs) exist only as a variable (numbers that can change) in a smart contract such as with Ethereum’s ERC-20. The protections and validations that define how the variable should behave, e.g., that it shouldn’t be spent twice, it shouldn’t be drained by an unauthorized user, that transfers should always balance and net to zero — all need to be implemented by the developer from scratch, for every single smart contract.

As smart contracts get more complex, so too are the required protections and validations. People are human. Mistakes happen. Bugs happen. Money gets lost.

A case in point: Compound, one of the most blue-chip of DeFi protocols, was exploited to the tune of $80 million in September 2021. Why? The smart contract contained a “>” instead of a “>=.”

The knock-on effect

For smart contracts to interact with one another, such as a user swapping a token with a different one, messages are sent to each of the smart contracts to update their list of internal variables.

The result is a complex balancing act. Ensuring that all interactions with the smart contract are handled correctly falls entirely on the DeFi developer. Since there are no innate guardrails built into Solidity and the Ethereum Virtual Machine (EVM), DeFi developers must design and implement all the required protections and validations themselves.

So DeFi developers spend nearly all their time making sure their code is secure. And double-checking it — and triple checking it — to the extent that some developers report that they spend up to 90% of their time on validations and testing and only 10% of their time building features and functionality.

With the majority of developer time spent battling unsecure code, compounded with a shortage of developers, how has DeFi grown so quickly? Apparently, there is demand for self-sovereign, permissionless and automated forms of programmable money, despite the challenges and risks of providing it today. Now, imagine how much innovation could be unleashed if DeFi developers could focus their productivity on features and not failures. The kind of innovation that might allow a fledgling $46 billion industry to disrupt an industry as large as, well, the $468 trillion of global finance.

Total assets of global financial institutions from 2002 to 2020. Source: Statista

Innovation and safety

The key to DeFi being both innovative and safe stems from the same source: Give developers an easy way to create and interact with assets and make assets and their intuitive behavior a native feature. Any asset created should always behave predictably and in line with common sense financial principles.

In the asset-oriented programming paradigm, creating an asset is as easy as calling a native function. The platform knows what an asset is: .initial_supply_fungible(1000) creates a fungible token with a fixed supply of 1000 (beyond supply, many more token configuration options are available as well) while functions such as .take and .put take tokens from somewhere and put them elsewhere.

Instead of developers writing complex logic instructing smart contracts to update lists of variables with all the error-checking that entails, in asset-oriented programming, operations that anyone would intuitively expect as fundamental to DeFi are native functions of the language. Tokens can’t be lost or drained because asset-oriented programming guarantees they can’t.

This is how you get both innovation and safety in DeFi. And this is how you change the perception of the mainstream public from one where DeFi is the wild west to one where DeFi is where you have to put your savings, as otherwise, you’re losing out.

Ben Far is head of partnerships at RDX Works, the core developer of the Radix protocol. Prior to RDX Works, he held managerial positions at PwC and Deloitte, where he served clients on matters relating to the governance, audit, risk management and regulation of financial technology. He holds a bachelor of arts in geography and economics and a master’s degree in mapping software and analytics from the University of Leeds.

The author, who disclosed his identity to Cointelegraph, used a pseudonym for this article. This article is for general information purposes and is not intended to be and should not be taken as legal or investment advice. The views, thoughts, and opinions expressed here are the author’s alone and do not necessarily reflect or represent the views and opinions of Cointelegraph.

DEX accidentally hits ‘kill-switch’ on mainnet, locking 660,000 USDC inside

August 30, 2022 Coin Telegraph

Coin Telegraph

The deployment of a program upgrade went terribly wrong as a fateful “Solana program close” command stopped OptiFi's platform indefinitely.

A decentralized cryptocurrency (DEX) options exchange cut its own life short after unwittingly executing a command that closed its mainnet program and made it irrecoverable.

OptiFi informed users that its platform had come to an unceremonious end after its development team tried to update its code on Aug. 29. According to the portfolio margining derivative DEX, the program incident also locked up some 660,000 USD Coin (USDC) on-chain.

OptiFi's program has been closed by mistakes we made.

TL;DR

1. We accidentally closed the OptiFi mainnet program and it's not recoverable
2. 661k USDC is locked in the PDAs, luckily 95% of the fund is from our team member
3. We will compensate for all users’ funds
— OptiFi (@OptifiLabs) August 29, 2022

OptiFi has pledged to compensate user funds lost by the error, while a large bulk of the locked-up USDC was reportedly vested by one of its team members. The company has also urged other developers working on the Solana blockchain to be wary of the ramifications of the ‘Solana program close’ command.

The platform unpacked the series of events that led to the sudden closure of its mainnet in a Medium post, which began with an attempt to deploy an update to its Solana program code on the mainnet.

Due to what the team described as bad network status, the deployment took longer than usual, and the command was canceled. However, a buffer address was created that received Solana (SOL) tokens that the team wanted to recover.

In the past, the team had managed to recover SOL tokens from buffer accounts without using memory phrases by closing the program. The approach initially looked to have worked after executing the command as the team recovered the SOL, allowing an attempt to deploy the program a second time.

An error message was returned indicating that the program had been closed and could not be re-deployed unless a new program id was used. Discussions with a Solana core developer confirmed the team's fears that it would not be able to redeploy the program with its previous id.

“Here it turned out that we didn’t really understand the impact and risk of this closing program command line. ‘solana program close’ is actually for closing the program permanently and sending the SOL tokens in the buffer account used by the program back to the recipient wallet.”

The OptiFi team has called for the Solana development community to explore two-step confirmation when running the ‘Solana program close’ function and caution users of the results of using the command.

How Coinbase interviews for engineering roles

September 8, 2021 Coinbase

coinbase

Coinbase is on a mission to increase economic freedom in the world. In order to achieve this mission, we need to build a high performing team. Our top priority is attracting and retaining great talent. and we take extraordinary measures to have exceptional people in every seat. This post is designed to give candidates a sneak preview of what we look for in our technical interview process.

In this post we’ll focus on what we look out for in an engineering candidate. Some of this advice will apply to other positions too, but it’s most useful if you want to join us as an engineer.

When joining the interview process you’ll progress through a series of stages. In each stage we’ll assess you in different ways to ensure the role you’re interviewing for is a good mutual fit. While the exercises and questions you face will vary, we always look out for the Coinbase cultural tenets:

Clear communication
Efficient execution
Act like an owner
Top talent
Championship team
Customer focus
Repeatable innovation
Positive energy
Continuous learning

Learn more about these tenets here. You may not get an opportunity to display all of these qualities at every interview stage, but this will give you a good idea of what we are looking for. When we assess your performance we will do so almost exclusively through the lens of these tenets.

The interview stages are (typically but not always):

an initial chat with someone from HR about the role
one 60 minute pair programming problem
one or two 60 minute engineering manager interviews
one or two 60 or 90 minute pair programming interviews
one or two 60 minute system design interviews

You will need to perform well at all stages to get an offer, so it’s important to prepare for each interview stage. That said, most people that are unsuccessful in their Coinbase interview loop fail on the pair programming stages. Let’s start there.

Pair Programming

In the pair programming interview(s) you will work through a problem with one of our engineers. To start, your interviewer will provide you with a short brief of the problem. Now it’s up to you to code a solution to the problem.

It’s not enough to solve the problem to pass this stage. We are not looking for a minimal Leetcode-style optimal solution. We are looking for evidence that you are able to produce production-grade code. As a result, we assess both the end result and how you got to the result, giving credit for both components. If you get stuck on a bug, how do you overcome it? Do you know your tooling well? Do you use the debugger with a breakpoint, or do you change random lines of code until it works? Is there a method to how you approach a coding problem?

We will look beyond the completeness and correctness of your solution. We will assess the quality and maintainability of your code, too. Is your code idiomatic for your chosen language? Is it easy to read through and understand? What about variable naming? Do you leverage the tooling that is available to you in your IDE and terminal? How can we be confident that your code is correct? Did you test it?

How well do you understand the problem? Do you ask relevant clarifying questions? How well do you take the interviewer’s feedback?

Don’t be discouraged if you do not reach the very end of the problem. We design our interview problems to fill more than the allotted time. The interviewer will stop the interview after either 90 minutes have passed, or when they are confident in their assessment. Ending an interview early is not always a bad sign.

Most candidates who fail the interview do so because their code or process isn’t good enough. We do not typically fail you for an incomplete solution.

Let’s look at a practical example. Suppose the problem is:

Given a list that contains all integers from 1 to n — 1. The list is not sorted. One integer occurs twice in this list. Write a function that determines which.

Here’s the first example solution (there are errors!):

def duplicates(integers):

"""duplicates takes a list of integers and returns the first duplicate value or None if all values are unique"""

 if not isinstance(integers, list):

  raise ArgumentError(“expected list as input”)

sorted_integers = integers.sort()

previous_value = nil

for x in sorted_integers:

 if x == previous_value:

  return x

 previous_value = x

 return None

def test_duplicates():

 assert duplicates([]) == None, "empty array is considered unique"

 assert duplicates([1, 2]) == None , "array of unique values returns None"

 assert duplicates([1, 2, 2]) == 2, "duplicate returns the duplicate integer"

 assert duplicates([1, 2, 2, 1]) == 2, "multiple duplicates returns the first duplicate"

And the second solution (there are errors here, too!):

def dupilcateIntegers(l ):

 n = len(l)

 return sum(l) - ((len(l)+1) * len(l))/2

The first solution doesn’t actually solve the problem. But the author seems to have thought about error handling and input validation. The candidate has thought about the problem and its edge cases. Furthermore the solution attempts to also solve for a larger and more general class of the same problem. They’ve added a short docstring and the code is generally well-formatted and idiomatic python. We would be inclined to consider the first solution a pass. There’s a bug in the implementation and the algorithm is not optimal yet the code is maintainable and generally well structured (with tests too!). This solution is good.

The second solution is correct and optimal, yet this candidate would not pass the interview. The formatting is sloppy and there are spelling mistakes, and unused variables. The code itself is terse and difficult to understand. This candidate would probably be rejected.

Finally, also keep in mind that you have only 90 minutes to complete the problem. Our problems don’t really have any tricks in them, and the naive solution is typically good enough. We won’t ask you to invert a binary tree, but we will ask you to solve a simplified version of a real life problem. We’re looking for production-like code, not hacky solutions.

So how would you best prepare for the pair programming interview with us? Don’t focus too much on grinding Leetcode. It’s better to focus on the fundamentals. Learn your editor, your debugger, and your language. Practice writing well formatted and well structured code with relevant method and variable names, good state management and error handling.

System Design

In our system design interview you will be asked to design the general architecture of a real-world service. For example: How would you design a Twitter feed?

The brief is typically short, and it’s up to you to ask the interviewer for clarifications around the requirements.

Don’t dive too deeply into any one specific aspect of the design (unless asked by the interviewer). It’s better to keep it general and give a specific example of a technology you know well, that would be a good fit for the use case at hand. Example: “For this service an RDBMs database would be a good choice, because we don’t know exactly what the queries will look like in advance. I would choose MariaDB.”

Be sure to address the entire problem, and if you’re unsure if you’ve covered everything ask the interviewer to clarify, or if there’s anything they’d like you to expand upon.

If you are unsure about the specifics of a particular component in your design, it’s best to try to let your interviewer know and to tell them how you would go about finding the answer. Don’t wing it — being wrong with confidence is a negative signal, whereas humility is a positive signal. A good answer might be: “I don’t know if the query pattern and other requirements fit well with an SQL database here, but I have the most experience with MariaDB so it would be my default choice. However, before making a decision I would have to research what its performance might look like in this specific case. I’d also research some NoSQL alternatives like MongoDB and perhaps also a column wide store like Cassandra.”

You’ll be assessed on your ability to explore the requirements, and how well your design might perform in real life. Do you take scalability into account? How about error handling and recovery? Do you design for change? Have you thought about observability? You’ll also be assessed on how well you communicate your design and thoughts to the interviewer.

General Tips

During our interview process, we look for signals that help us understand whether there is a skill match but more importantly a cultural fit. Some of the signals we look for:

Be honest — Honesty always pays. If you’ve seen the question before, best to let your interviewer know so that an alternate question can be discussed. Similarly, exaggerating current scope/responsibilities is considered a red flag.
Speak your mind — Even if the question might seem difficult or you need time to think, vocalize your thoughts so that the interviewer can help you along. It’s not as important to get the right answer as it is to have a reasonable thought process.
Understand before responding — It’s best to listen and understand the question before responding. If you’re unsure, ask to clarify or state your assumptions. We aren’t looking for a quick response but always appreciate a thoughtful response. If the question isn’t clear the first time, feel free to request the interviewer to repeat it.
Good Setup — Being a remote first company, our interviews are virtual on Google Meet. Take the meeting at a place where the internet connection and your audio/video is good. It’s better to reschedule in advance if your setup isn’t tip-top. Finally, test your microphone and camera an hour before joining the call. We keep our cameras on the entire interview and expect you to do the same.
Be Prepared — We advise that you go through the links your recruiter shares with you as part of the interview invite. They contain information about how everyone at Coinbase operates and what they value.
Ask what’s on your mind — Our panelists always leave time for the candidates to ask questions.Take that opportunity to ask questions that would help you decide your Coinbase journey rather than asking generic questions (most of which are answered on our blog). You will interview with engineers and managers so tailor your questions to the unique perspectives offered by each role.
Crypto or Industry knowledge — Unless you are specifically interviewing for a role that requires deep crypto/blockchain knowledge (your recruiter would be able to share this with you), we aren’t looking for this knowledge as a mandatory skill. As long as you are willing to learn, we want to talk to you — even if you are entirely new to crypto. We all were new at one point too!

Thanks for taking the time to learn a little bit more about our interview process. If you are interested in building the future of finance, have a look at our open roles here. Good luck!

How Coinbase interviews for engineering roles was originally published in The Coinbase Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

How we scaled data streaming at Coinbase using AWS MSK

August 24, 2021 Coinbase

coinbase

By: Dan Moore, Eric Sun, LV Lu, Xinyu Liu

Tl;dr: Coinbase is leveraging AWS’ Managed Streaming for Kafka (MSK) for ultra low latency, seamless service-to-service communication, data ETLs, and database Change Data Capture (CDC). Engineers from our Data Platform team will further present this work at AWS’ November 2021 Re:Invent conference.

Abstract

At Coinbase, we ingest billions of events daily from user, application, and crypto sources across our products. Clickstream data is collected via web and mobile clients and ingested into Kafka using a home-grown Ruby and Golang SDK. In addition, Change Data Capture (CDC) streams from a variety of databases are powered via Kafka Connect. One major consumer of these Kafka messages is our data ETL pipeline, which transmits data to our data warehouse (Snowflake) for further analysis by our Data Science and Data Analyst teams. Moreover, internal services across the company (like our Prime Brokerage and real time Inventory Drift products) rely on our Kafka cluster for running mission-critical, low-latency (sub 10 msec) applications.

With AWS-managed Kafka (MSK), our team has mitigated the day-to-day Kafka operational overhead of broker maintenance and recovery, allowing us to concentrate our engineering time on core business demands. We have found scaling up/out Kafka clusters and upgrading brokers to the latest Kafka version simple and safe with MSK. This post outlines our core architecture and the complete tooling ecosystem we’ve developed around MSK.

Configuration and Benefits of MSK

Config:

TLS authenticated cluster
30 broker nodes across multiple AZs to protect against full AZ outage
Multi-cluster support
~17TB storage/broker
99.9% monthly uptime SLA from AWS

Benefits:

Since MSK is AWS managed, one of the biggest benefits is that we’re able to avoid having internal engineers actively maintain ZooKeeper / broker nodes. This has saved us 100+ hours of engineering work as AWS handles all broker security patch updates, node recovery, and Kafka version upgrades in a seamless manner. All broker updates are done in a rolling fashion (one broker node is updated at a time), so no user read/write operations are impacted.

Moreover, MSK offers flexible networking configurations. Our cluster has tight security group ingress rules around which services can communicate directly with ZooKeeper or MSK broker node ports. Integration with Terraform allows for seamless broker addition, disk space increases, configuration updates to our cluster without any downtime.

Finally, AWS has offered excellent MSK Enterprise support, meeting with us on several occasions to answer thorny networking and cluster auth questions.

Performance:

We reduced our end-to-end (e2e) latency (time taken to produce, store, and consume an event) by ~95% when switching from Kinesis (~200 msec e2e latency) to Kafka (<10msec e2e latency). Our Kafka stack’s p50 e2e latency for payloads up to 100KB averages <10 msec (in-line with LinkedIn as a benchmark, the company originally behind Kafka). This opens doors for ultra low latency applications like our Prime Brokerage service. Full latency breakdown from stress tests on our prod cluster, by payload size, presented below:

Proprietary Kafka Security Service (KSS)

What is it?

Our Kafka Security Service (KSS) houses all topic Access Control Lists (ACLs). On deploy, it automatically syncs all topic read/write ACL changes with MSK’s ZooKeeper nodes; effectively, this is how we’re able to control read/write access to individual Kafka topics at the service level.

KSS also signs Certificate Signing Requests (CSRs) using the AWS ACM API. To do this, we leverage our internal Service-to-Service authentication (S2S) framework, which gives us a trustworthy service_id from the client; We then use that service_id and add it as the Distinguished Name in the signed certificate we return to the user.

With a signed certificate, having the Distinguished Name matching one’s service_id, MSK can easily detect via TLS auth whether a given service should be allowed to read/write from a particular topic. If the service is not allowed (according to our acl.yml file and ACLs set in ZooKeeper) to perform a given action, an error will occur on the client side and no Kafka read/write operations will occur.

Also Required

Parallel to KSS, we built a custom Kafka sidecar Docker container that: 1) Plugs simply into one’s existing docker-compose file 2) Auto-generates CSRs on bootup and calls KSS to get signed certs, and 3) Stores credentials in a Docker shared volume on user’s service, which can be used when instantiating a Kafka producer / consumer client so TLS auth can occur.

Rich Data Stream Tooling

We’ve extended our core Kafka cluster with the following powerful tools:

Kafka Connect

This is a distributed cluster of EC2 nodes (AWS autoscaling group) that performs Change Data Capture (CDC) on a variety of database systems. Currently, we’re leveraging the MongoDB, Snowflake, S3, and Postgres source/sink connectors. Many other connectors are available open-source through Confluent here

Kafdrop

We’re leveraging the open-source Kafdrop product for first-class topic/partition offset monitoring and inspecting user consumer lags: source code here

Cruise Control

This is another open-source project, which provides automatic partition rebalancing to keep our cluster load / disk space even across all broker nodes: source code here

Confluent Schema Registry

We use Confluent’s open-source Schema Registry to store versioned proto definitions (widely used along Coinbase gRPC): source code here

Internal Kafka SDK

Critical to our streaming stack is a custom Golang Kafka SDK developed internally, based on the segmentio/kafka release. The internal SDK is integrated with our Schema Registry so that proto definitions are automatically registered / updated on producer writes. Moreover, the SDK gives users the following benefits out of the box:

Consumer can automatically deserialize based on magic byte and matching SR record
Message provenance headers (such as service_id, event_time, event_type) which help conduct end-to-end audits of event stream completeness and latency metrics
These headers also accelerate message filtering and routing by avoiding the penalty of deserializing the entire payload

Streaming SDK

Beyond Kafka, we may still need to make use of other streaming solutions, including Kinesis, SNS, and SQS. We introduced a unified Streaming-SDK to address the following requirements:

Delivering a single event to multiple destinations, often described as ‘fanout’ or ‘mirroring’. For instance, sending the same message simultaneously to a Kafka topic and an SQS queue
Receiving messages from one Kafka topic, emitting new messages to another topic or even a Kinesis stream as the result of data processing
Supporting dynamic message routing, for example, messages can failover across multiple Kafka clusters or AWS regions
Offering optimized configurations for each streaming platform to minimize human mistakes, maximize throughput and performance, and alert users of misconfigurations

Upcoming

On the horizon is integration with our Delta Lake which will fuel more performant, timely data ETLs for our data analyst and data science teams. Beyond that, we have the capacity to 3x the number of broker nodes in our prod cluster (30 -> 90 nodes) as internal demand increases — that is a soft limit which can be increased via an AWS support ticket.

Takeaways

Overall, we’ve been quite pleased with AWS MSK. The automatic broker recovery during security patches, maintenance, and Kafka version upgrades along with the advanced broker / topic level monitoring metrics around disk space usage / broker CPU, have saved us hundreds of hours provisioning and maintaining broker and ZooKeeper nodes on our own. Integration with Terraform has made initial cluster configuration, deployment, and configuration updates relatively painless (use 3AZs for your cluster to make it more resilient and prevent impact from a full-AZ outage).

Performance has exceeded expectations, with sub 10msec latencies opening doors for ultra high-speed applications. Uptime of the cluster has been sound, surpassing the 99.9% SLA given by AWS. Moreover, when any security patches take place, it’s always done in a rolling broker fashion, so no read/write operations are impacted (set default topic replication factor to 3, so that min in-sync replicas is 2 even with node failure).

We’ve found building on top of MSK highly extensible having integrated Kafka Connect, Confluent Schema Registry, Kafdrop, Cruise Control, and more without issue. Ultimately, MSK has been beneficial for both our engineers maintaining the system (less overhead maintaining nodes) and unlocking our internal users and services with the power of ultra-low latency data streaming.

If you’re excited about designing and building highly-scalable data platform systems or working with cutting-edge blockchain data sets (data science, data analytics, ML), come join us on our mission building the world’s open financial system: careers page.

How we scaled data streaming at Coinbase using AWS MSK was originally published in The Coinbase Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Binance Smart Chain Creates a $10 Million Bug Bounty Fund to Tighten Protocol Security

July 26, 2021 Bitcoin.com

Production Threshold Signing Service

May 4, 2021 Coinbase

coinbase

By Anika Raghuvanshi, Software Engineer on the Crypto Engineering Team

When generating keys to secure customer funds, we take many precautions to ensure keys cannot be stolen. Cryptocurrency wallets are associated with two keys: a secret or private key, known only to the wallet holder, and a public key, known to the world.¹ To send funds from a wallet, the wallet owner produces a valid digital signature, which requires signing a message (a transaction in this case) with their private key. If a malicious party gains access to the private key, they could steal the wallet’s funds. For most customers, Coinbase has the responsibility of protecting private keys to make sure funds remain secure and out of reach of attackers.

Reusable Keys

During key generation, we segment private keys into shares using Shamir Secret Sharing (SSS) and delete the full key for extra security. Each share is held by a different party, and no individual party has full access to the private key. For a long time, there was one way to create a signature: reassemble private key shares to sign a transaction. Therefore, for our wallets to maintain the highest level of security, we only used an address once. If 1 BTC needed to be withdrawn from a key that stores 100 BTC, the remaining 99 BTC would be sent to a new private key during the withdrawal to ensure that we were not storing funds at a potentially vulnerable address.

However, one-time-use addresses have limits. In addition to the overhead required with continuing to generate keys, an even stronger need for securely reusing keys came along with cryptocurrency staking. Staking generally requires multiple uses of a single long-term address. We needed a way to generate valid digital signatures without reconstructing the private key.

Multiparty computation (MPC) saved the day. MPC protocols allow multiple parties to compute a function together, revealing no other information besides the output. Threshold signing, a specific use of MPC, permits individual parties to collaborate and produce a digital signature without reconstituting the original, composite private key. In practice, this means that rather than parties uploading their private key shares, they individually sign a transaction with their key share and upload a partial signature. These partial signatures² are combined to create the valid signature, which is published to the blockchain.³ Key shares are never uploaded by parties nor combined, therefore maintaining the highest security while allowing keys to be reused.

Threshold Signing Service

We applied MPC to create Threshold Signing Service (TSS). Different cryptocurrency assets use different digital signatures algorithms. We will focus the rest of this article on the TSS protocol for Ed25519 signatures. There are five phases for this protocol:

Party Key Generation. Creates long-term public and private keys in a trusted environment for parties who will participate in signing. Each party’s private keys are loaded onto Hardware Security Modules (HSMs), which prevent anyone from using the private keys without physical access to the HSM.
Key Generation. Creates a set of TSS keys and divides the keys using SSS. Uses the public keys produced in party key generation to encrypt each signing key share to the party who will receive it.
Nonce Generation. Round 1 of 2 of the signing protocol. Participants in this round generate nonce values and send them to all other parties.
Partial Sign. Round 2 of 2 of the signing protocol. Participants use the nonce shares received from other parties and their signing key share to generate partial signatures.
Generate Final Signature. Combine partial signatures into the final result.

The first two phases occur rarely (once in the lifetime of a signing key). The final three phases repeat every time a transaction, which we call a message, is signed. The next section takes a technical deep dive into the signing phases of the protocol.Ed25519 Signatures

The method for generating a digital signature for an Ed25519 key is as follows: Ed25519 is the EdDSA signature scheme that is parameterized to SHA-512 and Curve25519. For elliptic curves, G is the base point and q is the base point order. Given a message m to be signed and a private key k, a signature is produced as follows:

Our threshold signing protocol is an adaptation of the threshold Schnorr signature scheme by Gennaro, Jarecki, Krawczyk, and Rabin.

In the protocol, participants generate both the nonce r and signature s in a distributed fashion without reconstituting the underlying private key. In Round 1, participants produce and distribute nonce shares rᵢ. In Round 2, participants compute the composite nonce r from the nonce shares rᵢ and produce partial signatures sᵢ, which the server combines to produce the composite signature s. The final signature is identical to the signature which would be produced by combining secret shares and signing the original message with the composite private key.

Nonce Generation

In Round 1, participants use the message m and the key share kᵢ to do the following

After t participants have completed the nonce generation, signing begins.

Nonce Aggregation

After t nonces have been posted, participants perform Round 2 by aggregating the nonce shares to derive the composite nonce. Each participant i performs:

Partial Signatures

Participants create partial signatures⁴, which can be combined to generate the signature s.

Each participant i performs:

Signature Aggregation

Finally, the server aggregates components into signature that can be verified with public key K.

Below is an example that combines two partial signatures without Shamir sharing:

As long as the challenge, c, is the same for both signatures, the nonce and private key shares behave linearly under addition. Due to this property, we can apply the standard Shamir’s reconstruction to the sᵢ values to construct s:

This result, along with nonce public key R, is a valid signature. The server verifies the signature (R, s) using message m and public key K and checks the nonce value R has not been previously used.

From Concept to Production

Deploying a production-level system involves solving for certain real-world problems. For example, getting attention from human participants can take a long time (on the order of hours). Since the protocol supports human participants, we run the risk of delaying transaction approval for too long. For example, the cryptocurrency Algorand has a short validation time: just a few hours. If we cannot compute a signature over a transaction in this time frame, partial progress must be discarded. Two of our design decisions help reduce the burden for humans approving transactions: 1. Rounds are asynchronous, meaning participants do not have to participate at the same exact time, and 2. Each round requires a threshold of parties, but both rounds do not require the same set of participants to participate.

Another challenge is the issue of storing secret information, since the devices used to store key shares and participate in the signing protocol could be lost or broken. This led to a model where parties are relatively stateless: the only state they have is a small amount of long term storage on HSMs, which are highly secure, portable, and durable. Participants in the signing protocol do not communicate with each other: a centralized server stores all artifacts, such as nonce shares and partial signatures. A natural concern to come up is what happens if a centralized server is compromised — we investigate this threat and other threat models in the following section.

Security

TSS has two primary security goals:

Without the private key, an attacker should not be able to generate a valid signature over an unauthorized message. This is known as existential unforgeability under adaptive chosen message attack and is the expected security for a digital signature scheme.
Private key privacy after the key generation ceremony. This should protect against an attacker who has access to some (less than threshold t) private key shares.

Below are some attacks we considered and defenses we created for selected threat models.

Server Trust. The centralized server transfers messages between participants.

Attack: Unauthorized access to data.
Defense: All secret data uploaded to the server must be encrypted to the intended participant. The server has no access to private keys, so cannot access data in the messages it relays.
Attack: Manipulation of data.
Defense: Participants validate every piece of data provided by the server, and halt the protocol if they detect data modification.
Attack: Data loss.
Defense: Data from key generation ceremonies are backed up through our disaster-recovery processes (not discussed in detail here) and can be recovered in the case of server data loss.

Participant Trust. Participants hold onto key shares and participate in the signing protocol.

Attack: Individual participant performing existential forgery.
Defense: Participants have the critical responsibility of validating data provided by the server. Any participant can abort the protocol and trigger incident response procedures if malicious activity is detected.
Attack: Participants colluding to perform existential forgery.
Defense: Since any single participant can halt the protocol if they detect malicious activity, the protocol requiring t participants remains secure with up to t — 1 malicious participants. An additional protection against collusion is supporting a hybrid participant model: a combination of humans and servers. We use weighted secret sharing to ensure that every signing requires participation from both human and server participants. This further increases the barrier for compromising t participants.
Attack: Nonce Reuse. A well-known attack for Schnorr-style signatures is using the same nonce multiple times on different messages. This leads to a trivial recovery of private keys, compromising our second security goal and opening the door for an immediate loss of funds.
Defense: Nonce shares encode information about the message, making it simple to detect nonce reuse upon decryption. Having a two-round protocol allows participants to authenticate nonce shares in Round 2 and abort if they detect nonce reuse.

Key Issuer Trust. The key issuer generates private keys and distributes key shares.

Attack: Exfiltrating secrets from a key generation ceremony.
Defense: The process of generating keys is a fully documented and audited process, which requires that all plaintext of keys are destroyed before the culmination of the key generation ceremony.

Through this threat modeling, we see that a malicious entity needs to compromise a threshold of participants to have any chance of stealing funds. We carefully calibrate the human and server participant thresholds with this in mind to ensure that our funds maintain the highest level of security.

Summary

TSS for signing payloads over the Ed25519 curve is an elegant and simple service: it takes as input a message payload and produces a valid signature, after a series of signers interact with the two-round protocol. This service provides a solution for reusing cold keys without bringing them online. Through translating a full cryptographic protocol to a real life service, we created TSS, the first production-level threshold signing service in existence to secure billions of dollars in assets.

Footnotes

More precisely, the public key is used to derive the wallet’s address, which is known to the public.
that are invalid in isolation
The blockchain does not have to be aware of nor support special logic for threshold signatures, since the signature created is the same signature created using the composite private key.
Using the message as associated data for encryption. This ensures that this nonce can only be used for signing this message. This defends against nonce-replay attacks that lead to private key recovery.

Read more about Cryptography at Coinbase.

If you are interested in the complexity of applying threshold cryptography, Coinbase is hiring Cryptographers and Software Engineers.

This website contains links to third-party websites or other content for information purposes only (“Third-Party Sites”). The Third-Party Sites are not under the control of Coinbase, Inc., and its affiliates (“Coinbase”), and Coinbase is not responsible for the content of any Third-Party Site, including without limitation any link contained in a Third-Party Site, or any changes or updates to a Third-Party Site. Coinbase is not responsible for webcasting or any other form of transmission received from any Third-Party Site. Coinbase is providing these links to you only as a convenience, and the inclusion of any link does not imply endorsement, approval or recommendation by Coinbase of the site or any association with its operators.

All images herein are by Coinbase.

Production Threshold Signing Service was originally published in The Coinbase Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

programming

Current DeFi programming languages have no concept of assets

The knock-on effect

Innovation and safety

Pair Programming

System Design

General Tips

Abstract

Configuration and Benefits of MSK

Config:

Benefits:

Performance:

Proprietary Kafka Security Service (KSS)

What is it?

Also Required

Rich Data Stream Tooling

Kafka Connect

Kafdrop

Cruise Control

Confluent Schema Registry

Internal Kafka SDK

Streaming SDK

Upcoming

Takeaways

Reusable Keys

Threshold Signing Service

Nonce Generation

Nonce Aggregation

Partial Signatures

Signature Aggregation

From Concept to Production

Security

Summary

Share this video