Type encoding (SCALE)
Substrate uses a lightweight and efficient encoding and decoding program to optimize how data is sent and received over the network. The program used to serialize and deserialize data is called the SCALE codec, with SCALE being an acronym for simple concatenated aggregate little-endian.
The SCALE codec is a critical component for communication between the runtime and the outer node.
It is designed for high-performance, copy-free encoding and decoding of data in resource-constrained execution environments like the Substrate WebAssembly runtime.
The SCALE codec is not self-describing in any way.
It assumes the decoding context has all type knowledge about the encoded data.
Front-end libraries maintained by Parity use the parity-scale-codec
crate—which is a Rust implementation of the SCALE codec—to encode and decode interactions between RPCs and the runtime.
SCALE codec is advantageous for Substrate and blockchain systems because:
- It is lightweight relative to generic serialization frameworks like serde, which add significant boilerplate that can bloat the size of the binary.
- It does not use Rust
libstd
making it compatible withno_std
environments that compile to Wasm, such as the Substrate runtime. - It is built to have great support in Rust for deriving codec logic for new types using:
#[derive(Encode, Decode)]
.
It's important to define the encoding scheme used in Substrate rather than reuse an existing Rust codec library because this codec needs to be re-implemented on other platforms and languages that want to support interoperability among Substrate blockchains.
The following table shows how the Rust implementation of the Parity SCALE codec encodes different types.
SCALE codec examples of different types
Type | Description | Example SCALE decoded value | SCALE encoded value |
---|---|---|---|
Fixed-width integers | Basic integers are encoded using a fixed-width little-endian (LE) format. | signed 8-bit integer 69 | 0x45 |
unsigned 16-bit integer 42 | 0x2a00 | ||
unsigned 32-bit integer 16777215 | 0xffffff00 | ||
Compact/general integers1 | A "compact" or general integer encoding is sufficient for encoding large integers (up to 2**536) and is more efficient at encoding most values than the fixed-width version. (Though for single-byte values, the fixed-width integer is never worse.) | unsigned integer 0 | 0x00 |
unsigned integer 1 | 0x04 | ||
unsigned integer 42 | 0xa8 | ||
unsigned integer 69 | 0x1501 | ||
unsigned integer 65535 | 0xfeff0300 | ||
BigInt(100000000000000) | 0x0b00407a10f35a | ||
Boolean | Boolean values are encoded using the least significant bit of a single byte. | false | 0x00 |
true | 0x01 | ||
Results 2 | Results are commonly used enumerations which indicate whether certain operations were successful or unsuccessful. | Ok(42) | 0x002a |
Err(false) | 0x0100 | ||
Options | One or zero values of a particular type. | Some | 0x01 followed by the encoded value |
None | 0x00 followed by the encoded value | ||
Vectors (lists, series, sets) | A collection of same-typed values is encoded, prefixed with a compact encoding of the number of items, followed by each item's encoding concatenated in turn. | Vector of unsigned 16-bit integers: [4, 8, 15, 16, 23, 42] | 0x18040008000f00100017002a00 |
Strings | Strings are Vectors of bytes (Vec<u8> ) containing a valid UTF8 sequence. | ||
Tuples | A fixed-size series of values, each with a possibly different but predetermined and fixed type. This is simply the concatenation of each encoded value. | Tuple of compact unsigned integer and boolean: (3, false) | 0x0c00 |
Structs | For structures, the values are named, but that is irrelevant for the encoding (names are ignored - only order matters). All containers store elements consecutively. The order of the elements is not fixed, depends on the container, and cannot be relied on at decoding. This implicitly means that decoding some byte-array into a specified structure that enforces an order and then re-encoding it could result in a different byte array than the original that was decoded. | A SortedVecAsc<u8> structure that always has byte-elements in ascending order: SortedVecAsc::from([3, 5, 2, 8]) | [3, 2, 5, 8] |
Enumerations (tagged-unions) | A fixed number of variants, each mutually exclusive and potentially implying a further value or series of values. Encoded as the first byte identifying the index of the variant that the value is. Any further bytes are used to encode any data that the variant implies. Thus, no more than 256 variants are supported. | Int(42) and Bool(true) where enum IntOrBool { Int(u8), Bool(bool),} | 0x002a and 0x0101 |
SCALE Codec has been implemented in other languages, including:
- AssemblyScript:
LimeChain/as-scale-codec
- C:
MatthewDarnell/cScale
- C++:
soramitsu/scale-codec-cpp
- JavaScript:
polkadot-js/api
- Dart:
leonardocustodio/polkadart
- Haskell:
airalab/hs-web3
- Golang:
itering/scale.go
- Java:
emeraldpay/polkaj
- JavaScript:
polkadot-js/api
- Python:
polkascan/py-scale-codec
- Ruby:
wuminzhe/scale_rb
- TypeScript:
parity-scale-codec-ts
,scale-ts
,soramitsu/scale-codec-js-library
,subsquid/scale-codec
-
Compact/general integers are encoded with the two least significant bits denoting the mode:
0b00
: single-byte mode; upper six bits are the LE encoding of the value (valid only for values of 0-63).0b01
: two-byte mode: upper six bits and the following byte is the LE encoding of the value (valid only for values64-(2**14-1)
).0b10
: four-byte mode: upper six bits and the following three bytes are the LE encoding of the value (valid only for values(2**14)-(2**30-1)
).0b11
: Big-integer mode: The upper six bits are the number of bytes following, plus four. The value is contained, LE encoded, in the bytes following. The final (most significant) byte must be non-zero. Valid only for values(2**30)-(2**536-1)
.
-
Results are encoded as:
0x00
if the operation was successful, followed by the encoded value.0x01
if the operation was unsuccessful, followed by the encoded error.