Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Getting started with desert-rust

desert-rust is a binary serialization library for Rust. It focuses on compact binary data while still allowing compatible changes to structs and enums over time.

The Rust crate is the counterpart of the original Scala desert library. The wire format is intentionally similar, but the API is shaped around Rust traits, derive macros, feature flags, and explicit error handling.

Installation

Add the public crate to Cargo.toml:

[dependencies]
desert_rust = "0.1.8"

Additional codecs are controlled with crate features:

[dependencies]
desert_rust = { version = "0.1.8", features = ["uuid", "chrono", "url"] }

Feature flags exposed by the public crate are:

  • bigdecimal
  • bit-vec
  • chrono
  • mac_address
  • nonempty-collections
  • serde-json
  • url
  • uuid

The current desert_core default features already enable bigdecimal, chrono, uuid, nonempty-collections, and serde-json. Features such as url, mac_address, and bit-vec must be enabled explicitly.

Serialize and deserialize a known type

The most direct API works with a Vec<u8> or bytes::Bytes:

use desert_rust::{deserialize, serialize_to_byte_vec, Result};

fn main() -> Result<()> {
    let bytes = serialize_to_byte_vec(&"Hello world".to_string())?;
    let value: String = deserialize(&bytes)?;

    assert_eq!(value, "Hello world");
    Ok(())
}

This works because String implements both BinarySerializer and BinaryDeserializer. Their combination is named BinaryCodec.

Derive codecs for your data

For structs and enums, derive BinaryCodec:

use desert_rust::{deserialize, serialize_to_byte_vec, BinaryCodec, Result};

#[derive(Debug, PartialEq, BinaryCodec)]
struct Point {
    x: i32,
    y: i32,
}

#[derive(Debug, PartialEq, BinaryCodec)]
enum Command {
    Move { to: Point },
    Label(String),
    Stop,
}

fn main() -> Result<()> {
    let command = Command::Move {
        to: Point { x: 10, y: -5 },
    };

    let bytes = serialize_to_byte_vec(&command)?;
    let decoded: Command = deserialize(&bytes)?;

    assert_eq!(decoded, command);
    Ok(())
}

The derive macro generates trait implementations and metadata needed for schema evolution. There is no runtime reflection or registration step for statically known types.

Top-level helper functions

The main helper functions are:

  • serialize(value, output) writes to any BinaryOutput.
  • serialize_with_options(value, output, options) does the same with explicit options.
  • serialize_to_byte_vec(value) returns Vec<u8>.
  • serialize_to_bytes(value) returns bytes::Bytes.
  • deserialize::<T>(input) reads T from a byte slice.
  • deserialize_with_options::<T>(input, options) reads with explicit options.

Scala compatibility option

Rust char normally serializes as a variable-length Unicode scalar value. The Scala library encoded characters as 16-bit Unicode units. Use Options::scala_compatible() when reading or writing data that must match the Scala character encoding:

use desert_rust::{
    deserialize_with_options, serialize_to_byte_vec_with_options, Options, Result,
};

fn main() -> Result<()> {
    let options = Options::scala_compatible();
    let bytes = serialize_to_byte_vec_with_options(&'x', options.clone())?;
    let decoded: char = deserialize_with_options(&bytes, options)?;

    assert_eq!(decoded, 'x');
    Ok(())
}

If a character cannot be represented as a single 16-bit unit in this mode, serialization fails with Error::UnsupportedCharacter.

Building this book locally

The byte-layout examples in this book are generated by the mdbook-desert preprocessor. Build it first, then put Cargo’s debug binary directory on PATH while running mdBook:

cargo build -p desert_book
PATH="$PWD/target/debug:$PATH" mdbook build book

Where to go next

Read Codecs and derivation for built-in types and custom codec implementations. Read Data model evolution before persisting data long term or sending it between independently deployed versions.

Codecs and derivation

A codec is the pair of traits that defines how a type is written and read:

use desert_rust::{BinaryDeserializer, BinarySerializer};

trait BinaryCodec: BinarySerializer + BinaryDeserializer {}

In the crate this is a blanket trait: any type implementing both serializer and deserializer automatically implements BinaryCodec.

Built-in codecs

The desert_rust crate re-exports the core implementations. The always available codecs include:

  • integers: u8, i8, u16, i16, u32, i32, u64, i64, u128, i128, usize, isize
  • non-zero integers from std::num
  • floats: f32, f64
  • bool, (), char, String, str
  • std::time::Duration
  • Option<T> and Result<T, E>
  • std::ops::Bound<T> and Range<T>
  • bytes::Bytes
  • arrays, Vec<T>, VecDeque<T>, LinkedList<T>
  • HashSet<T>, BTreeSet<T>, HashMap<K, V>, BTreeMap<K, V>
  • Box<T>, Rc<T>, Arc<T>, references, and PhantomData<T>
  • std::net::IpAddr
  • tuples from arity 1 to 8

Feature flags control codecs for third-party types:

FeatureTypes
bigdecimalbigdecimal::BigDecimal, bigdecimal::num_bigint::BigInt
bit-vecbit_vec::BitVec
chronochrono dates, times, offsets, chrono_tz::Tz
mac_addressmac_address::MacAddress
nonempty-collectionsnonempty_collections::NEVec<T>
serde-jsonserde_json::Value
urlurl::Url
uuiduuid::Uuid

The facade currently pulls in the desert_core default feature set, so bigdecimal, chrono, uuid, nonempty-collections, and serde-json are enabled by default. Enable bit-vec, mac_address, or url explicitly when you need those codecs.

The same generator is used for optional third-party codecs:

uuid

let value = uuid::Uuid::from_bytes([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]);
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F, 0x10]
0102030405060708090A0B0C0D0E0F10
uuidraw UUID bytes

chrono::NaiveDate

let value = chrono::NaiveDate::from_ymd_opt(2024, 6, 22).unwrap();
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0xE8, 0x0F, 0x06, 0x16]
E80F0616
yearyear as var_u32monthmonth bytedayday byte

chrono::NaiveTime

let value = chrono::NaiveTime::from_hms_nano_opt(9, 30, 5, 125).unwrap();
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x09, 0x1E, 0x05, 0x7D]
091E057D
hourhour byteminuteminute bytesecondsecond bytenanosnanosecond fraction as var_u32

bigdecimal

let value: bigdecimal::BigDecimal = "123.45".parse().unwrap();
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x0C, 0x31, 0x32, 0x33, 0x2E, 0x34, 0x35]
0C3132332E3435
lengthbyte length encoded as var_i32decimalUTF-8 bytes

bit-vec

let value = bit_vec::BitVec::from_bytes(&[0b1010_0000]);
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x01, 0xA0]
01A0
lengthbyte count for packed bitsbitspacked bit payload

mac_address

let value = mac_address::MacAddress::new([0, 17, 34, 51, 68, 85]);
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x00, 0x11, 0x22, 0x33, 0x44, 0x55]
001122334455
macsix raw address bytes

url

let value = url::Url::parse("https://desert-rust.vigoo.dev/").unwrap();
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x3C, 0x68, 0x74, 0x74, 0x70, 0x73, 0x3A, 0x2F, 0x2F, 0x64, 0x65, 0x73, 0x65, 0x72, 0x74, 0x2D, 0x72, 0x75, 0x73, 0x74, 0x2E, 0x76, 0x69, 0x67, 0x6F, 0x6F, 0x2E, 0x64, 0x65, 0x76, 0x2F]
3C68747470733A2F2F6465736572742D
lengthbyte length encoded as var_i32URLUTF-8 bytes
727573742E7669676F6F2E6465762F
URLUTF-8 bytes

serde_json

let value = serde_json::json!({ "a": 1 });
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x07, 0x7B, 0x22, 0x61, 0x22, 0x3A, 0x31, 0x7D]
077B2261223A317D
lengthJSON byte count as var_u32JSONcompact JSON bytes

nonempty-collections

let value = nonempty_collections::NEVec::try_from_vec(vec![1u8, 2, 3]).unwrap();
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x03, 0x01, 0x02, 0x03]
03010203
lengthnon-empty byte count as var_u32bytesraw byte payload

Primitive representation

Fixed-width numeric types are written in big-endian byte order:

use desert_rust::{serialize_to_byte_vec, Result};

fn main() -> Result<()> {
    assert_eq!(serialize_to_byte_vec(&100u16)?, vec![0, 100]);
    assert_eq!(serialize_to_byte_vec(&100u32)?, vec![0, 0, 0, 100]);
    Ok(())
}

bool is encoded as a single byte: 0 for false, 1 for true. String and str are encoded as a variable-length signed byte count followed by UTF-8 bytes.

i32

let bytes = desert_rust::serialize_to_byte_vec(&42i32)?;
[0x00, 0x00, 0x00, 0x2A]
0000002A
i32fixed-width big-endian signed integer

u16

let bytes = desert_rust::serialize_to_byte_vec(&1000u16)?;
[0x03, 0xE8]
03E8
u16fixed-width big-endian unsigned integer

bool

let bytes = desert_rust::serialize_to_byte_vec(&true)?;
[0x01]
01
trueone byte: 1 for true, 0 for false

unit

let bytes = desert_rust::serialize_to_byte_vec(&())?;
[]

char

let bytes = desert_rust::serialize_to_byte_vec(&'λ')?;
[0xBB, 0x07]
BB07
code pointUnicode scalar value written as var_u32

String

let bytes = desert_rust::serialize_to_byte_vec(&"desert".to_string())?;
[0x0C, 0x64, 0x65, 0x73, 0x65, 0x72, 0x74]
0C646573657274
lengthbyte length encoded as var_i32UTF-8UTF-8 bytes

Option<T> and Result<T, E> start with a single tag byte and then write only the payload selected by that tag:

Option::Some

let bytes = desert_rust::serialize_to_byte_vec(&Some(7i32))?;
[0x01, 0x00, 0x00, 0x00, 0x07]
0100000007
Somepresence markervalueinner i32 payload

Option::None

let bytes = desert_rust::serialize_to_byte_vec(&Option::<i32>::None)?;
[0x00]
00
Noneabsence marker

Result::Ok

let value: Result<i32, String> = Ok(7);
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x01, 0x00, 0x00, 0x00, 0x07]
0100000007
Okresult markervaluesuccess payload

Result::Err

let value: Result<i32, String> = Err("no".to_string());
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x00, 0x04, 0x6E, 0x6F]
00046E6F
Errresult markerlengthbyte length encoded as var_i32errorUTF-8 bytes

Vec<u8>, [u8], [u8; N], bytes::Bytes, and NEVec<u8> use an optimized byte-block encoding: a variable-length unsigned length followed by raw bytes. This is intentionally compatible with the Scala library’s byte chunk format.

Vec<u8>

let bytes = desert_rust::serialize_to_byte_vec(&vec![1u8, 2, 3, 4])?;
[0x04, 0x01, 0x02, 0x03, 0x04]
0401020304
lengthvar_u32 byte countbytesraw byte payload

bytes::Bytes

let value = bytes::Bytes::from_static(b"abc");
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x03, 0x61, 0x62, 0x63]
03616263
lengthvar_u32 byte countbytesraw byte payload

Collections

All generic iterable collection codecs share the same representation:

  • If the iterator reports an exact size, desert writes that size as a variable-length signed integer and then all elements.
  • If the size is not known, desert writes -1, then each element prefixed by a 1 byte, then a final 0 byte.

Because the representation is shared, many collection changes are binary compatible. For example, a Vec<i32> can be read as a LinkedList<i32>, and a BTreeSet<i32> can be read as a HashSet<i32>, as long as the target collection’s type constraints are satisfied.

Vec<i32>

let bytes = desert_rust::serialize_to_byte_vec(&vec![1i32, 2, 3])?;
[0x06, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x00, 0x03]
060000000100000002
countexact-size iterable count as var_i32item 0first i32item 1second i32
00000003
item 2third i32

BTreeMap<String, i32>

let value = std::collections::BTreeMap::from([
    ("a".to_string(), 1i32),
    ("b".to_string(), 2i32),
]);
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x04, 0x00, 0x02, 0x61, 0x00, 0x00, 0x00, 0x01, 0x00, 0x02, 0x62, 0x00, 0x00, 0x00, 0x02]
0400026100000001
countexact-size iterable count as var_i32entrytuple marker for key/value pairkey astring key: length plus UTF-8value 1i32 map value
00026200000002
entrytuple marker for key/value pairkey bstring key: length plus UTF-8value 2i32 map value

(i32, bool)

let bytes = desert_rust::serialize_to_byte_vec(&(42i32, true))?;
[0x00, 0x00, 0x00, 0x00, 0x2A, 0x01]
000000002A01
versiontuple payload marker compatible with version-0 structsfield 0first tuple itemfield 1second tuple item

Deriving structs

Use #[derive(BinaryCodec)] for ordinary named-field structs:

use desert_rust::BinaryCodec;

#[derive(Debug, Clone, PartialEq, BinaryCodec)]
struct User {
    id: u64,
    name: String,
    email: Option<String>,
}

The generated format starts with a version byte. Version 0 structs are compatible with tuples of the same field order and arity, which allows simple tuple-to-struct migrations.

derived struct

#[derive(Debug, Clone, PartialEq, desert_rust::BinaryCodec)]
struct User {
    id: u32,
    name: String,
    email: Option<String>,
}

let value = User {
    id: 7,
    name: "Ada".to_string(),
    email: None,
};
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x00, 0x00, 0x00, 0x00, 0x07, 0x06, 0x41, 0x64, 0x61, 0x00]
000000000706416461
versionversion-0 struct markeridu32 fieldname lengthstring byte count as var_i32name UTF-8string bytes
00
emailOption::None marker

For generic types, the derive macro adds serializer and deserializer bounds for generic parameters:

use desert_rust::BinaryCodec;

#[derive(Debug, Clone, PartialEq, BinaryCodec)]
struct Wrapper<T> {
    value: T,
}

Deriving enums

Enums are encoded as a constructor id followed by constructor payload data:

use desert_rust::BinaryCodec;

#[derive(Debug, Clone, PartialEq, BinaryCodec)]
enum Event {
    Started,
    Message(String),
    Moved { x: i32, y: i32 },
}

Constructor ids are assigned from the enum variant order, skipping transient variants. Adding new variants at the end is compatible with old data, but old code cannot read values using the new variant.

derived enum

#[derive(Debug, Clone, PartialEq, desert_rust::BinaryCodec)]
enum Event {
    Started,
    Message(String),
    Moved { x: i32, y: i32 },
}

let bytes = desert_rust::serialize_to_byte_vec(&Event::Message("hi".to_string()))?;
[0x00, 0x01, 0x00, 0x04, 0x68, 0x69]
000100046869
versionouter enum versionconstructorvariant id as var_u32case versionvariant payload versionlengthbyte length encoded as var_i32payloadUTF-8 bytes

You can ask the derive macro to assign constructor ids by sorted variant name:

use desert_rust::BinaryCodec;

#[derive(Debug, Clone, PartialEq, BinaryCodec)]
#[desert(sorted_constructors)]
enum StableByName {
    B,
    A,
}

Use this only when all versions agree on the same naming scheme. Reordering without sorted_constructors changes constructor ids and breaks compatibility.

Transparent wrappers

Single-field structs can be encoded exactly as their inner type:

use desert_rust::BinaryCodec;

#[derive(Debug, Clone, PartialEq, BinaryCodec)]
#[desert(transparent)]
struct UserId(u64);

This is the Rust equivalent of using the Scala wrapper derivation. It is useful when a primitive value is promoted to a domain-specific newtype without changing the wire format.

Transparent enum variants are also supported for unit variants and single-field variants:

use desert_rust::BinaryCodec;

#[derive(Debug, Clone, PartialEq, BinaryCodec)]
enum Value {
    #[desert(transparent)]
    Text(String),
    Structured { value: String },
}

The transparent variant still has an enum constructor id. The attribute affects how the variant payload is encoded.

Transient fields and variants

A transient field is not serialized. It must provide a default expression used when deserializing:

use desert_rust::BinaryCodec;

#[derive(Debug, Clone, PartialEq, BinaryCodec)]
struct Cached {
    value: String,
    #[transient(None::<usize>)]
    cached_len: Option<usize>,
}

Transient enum variants are not assigned constructor ids. Serializing such a variant returns Error::SerializingTransientConstructor.

use desert_rust::BinaryCodec;

#[derive(Debug, Clone, PartialEq, BinaryCodec)]
enum State {
    Stored,
    #[transient]
    RuntimeOnly,
}

Transient variants can be inserted or removed without shifting the ids of stored variants.

Custom codecs

Implement BinarySerializer and BinaryDeserializer manually when the derived format is not appropriate:

use desert_rust::{
    BinaryDeserializer, BinaryOutput, BinarySerializer, DeserializationContext,
    Result, SerializationContext,
};

#[derive(Debug, PartialEq)]
struct Lowercase(String);

impl BinarySerializer for Lowercase {
    fn serialize<Output: BinaryOutput>(
        &self,
        context: &mut SerializationContext<Output>,
    ) -> Result<()> {
        self.0.to_lowercase().serialize(context)
    }
}

impl BinaryDeserializer for Lowercase {
    fn deserialize(context: &mut DeserializationContext<'_>) -> Result<Self> {
        Ok(Self(String::deserialize(context)?))
    }
}

The enum derive macro can also wrap a single-field variant through a custom type. The wrapper type must be constructible from a borrowed value in the shape expected by the macro, so this is mainly useful for specialized string wrappers.

String deduplication

Normal String serialization does not deduplicate values. This keeps schema evolution safe: when an older reader skips a newly added string field, it does not accidentally miss a string id assignment needed by a later field.

For streams where the writer and reader agree that deduplication is safe, wrap values in DeduplicatedString:

use bytes::BytesMut;
use desert_rust::{
    BinaryDeserializer, BinarySerializer, DeduplicatedString, DeserializationContext,
    Options, Result, SerializationContext,
};

fn main() -> Result<()> {
    let mut output = SerializationContext::new(BytesMut::new(), Options::default());

    DeduplicatedString("same".to_string()).serialize(&mut output)?;
    DeduplicatedString("same".to_string()).serialize(&mut output)?;

    let bytes = output.into_output();
    let mut input = DeserializationContext::new(&bytes, Options::default());

    let first = DeduplicatedString::deserialize(&mut input)?.0;
    let second = DeduplicatedString::deserialize(&mut input)?.0;

    assert_eq!(first, second);
    Ok(())
}

The first occurrence is encoded like a normal string. Later occurrences in the same serialization context are encoded as a negative id.

DeduplicatedString

let mut context = desert_rust::SerializationContext::new(
    Vec::new(),
    desert_rust::Options::default(),
);
desert_rust::DeduplicatedString("same".to_string()).serialize(&mut context)?;
desert_rust::DeduplicatedString("same".to_string()).serialize(&mut context)?;
let bytes = context.into_output();
[0x08, 0x73, 0x61, 0x6D, 0x65, 0x01]
0873616D6501
first lengthfirst occurrence uses normal string lengthfirst UTF-8first string bytesrepeat idnegative string id encoded as var_i32

Binary input/output

The high-level helpers are enough for most use cases, but the core library is built around two low-level traits:

  • BinaryOutput writes bytes.
  • BinaryInput reads bytes and can skip regions.

Serialization and deserialization contexts implement these traits and add shared state for features such as string deduplication, reference tracking, options, and ADT evolution.

High-level output helpers

Use serialize_to_byte_vec when you want a Vec<u8>:

use desert_rust::{serialize_to_byte_vec, Result};

fn main() -> Result<()> {
    let bytes = serialize_to_byte_vec(&42i32)?;
    assert_eq!(bytes, vec![0, 0, 0, 42]);
    Ok(())
}

Use serialize_to_bytes when you want bytes::Bytes:

use desert_rust::{serialize_to_bytes, Result};

fn main() -> Result<()> {
    let bytes = serialize_to_bytes(&"hello".to_string())?;
    assert_eq!(&bytes[..], &[10, b'h', b'e', b'l', b'l', b'o']);
    Ok(())
}

The string length byte is 10 because signed variable integers use zig-zag encoding internally. The logical string length is 5.

Choosing a Vec<u8> helper

serialize_to_byte_vec starts with a small default capacity and serializes the value once. This is usually the right choice for small payloads, such as single commands, individual events, request/response fragments, or anything likely to fit within the default 128-byte buffer.

For larger values, repeated Vec growth can become visible. There are three ways to avoid that:

  • Use serialize_to_byte_vec_with_capacity when you already know a good capacity estimate.
  • Use serialize_into_byte_vec when serializing repeatedly and you can reuse an existing buffer.
  • Use serialize_to_byte_vec_exact when the value is probably large but the caller does not know its serialized size.

serialize_to_byte_vec_exact first computes the exact serialized length with serialized_size, then serializes into a Vec<u8> allocated with that capacity:

use desert_rust::{serialize_to_byte_vec_exact, Result};

fn main() -> Result<()> {
    let batch = (0..10_000).collect::<Vec<u32>>();
    let bytes = serialize_to_byte_vec_exact(&batch)?;

    assert!(!bytes.is_empty());
    Ok(())
}

This is a two-pass tradeoff. It avoids growth for large, one-off values, but it does more work for small values. If the value fits in the default buffer, serialize_to_byte_vec is normally faster. If you already know the size or can reuse a buffer, the capacity or reusable-buffer helpers are usually faster than the exact helper.

Writing to a custom output

The generic serialize function accepts any BinaryOutput. The built-in outputs are:

  • Vec<u8>
  • bytes::BytesMut
  • SizeCalculator

SizeCalculator computes the number of bytes that would be written without storing them:

use desert_rust::{serialize, Result, SizeCalculator};

fn main() -> Result<()> {
    let output = serialize(&1234i32, SizeCalculator::new())?;
    assert_eq!(output.size(), 4);
    Ok(())
}

To write to a new destination, implement BinaryOutput:

use desert_rust::{BinaryOutput, Result};

struct CountingOutput {
    count: usize,
}

impl BinaryOutput for CountingOutput {
    fn write_u8(&mut self, _value: u8) {
        self.count += 1;
    }

    fn write_bytes(&mut self, bytes: &[u8]) {
        self.count += bytes.len();
    }
}

The trait provides default implementations for fixed-width integers, floats, variable-length integers, and compressed byte blocks.

Reading input

The public top-level deserialize helper reads from &[u8]:

use desert_rust::{deserialize, Result};

fn main() -> Result<()> {
    let value: i32 = deserialize(&[0, 0, 0, 42])?;
    assert_eq!(value, 42);
    Ok(())
}

For low-level code, SliceInput borrows bytes and OwnedInput owns a Vec<u8>:

use desert_rust::{BinaryInput, Result, SliceInput};

fn main() -> Result<()> {
    let mut input = SliceInput::new(&[0, 0, 0, 42]);
    let value = input.read_i32()?;

    assert_eq!(value, 42);
    Ok(())
}

Most custom deserializers should use DeserializationContext rather than SliceInput directly, because the context also carries options and shared state:

use desert_rust::{BinaryDeserializer, DeserializationContext, Options, Result};

fn main() -> Result<()> {
    let mut context = DeserializationContext::new(&[0, 0, 0, 42], Options::default());
    let value = i32::deserialize(&mut context)?;

    assert_eq!(value, 42);
    Ok(())
}

Variable-length integers

BinaryOutput::write_var_u32 writes a u32 in 1 to 5 bytes. Smaller positive values use fewer bytes:

use desert_rust::{BinaryOutput, Result};

fn main() -> Result<()> {
    let mut bytes = Vec::new();
    bytes.write_var_u32(1);
    bytes.write_var_u32(4096);

    assert_eq!(bytes, vec![1, 128, 32]);
    Ok(())
}

write_var_i32 uses zig-zag encoding before writing the value as var_u32. This keeps small negative values compact too:

use desert_rust::BinaryOutput;

let mut bytes = Vec::new();
bytes.write_var_i32(-1);
assert_eq!(bytes, vec![1]);

The library uses variable-length integers for lengths, ids, and ADT evolution metadata. Fixed-width Rust integer codecs such as i32 still use fixed-width big-endian bytes.

var_u32

let mut bytes = Vec::new();
bytes.write_var_u32(16_384);
[0x80, 0x80, 0x01]
808001
var_u327-bit groups with continuation bits

var_i32

let mut bytes = Vec::new();
bytes.write_var_i32(-64);
[0x7F]
7F
zig-zagsigned value zig-zag encoded, then var_u32

Compression helpers

BinaryOutput::write_compressed stores:

  1. the uncompressed length as var_u32
  2. the compressed length as var_u32
  3. the deflate-compressed bytes

BinaryInput::read_compressed reverses that representation:

use desert_rust::{BinaryInput, BinaryOutput, OwnedInput, Result};

fn main() -> Result<()> {
    let data = b"hello hello hello";
    let mut bytes = Vec::new();

    bytes.write_compressed(data, Default::default())?;

    let mut input = OwnedInput::new(bytes);
    let decoded = input.read_compressed()?;

    assert_eq!(decoded, data);
    Ok(())
}

Compression is a low-level helper. The built-in type codecs do not compress their payloads automatically.

compressed block

let data = b"hello hello hello";
let mut bytes = Vec::new();
bytes.write_compressed(data, flate2::Compression::fast())?;
[0x11, 0x09, 0xCB, 0x48, 0xCD, 0xC9, 0xC9, 0x57, 0x40, 0x22, 0x01]
1109CB48CDC9C957402201
plain lenuncompressed byte count as var_u32deflate lencompressed byte count as var_u32deflatedeflate payload

Byte blocks and iterables

Byte-oriented containers use a compact block format: a var_u32 byte count followed by the bytes. Generic iterable containers use an item format that can also represent streams with no exact size hint.

Vec<u8>

let bytes = desert_rust::serialize_to_byte_vec(&vec![1u8, 2, 3, 4])?;
[0x04, 0x01, 0x02, 0x03, 0x04]
0401020304
lengthvar_u32 byte countbytesraw byte payload

unknown-size iterable

let mut iter = [1i32, 2].into_iter().filter(|value| *value > 0);
desert_rust::serialize_iterator(
    &mut iter,
    &mut desert_rust::SerializationContext::new(&mut bytes, desert_rust::Options::default()),
)?;
[0x01, 0x01, 0x00, 0x00, 0x00, 0x01, 0x01, 0x00, 0x00, 0x00, 0x02, 0x00]
0101000000010100000002
unknown-1 count marker as var_i32itemitem-present markervaluefirst i32itemitem-present markervaluesecond i32
00
endend marker

Options

Options currently controls Scala-compatible character encoding:

use desert_rust::Options;

let default_options = Options::default();
let scala_options = Options::scala_compatible();

assert!(!default_options.chars_as_u16);
assert!(scala_options.chars_as_u16);

Pass options through serialize_with_options, serialize_to_byte_vec_with_options, serialize_to_bytes_with_options, or deserialize_with_options.

Context state

SerializationContext and DeserializationContext also hold per-stream state. Built-in uses include:

  • string ids for DeduplicatedString
  • reference ids for custom reference-aware codecs
  • nested buffer stacks used by ADT evolution encoding

If you implement a custom codec and only need ordinary values, call the existing BinarySerializer and BinaryDeserializer implementations. Touching context state directly is an advanced use case.

Data model evolution

One of desert’s main goals is allowing stored or transmitted data to survive controlled changes to Rust data types.

Evolution support is generated by #[derive(BinaryCodec)] for structs and enums. The derive macro writes a compact version header and enough metadata for older and newer versions to skip, default, or reinterpret fields where that is safe.

Compatibility rules in short

Compatible changes:

  • Convert a multi-field tuple into a struct with the same field order.
  • Wrap a value in a #[desert(transparent)] single-field struct.
  • Replace one collection type with another collection type using the same item representation.
  • Add a struct field with FieldAdded.
  • Make a field optional with FieldMadeOptional.
  • Remove a field with FieldRemoved, with the limits described below.
  • Make a field transient with FieldMadeTransient plus #[transient(default)].
  • Add an enum variant at the end of the enum.
  • Insert or remove #[transient] enum variants.

Breaking or risky changes:

  • Rename a field without preserving the serialized field name. The current Rust derive macro does not have a field rename attribute.
  • Reorder enum variants without #[desert(sorted_constructors)].
  • Remove an enum variant that has already been serialized.
  • Change a field’s type unless the old and new types intentionally share a binary representation.
  • Use DeduplicatedString in evolvable fields unless all readers and writers agree on the exact stream shape.

Tuples and structs

Version-0 structs are compatible with tuples of the same arity and field order:

use desert_rust::{deserialize, serialize_to_byte_vec, BinaryCodec, Result};

#[derive(Debug, PartialEq, BinaryCodec)]
struct Point {
    x: i32,
    y: i32,
}

fn main() -> Result<()> {
    let bytes = serialize_to_byte_vec(&(10, 20))?;
    let point: Point = deserialize(&bytes)?;

    assert_eq!(point, Point { x: 10, y: 20 });
    Ok(())
}

This is useful for moving from positional values to named records.

Transparent wrappers

Transparent single-field structs are encoded exactly like the inner field:

use desert_rust::{deserialize, serialize_to_byte_vec, BinaryCodec, Result};

#[derive(Debug, PartialEq, BinaryCodec)]
#[desert(transparent)]
struct UserId(u64);

fn main() -> Result<()> {
    let bytes = serialize_to_byte_vec(&42u64)?;
    let id: UserId = deserialize(&bytes)?;

    assert_eq!(id, UserId(42));
    Ok(())
}

This lets a model evolve from primitive values to domain newtypes without changing stored bytes.

Collections

Generic collection codecs use a shared iterable format, so many collection changes are compatible:

use desert_rust::{deserialize, serialize_to_byte_vec, Result};
use std::collections::LinkedList;

fn main() -> Result<()> {
    let bytes = serialize_to_byte_vec(&vec![1i32, 2, 3])?;
    let values: LinkedList<i32> = deserialize(&bytes)?;

    assert_eq!(values.into_iter().collect::<Vec<_>>(), vec![1, 2, 3]);
    Ok(())
}

Set and map compatibility depends on the target type’s Eq, Hash, or Ord requirements and on whether duplicate values make sense.

Adding a field

When a struct receives a new field, record it with FieldAdded and provide the default expression used when reading old data:

use desert_rust::BinaryCodec;

#[derive(Debug, Clone, PartialEq, BinaryCodec)]
struct ProductV1 {
    name: String,
    price: i32,
}

#[derive(Debug, Clone, PartialEq, BinaryCodec)]
#[desert(evolution(FieldAdded("in_stock", true)))]
struct ProductV2 {
    name: String,
    in_stock: bool,
    price: i32,
}

With this change:

  • ProductV2 can read ProductV1 data and uses true for in_stock.
  • ProductV1 can read ProductV2 data by skipping the added field.

The added field does not have to be the last field in the Rust struct. The evolution metadata records which generation introduced it.

Making a field optional

A field can be changed from T to Option<T>:

use desert_rust::BinaryCodec;

#[derive(Debug, Clone, PartialEq, BinaryCodec)]
#[desert(evolution(
    FieldAdded("in_stock", true),
    FieldMadeOptional("price")
))]
struct ProductV3 {
    name: String,
    in_stock: bool,
    price: Option<i32>,
}

With this change:

  • New code reads old data as Some(old_value).
  • Old code can read new data if the option is Some(value).
  • Old code cannot read new data if the option is None, because it expected a non-optional field value.

This can be used as an intermediate migration before removing a field.

Removing a field

A removed field must be recorded:

use desert_rust::BinaryCodec;

#[derive(Debug, Clone, PartialEq, BinaryCodec)]
#[desert(evolution(
    FieldAdded("in_stock", true),
    FieldMadeOptional("price"),
    FieldRemoved("price")
))]
struct ProductV4 {
    name: String,
    in_stock: bool,
}

With this change:

  • New code can read old data by skipping price.
  • Old code can read new data only if the removed field had already become optional, in which case it is read as None.
  • Old code that expects a non-optional removed field cannot read new data.

Transient fields

Adding a new transient field does not change the binary representation:

use desert_rust::BinaryCodec;

#[derive(Debug, Clone, PartialEq, BinaryCodec)]
struct WithCache {
    value: String,
    #[transient(None::<usize>)]
    cached_len: Option<usize>,
}

If an existing serialized field becomes transient, record it with FieldMadeTransient and keep a transient default expression:

use desert_rust::BinaryCodec;

#[derive(Debug, Clone, PartialEq, BinaryCodec)]
#[desert(evolution(
    FieldAdded("in_stock", true),
    FieldMadeOptional("price"),
    FieldRemoved("price"),
    FieldMadeTransient("name")
))]
struct ProductV5 {
    #[transient("unknown".to_string())]
    name: String,
    in_stock: bool,
}

FieldMadeTransient behaves like FieldRemoved for the wire format. New code can read old data and uses the transient default. Older code cannot read the new data if it requires the missing field.

Evolving enums

Enums are encoded by constructor id. Constructor ids follow source order unless #[desert(sorted_constructors)] is used.

Adding a new variant at the end is compatible for old values:

use desert_rust::BinaryCodec;

#[derive(Debug, Clone, PartialEq, BinaryCodec)]
enum EventV1 {
    Started,
    Message(String),
}

#[derive(Debug, Clone, PartialEq, BinaryCodec)]
enum EventV2 {
    Started,
    Message(String),
    Stopped,
}

With this change:

  • EventV2 can read old EventV1 values.
  • EventV1 can read EventV2 data only when the stored constructor id also existed in EventV1.
  • EventV1 cannot read EventV2::Stopped.

Enum variants with fields can have their own evolution steps:

use desert_rust::BinaryCodec;

#[derive(Debug, Clone, PartialEq, BinaryCodec)]
enum Event {
    #[desert(evolution(FieldAdded("source", "api".to_string())))]
    Message { text: String, source: String },
}

Transient variants are not assigned constructor ids:

use desert_rust::BinaryCodec;

#[derive(Debug, Clone, PartialEq, BinaryCodec)]
enum RuntimeState {
    Stored,
    #[transient]
    InMemoryOnly,
}

Serializing RuntimeState::InMemoryOnly fails. The benefit is that such variants can be inserted or removed without shifting persistent constructor ids.

Evolution encoding

For derived structs and non-transparent enum variant payloads, desert writes:

  1. a version byte
  2. for non-zero versions, compact evolution metadata describing chunks and removed or optional fields
  3. the field data split into generation chunks

Version 0 data has no extra evolution metadata beyond the version byte. This keeps initial records compact and is why version-0 structs and tuples can share the same format.

PointV1

#[derive(Debug, Clone, PartialEq, desert_rust::BinaryCodec)]
#[desert(evolution())]
struct PointV1 {
    x: i32,
    y: i32,
}

let bytes = desert_rust::serialize_to_byte_vec(&PointV1 { x: 10, y: 20 })?;
[0x00, 0x00, 0x00, 0x00, 0x0A, 0x00, 0x00, 0x00, 0x14]
000000000A00000014
versionversion 0xfirst fieldysecond field

When a field is added, the new generation is written as a later chunk. Older readers can skip chunks they do not know about. Newer readers can detect missing chunks and use defaults from FieldAdded.

PointV2 adds label

#[derive(Debug, Clone, PartialEq, desert_rust::BinaryCodec)]
#[desert(evolution(FieldAdded("label", "origin".to_string())))]
struct PointV2 {
    x: i32,
    label: String,
    y: i32,
}

let value = PointV2 { x: 10, label: "origin".to_string(), y: 20 };
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x01, 0x10, 0x0E, 0x00, 0x00, 0x00, 0x0A, 0x00, 0x00, 0x00, 0x14, 0x0C, 0x6F, 0x72, 0x69, 0x67, 0x69, 0x6E]
01100E0000000A00000014
versionversion 1v0 sizebyte length of original-field chunkv1 sizebyte length of added-field chunkxversion-0 fieldyversion-0 field
0C6F726967696E
label lengthadded string lengthlabel UTF-8added field data

If a later version makes the field optional, the metadata records the field position. Older readers can still read Some(value) as the original field type; None is only understood by readers that know the optional step.

PointV3 makes label optional

#[derive(Debug, Clone, PartialEq, desert_rust::BinaryCodec)]
#[desert(evolution(
    FieldAdded("label", "origin".to_string()),
    FieldMadeOptional("label")
))]
struct PointV3 {
    x: i32,
    label: Option<String>,
    y: i32,
}

let value = PointV3 { x: 10, label: None, y: 20 };
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x02, 0x10, 0x02, 0x01, 0x01, 0x00, 0x00, 0x00, 0x0A, 0x00, 0x00, 0x00, 0x14, 0x00]
02100201010000000A
versionversion 2v0 sizebyte length of original-field chunkv1 sizebyte length of optional-field chunkoptionalFieldMadeOptional marker plus field positionxversion-0 field
0000001400
yversion-0 fieldlabelOption::None marker

When a field is removed, the metadata records the removed field name so readers that still know the field can either treat it as None if it is optional, or fail clearly if it is required.

PointV4 removes label

#[derive(Debug, Clone, PartialEq, desert_rust::BinaryCodec)]
#[desert(evolution(
    FieldAdded("label", "origin".to_string()),
    FieldMadeOptional("label"),
    FieldRemoved("label")
))]
struct PointV4 {
    x: i32,
    y: i32,
}

let bytes = desert_rust::serialize_to_byte_vec(&PointV4 { x: 10, y: 20 })?;
[0x03, 0x10, 0x00, 0x03, 0x0A, 0x6C, 0x61, 0x62, 0x65, 0x6C, 0x03, 0x01, 0x00, 0x00, 0x00, 0x0A, 0x00, 0x00, 0x00, 0x14]
031000030A6C6162656C
versionversion 3v0 sizebyte length of original-field chunkv1 sizeremoved field leaves no data in the added chunkremovedFieldRemoved marker for optional stepname lengthremoved field name lengthname UTF-8removed field name
03010000000A00000014
removedFieldRemoved markername refdeduplicated reference to field namexversion-0 fieldyversion-0 field

Keeping evolution safe

Use these practices for long-lived formats:

  • Append evolution steps. Do not rewrite old evolution history.
  • Keep field names stable.
  • Prefer Option<T> as a compatibility bridge before removing a field.
  • Keep enum constructor order stable, or opt into sorted constructors from the first released version.
  • Add roundtrip and cross-version compatibility tests for every model that is persisted or sent across process boundaries.

Rust and Scala differences

The Rust library follows the same design goals as the original Scala desert library: compact binary data, ADT support, and schema evolution. The APIs are different because Rust and Scala expose different language tools.

This page is a guide for readers who know the Scala documentation and want to understand what changed in desert-rust.

Crate layout

Rust has three workspace crates:

  • desert_rust: public facade crate, re-exporting the core library and derive macro
  • desert_core: serialization, deserialization, state, codecs, and evolution logic
  • desert_macro: #[derive(BinaryCodec)]

Scala has separate modules for core codecs, derivation implementations, and ecosystem integrations such as Akka, Pekko, Cats, ZIO, and Shardcake. Those integration modules do not exist in the Rust project.

Dependency model

Scala users choose between derivation modules such as Shapeless or ZIO Schema. Rust users depend on desert_rust and use one derive macro:

use desert_rust::BinaryCodec;

#[derive(BinaryCodec)]
struct Point {
    x: i32,
    y: i32,
}

Third-party Rust codecs are controlled by Cargo feature flags, for example uuid, chrono, and serde-json.

Codec discovery

Scala uses implicit BinaryCodec[T] values. Rust uses trait implementations:

  • BinarySerializer writes a type.
  • BinaryDeserializer reads a type.
  • BinaryCodec is implemented automatically when both are present.

There is no implicit search at runtime. If a generic function needs a codec, it uses Rust trait bounds:

use desert_rust::{BinarySerializer, Result, serialize_to_byte_vec};

fn encode<T: BinarySerializer>(value: &T) -> Result<Vec<u8>> {
    serialize_to_byte_vec(value)
}

Error handling

Scala APIs return an effect or an Either-like result depending on the module. Rust APIs return desert_rust::Result<T>, whose error type is desert_rust::Error:

use desert_rust::{deserialize, Error};

let result: Result<i32, Error> = deserialize(&[0, 0, 0, 1]);

Derivation attributes

Scala evolution uses annotations such as @evolutionSteps and @transientField. Rust uses derive helper attributes:

use desert_rust::BinaryCodec;

#[derive(BinaryCodec)]
#[desert(evolution(
    FieldAdded("description", Some("new".to_string())),
    FieldMadeOptional("description")
))]
struct Item {
    name: String,
    description: Option<String>,
    #[transient(0usize)]
    cached_hash: usize,
}

The Rust derive macro currently supports:

  • #[desert(evolution(...))]
  • FieldAdded("field", default_expr)
  • FieldMadeOptional("field")
  • FieldRemoved("field")
  • FieldMadeTransient("field")
  • #[transient(default_expr)] on fields
  • #[transient] on enum variants
  • #[desert(transparent)] on single-field structs and enum variants
  • #[desert(sorted_constructors)] on enums

Character encoding

The Scala library encoded characters as 16-bit Unicode values. The Rust default encodes char as a Unicode scalar value using a variable-length unsigned integer.

Use Options::scala_compatible() when Rust must read or write data compatible with Scala character encoding:

use desert_rust::{Options, serialize_to_byte_vec_with_options};

fn main() -> desert_rust::Result<()> {
    let bytes = serialize_to_byte_vec_with_options(&'A', Options::scala_compatible())?;
    assert_eq!(bytes, vec![0, 65]);
    Ok(())
}

In Scala-compatible mode, characters outside a single UTF-16 code unit cannot be serialized as char.

Strings and byte arrays

Normal strings use the same basic shape: a compact byte length followed by UTF-8. DeduplicatedString also follows the same idea as Scala: first occurrences are written normally, repeated occurrences in the same context are written as negative ids.

Raw byte blocks in Rust are represented by Vec<u8>, [u8], [u8; N], bytes::Bytes, and NEVec<u8>. These use a compact unsigned length plus raw bytes for compatibility with Scala byte chunks.

Collections

Both implementations use a shared iterable representation so collection type changes can be compatible. Rust collection compatibility is constrained by Rust trait bounds: for example, deserializing into HashSet<T> requires T: Eq + Hash, while deserializing into BTreeSet<T> requires T: Ord.

Type registry

The Scala library has a type registry for serializing values whose concrete type is not known statically.

The Rust crate does not currently expose an equivalent type registry API. For now, model closed sets of runtime alternatives as enums, or define an application-level tag plus custom BinarySerializer and BinaryDeserializer implementations.

Ecosystem integrations

Scala-specific modules described in the original documentation are not part of desert-rust:

  • Akka and Pekko serializers
  • Cats and Cats Effect codecs
  • ZIO effect wrappers and codecs
  • ZIO Prelude API
  • Shardcake serializer

Rust integration points are currently lower-level: implement the codec traits, use the byte helper functions, and integrate those bytes with your framework of choice.

Compatibility expectation

The projects are similar, but not every Rust type has a Scala equivalent and not every Scala codec has a Rust equivalent. For cross-language data, prefer a small golden dataset and test it from both sides. Pay special attention to:

  • char encoding and Options::scala_compatible()
  • enabled Rust feature flags
  • enum constructor order
  • byte collection formats
  • derived evolution history

Type registry

The original Scala desert library includes a type registry for cases where the concrete type is not known at compile time. A serialized value carries a compact type id, and the registry maps that id back to a codec during deserialization.

desert-rust does not currently expose an equivalent public type registry API. The placeholder remains in the book because this is an important concept in the Scala documentation and a likely future Rust feature.

What to use today

For closed sets of alternatives, use an enum:

use desert_rust::BinaryCodec;

#[derive(Debug, Clone, PartialEq, BinaryCodec)]
enum Message {
    Ping,
    Rename { id: u64, name: String },
    Delete { id: u64 },
}

This is the most idiomatic Rust option when all possible variants are known to the crate defining the protocol.

For open sets, define an application-level tag and implement the codec traits manually:

use desert_rust::{
    BinaryDeserializer, BinaryOutput, BinarySerializer, DeserializationContext,
    Error, Result, SerializationContext,
};

trait PluginMessage {}

struct TextMessage(String);
impl PluginMessage for TextMessage {}

enum AnyMessage {
    Text(TextMessage),
}

impl BinarySerializer for AnyMessage {
    fn serialize<Output: BinaryOutput>(
        &self,
        context: &mut SerializationContext<Output>,
    ) -> Result<()> {
        match self {
            AnyMessage::Text(TextMessage(text)) => {
                1u32.serialize(context)?;
                text.serialize(context)
            }
        }
    }
}

impl BinaryDeserializer for AnyMessage {
    fn deserialize(context: &mut DeserializationContext<'_>) -> Result<Self> {
        match u32::deserialize(context)? {
            1 => Ok(AnyMessage::Text(TextMessage(String::deserialize(context)?))),
            other => Err(Error::InvalidConstructorId {
                constructor_id: other,
                type_name: "AnyMessage".to_string(),
            }),
        }
    }
}

Keep ids stable once data has been written. If an id is retired, leave it reserved so newer variants do not take over old meanings.

Difference from Scala

In Scala, the type registry is part of the public library API and is used by integrations such as actor serializers. In Rust, framework integration is currently expected to happen at the byte boundary: serialize a statically known type, or define your own dynamic envelope type as shown above.