Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Codecs and derivation

A codec is the pair of traits that defines how a type is written and read:

use desert_rust::{BinaryDeserializer, BinarySerializer};

trait BinaryCodec: BinarySerializer + BinaryDeserializer {}

In the crate this is a blanket trait: any type implementing both serializer and deserializer automatically implements BinaryCodec.

Built-in codecs

The desert_rust crate re-exports the core implementations. The always available codecs include:

  • integers: u8, i8, u16, i16, u32, i32, u64, i64, u128, i128, usize, isize
  • non-zero integers from std::num
  • floats: f32, f64
  • bool, (), char, String, str
  • std::time::Duration
  • Option<T> and Result<T, E>
  • std::ops::Bound<T> and Range<T>
  • bytes::Bytes
  • arrays, Vec<T>, VecDeque<T>, LinkedList<T>
  • HashSet<T>, BTreeSet<T>, HashMap<K, V>, BTreeMap<K, V>
  • Box<T>, Rc<T>, Arc<T>, references, and PhantomData<T>
  • std::net::IpAddr
  • tuples from arity 1 to 8

Feature flags control codecs for third-party types:

FeatureTypes
bigdecimalbigdecimal::BigDecimal, bigdecimal::num_bigint::BigInt
bit-vecbit_vec::BitVec
chronochrono dates, times, offsets, chrono_tz::Tz
mac_addressmac_address::MacAddress
nonempty-collectionsnonempty_collections::NEVec<T>
serde-jsonserde_json::Value
urlurl::Url
uuiduuid::Uuid

The facade currently pulls in the desert_core default feature set, so bigdecimal, chrono, uuid, nonempty-collections, and serde-json are enabled by default. Enable bit-vec, mac_address, or url explicitly when you need those codecs.

The same generator is used for optional third-party codecs:

uuid

let value = uuid::Uuid::from_bytes([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]);
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F, 0x10]
0102030405060708090A0B0C0D0E0F10
uuidraw UUID bytes

chrono::NaiveDate

let value = chrono::NaiveDate::from_ymd_opt(2024, 6, 22).unwrap();
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0xE8, 0x0F, 0x06, 0x16]
E80F0616
yearyear as var_u32monthmonth bytedayday byte

chrono::NaiveTime

let value = chrono::NaiveTime::from_hms_nano_opt(9, 30, 5, 125).unwrap();
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x09, 0x1E, 0x05, 0x7D]
091E057D
hourhour byteminuteminute bytesecondsecond bytenanosnanosecond fraction as var_u32

bigdecimal

let value: bigdecimal::BigDecimal = "123.45".parse().unwrap();
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x0C, 0x31, 0x32, 0x33, 0x2E, 0x34, 0x35]
0C3132332E3435
lengthbyte length encoded as var_i32decimalUTF-8 bytes

bit-vec

let value = bit_vec::BitVec::from_bytes(&[0b1010_0000]);
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x01, 0xA0]
01A0
lengthbyte count for packed bitsbitspacked bit payload

mac_address

let value = mac_address::MacAddress::new([0, 17, 34, 51, 68, 85]);
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x00, 0x11, 0x22, 0x33, 0x44, 0x55]
001122334455
macsix raw address bytes

url

let value = url::Url::parse("https://desert-rust.vigoo.dev/").unwrap();
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x3C, 0x68, 0x74, 0x74, 0x70, 0x73, 0x3A, 0x2F, 0x2F, 0x64, 0x65, 0x73, 0x65, 0x72, 0x74, 0x2D, 0x72, 0x75, 0x73, 0x74, 0x2E, 0x76, 0x69, 0x67, 0x6F, 0x6F, 0x2E, 0x64, 0x65, 0x76, 0x2F]
3C68747470733A2F2F6465736572742D
lengthbyte length encoded as var_i32URLUTF-8 bytes
727573742E7669676F6F2E6465762F
URLUTF-8 bytes

serde_json

let value = serde_json::json!({ "a": 1 });
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x07, 0x7B, 0x22, 0x61, 0x22, 0x3A, 0x31, 0x7D]
077B2261223A317D
lengthJSON byte count as var_u32JSONcompact JSON bytes

nonempty-collections

let value = nonempty_collections::NEVec::try_from_vec(vec![1u8, 2, 3]).unwrap();
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x03, 0x01, 0x02, 0x03]
03010203
lengthnon-empty byte count as var_u32bytesraw byte payload

Primitive representation

Fixed-width numeric types are written in big-endian byte order:

use desert_rust::{serialize_to_byte_vec, Result};

fn main() -> Result<()> {
    assert_eq!(serialize_to_byte_vec(&100u16)?, vec![0, 100]);
    assert_eq!(serialize_to_byte_vec(&100u32)?, vec![0, 0, 0, 100]);
    Ok(())
}

bool is encoded as a single byte: 0 for false, 1 for true. String and str are encoded as a variable-length signed byte count followed by UTF-8 bytes.

i32

let bytes = desert_rust::serialize_to_byte_vec(&42i32)?;
[0x00, 0x00, 0x00, 0x2A]
0000002A
i32fixed-width big-endian signed integer

u16

let bytes = desert_rust::serialize_to_byte_vec(&1000u16)?;
[0x03, 0xE8]
03E8
u16fixed-width big-endian unsigned integer

bool

let bytes = desert_rust::serialize_to_byte_vec(&true)?;
[0x01]
01
trueone byte: 1 for true, 0 for false

unit

let bytes = desert_rust::serialize_to_byte_vec(&())?;
[]

char

let bytes = desert_rust::serialize_to_byte_vec(&'λ')?;
[0xBB, 0x07]
BB07
code pointUnicode scalar value written as var_u32

String

let bytes = desert_rust::serialize_to_byte_vec(&"desert".to_string())?;
[0x0C, 0x64, 0x65, 0x73, 0x65, 0x72, 0x74]
0C646573657274
lengthbyte length encoded as var_i32UTF-8UTF-8 bytes

Option<T> and Result<T, E> start with a single tag byte and then write only the payload selected by that tag:

Option::Some

let bytes = desert_rust::serialize_to_byte_vec(&Some(7i32))?;
[0x01, 0x00, 0x00, 0x00, 0x07]
0100000007
Somepresence markervalueinner i32 payload

Option::None

let bytes = desert_rust::serialize_to_byte_vec(&Option::<i32>::None)?;
[0x00]
00
Noneabsence marker

Result::Ok

let value: Result<i32, String> = Ok(7);
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x01, 0x00, 0x00, 0x00, 0x07]
0100000007
Okresult markervaluesuccess payload

Result::Err

let value: Result<i32, String> = Err("no".to_string());
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x00, 0x04, 0x6E, 0x6F]
00046E6F
Errresult markerlengthbyte length encoded as var_i32errorUTF-8 bytes

Vec<u8>, [u8], [u8; N], bytes::Bytes, and NEVec<u8> use an optimized byte-block encoding: a variable-length unsigned length followed by raw bytes. This is intentionally compatible with the Scala library’s byte chunk format.

Vec<u8>

let bytes = desert_rust::serialize_to_byte_vec(&vec![1u8, 2, 3, 4])?;
[0x04, 0x01, 0x02, 0x03, 0x04]
0401020304
lengthvar_u32 byte countbytesraw byte payload

bytes::Bytes

let value = bytes::Bytes::from_static(b"abc");
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x03, 0x61, 0x62, 0x63]
03616263
lengthvar_u32 byte countbytesraw byte payload

Collections

All generic iterable collection codecs share the same representation:

  • If the iterator reports an exact size, desert writes that size as a variable-length signed integer and then all elements.
  • If the size is not known, desert writes -1, then each element prefixed by a 1 byte, then a final 0 byte.

Because the representation is shared, many collection changes are binary compatible. For example, a Vec<i32> can be read as a LinkedList<i32>, and a BTreeSet<i32> can be read as a HashSet<i32>, as long as the target collection’s type constraints are satisfied.

Vec<i32>

let bytes = desert_rust::serialize_to_byte_vec(&vec![1i32, 2, 3])?;
[0x06, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x00, 0x03]
060000000100000002
countexact-size iterable count as var_i32item 0first i32item 1second i32
00000003
item 2third i32

BTreeMap<String, i32>

let value = std::collections::BTreeMap::from([
    ("a".to_string(), 1i32),
    ("b".to_string(), 2i32),
]);
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x04, 0x00, 0x02, 0x61, 0x00, 0x00, 0x00, 0x01, 0x00, 0x02, 0x62, 0x00, 0x00, 0x00, 0x02]
0400026100000001
countexact-size iterable count as var_i32entrytuple marker for key/value pairkey astring key: length plus UTF-8value 1i32 map value
00026200000002
entrytuple marker for key/value pairkey bstring key: length plus UTF-8value 2i32 map value

(i32, bool)

let bytes = desert_rust::serialize_to_byte_vec(&(42i32, true))?;
[0x00, 0x00, 0x00, 0x00, 0x2A, 0x01]
000000002A01
versiontuple payload marker compatible with version-0 structsfield 0first tuple itemfield 1second tuple item

Deriving structs

Use #[derive(BinaryCodec)] for ordinary named-field structs:

use desert_rust::BinaryCodec;

#[derive(Debug, Clone, PartialEq, BinaryCodec)]
struct User {
    id: u64,
    name: String,
    email: Option<String>,
}

The generated format starts with a version byte. Version 0 structs are compatible with tuples of the same field order and arity, which allows simple tuple-to-struct migrations.

derived struct

#[derive(Debug, Clone, PartialEq, desert_rust::BinaryCodec)]
struct User {
    id: u32,
    name: String,
    email: Option<String>,
}

let value = User {
    id: 7,
    name: "Ada".to_string(),
    email: None,
};
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x00, 0x00, 0x00, 0x00, 0x07, 0x06, 0x41, 0x64, 0x61, 0x00]
000000000706416461
versionversion-0 struct markeridu32 fieldname lengthstring byte count as var_i32name UTF-8string bytes
00
emailOption::None marker

For generic types, the derive macro adds serializer and deserializer bounds for generic parameters:

use desert_rust::BinaryCodec;

#[derive(Debug, Clone, PartialEq, BinaryCodec)]
struct Wrapper<T> {
    value: T,
}

Deriving enums

Enums are encoded as a constructor id followed by constructor payload data:

use desert_rust::BinaryCodec;

#[derive(Debug, Clone, PartialEq, BinaryCodec)]
enum Event {
    Started,
    Message(String),
    Moved { x: i32, y: i32 },
}

Constructor ids are assigned from the enum variant order, skipping transient variants. Adding new variants at the end is compatible with old data, but old code cannot read values using the new variant.

derived enum

#[derive(Debug, Clone, PartialEq, desert_rust::BinaryCodec)]
enum Event {
    Started,
    Message(String),
    Moved { x: i32, y: i32 },
}

let bytes = desert_rust::serialize_to_byte_vec(&Event::Message("hi".to_string()))?;
[0x00, 0x01, 0x00, 0x04, 0x68, 0x69]
000100046869
versionouter enum versionconstructorvariant id as var_u32case versionvariant payload versionlengthbyte length encoded as var_i32payloadUTF-8 bytes

You can ask the derive macro to assign constructor ids by sorted variant name:

use desert_rust::BinaryCodec;

#[derive(Debug, Clone, PartialEq, BinaryCodec)]
#[desert(sorted_constructors)]
enum StableByName {
    B,
    A,
}

Use this only when all versions agree on the same naming scheme. Reordering without sorted_constructors changes constructor ids and breaks compatibility.

Transparent wrappers

Single-field structs can be encoded exactly as their inner type:

use desert_rust::BinaryCodec;

#[derive(Debug, Clone, PartialEq, BinaryCodec)]
#[desert(transparent)]
struct UserId(u64);

This is the Rust equivalent of using the Scala wrapper derivation. It is useful when a primitive value is promoted to a domain-specific newtype without changing the wire format.

Transparent enum variants are also supported for unit variants and single-field variants:

use desert_rust::BinaryCodec;

#[derive(Debug, Clone, PartialEq, BinaryCodec)]
enum Value {
    #[desert(transparent)]
    Text(String),
    Structured { value: String },
}

The transparent variant still has an enum constructor id. The attribute affects how the variant payload is encoded.

Transient fields and variants

A transient field is not serialized. It must provide a default expression used when deserializing:

use desert_rust::BinaryCodec;

#[derive(Debug, Clone, PartialEq, BinaryCodec)]
struct Cached {
    value: String,
    #[transient(None::<usize>)]
    cached_len: Option<usize>,
}

Transient enum variants are not assigned constructor ids. Serializing such a variant returns Error::SerializingTransientConstructor.

use desert_rust::BinaryCodec;

#[derive(Debug, Clone, PartialEq, BinaryCodec)]
enum State {
    Stored,
    #[transient]
    RuntimeOnly,
}

Transient variants can be inserted or removed without shifting the ids of stored variants.

Custom codecs

Implement BinarySerializer and BinaryDeserializer manually when the derived format is not appropriate:

use desert_rust::{
    BinaryDeserializer, BinaryOutput, BinarySerializer, DeserializationContext,
    Result, SerializationContext,
};

#[derive(Debug, PartialEq)]
struct Lowercase(String);

impl BinarySerializer for Lowercase {
    fn serialize<Output: BinaryOutput>(
        &self,
        context: &mut SerializationContext<Output>,
    ) -> Result<()> {
        self.0.to_lowercase().serialize(context)
    }
}

impl BinaryDeserializer for Lowercase {
    fn deserialize(context: &mut DeserializationContext<'_>) -> Result<Self> {
        Ok(Self(String::deserialize(context)?))
    }
}

The enum derive macro can also wrap a single-field variant through a custom type. The wrapper type must be constructible from a borrowed value in the shape expected by the macro, so this is mainly useful for specialized string wrappers.

String deduplication

Normal String serialization does not deduplicate values. This keeps schema evolution safe: when an older reader skips a newly added string field, it does not accidentally miss a string id assignment needed by a later field.

For streams where the writer and reader agree that deduplication is safe, wrap values in DeduplicatedString:

use bytes::BytesMut;
use desert_rust::{
    BinaryDeserializer, BinarySerializer, DeduplicatedString, DeserializationContext,
    Options, Result, SerializationContext,
};

fn main() -> Result<()> {
    let mut output = SerializationContext::new(BytesMut::new(), Options::default());

    DeduplicatedString("same".to_string()).serialize(&mut output)?;
    DeduplicatedString("same".to_string()).serialize(&mut output)?;

    let bytes = output.into_output();
    let mut input = DeserializationContext::new(&bytes, Options::default());

    let first = DeduplicatedString::deserialize(&mut input)?.0;
    let second = DeduplicatedString::deserialize(&mut input)?.0;

    assert_eq!(first, second);
    Ok(())
}

The first occurrence is encoded like a normal string. Later occurrences in the same serialization context are encoded as a negative id.

DeduplicatedString

let mut context = desert_rust::SerializationContext::new(
    Vec::new(),
    desert_rust::Options::default(),
);
desert_rust::DeduplicatedString("same".to_string()).serialize(&mut context)?;
desert_rust::DeduplicatedString("same".to_string()).serialize(&mut context)?;
let bytes = context.into_output();
[0x08, 0x73, 0x61, 0x6D, 0x65, 0x01]
0873616D6501
first lengthfirst occurrence uses normal string lengthfirst UTF-8first string bytesrepeat idnegative string id encoded as var_i32