Getting started with desert-rust
desert-rust is a binary serialization library for Rust. It focuses on compact
binary data while still allowing compatible changes to structs and enums over
time.
The Rust crate is the counterpart of the original Scala desert library. The
wire format is intentionally similar, but the API is shaped around Rust traits,
derive macros, feature flags, and explicit error handling.
Installation
Add the public crate to Cargo.toml:
[dependencies]
desert_rust = "0.1.8"
Additional codecs are controlled with crate features:
[dependencies]
desert_rust = { version = "0.1.8", features = ["uuid", "chrono", "url"] }
Feature flags exposed by the public crate are:
bigdecimalbit-vecchronomac_addressnonempty-collectionsserde-jsonurluuid
The current desert_core default features already enable bigdecimal,
chrono, uuid, nonempty-collections, and serde-json. Features such as
url, mac_address, and bit-vec must be enabled explicitly.
Serialize and deserialize a known type
The most direct API works with a Vec<u8> or bytes::Bytes:
use desert_rust::{deserialize, serialize_to_byte_vec, Result};
fn main() -> Result<()> {
let bytes = serialize_to_byte_vec(&"Hello world".to_string())?;
let value: String = deserialize(&bytes)?;
assert_eq!(value, "Hello world");
Ok(())
}
This works because String implements both BinarySerializer and
BinaryDeserializer. Their combination is named BinaryCodec.
Derive codecs for your data
For structs and enums, derive BinaryCodec:
use desert_rust::{deserialize, serialize_to_byte_vec, BinaryCodec, Result};
#[derive(Debug, PartialEq, BinaryCodec)]
struct Point {
x: i32,
y: i32,
}
#[derive(Debug, PartialEq, BinaryCodec)]
enum Command {
Move { to: Point },
Label(String),
Stop,
}
fn main() -> Result<()> {
let command = Command::Move {
to: Point { x: 10, y: -5 },
};
let bytes = serialize_to_byte_vec(&command)?;
let decoded: Command = deserialize(&bytes)?;
assert_eq!(decoded, command);
Ok(())
}
The derive macro generates trait implementations and metadata needed for schema evolution. There is no runtime reflection or registration step for statically known types.
Top-level helper functions
The main helper functions are:
serialize(value, output)writes to anyBinaryOutput.serialize_with_options(value, output, options)does the same with explicit options.serialize_to_byte_vec(value)returnsVec<u8>.serialize_to_bytes(value)returnsbytes::Bytes.deserialize::<T>(input)readsTfrom a byte slice.deserialize_with_options::<T>(input, options)reads with explicit options.
Scala compatibility option
Rust char normally serializes as a variable-length Unicode scalar value.
The Scala library encoded characters as 16-bit Unicode units. Use
Options::scala_compatible() when reading or writing data that must match the
Scala character encoding:
use desert_rust::{
deserialize_with_options, serialize_to_byte_vec_with_options, Options, Result,
};
fn main() -> Result<()> {
let options = Options::scala_compatible();
let bytes = serialize_to_byte_vec_with_options(&'x', options.clone())?;
let decoded: char = deserialize_with_options(&bytes, options)?;
assert_eq!(decoded, 'x');
Ok(())
}
If a character cannot be represented as a single 16-bit unit in this mode,
serialization fails with Error::UnsupportedCharacter.
Building this book locally
The byte-layout examples in this book are generated by the mdbook-desert
preprocessor. Build it first, then put Cargo’s debug binary directory on
PATH while running mdBook:
cargo build -p desert_book
PATH="$PWD/target/debug:$PATH" mdbook build book
Where to go next
Read Codecs and derivation for built-in types and custom codec implementations. Read Data model evolution before persisting data long term or sending it between independently deployed versions.
Codecs and derivation
A codec is the pair of traits that defines how a type is written and read:
use desert_rust::{BinaryDeserializer, BinarySerializer};
trait BinaryCodec: BinarySerializer + BinaryDeserializer {}
In the crate this is a blanket trait: any type implementing both serializer and
deserializer automatically implements BinaryCodec.
Built-in codecs
The desert_rust crate re-exports the core implementations. The always
available codecs include:
- integers:
u8,i8,u16,i16,u32,i32,u64,i64,u128,i128,usize,isize - non-zero integers from
std::num - floats:
f32,f64 bool,(),char,String,strstd::time::DurationOption<T>andResult<T, E>std::ops::Bound<T>andRange<T>bytes::Bytes- arrays,
Vec<T>,VecDeque<T>,LinkedList<T> HashSet<T>,BTreeSet<T>,HashMap<K, V>,BTreeMap<K, V>Box<T>,Rc<T>,Arc<T>, references, andPhantomData<T>std::net::IpAddr- tuples from arity 1 to 8
Feature flags control codecs for third-party types:
| Feature | Types |
|---|---|
bigdecimal | bigdecimal::BigDecimal, bigdecimal::num_bigint::BigInt |
bit-vec | bit_vec::BitVec |
chrono | chrono dates, times, offsets, chrono_tz::Tz |
mac_address | mac_address::MacAddress |
nonempty-collections | nonempty_collections::NEVec<T> |
serde-json | serde_json::Value |
url | url::Url |
uuid | uuid::Uuid |
The facade currently pulls in the desert_core default feature set, so
bigdecimal, chrono, uuid, nonempty-collections, and serde-json are
enabled by default. Enable bit-vec, mac_address, or url explicitly when
you need those codecs.
The same generator is used for optional third-party codecs:
uuid
let value = uuid::Uuid::from_bytes([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]);
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F, 0x10]
01 | 02 | 03 | 04 | 05 | 06 | 07 | 08 | 09 | 0A | 0B | 0C | 0D | 0E | 0F | 10 |
| uuidraw UUID bytes | |||||||||||||||
chrono::NaiveDate
let value = chrono::NaiveDate::from_ymd_opt(2024, 6, 22).unwrap();
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0xE8, 0x0F, 0x06, 0x16]
E8 | 0F | 06 | 16 |
| yearyear as var_u32 | monthmonth byte | dayday byte | |
chrono::NaiveTime
let value = chrono::NaiveTime::from_hms_nano_opt(9, 30, 5, 125).unwrap();
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x09, 0x1E, 0x05, 0x7D]
09 | 1E | 05 | 7D |
| hourhour byte | minuteminute byte | secondsecond byte | nanosnanosecond fraction as var_u32 |
bigdecimal
let value: bigdecimal::BigDecimal = "123.45".parse().unwrap();
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x0C, 0x31, 0x32, 0x33, 0x2E, 0x34, 0x35]
0C | 31 | 32 | 33 | 2E | 34 | 35 |
| lengthbyte length encoded as var_i32 | decimalUTF-8 bytes | |||||
bit-vec
let value = bit_vec::BitVec::from_bytes(&[0b1010_0000]);
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x01, 0xA0]
01 | A0 |
| lengthbyte count for packed bits | bitspacked bit payload |
mac_address
let value = mac_address::MacAddress::new([0, 17, 34, 51, 68, 85]);
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x00, 0x11, 0x22, 0x33, 0x44, 0x55]
00 | 11 | 22 | 33 | 44 | 55 |
| macsix raw address bytes | |||||
url
let value = url::Url::parse("https://desert-rust.vigoo.dev/").unwrap();
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x3C, 0x68, 0x74, 0x74, 0x70, 0x73, 0x3A, 0x2F, 0x2F, 0x64, 0x65, 0x73, 0x65, 0x72, 0x74, 0x2D, 0x72, 0x75, 0x73, 0x74, 0x2E, 0x76, 0x69, 0x67, 0x6F, 0x6F, 0x2E, 0x64, 0x65, 0x76, 0x2F]
3C | 68 | 74 | 74 | 70 | 73 | 3A | 2F | 2F | 64 | 65 | 73 | 65 | 72 | 74 | 2D |
| lengthbyte length encoded as var_i32 | URLUTF-8 bytes | ||||||||||||||
72 | 75 | 73 | 74 | 2E | 76 | 69 | 67 | 6F | 6F | 2E | 64 | 65 | 76 | 2F | |
| URLUTF-8 bytes | |||||||||||||||
serde_json
let value = serde_json::json!({ "a": 1 });
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x07, 0x7B, 0x22, 0x61, 0x22, 0x3A, 0x31, 0x7D]
07 | 7B | 22 | 61 | 22 | 3A | 31 | 7D |
| lengthJSON byte count as var_u32 | JSONcompact JSON bytes | ||||||
nonempty-collections
let value = nonempty_collections::NEVec::try_from_vec(vec![1u8, 2, 3]).unwrap();
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x03, 0x01, 0x02, 0x03]
03 | 01 | 02 | 03 |
| lengthnon-empty byte count as var_u32 | bytesraw byte payload | ||
Primitive representation
Fixed-width numeric types are written in big-endian byte order:
use desert_rust::{serialize_to_byte_vec, Result};
fn main() -> Result<()> {
assert_eq!(serialize_to_byte_vec(&100u16)?, vec![0, 100]);
assert_eq!(serialize_to_byte_vec(&100u32)?, vec![0, 0, 0, 100]);
Ok(())
}
bool is encoded as a single byte: 0 for false, 1 for true.
String and str are encoded as a variable-length signed byte count followed
by UTF-8 bytes.
i32
let bytes = desert_rust::serialize_to_byte_vec(&42i32)?;
[0x00, 0x00, 0x00, 0x2A]
00 | 00 | 00 | 2A |
| i32fixed-width big-endian signed integer | |||
u16
let bytes = desert_rust::serialize_to_byte_vec(&1000u16)?;
[0x03, 0xE8]
03 | E8 |
| u16fixed-width big-endian unsigned integer | |
bool
let bytes = desert_rust::serialize_to_byte_vec(&true)?;
[0x01]
01 |
| trueone byte: 1 for true, 0 for false |
unit
let bytes = desert_rust::serialize_to_byte_vec(&())?;
[]
char
let bytes = desert_rust::serialize_to_byte_vec(&'λ')?;
[0xBB, 0x07]
BB | 07 |
| code pointUnicode scalar value written as var_u32 | |
String
let bytes = desert_rust::serialize_to_byte_vec(&"desert".to_string())?;
[0x0C, 0x64, 0x65, 0x73, 0x65, 0x72, 0x74]
0C | 64 | 65 | 73 | 65 | 72 | 74 |
| lengthbyte length encoded as var_i32 | UTF-8UTF-8 bytes | |||||
Option<T> and Result<T, E> start with a single tag byte and then write only
the payload selected by that tag:
Option::Some
let bytes = desert_rust::serialize_to_byte_vec(&Some(7i32))?;
[0x01, 0x00, 0x00, 0x00, 0x07]
01 | 00 | 00 | 00 | 07 |
| Somepresence marker | valueinner i32 payload | |||
Option::None
let bytes = desert_rust::serialize_to_byte_vec(&Option::<i32>::None)?;
[0x00]
00 |
| Noneabsence marker |
Result::Ok
let value: Result<i32, String> = Ok(7);
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x01, 0x00, 0x00, 0x00, 0x07]
01 | 00 | 00 | 00 | 07 |
| Okresult marker | valuesuccess payload | |||
Result::Err
let value: Result<i32, String> = Err("no".to_string());
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x00, 0x04, 0x6E, 0x6F]
00 | 04 | 6E | 6F |
| Errresult marker | lengthbyte length encoded as var_i32 | errorUTF-8 bytes | |
Vec<u8>, [u8], [u8; N], bytes::Bytes, and NEVec<u8> use an optimized
byte-block encoding: a variable-length unsigned length followed by raw bytes.
This is intentionally compatible with the Scala library’s byte chunk format.
Vec<u8>
let bytes = desert_rust::serialize_to_byte_vec(&vec![1u8, 2, 3, 4])?;
[0x04, 0x01, 0x02, 0x03, 0x04]
04 | 01 | 02 | 03 | 04 |
| lengthvar_u32 byte count | bytesraw byte payload | |||
bytes::Bytes
let value = bytes::Bytes::from_static(b"abc");
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x03, 0x61, 0x62, 0x63]
03 | 61 | 62 | 63 |
| lengthvar_u32 byte count | bytesraw byte payload | ||
Collections
All generic iterable collection codecs share the same representation:
- If the iterator reports an exact size, desert writes that size as a variable-length signed integer and then all elements.
- If the size is not known, desert writes
-1, then each element prefixed by a1byte, then a final0byte.
Because the representation is shared, many collection changes are binary
compatible. For example, a Vec<i32> can be read as a LinkedList<i32>, and a
BTreeSet<i32> can be read as a HashSet<i32>, as long as the target
collection’s type constraints are satisfied.
Vec<i32>
let bytes = desert_rust::serialize_to_byte_vec(&vec![1i32, 2, 3])?;
[0x06, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x00, 0x03]
06 | 00 | 00 | 00 | 01 | 00 | 00 | 00 | 02 |
| countexact-size iterable count as var_i32 | item 0first i32 | item 1second i32 | ||||||
00 | 00 | 00 | 03 | |||||
| item 2third i32 | ||||||||
BTreeMap<String, i32>
let value = std::collections::BTreeMap::from([
("a".to_string(), 1i32),
("b".to_string(), 2i32),
]);
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x04, 0x00, 0x02, 0x61, 0x00, 0x00, 0x00, 0x01, 0x00, 0x02, 0x62, 0x00, 0x00, 0x00, 0x02]
04 | 00 | 02 | 61 | 00 | 00 | 00 | 01 |
| countexact-size iterable count as var_i32 | entrytuple marker for key/value pair | key astring key: length plus UTF-8 | value 1i32 map value | ||||
00 | 02 | 62 | 00 | 00 | 00 | 02 | |
| entrytuple marker for key/value pair | key bstring key: length plus UTF-8 | value 2i32 map value | |||||
(i32, bool)
let bytes = desert_rust::serialize_to_byte_vec(&(42i32, true))?;
[0x00, 0x00, 0x00, 0x00, 0x2A, 0x01]
00 | 00 | 00 | 00 | 2A | 01 |
| versiontuple payload marker compatible with version-0 structs | field 0first tuple item | field 1second tuple item | |||
Deriving structs
Use #[derive(BinaryCodec)] for ordinary named-field structs:
use desert_rust::BinaryCodec;
#[derive(Debug, Clone, PartialEq, BinaryCodec)]
struct User {
id: u64,
name: String,
email: Option<String>,
}
The generated format starts with a version byte. Version 0 structs are
compatible with tuples of the same field order and arity, which allows simple
tuple-to-struct migrations.
derived struct
#[derive(Debug, Clone, PartialEq, desert_rust::BinaryCodec)]
struct User {
id: u32,
name: String,
email: Option<String>,
}
let value = User {
id: 7,
name: "Ada".to_string(),
email: None,
};
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x00, 0x00, 0x00, 0x00, 0x07, 0x06, 0x41, 0x64, 0x61, 0x00]
00 | 00 | 00 | 00 | 07 | 06 | 41 | 64 | 61 |
| versionversion-0 struct marker | idu32 field | name lengthstring byte count as var_i32 | name UTF-8string bytes | |||||
00 | ||||||||
| emailOption::None marker | ||||||||
For generic types, the derive macro adds serializer and deserializer bounds for generic parameters:
use desert_rust::BinaryCodec;
#[derive(Debug, Clone, PartialEq, BinaryCodec)]
struct Wrapper<T> {
value: T,
}
Deriving enums
Enums are encoded as a constructor id followed by constructor payload data:
use desert_rust::BinaryCodec;
#[derive(Debug, Clone, PartialEq, BinaryCodec)]
enum Event {
Started,
Message(String),
Moved { x: i32, y: i32 },
}
Constructor ids are assigned from the enum variant order, skipping transient variants. Adding new variants at the end is compatible with old data, but old code cannot read values using the new variant.
derived enum
#[derive(Debug, Clone, PartialEq, desert_rust::BinaryCodec)]
enum Event {
Started,
Message(String),
Moved { x: i32, y: i32 },
}
let bytes = desert_rust::serialize_to_byte_vec(&Event::Message("hi".to_string()))?;
[0x00, 0x01, 0x00, 0x04, 0x68, 0x69]
00 | 01 | 00 | 04 | 68 | 69 |
| versionouter enum version | constructorvariant id as var_u32 | case versionvariant payload version | lengthbyte length encoded as var_i32 | payloadUTF-8 bytes | |
You can ask the derive macro to assign constructor ids by sorted variant name:
use desert_rust::BinaryCodec;
#[derive(Debug, Clone, PartialEq, BinaryCodec)]
#[desert(sorted_constructors)]
enum StableByName {
B,
A,
}
Use this only when all versions agree on the same naming scheme. Reordering
without sorted_constructors changes constructor ids and breaks compatibility.
Transparent wrappers
Single-field structs can be encoded exactly as their inner type:
use desert_rust::BinaryCodec;
#[derive(Debug, Clone, PartialEq, BinaryCodec)]
#[desert(transparent)]
struct UserId(u64);
This is the Rust equivalent of using the Scala wrapper derivation. It is useful when a primitive value is promoted to a domain-specific newtype without changing the wire format.
Transparent enum variants are also supported for unit variants and single-field variants:
use desert_rust::BinaryCodec;
#[derive(Debug, Clone, PartialEq, BinaryCodec)]
enum Value {
#[desert(transparent)]
Text(String),
Structured { value: String },
}
The transparent variant still has an enum constructor id. The attribute affects how the variant payload is encoded.
Transient fields and variants
A transient field is not serialized. It must provide a default expression used when deserializing:
use desert_rust::BinaryCodec;
#[derive(Debug, Clone, PartialEq, BinaryCodec)]
struct Cached {
value: String,
#[transient(None::<usize>)]
cached_len: Option<usize>,
}
Transient enum variants are not assigned constructor ids. Serializing such a
variant returns Error::SerializingTransientConstructor.
use desert_rust::BinaryCodec;
#[derive(Debug, Clone, PartialEq, BinaryCodec)]
enum State {
Stored,
#[transient]
RuntimeOnly,
}
Transient variants can be inserted or removed without shifting the ids of stored variants.
Custom codecs
Implement BinarySerializer and BinaryDeserializer manually when the derived
format is not appropriate:
use desert_rust::{
BinaryDeserializer, BinaryOutput, BinarySerializer, DeserializationContext,
Result, SerializationContext,
};
#[derive(Debug, PartialEq)]
struct Lowercase(String);
impl BinarySerializer for Lowercase {
fn serialize<Output: BinaryOutput>(
&self,
context: &mut SerializationContext<Output>,
) -> Result<()> {
self.0.to_lowercase().serialize(context)
}
}
impl BinaryDeserializer for Lowercase {
fn deserialize(context: &mut DeserializationContext<'_>) -> Result<Self> {
Ok(Self(String::deserialize(context)?))
}
}
The enum derive macro can also wrap a single-field variant through a custom type. The wrapper type must be constructible from a borrowed value in the shape expected by the macro, so this is mainly useful for specialized string wrappers.
String deduplication
Normal String serialization does not deduplicate values. This keeps schema
evolution safe: when an older reader skips a newly added string field, it does
not accidentally miss a string id assignment needed by a later field.
For streams where the writer and reader agree that deduplication is safe, wrap
values in DeduplicatedString:
use bytes::BytesMut;
use desert_rust::{
BinaryDeserializer, BinarySerializer, DeduplicatedString, DeserializationContext,
Options, Result, SerializationContext,
};
fn main() -> Result<()> {
let mut output = SerializationContext::new(BytesMut::new(), Options::default());
DeduplicatedString("same".to_string()).serialize(&mut output)?;
DeduplicatedString("same".to_string()).serialize(&mut output)?;
let bytes = output.into_output();
let mut input = DeserializationContext::new(&bytes, Options::default());
let first = DeduplicatedString::deserialize(&mut input)?.0;
let second = DeduplicatedString::deserialize(&mut input)?.0;
assert_eq!(first, second);
Ok(())
}
The first occurrence is encoded like a normal string. Later occurrences in the same serialization context are encoded as a negative id.
DeduplicatedString
let mut context = desert_rust::SerializationContext::new(
Vec::new(),
desert_rust::Options::default(),
);
desert_rust::DeduplicatedString("same".to_string()).serialize(&mut context)?;
desert_rust::DeduplicatedString("same".to_string()).serialize(&mut context)?;
let bytes = context.into_output();
[0x08, 0x73, 0x61, 0x6D, 0x65, 0x01]
08 | 73 | 61 | 6D | 65 | 01 |
| first lengthfirst occurrence uses normal string length | first UTF-8first string bytes | repeat idnegative string id encoded as var_i32 | |||
Binary input/output
The high-level helpers are enough for most use cases, but the core library is built around two low-level traits:
BinaryOutputwrites bytes.BinaryInputreads bytes and can skip regions.
Serialization and deserialization contexts implement these traits and add shared state for features such as string deduplication, reference tracking, options, and ADT evolution.
High-level output helpers
Use serialize_to_byte_vec when you want a Vec<u8>:
use desert_rust::{serialize_to_byte_vec, Result};
fn main() -> Result<()> {
let bytes = serialize_to_byte_vec(&42i32)?;
assert_eq!(bytes, vec![0, 0, 0, 42]);
Ok(())
}
Use serialize_to_bytes when you want bytes::Bytes:
use desert_rust::{serialize_to_bytes, Result};
fn main() -> Result<()> {
let bytes = serialize_to_bytes(&"hello".to_string())?;
assert_eq!(&bytes[..], &[10, b'h', b'e', b'l', b'l', b'o']);
Ok(())
}
The string length byte is 10 because signed variable integers use zig-zag
encoding internally. The logical string length is 5.
Choosing a Vec<u8> helper
serialize_to_byte_vec starts with a small default capacity and serializes the
value once. This is usually the right choice for small payloads, such as single
commands, individual events, request/response fragments, or anything likely to
fit within the default 128-byte buffer.
For larger values, repeated Vec growth can become visible. There are three
ways to avoid that:
- Use
serialize_to_byte_vec_with_capacitywhen you already know a good capacity estimate. - Use
serialize_into_byte_vecwhen serializing repeatedly and you can reuse an existing buffer. - Use
serialize_to_byte_vec_exactwhen the value is probably large but the caller does not know its serialized size.
serialize_to_byte_vec_exact first computes the exact serialized length with
serialized_size, then serializes into a Vec<u8> allocated with that capacity:
use desert_rust::{serialize_to_byte_vec_exact, Result};
fn main() -> Result<()> {
let batch = (0..10_000).collect::<Vec<u32>>();
let bytes = serialize_to_byte_vec_exact(&batch)?;
assert!(!bytes.is_empty());
Ok(())
}
This is a two-pass tradeoff. It avoids growth for large, one-off values, but it
does more work for small values. If the value fits in the default buffer,
serialize_to_byte_vec is normally faster. If you already know the size or can
reuse a buffer, the capacity or reusable-buffer helpers are usually faster than
the exact helper.
Writing to a custom output
The generic serialize function accepts any BinaryOutput. The built-in
outputs are:
Vec<u8>bytes::BytesMutSizeCalculator
SizeCalculator computes the number of bytes that would be written without
storing them:
use desert_rust::{serialize, Result, SizeCalculator};
fn main() -> Result<()> {
let output = serialize(&1234i32, SizeCalculator::new())?;
assert_eq!(output.size(), 4);
Ok(())
}
To write to a new destination, implement BinaryOutput:
use desert_rust::{BinaryOutput, Result};
struct CountingOutput {
count: usize,
}
impl BinaryOutput for CountingOutput {
fn write_u8(&mut self, _value: u8) {
self.count += 1;
}
fn write_bytes(&mut self, bytes: &[u8]) {
self.count += bytes.len();
}
}
The trait provides default implementations for fixed-width integers, floats, variable-length integers, and compressed byte blocks.
Reading input
The public top-level deserialize helper reads from &[u8]:
use desert_rust::{deserialize, Result};
fn main() -> Result<()> {
let value: i32 = deserialize(&[0, 0, 0, 42])?;
assert_eq!(value, 42);
Ok(())
}
For low-level code, SliceInput borrows bytes and OwnedInput owns a Vec<u8>:
use desert_rust::{BinaryInput, Result, SliceInput};
fn main() -> Result<()> {
let mut input = SliceInput::new(&[0, 0, 0, 42]);
let value = input.read_i32()?;
assert_eq!(value, 42);
Ok(())
}
Most custom deserializers should use DeserializationContext rather than
SliceInput directly, because the context also carries options and shared
state:
use desert_rust::{BinaryDeserializer, DeserializationContext, Options, Result};
fn main() -> Result<()> {
let mut context = DeserializationContext::new(&[0, 0, 0, 42], Options::default());
let value = i32::deserialize(&mut context)?;
assert_eq!(value, 42);
Ok(())
}
Variable-length integers
BinaryOutput::write_var_u32 writes a u32 in 1 to 5 bytes. Smaller positive
values use fewer bytes:
use desert_rust::{BinaryOutput, Result};
fn main() -> Result<()> {
let mut bytes = Vec::new();
bytes.write_var_u32(1);
bytes.write_var_u32(4096);
assert_eq!(bytes, vec![1, 128, 32]);
Ok(())
}
write_var_i32 uses zig-zag encoding before writing the value as var_u32.
This keeps small negative values compact too:
use desert_rust::BinaryOutput;
let mut bytes = Vec::new();
bytes.write_var_i32(-1);
assert_eq!(bytes, vec![1]);
The library uses variable-length integers for lengths, ids, and ADT evolution
metadata. Fixed-width Rust integer codecs such as i32 still use fixed-width
big-endian bytes.
var_u32
let mut bytes = Vec::new();
bytes.write_var_u32(16_384);
[0x80, 0x80, 0x01]
80 | 80 | 01 |
| var_u327-bit groups with continuation bits | ||
var_i32
let mut bytes = Vec::new();
bytes.write_var_i32(-64);
[0x7F]
7F |
| zig-zagsigned value zig-zag encoded, then var_u32 |
Compression helpers
BinaryOutput::write_compressed stores:
- the uncompressed length as
var_u32 - the compressed length as
var_u32 - the deflate-compressed bytes
BinaryInput::read_compressed reverses that representation:
use desert_rust::{BinaryInput, BinaryOutput, OwnedInput, Result};
fn main() -> Result<()> {
let data = b"hello hello hello";
let mut bytes = Vec::new();
bytes.write_compressed(data, Default::default())?;
let mut input = OwnedInput::new(bytes);
let decoded = input.read_compressed()?;
assert_eq!(decoded, data);
Ok(())
}
Compression is a low-level helper. The built-in type codecs do not compress their payloads automatically.
compressed block
let data = b"hello hello hello";
let mut bytes = Vec::new();
bytes.write_compressed(data, flate2::Compression::fast())?;
[0x11, 0x09, 0xCB, 0x48, 0xCD, 0xC9, 0xC9, 0x57, 0x40, 0x22, 0x01]
11 | 09 | CB | 48 | CD | C9 | C9 | 57 | 40 | 22 | 01 |
| plain lenuncompressed byte count as var_u32 | deflate lencompressed byte count as var_u32 | deflatedeflate payload | ||||||||
Byte blocks and iterables
Byte-oriented containers use a compact block format: a var_u32 byte count
followed by the bytes. Generic iterable containers use an item format that can
also represent streams with no exact size hint.
Vec<u8>
let bytes = desert_rust::serialize_to_byte_vec(&vec![1u8, 2, 3, 4])?;
[0x04, 0x01, 0x02, 0x03, 0x04]
04 | 01 | 02 | 03 | 04 |
| lengthvar_u32 byte count | bytesraw byte payload | |||
unknown-size iterable
let mut iter = [1i32, 2].into_iter().filter(|value| *value > 0);
desert_rust::serialize_iterator(
&mut iter,
&mut desert_rust::SerializationContext::new(&mut bytes, desert_rust::Options::default()),
)?;
[0x01, 0x01, 0x00, 0x00, 0x00, 0x01, 0x01, 0x00, 0x00, 0x00, 0x02, 0x00]
01 | 01 | 00 | 00 | 00 | 01 | 01 | 00 | 00 | 00 | 02 |
| unknown-1 count marker as var_i32 | itemitem-present marker | valuefirst i32 | itemitem-present marker | valuesecond i32 | ||||||
00 | ||||||||||
| endend marker | ||||||||||
Options
Options currently controls Scala-compatible character encoding:
use desert_rust::Options;
let default_options = Options::default();
let scala_options = Options::scala_compatible();
assert!(!default_options.chars_as_u16);
assert!(scala_options.chars_as_u16);
Pass options through serialize_with_options,
serialize_to_byte_vec_with_options, serialize_to_bytes_with_options, or
deserialize_with_options.
Context state
SerializationContext and DeserializationContext also hold per-stream state.
Built-in uses include:
- string ids for
DeduplicatedString - reference ids for custom reference-aware codecs
- nested buffer stacks used by ADT evolution encoding
If you implement a custom codec and only need ordinary values, call the existing
BinarySerializer and BinaryDeserializer implementations. Touching context
state directly is an advanced use case.
Data model evolution
One of desert’s main goals is allowing stored or transmitted data to survive controlled changes to Rust data types.
Evolution support is generated by #[derive(BinaryCodec)] for structs and
enums. The derive macro writes a compact version header and enough metadata for
older and newer versions to skip, default, or reinterpret fields where that is
safe.
Compatibility rules in short
Compatible changes:
- Convert a multi-field tuple into a struct with the same field order.
- Wrap a value in a
#[desert(transparent)]single-field struct. - Replace one collection type with another collection type using the same item representation.
- Add a struct field with
FieldAdded. - Make a field optional with
FieldMadeOptional. - Remove a field with
FieldRemoved, with the limits described below. - Make a field transient with
FieldMadeTransientplus#[transient(default)]. - Add an enum variant at the end of the enum.
- Insert or remove
#[transient]enum variants.
Breaking or risky changes:
- Rename a field without preserving the serialized field name. The current Rust derive macro does not have a field rename attribute.
- Reorder enum variants without
#[desert(sorted_constructors)]. - Remove an enum variant that has already been serialized.
- Change a field’s type unless the old and new types intentionally share a binary representation.
- Use
DeduplicatedStringin evolvable fields unless all readers and writers agree on the exact stream shape.
Tuples and structs
Version-0 structs are compatible with tuples of the same arity and field order:
use desert_rust::{deserialize, serialize_to_byte_vec, BinaryCodec, Result};
#[derive(Debug, PartialEq, BinaryCodec)]
struct Point {
x: i32,
y: i32,
}
fn main() -> Result<()> {
let bytes = serialize_to_byte_vec(&(10, 20))?;
let point: Point = deserialize(&bytes)?;
assert_eq!(point, Point { x: 10, y: 20 });
Ok(())
}
This is useful for moving from positional values to named records.
Transparent wrappers
Transparent single-field structs are encoded exactly like the inner field:
use desert_rust::{deserialize, serialize_to_byte_vec, BinaryCodec, Result};
#[derive(Debug, PartialEq, BinaryCodec)]
#[desert(transparent)]
struct UserId(u64);
fn main() -> Result<()> {
let bytes = serialize_to_byte_vec(&42u64)?;
let id: UserId = deserialize(&bytes)?;
assert_eq!(id, UserId(42));
Ok(())
}
This lets a model evolve from primitive values to domain newtypes without changing stored bytes.
Collections
Generic collection codecs use a shared iterable format, so many collection changes are compatible:
use desert_rust::{deserialize, serialize_to_byte_vec, Result};
use std::collections::LinkedList;
fn main() -> Result<()> {
let bytes = serialize_to_byte_vec(&vec![1i32, 2, 3])?;
let values: LinkedList<i32> = deserialize(&bytes)?;
assert_eq!(values.into_iter().collect::<Vec<_>>(), vec![1, 2, 3]);
Ok(())
}
Set and map compatibility depends on the target type’s Eq, Hash, or Ord
requirements and on whether duplicate values make sense.
Adding a field
When a struct receives a new field, record it with FieldAdded and provide the
default expression used when reading old data:
use desert_rust::BinaryCodec;
#[derive(Debug, Clone, PartialEq, BinaryCodec)]
struct ProductV1 {
name: String,
price: i32,
}
#[derive(Debug, Clone, PartialEq, BinaryCodec)]
#[desert(evolution(FieldAdded("in_stock", true)))]
struct ProductV2 {
name: String,
in_stock: bool,
price: i32,
}
With this change:
ProductV2can readProductV1data and usestrueforin_stock.ProductV1can readProductV2data by skipping the added field.
The added field does not have to be the last field in the Rust struct. The evolution metadata records which generation introduced it.
Making a field optional
A field can be changed from T to Option<T>:
use desert_rust::BinaryCodec;
#[derive(Debug, Clone, PartialEq, BinaryCodec)]
#[desert(evolution(
FieldAdded("in_stock", true),
FieldMadeOptional("price")
))]
struct ProductV3 {
name: String,
in_stock: bool,
price: Option<i32>,
}
With this change:
- New code reads old data as
Some(old_value). - Old code can read new data if the option is
Some(value). - Old code cannot read new data if the option is
None, because it expected a non-optional field value.
This can be used as an intermediate migration before removing a field.
Removing a field
A removed field must be recorded:
use desert_rust::BinaryCodec;
#[derive(Debug, Clone, PartialEq, BinaryCodec)]
#[desert(evolution(
FieldAdded("in_stock", true),
FieldMadeOptional("price"),
FieldRemoved("price")
))]
struct ProductV4 {
name: String,
in_stock: bool,
}
With this change:
- New code can read old data by skipping
price. - Old code can read new data only if the removed field had already become
optional, in which case it is read as
None. - Old code that expects a non-optional removed field cannot read new data.
Transient fields
Adding a new transient field does not change the binary representation:
use desert_rust::BinaryCodec;
#[derive(Debug, Clone, PartialEq, BinaryCodec)]
struct WithCache {
value: String,
#[transient(None::<usize>)]
cached_len: Option<usize>,
}
If an existing serialized field becomes transient, record it with
FieldMadeTransient and keep a transient default expression:
use desert_rust::BinaryCodec;
#[derive(Debug, Clone, PartialEq, BinaryCodec)]
#[desert(evolution(
FieldAdded("in_stock", true),
FieldMadeOptional("price"),
FieldRemoved("price"),
FieldMadeTransient("name")
))]
struct ProductV5 {
#[transient("unknown".to_string())]
name: String,
in_stock: bool,
}
FieldMadeTransient behaves like FieldRemoved for the wire format. New code
can read old data and uses the transient default. Older code cannot read the new
data if it requires the missing field.
Evolving enums
Enums are encoded by constructor id. Constructor ids follow source order unless
#[desert(sorted_constructors)] is used.
Adding a new variant at the end is compatible for old values:
use desert_rust::BinaryCodec;
#[derive(Debug, Clone, PartialEq, BinaryCodec)]
enum EventV1 {
Started,
Message(String),
}
#[derive(Debug, Clone, PartialEq, BinaryCodec)]
enum EventV2 {
Started,
Message(String),
Stopped,
}
With this change:
EventV2can read oldEventV1values.EventV1can readEventV2data only when the stored constructor id also existed inEventV1.EventV1cannot readEventV2::Stopped.
Enum variants with fields can have their own evolution steps:
use desert_rust::BinaryCodec;
#[derive(Debug, Clone, PartialEq, BinaryCodec)]
enum Event {
#[desert(evolution(FieldAdded("source", "api".to_string())))]
Message { text: String, source: String },
}
Transient variants are not assigned constructor ids:
use desert_rust::BinaryCodec;
#[derive(Debug, Clone, PartialEq, BinaryCodec)]
enum RuntimeState {
Stored,
#[transient]
InMemoryOnly,
}
Serializing RuntimeState::InMemoryOnly fails. The benefit is that such variants
can be inserted or removed without shifting persistent constructor ids.
Evolution encoding
For derived structs and non-transparent enum variant payloads, desert writes:
- a version byte
- for non-zero versions, compact evolution metadata describing chunks and removed or optional fields
- the field data split into generation chunks
Version 0 data has no extra evolution metadata beyond the version byte. This
keeps initial records compact and is why version-0 structs and tuples can share
the same format.
PointV1
#[derive(Debug, Clone, PartialEq, desert_rust::BinaryCodec)]
#[desert(evolution())]
struct PointV1 {
x: i32,
y: i32,
}
let bytes = desert_rust::serialize_to_byte_vec(&PointV1 { x: 10, y: 20 })?;
[0x00, 0x00, 0x00, 0x00, 0x0A, 0x00, 0x00, 0x00, 0x14]
00 | 00 | 00 | 00 | 0A | 00 | 00 | 00 | 14 |
| versionversion 0 | xfirst field | ysecond field | ||||||
When a field is added, the new generation is written as a later chunk. Older
readers can skip chunks they do not know about. Newer readers can detect missing
chunks and use defaults from FieldAdded.
PointV2 adds label
#[derive(Debug, Clone, PartialEq, desert_rust::BinaryCodec)]
#[desert(evolution(FieldAdded("label", "origin".to_string())))]
struct PointV2 {
x: i32,
label: String,
y: i32,
}
let value = PointV2 { x: 10, label: "origin".to_string(), y: 20 };
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x01, 0x10, 0x0E, 0x00, 0x00, 0x00, 0x0A, 0x00, 0x00, 0x00, 0x14, 0x0C, 0x6F, 0x72, 0x69, 0x67, 0x69, 0x6E]
01 | 10 | 0E | 00 | 00 | 00 | 0A | 00 | 00 | 00 | 14 |
| versionversion 1 | v0 sizebyte length of original-field chunk | v1 sizebyte length of added-field chunk | xversion-0 field | yversion-0 field | ||||||
0C | 6F | 72 | 69 | 67 | 69 | 6E | ||||
| label lengthadded string length | label UTF-8added field data | |||||||||
If a later version makes the field optional, the metadata records the field
position. Older readers can still read Some(value) as the original field type;
None is only understood by readers that know the optional step.
PointV3 makes label optional
#[derive(Debug, Clone, PartialEq, desert_rust::BinaryCodec)]
#[desert(evolution(
FieldAdded("label", "origin".to_string()),
FieldMadeOptional("label")
))]
struct PointV3 {
x: i32,
label: Option<String>,
y: i32,
}
let value = PointV3 { x: 10, label: None, y: 20 };
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x02, 0x10, 0x02, 0x01, 0x01, 0x00, 0x00, 0x00, 0x0A, 0x00, 0x00, 0x00, 0x14, 0x00]
02 | 10 | 02 | 01 | 01 | 00 | 00 | 00 | 0A |
| versionversion 2 | v0 sizebyte length of original-field chunk | v1 sizebyte length of optional-field chunk | optionalFieldMadeOptional marker plus field position | xversion-0 field | ||||
00 | 00 | 00 | 14 | 00 | ||||
| yversion-0 field | labelOption::None marker | |||||||
When a field is removed, the metadata records the removed field name so readers
that still know the field can either treat it as None if it is optional, or
fail clearly if it is required.
PointV4 removes label
#[derive(Debug, Clone, PartialEq, desert_rust::BinaryCodec)]
#[desert(evolution(
FieldAdded("label", "origin".to_string()),
FieldMadeOptional("label"),
FieldRemoved("label")
))]
struct PointV4 {
x: i32,
y: i32,
}
let bytes = desert_rust::serialize_to_byte_vec(&PointV4 { x: 10, y: 20 })?;
[0x03, 0x10, 0x00, 0x03, 0x0A, 0x6C, 0x61, 0x62, 0x65, 0x6C, 0x03, 0x01, 0x00, 0x00, 0x00, 0x0A, 0x00, 0x00, 0x00, 0x14]
03 | 10 | 00 | 03 | 0A | 6C | 61 | 62 | 65 | 6C |
| versionversion 3 | v0 sizebyte length of original-field chunk | v1 sizeremoved field leaves no data in the added chunk | removedFieldRemoved marker for optional step | name lengthremoved field name length | name UTF-8removed field name | ||||
03 | 01 | 00 | 00 | 00 | 0A | 00 | 00 | 00 | 14 |
| removedFieldRemoved marker | name refdeduplicated reference to field name | xversion-0 field | yversion-0 field | ||||||
Keeping evolution safe
Use these practices for long-lived formats:
- Append evolution steps. Do not rewrite old evolution history.
- Keep field names stable.
- Prefer
Option<T>as a compatibility bridge before removing a field. - Keep enum constructor order stable, or opt into sorted constructors from the first released version.
- Add roundtrip and cross-version compatibility tests for every model that is persisted or sent across process boundaries.
Rust and Scala differences
The Rust library follows the same design goals as the original Scala desert
library: compact binary data, ADT support, and schema evolution. The APIs are
different because Rust and Scala expose different language tools.
This page is a guide for readers who know the Scala documentation and want to
understand what changed in desert-rust.
Crate layout
Rust has three workspace crates:
desert_rust: public facade crate, re-exporting the core library and derive macrodesert_core: serialization, deserialization, state, codecs, and evolution logicdesert_macro:#[derive(BinaryCodec)]
Scala has separate modules for core codecs, derivation implementations, and ecosystem integrations such as Akka, Pekko, Cats, ZIO, and Shardcake. Those integration modules do not exist in the Rust project.
Dependency model
Scala users choose between derivation modules such as Shapeless or ZIO Schema.
Rust users depend on desert_rust and use one derive macro:
use desert_rust::BinaryCodec;
#[derive(BinaryCodec)]
struct Point {
x: i32,
y: i32,
}
Third-party Rust codecs are controlled by Cargo feature flags, for example
uuid, chrono, and serde-json.
Codec discovery
Scala uses implicit BinaryCodec[T] values. Rust uses trait implementations:
BinarySerializerwrites a type.BinaryDeserializerreads a type.BinaryCodecis implemented automatically when both are present.
There is no implicit search at runtime. If a generic function needs a codec, it uses Rust trait bounds:
use desert_rust::{BinarySerializer, Result, serialize_to_byte_vec};
fn encode<T: BinarySerializer>(value: &T) -> Result<Vec<u8>> {
serialize_to_byte_vec(value)
}
Error handling
Scala APIs return an effect or an Either-like result depending on the module.
Rust APIs return desert_rust::Result<T>, whose error type is
desert_rust::Error:
use desert_rust::{deserialize, Error};
let result: Result<i32, Error> = deserialize(&[0, 0, 0, 1]);
Derivation attributes
Scala evolution uses annotations such as @evolutionSteps and
@transientField. Rust uses derive helper attributes:
use desert_rust::BinaryCodec;
#[derive(BinaryCodec)]
#[desert(evolution(
FieldAdded("description", Some("new".to_string())),
FieldMadeOptional("description")
))]
struct Item {
name: String,
description: Option<String>,
#[transient(0usize)]
cached_hash: usize,
}
The Rust derive macro currently supports:
#[desert(evolution(...))]FieldAdded("field", default_expr)FieldMadeOptional("field")FieldRemoved("field")FieldMadeTransient("field")#[transient(default_expr)]on fields#[transient]on enum variants#[desert(transparent)]on single-field structs and enum variants#[desert(sorted_constructors)]on enums
Character encoding
The Scala library encoded characters as 16-bit Unicode values. The Rust default
encodes char as a Unicode scalar value using a variable-length unsigned
integer.
Use Options::scala_compatible() when Rust must read or write data compatible
with Scala character encoding:
use desert_rust::{Options, serialize_to_byte_vec_with_options};
fn main() -> desert_rust::Result<()> {
let bytes = serialize_to_byte_vec_with_options(&'A', Options::scala_compatible())?;
assert_eq!(bytes, vec![0, 65]);
Ok(())
}
In Scala-compatible mode, characters outside a single UTF-16 code unit cannot be
serialized as char.
Strings and byte arrays
Normal strings use the same basic shape: a compact byte length followed by
UTF-8. DeduplicatedString also follows the same idea as Scala: first
occurrences are written normally, repeated occurrences in the same context are
written as negative ids.
Raw byte blocks in Rust are represented by Vec<u8>, [u8], [u8; N],
bytes::Bytes, and NEVec<u8>. These use a compact unsigned length plus raw
bytes for compatibility with Scala byte chunks.
Collections
Both implementations use a shared iterable representation so collection type
changes can be compatible. Rust collection compatibility is constrained by Rust
trait bounds: for example, deserializing into HashSet<T> requires T: Eq + Hash, while deserializing into BTreeSet<T> requires T: Ord.
Type registry
The Scala library has a type registry for serializing values whose concrete type is not known statically.
The Rust crate does not currently expose an equivalent type registry API. For
now, model closed sets of runtime alternatives as enums, or define an
application-level tag plus custom BinarySerializer and BinaryDeserializer
implementations.
Ecosystem integrations
Scala-specific modules described in the original documentation are not part of
desert-rust:
- Akka and Pekko serializers
- Cats and Cats Effect codecs
- ZIO effect wrappers and codecs
- ZIO Prelude API
- Shardcake serializer
Rust integration points are currently lower-level: implement the codec traits, use the byte helper functions, and integrate those bytes with your framework of choice.
Compatibility expectation
The projects are similar, but not every Rust type has a Scala equivalent and not every Scala codec has a Rust equivalent. For cross-language data, prefer a small golden dataset and test it from both sides. Pay special attention to:
charencoding andOptions::scala_compatible()- enabled Rust feature flags
- enum constructor order
- byte collection formats
- derived evolution history
Type registry
The original Scala desert library includes a type registry for cases where the
concrete type is not known at compile time. A serialized value carries a compact
type id, and the registry maps that id back to a codec during deserialization.
desert-rust does not currently expose an equivalent public type registry API.
The placeholder remains in the book because this is an important concept in the
Scala documentation and a likely future Rust feature.
What to use today
For closed sets of alternatives, use an enum:
use desert_rust::BinaryCodec;
#[derive(Debug, Clone, PartialEq, BinaryCodec)]
enum Message {
Ping,
Rename { id: u64, name: String },
Delete { id: u64 },
}
This is the most idiomatic Rust option when all possible variants are known to the crate defining the protocol.
For open sets, define an application-level tag and implement the codec traits manually:
use desert_rust::{
BinaryDeserializer, BinaryOutput, BinarySerializer, DeserializationContext,
Error, Result, SerializationContext,
};
trait PluginMessage {}
struct TextMessage(String);
impl PluginMessage for TextMessage {}
enum AnyMessage {
Text(TextMessage),
}
impl BinarySerializer for AnyMessage {
fn serialize<Output: BinaryOutput>(
&self,
context: &mut SerializationContext<Output>,
) -> Result<()> {
match self {
AnyMessage::Text(TextMessage(text)) => {
1u32.serialize(context)?;
text.serialize(context)
}
}
}
}
impl BinaryDeserializer for AnyMessage {
fn deserialize(context: &mut DeserializationContext<'_>) -> Result<Self> {
match u32::deserialize(context)? {
1 => Ok(AnyMessage::Text(TextMessage(String::deserialize(context)?))),
other => Err(Error::InvalidConstructorId {
constructor_id: other,
type_name: "AnyMessage".to_string(),
}),
}
}
}
Keep ids stable once data has been written. If an id is retired, leave it reserved so newer variants do not take over old meanings.
Difference from Scala
In Scala, the type registry is part of the public library API and is used by integrations such as actor serializers. In Rust, framework integration is currently expected to happen at the byte boundary: serialize a statically known type, or define your own dynamic envelope type as shown above.