Codecs and derivation
A codec is the pair of traits that defines how a type is written and read:
use desert_rust::{BinaryDeserializer, BinarySerializer};
trait BinaryCodec: BinarySerializer + BinaryDeserializer {}
In the crate this is a blanket trait: any type implementing both serializer and
deserializer automatically implements BinaryCodec.
Built-in codecs
The desert_rust crate re-exports the core implementations. The always
available codecs include:
- integers:
u8,i8,u16,i16,u32,i32,u64,i64,u128,i128,usize,isize - non-zero integers from
std::num - floats:
f32,f64 bool,(),char,String,strstd::time::DurationOption<T>andResult<T, E>std::ops::Bound<T>andRange<T>bytes::Bytes- arrays,
Vec<T>,VecDeque<T>,LinkedList<T> HashSet<T>,BTreeSet<T>,HashMap<K, V>,BTreeMap<K, V>Box<T>,Rc<T>,Arc<T>, references, andPhantomData<T>std::net::IpAddr- tuples from arity 1 to 8
Feature flags control codecs for third-party types:
| Feature | Types |
|---|---|
bigdecimal | bigdecimal::BigDecimal, bigdecimal::num_bigint::BigInt |
bit-vec | bit_vec::BitVec |
chrono | chrono dates, times, offsets, chrono_tz::Tz |
mac_address | mac_address::MacAddress |
nonempty-collections | nonempty_collections::NEVec<T> |
serde-json | serde_json::Value |
url | url::Url |
uuid | uuid::Uuid |
The facade currently pulls in the desert_core default feature set, so
bigdecimal, chrono, uuid, nonempty-collections, and serde-json are
enabled by default. Enable bit-vec, mac_address, or url explicitly when
you need those codecs.
The same generator is used for optional third-party codecs:
uuid
let value = uuid::Uuid::from_bytes([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]);
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F, 0x10]
01 | 02 | 03 | 04 | 05 | 06 | 07 | 08 | 09 | 0A | 0B | 0C | 0D | 0E | 0F | 10 |
| uuidraw UUID bytes | |||||||||||||||
chrono::NaiveDate
let value = chrono::NaiveDate::from_ymd_opt(2024, 6, 22).unwrap();
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0xE8, 0x0F, 0x06, 0x16]
E8 | 0F | 06 | 16 |
| yearyear as var_u32 | monthmonth byte | dayday byte | |
chrono::NaiveTime
let value = chrono::NaiveTime::from_hms_nano_opt(9, 30, 5, 125).unwrap();
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x09, 0x1E, 0x05, 0x7D]
09 | 1E | 05 | 7D |
| hourhour byte | minuteminute byte | secondsecond byte | nanosnanosecond fraction as var_u32 |
bigdecimal
let value: bigdecimal::BigDecimal = "123.45".parse().unwrap();
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x0C, 0x31, 0x32, 0x33, 0x2E, 0x34, 0x35]
0C | 31 | 32 | 33 | 2E | 34 | 35 |
| lengthbyte length encoded as var_i32 | decimalUTF-8 bytes | |||||
bit-vec
let value = bit_vec::BitVec::from_bytes(&[0b1010_0000]);
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x01, 0xA0]
01 | A0 |
| lengthbyte count for packed bits | bitspacked bit payload |
mac_address
let value = mac_address::MacAddress::new([0, 17, 34, 51, 68, 85]);
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x00, 0x11, 0x22, 0x33, 0x44, 0x55]
00 | 11 | 22 | 33 | 44 | 55 |
| macsix raw address bytes | |||||
url
let value = url::Url::parse("https://desert-rust.vigoo.dev/").unwrap();
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x3C, 0x68, 0x74, 0x74, 0x70, 0x73, 0x3A, 0x2F, 0x2F, 0x64, 0x65, 0x73, 0x65, 0x72, 0x74, 0x2D, 0x72, 0x75, 0x73, 0x74, 0x2E, 0x76, 0x69, 0x67, 0x6F, 0x6F, 0x2E, 0x64, 0x65, 0x76, 0x2F]
3C | 68 | 74 | 74 | 70 | 73 | 3A | 2F | 2F | 64 | 65 | 73 | 65 | 72 | 74 | 2D |
| lengthbyte length encoded as var_i32 | URLUTF-8 bytes | ||||||||||||||
72 | 75 | 73 | 74 | 2E | 76 | 69 | 67 | 6F | 6F | 2E | 64 | 65 | 76 | 2F | |
| URLUTF-8 bytes | |||||||||||||||
serde_json
let value = serde_json::json!({ "a": 1 });
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x07, 0x7B, 0x22, 0x61, 0x22, 0x3A, 0x31, 0x7D]
07 | 7B | 22 | 61 | 22 | 3A | 31 | 7D |
| lengthJSON byte count as var_u32 | JSONcompact JSON bytes | ||||||
nonempty-collections
let value = nonempty_collections::NEVec::try_from_vec(vec![1u8, 2, 3]).unwrap();
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x03, 0x01, 0x02, 0x03]
03 | 01 | 02 | 03 |
| lengthnon-empty byte count as var_u32 | bytesraw byte payload | ||
Primitive representation
Fixed-width numeric types are written in big-endian byte order:
use desert_rust::{serialize_to_byte_vec, Result};
fn main() -> Result<()> {
assert_eq!(serialize_to_byte_vec(&100u16)?, vec![0, 100]);
assert_eq!(serialize_to_byte_vec(&100u32)?, vec![0, 0, 0, 100]);
Ok(())
}
bool is encoded as a single byte: 0 for false, 1 for true.
String and str are encoded as a variable-length signed byte count followed
by UTF-8 bytes.
i32
let bytes = desert_rust::serialize_to_byte_vec(&42i32)?;
[0x00, 0x00, 0x00, 0x2A]
00 | 00 | 00 | 2A |
| i32fixed-width big-endian signed integer | |||
u16
let bytes = desert_rust::serialize_to_byte_vec(&1000u16)?;
[0x03, 0xE8]
03 | E8 |
| u16fixed-width big-endian unsigned integer | |
bool
let bytes = desert_rust::serialize_to_byte_vec(&true)?;
[0x01]
01 |
| trueone byte: 1 for true, 0 for false |
unit
let bytes = desert_rust::serialize_to_byte_vec(&())?;
[]
char
let bytes = desert_rust::serialize_to_byte_vec(&'λ')?;
[0xBB, 0x07]
BB | 07 |
| code pointUnicode scalar value written as var_u32 | |
String
let bytes = desert_rust::serialize_to_byte_vec(&"desert".to_string())?;
[0x0C, 0x64, 0x65, 0x73, 0x65, 0x72, 0x74]
0C | 64 | 65 | 73 | 65 | 72 | 74 |
| lengthbyte length encoded as var_i32 | UTF-8UTF-8 bytes | |||||
Option<T> and Result<T, E> start with a single tag byte and then write only
the payload selected by that tag:
Option::Some
let bytes = desert_rust::serialize_to_byte_vec(&Some(7i32))?;
[0x01, 0x00, 0x00, 0x00, 0x07]
01 | 00 | 00 | 00 | 07 |
| Somepresence marker | valueinner i32 payload | |||
Option::None
let bytes = desert_rust::serialize_to_byte_vec(&Option::<i32>::None)?;
[0x00]
00 |
| Noneabsence marker |
Result::Ok
let value: Result<i32, String> = Ok(7);
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x01, 0x00, 0x00, 0x00, 0x07]
01 | 00 | 00 | 00 | 07 |
| Okresult marker | valuesuccess payload | |||
Result::Err
let value: Result<i32, String> = Err("no".to_string());
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x00, 0x04, 0x6E, 0x6F]
00 | 04 | 6E | 6F |
| Errresult marker | lengthbyte length encoded as var_i32 | errorUTF-8 bytes | |
Vec<u8>, [u8], [u8; N], bytes::Bytes, and NEVec<u8> use an optimized
byte-block encoding: a variable-length unsigned length followed by raw bytes.
This is intentionally compatible with the Scala library’s byte chunk format.
Vec<u8>
let bytes = desert_rust::serialize_to_byte_vec(&vec![1u8, 2, 3, 4])?;
[0x04, 0x01, 0x02, 0x03, 0x04]
04 | 01 | 02 | 03 | 04 |
| lengthvar_u32 byte count | bytesraw byte payload | |||
bytes::Bytes
let value = bytes::Bytes::from_static(b"abc");
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x03, 0x61, 0x62, 0x63]
03 | 61 | 62 | 63 |
| lengthvar_u32 byte count | bytesraw byte payload | ||
Collections
All generic iterable collection codecs share the same representation:
- If the iterator reports an exact size, desert writes that size as a variable-length signed integer and then all elements.
- If the size is not known, desert writes
-1, then each element prefixed by a1byte, then a final0byte.
Because the representation is shared, many collection changes are binary
compatible. For example, a Vec<i32> can be read as a LinkedList<i32>, and a
BTreeSet<i32> can be read as a HashSet<i32>, as long as the target
collection’s type constraints are satisfied.
Vec<i32>
let bytes = desert_rust::serialize_to_byte_vec(&vec![1i32, 2, 3])?;
[0x06, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x00, 0x03]
06 | 00 | 00 | 00 | 01 | 00 | 00 | 00 | 02 |
| countexact-size iterable count as var_i32 | item 0first i32 | item 1second i32 | ||||||
00 | 00 | 00 | 03 | |||||
| item 2third i32 | ||||||||
BTreeMap<String, i32>
let value = std::collections::BTreeMap::from([
("a".to_string(), 1i32),
("b".to_string(), 2i32),
]);
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x04, 0x00, 0x02, 0x61, 0x00, 0x00, 0x00, 0x01, 0x00, 0x02, 0x62, 0x00, 0x00, 0x00, 0x02]
04 | 00 | 02 | 61 | 00 | 00 | 00 | 01 |
| countexact-size iterable count as var_i32 | entrytuple marker for key/value pair | key astring key: length plus UTF-8 | value 1i32 map value | ||||
00 | 02 | 62 | 00 | 00 | 00 | 02 | |
| entrytuple marker for key/value pair | key bstring key: length plus UTF-8 | value 2i32 map value | |||||
(i32, bool)
let bytes = desert_rust::serialize_to_byte_vec(&(42i32, true))?;
[0x00, 0x00, 0x00, 0x00, 0x2A, 0x01]
00 | 00 | 00 | 00 | 2A | 01 |
| versiontuple payload marker compatible with version-0 structs | field 0first tuple item | field 1second tuple item | |||
Deriving structs
Use #[derive(BinaryCodec)] for ordinary named-field structs:
use desert_rust::BinaryCodec;
#[derive(Debug, Clone, PartialEq, BinaryCodec)]
struct User {
id: u64,
name: String,
email: Option<String>,
}
The generated format starts with a version byte. Version 0 structs are
compatible with tuples of the same field order and arity, which allows simple
tuple-to-struct migrations.
derived struct
#[derive(Debug, Clone, PartialEq, desert_rust::BinaryCodec)]
struct User {
id: u32,
name: String,
email: Option<String>,
}
let value = User {
id: 7,
name: "Ada".to_string(),
email: None,
};
let bytes = desert_rust::serialize_to_byte_vec(&value)?;
[0x00, 0x00, 0x00, 0x00, 0x07, 0x06, 0x41, 0x64, 0x61, 0x00]
00 | 00 | 00 | 00 | 07 | 06 | 41 | 64 | 61 |
| versionversion-0 struct marker | idu32 field | name lengthstring byte count as var_i32 | name UTF-8string bytes | |||||
00 | ||||||||
| emailOption::None marker | ||||||||
For generic types, the derive macro adds serializer and deserializer bounds for generic parameters:
use desert_rust::BinaryCodec;
#[derive(Debug, Clone, PartialEq, BinaryCodec)]
struct Wrapper<T> {
value: T,
}
Deriving enums
Enums are encoded as a constructor id followed by constructor payload data:
use desert_rust::BinaryCodec;
#[derive(Debug, Clone, PartialEq, BinaryCodec)]
enum Event {
Started,
Message(String),
Moved { x: i32, y: i32 },
}
Constructor ids are assigned from the enum variant order, skipping transient variants. Adding new variants at the end is compatible with old data, but old code cannot read values using the new variant.
derived enum
#[derive(Debug, Clone, PartialEq, desert_rust::BinaryCodec)]
enum Event {
Started,
Message(String),
Moved { x: i32, y: i32 },
}
let bytes = desert_rust::serialize_to_byte_vec(&Event::Message("hi".to_string()))?;
[0x00, 0x01, 0x00, 0x04, 0x68, 0x69]
00 | 01 | 00 | 04 | 68 | 69 |
| versionouter enum version | constructorvariant id as var_u32 | case versionvariant payload version | lengthbyte length encoded as var_i32 | payloadUTF-8 bytes | |
You can ask the derive macro to assign constructor ids by sorted variant name:
use desert_rust::BinaryCodec;
#[derive(Debug, Clone, PartialEq, BinaryCodec)]
#[desert(sorted_constructors)]
enum StableByName {
B,
A,
}
Use this only when all versions agree on the same naming scheme. Reordering
without sorted_constructors changes constructor ids and breaks compatibility.
Transparent wrappers
Single-field structs can be encoded exactly as their inner type:
use desert_rust::BinaryCodec;
#[derive(Debug, Clone, PartialEq, BinaryCodec)]
#[desert(transparent)]
struct UserId(u64);
This is the Rust equivalent of using the Scala wrapper derivation. It is useful when a primitive value is promoted to a domain-specific newtype without changing the wire format.
Transparent enum variants are also supported for unit variants and single-field variants:
use desert_rust::BinaryCodec;
#[derive(Debug, Clone, PartialEq, BinaryCodec)]
enum Value {
#[desert(transparent)]
Text(String),
Structured { value: String },
}
The transparent variant still has an enum constructor id. The attribute affects how the variant payload is encoded.
Transient fields and variants
A transient field is not serialized. It must provide a default expression used when deserializing:
use desert_rust::BinaryCodec;
#[derive(Debug, Clone, PartialEq, BinaryCodec)]
struct Cached {
value: String,
#[transient(None::<usize>)]
cached_len: Option<usize>,
}
Transient enum variants are not assigned constructor ids. Serializing such a
variant returns Error::SerializingTransientConstructor.
use desert_rust::BinaryCodec;
#[derive(Debug, Clone, PartialEq, BinaryCodec)]
enum State {
Stored,
#[transient]
RuntimeOnly,
}
Transient variants can be inserted or removed without shifting the ids of stored variants.
Custom codecs
Implement BinarySerializer and BinaryDeserializer manually when the derived
format is not appropriate:
use desert_rust::{
BinaryDeserializer, BinaryOutput, BinarySerializer, DeserializationContext,
Result, SerializationContext,
};
#[derive(Debug, PartialEq)]
struct Lowercase(String);
impl BinarySerializer for Lowercase {
fn serialize<Output: BinaryOutput>(
&self,
context: &mut SerializationContext<Output>,
) -> Result<()> {
self.0.to_lowercase().serialize(context)
}
}
impl BinaryDeserializer for Lowercase {
fn deserialize(context: &mut DeserializationContext<'_>) -> Result<Self> {
Ok(Self(String::deserialize(context)?))
}
}
The enum derive macro can also wrap a single-field variant through a custom type. The wrapper type must be constructible from a borrowed value in the shape expected by the macro, so this is mainly useful for specialized string wrappers.
String deduplication
Normal String serialization does not deduplicate values. This keeps schema
evolution safe: when an older reader skips a newly added string field, it does
not accidentally miss a string id assignment needed by a later field.
For streams where the writer and reader agree that deduplication is safe, wrap
values in DeduplicatedString:
use bytes::BytesMut;
use desert_rust::{
BinaryDeserializer, BinarySerializer, DeduplicatedString, DeserializationContext,
Options, Result, SerializationContext,
};
fn main() -> Result<()> {
let mut output = SerializationContext::new(BytesMut::new(), Options::default());
DeduplicatedString("same".to_string()).serialize(&mut output)?;
DeduplicatedString("same".to_string()).serialize(&mut output)?;
let bytes = output.into_output();
let mut input = DeserializationContext::new(&bytes, Options::default());
let first = DeduplicatedString::deserialize(&mut input)?.0;
let second = DeduplicatedString::deserialize(&mut input)?.0;
assert_eq!(first, second);
Ok(())
}
The first occurrence is encoded like a normal string. Later occurrences in the same serialization context are encoded as a negative id.
DeduplicatedString
let mut context = desert_rust::SerializationContext::new(
Vec::new(),
desert_rust::Options::default(),
);
desert_rust::DeduplicatedString("same".to_string()).serialize(&mut context)?;
desert_rust::DeduplicatedString("same".to_string()).serialize(&mut context)?;
let bytes = context.into_output();
[0x08, 0x73, 0x61, 0x6D, 0x65, 0x01]
08 | 73 | 61 | 6D | 65 | 01 |
| first lengthfirst occurrence uses normal string length | first UTF-8first string bytes | repeat idnegative string id encoded as var_i32 | |||