Back to Articles

Codify-rs: Embedding Complex Data Structures at Compile Time Without Serialization Overhead

[ View on GitHub ]

Codify-rs: Embedding Complex Data Structures at Compile Time Without Serialization Overhead

Hook

What if your application could access complex configuration data instantly at startup—no parsing, no file I/O, no serde—just pure, pre-initialized structs compiled directly into your binary?

Context

Most Rust applications that need to bundle data face a familiar trade-off: either include raw files with include_bytes! and parse them at runtime (paying the parsing cost on every startup), or hand-write verbose initialization code that's error-prone and tedious to maintain. Libraries like serde have made runtime deserialization elegant, but you still pay the performance penalty every time your program starts.

Codify-rs takes a different approach entirely. Instead of serializing data to JSON/TOML/YAML and deserializing at runtime, it generates actual Rust initialization code during the build process. Your structs and enums are serialized into human-readable Rust code that gets compiled directly into your binary. The result? Zero runtime overhead, no parsing dependencies in your final executable, and data structures that are instantly available the moment your program starts. This is particularly valuable for applications with large configuration schemas, lookup tables, or resource definitions that never change after compilation.

Technical Insight

Codify-rs operates through a two-phase architecture that leverages Rust's build script system. During the build phase, your build.rs script implements the Codify trait for your data structures, which generates .rs files containing initialization code. These files are written to the OUT_DIR and then included in your source via the include! macro at compile time.

Here's a practical example. Imagine you have a configuration schema with nested structures:

#[derive(Codify)]
struct AppConfig {
    database: DatabaseConfig,
    features: Vec<Feature>,
    constants: HashMap<String, i32>,
}

#[derive(Codify)]
struct DatabaseConfig {
    url: String,
    pool_size: u32,
    timeout_ms: u64,
}

In your build.rs, you'd instantiate this configuration and generate the code:

use codify::Codify;
use std::env;
use std::path::PathBuf;

fn main() {
    let config = AppConfig {
        database: DatabaseConfig {
            url: "postgres://localhost/myapp".to_string(),
            pool_size: 20,
            timeout_ms: 5000,
        },
        features: vec![
            Feature::new("analytics", true),
            Feature::new("experimental", false),
        ],
        constants: HashMap::from([
            ("max_retries".to_string(), 3),
            ("batch_size".to_string(), 100),
        ]),
    };
    
    let out_dir = PathBuf::from(env::var("OUT_DIR").unwrap());
    let config_file = out_dir.join("config_generated.rs");
    
    let mut file = File::create(&config_file).unwrap();
    config.codify(&mut file).unwrap();
}

The generated config_generated.rs file contains pure Rust initialization code that looks something like:

AppConfig {
    database: DatabaseConfig {
        url: String::from("postgres://localhost/myapp"),
        pool_size: 20u32,
        timeout_ms: 5000u64,
    },
    features: vec![
        Feature { name: String::from("analytics"), enabled: true },
        Feature { name: String::from("experimental"), enabled: false },
    ],
    constants: HashMap::from([
        (String::from("max_retries"), 3i32),
        (String::from("batch_size"), 100i32),
    ]),
}

In your application code, you include this generated file:

const APP_CONFIG: AppConfig = include!(concat!(env!("OUT_DIR"), "/config_generated.rs"));

fn main() {
    // Config is immediately available, no parsing needed
    println!("Pool size: {}", APP_CONFIG.database.pool_size);
}

The beauty of this approach is that the Rust compiler performs all validation. If your generated code references a type that doesn't exist, or if field types don't match, you get a compile-time error rather than a runtime panic. The generated code is also human-readable, making it easy to debug build issues.

One particularly clever aspect is how this integrates with Rust's const evaluation system. For simpler types, the generated code can be used in const contexts, allowing the data to live in the binary's data segment rather than requiring heap allocation. This is impossible with traditional deserialization approaches that fundamentally require runtime execution.

The library's design philosophy prioritizes explicitness over magic. Unlike procedural macros that hide complexity, the generated .rs files are visible in OUT_DIR and can be inspected directly. This transparency makes debugging straightforward and helps developers understand exactly what code is being compiled into their binary.

Gotcha

The most immediate limitation is the build-time requirement. Codify-rs is designed exclusively for data that's known at compile time. If your configuration needs to be swapped without recompilation—like different settings for development versus production environments—you'll need a hybrid approach or should stick with runtime deserialization. The library doesn't support dynamic data loading, which is actually a feature for some use cases but a dealbreaker for others.

Integration requires more boilerplate than simpler alternatives. You need to set up a build.rs file, understand Rust's build script system, and create proxy source files that include the generated code. For developers unfamiliar with build scripts, this can be a steep learning curve compared to just slapping #[derive(Deserialize)] on a struct. The documentation is also quite sparse—the repository points to a single external example, which isn't enough for developers to confidently adopt the library in production. With only one star on GitHub and a package name (codify_hoijui) that differs from the repository name, this appears to be a personal fork with minimal community validation. You won't find Stack Overflow answers or community support if you hit issues.

There's also the compile-time cost trade-off. While you save runtime parsing, you're potentially increasing build times, especially if you're generating code for large data structures. For applications with frequent rebuilds during development, this could become annoying. The generated code also increases your binary size compared to compressed serialization formats, though the difference is often negligible for most datasets.

Verdict

Use if: You have static configuration, lookup tables, or resource definitions that are determined at build time and need to be embedded in your binary with zero runtime overhead. You're comfortable with Rust's build script system and value compile-time validation over runtime flexibility. Your project is experimental or internal, where bleeding-edge dependencies are acceptable. You're embedding data that's expensive to parse at runtime (like complex nested structures or large constant tables) and startup performance is critical.

Skip if: You need to change configuration without recompiling, require environment-specific settings, or prefer runtime flexibility. You want a mature, well-documented library with community support and proven production usage. You're building a library that others will depend on (adding build script complexity to downstream users is poor ergonomics). Simple include_str! with serde would suffice for your use case. You're uncomfortable being an early adopter of low-star experimental crates. For most projects, the pragmatic choice remains runtime deserialization with serde—it's proven, flexible, and the performance cost is negligible unless you have specific startup-time requirements.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/automation/hoijui-codify-rs.svg)](https://starlog.is/api/badge-click/automation/hoijui-codify-rs)