Back to Articles

Compile-Time Data Embedding in Rust: How codify-rs Eliminates Runtime Parsing Overhead

[ View on GitHub ]

Compile-Time Data Embedding in Rust: How codify-rs Eliminates Runtime Parsing Overhead

Hook

What if your application could load a 10MB configuration file in zero nanoseconds? That’s not hyperbole—it’s what happens when you move deserialization from runtime to compile-time.

Context

Every Rust application that embeds data faces a fundamental trade-off: simplicity versus performance. The standard approach—using include_str! or include_bytes! combined with serde deserialization at startup—is elegant but carries a cost. That JSON config file, YAML resource map, or TOML lookup table must be parsed every single time your binary launches, consuming CPU cycles and delaying initialization.

For most applications, this overhead is negligible. But in performance-critical contexts—CLI tools that must start instantly, embedded systems with constrained resources, or applications with massive configuration datasets—those milliseconds matter. The holy grail is having your data already exist in memory as initialized Rust structs the moment your program starts. This is exactly what codify-rs attempts to solve: transforming compile-time file I/O into runtime-ready data structures through automatic code generation.

Technical Insight

Runtime

Compile Time

Build Time

External Data Files

JSON/YAML/TOML

build.rs Script

Serde Deserialize

Runtime Structs

in Memory

Codify Trait

codify method

Generated Rust

Source Code

OUT_DIR File

codegen.rs

Main Crate

include! macro

Compiled Binary

Static Data

System architecture — auto-generated

The codify-rs architecture centers on the Codify trait, which defines a single method: codify(&self) -> String. This method takes any Rust type and returns valid Rust source code that would reconstruct that exact value. The library provides implementations for common standard library types—primitives, String, Vec, HashMap, and Option—and allows you to derive it for custom types.

The workflow operates in three stages during your build process. First, your build.rs script loads data from external files (typically using serde to deserialize JSON, YAML, or other formats into Rust structs). Second, you invoke the codify() method on these structs, generating Rust source code as a string. Third, you write this generated code to a file in OUT_DIR and use include! to incorporate it into your crate. Here’s a concrete example:

// build.rs
use codify::Codify;
use serde::Deserialize;
use std::fs;
use std::path::PathBuf;

#[derive(Deserialize, Codify)]
struct AppConfig {
    name: String,
    max_connections: u32,
    features: Vec<String>,
}

fn main() {
    // Load and deserialize at compile-time
    let config_str = fs::read_to_string("config.json").unwrap();
    let config: AppConfig = serde_json::from_str(&config_str).unwrap();
    
    // Generate Rust initialization code
    let code = format!(
        "pub const CONFIG: AppConfig = {};",
        config.codify()
    );
    
    // Write to OUT_DIR
    let out_dir = PathBuf::from(std::env::var("OUT_DIR").unwrap());
    fs::write(out_dir.join("config.rs"), code).unwrap();
    
    // Trigger rebuild if source file changes
    println!("cargo:rerun-if-changed=config.json");
}

In your main crate, you simply include the generated code:

// src/main.rs
include!(concat!(env!("OUT_DIR"), "/config.rs"));

fn main() {
    // CONFIG already exists as a fully initialized struct
    println!("App: {}", CONFIG.name);
    println!("Max connections: {}", CONFIG.max_connections);
}

The generated code in config.rs looks something like this:

pub const CONFIG: AppConfig = AppConfig {
    name: String::from("MyApp"),
    max_connections: 100u32,
    features: vec![
        String::from("logging"),
        String::from("metrics"),
    ],
};

Notice that this isn’t JSON or any serialized format—it’s pure Rust code. When your binary starts, CONFIG doesn’t need to be parsed; it already exists in memory as initialized data structures. The deserialization happened during compilation, not at runtime.

The real power emerges with complex nested structures. Consider a lookup table mapping error codes to descriptions across multiple languages. With traditional runtime deserialization, you’d parse a potentially large JSON file on every startup. With codify-rs, that entire structure—nested HashMaps, thousands of strings—exists as initialized data in your binary. The performance difference scales with data size: a 100KB config file might parse in 5ms at runtime, but with codify-rs it costs zero runtime cycles.

The library’s design also handles Rust’s type system gracefully. Because it generates actual Rust code, you get compile-time type checking of your embedded data. If your data file contains invalid values that don’t match your struct definition, the build fails immediately. This is superior to runtime deserialization errors that might only surface in production under specific conditions.

One clever aspect is how codify-rs handles owned versus borrowed data. The generated code uses String::from() and vec![] macros to construct owned data structures that live for the 'static lifetime. This means your embedded data can be freely used throughout your application without lifetime gymnastics or runtime allocation.

Gotcha

The first limitation hits you immediately: ergonomics. Setting up codify-rs requires creating a build.rs, carefully managing the OUT_DIR environment variable, writing generated files, and using include! macros with concat! and env!. This is significantly more boilerplate than a procedural macro approach would require. For developers accustomed to deriving traits with a single attribute, the ceremony feels heavyweight.

Incremental compilation becomes problematic with data-heavy projects. Any change to your source data files triggers a full rebuild of everything that depends on the generated code. During active development where you’re frequently tweaking configuration values, this rebuild penalty accumulates. The cargo:rerun-if-changed directive helps, but you’re still regenerating and recompiling code rather than just hot-reloading data.

The library’s minimal adoption (1 GitHub star) is a red flag for production use. This suggests limited battle-testing, potentially undiscovered edge cases, and no community ecosystem of examples or extensions. You’re essentially betting on unmaintained or early-stage code. More established alternatives with larger communities provide better long-term stability guarantees. Additionally, the generated code size can balloon with large datasets—a 1MB JSON file might expand to 3MB of Rust source code with all the String::from() calls and struct initialization syntax, increasing compilation time and potentially binary size if not optimized away by the compiler.

Verdict

Use if: You have static configuration or lookup data that’s read-only, relatively stable during development, and large enough that runtime deserialization measurably impacts startup performance (think 10ms+). You’re building performance-critical CLI tools, embedded systems, or applications where every millisecond of initialization time matters. You’re comfortable with build script complexity and understand the trade-offs of compile-time code generation. Skip if: Your data changes frequently during development (the rebuild overhead will drive you mad), your data is small enough that runtime parsing takes microseconds anyway, you prioritize ergonomics over micro-optimizations, or you need a battle-tested solution with community support. Consider rust-embed for file embedding with more polish, or simply accept the runtime parsing cost with serde and lazy_static—for most applications, that’s the right trade-off.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/automation/hoijui-codify-rs.svg)](https://starlog.is/api/badge-click/automation/hoijui-codify-rs)