Back to Articles

Building a Privacy-First File Organizer with On-Device AI Models

[ View on GitHub ]

Building a Privacy-First File Organizer with On-Device AI Models

Hook

Your file manager knows everything about you—your tax documents, personal photos, medical records—yet most AI-powered organizers ship that data straight to cloud APIs. What if the AI ran entirely on your machine instead?

Context

File organization has been a persistent problem since personal computers stored more than a handful of documents. We’ve cycled through solutions: manual folder hierarchies that decay into chaos, search-based approaches that require remembering filenames, tagging systems that demand discipline most users lack, and rule-based automation that breaks when life doesn’t follow patterns.

The recent wave of AI-powered organization tools promised intelligence—understanding context, extracting meaning, creating taxonomies automatically. But they came with a Faustian bargain: upload your files to someone else’s servers, trust their privacy policies, hope they don’t get breached. For documents containing financial data, medical information, or proprietary work, that’s a non-starter. Local File Organizer takes a different path: sophisticated AI analysis running entirely on your hardware, never touching the internet, processing files where they already live.

Technical Insight

File paths

Text

Documents

Images

Extracted text

Extracted content

Text content

Image data

Categories/metadata

Image descriptions

Folder hierarchy

New filenames

Preview

Execute

Directory Scanner

Content Extractor

pytesseract OCR

PyMuPDF/python-docx

Nexa SDK Router

Llama3.2 3B GGUF

LLaVA-v1.6 GGUF

Multiprocess Analyzer

Filesystem Reorganizer

Dry-run Display

Reorganized Directory

System architecture — auto-generated

Local File Organizer’s architecture centers on orchestrating two specialized models through the Nexa SDK: Llama3.2 3B for text understanding and LLaVA-v1.6 (built on Vicuna-7B) for vision-language tasks. Both run as quantized GGUF models, trading some accuracy for the ability to execute on consumer hardware without dedicated GPUs.

The processing pipeline begins with content extraction using format-specific libraries. For PDFs, it uses PyMuPDF (fitz) to extract both text and embedded images. Word documents get parsed through python-docx. Images pass through Tesseract OCR to capture any embedded text before being analyzed by the vision model. Here’s the core extraction logic for PDFs:

def extract_pdf_content(pdf_path):
    text_content = []
    images = []
    
    with fitz.open(pdf_path) as doc:
        for page_num, page in enumerate(doc):
            # Extract text
            text_content.append(page.get_text())
            
            # Extract embedded images
            image_list = page.get_images(full=True)
            for img_index, img in enumerate(image_list):
                xref = img[0]
                base_image = doc.extract_image(xref)
                image_bytes = base_image["image"]
                images.append((page_num, img_index, image_bytes))
    
    return "\n".join(text_content), images

Once content is extracted, the tool constructs prompts for the language model that request structured categorization. Rather than asking for freeform descriptions, it constrains the model to return JSON with specific fields—category, subcategory, and suggested filename. This structured output makes downstream processing deterministic:

def categorize_with_llm(content, file_type):
    prompt = f"""Analyze this {file_type} content and provide categorization.
    Return JSON with: category, subcategory, suggested_name, summary.
    
    Content: {content[:2000]}  # Limit context window
    """
    
    response = nexa_model.generate(prompt, temperature=0.3)
    return json.loads(response)

The multimodal aspect is where things get interesting. For images, the tool first attempts OCR to extract any text (receipts, screenshots, scanned documents), then passes both the image and extracted text to LLaVA. This dual-input approach lets the vision model understand context beyond pure visual content—a screenshot of code can be categorized by the programming language detected in the text, not just visual features.

The reorganization phase uses the AI-generated metadata to construct a new directory structure. Rather than moving files immediately, it builds an in-memory plan mapping old paths to new paths, allowing for dry-run previews. Users can review the proposed changes before committing:

class ReorganizationPlan:
    def __init__(self):
        self.moves = []  # List of (src, dst) tuples
        self.new_dirs = set()
        
    def add_move(self, src_path, category, subcategory, new_name):
        dst_dir = os.path.join(self.base_path, category, subcategory)
        dst_path = os.path.join(dst_dir, new_name)
        
        self.new_dirs.add(dst_dir)
        self.moves.append((src_path, dst_path))
        
    def execute(self, dry_run=False):
        if dry_run:
            for src, dst in self.moves:
                print(f"Would move: {src} -> {dst}")
            return
            
        # Create directory structure
        for dir_path in self.new_dirs:
            os.makedirs(dir_path, exist_ok=True)
            
        # Execute moves
        for src, dst in self.moves:
            shutil.move(src, dst)

Performance optimization comes through Python’s multiprocessing module. Each file gets analyzed independently, so the tool spawns worker processes matching CPU core count. Each worker loads its own model instance (memory-intensive but parallelizable), processes files from a queue, and returns categorization results. This architecture scales linearly with cores until memory becomes the bottleneck—running multiple 3B and 7B model instances quickly consumes 16-32GB of RAM.

The Nexa SDK abstracts model loading and inference, handling GGUF file parsing and quantization transparently. Under the hood, it uses llama.cpp bindings for CPU inference with architecture-specific optimizations (AVX2, NEON). This means the same Python code runs on x86 Linux, ARM macOS, and Windows without modification, though performance varies dramatically by processor architecture.

Gotcha

The privacy-first architecture that makes this tool appealing also creates its biggest limitation: speed. Processing 1,000 files can take hours depending on your hardware. Each PDF might require 5-10 seconds of inference time, images with OCR take even longer, and running multiple model instances to parallelize hits memory limits quickly. On an M1 MacBook with 16GB RAM, you’re realistically looking at 2-3 parallel workers before swapping kills performance. This isn’t a tool you run continuously in the background—it’s for periodic cleanup sessions.

Model hallucinations create practical problems. Language models are probabilistic, and even with structured output prompts, they occasionally generate invalid JSON or nonsensical categories. A vacation photo might get categorized as a work document because OCR picked up text from a landmark sign. The tool lacks confidence scores or fallback logic—bad categorizations end up in weird folder hierarchies. There’s also no handling of edge cases like filename conflicts (what happens when two files get the same AI-generated name?) or circular references in existing folder structures. You’ll want to run dry-run mode first and carefully review the plan, which somewhat defeats the promise of automated intelligence.

Verdict

Use if: You’re organizing personal documents with sensitive information (financial, medical, legal), have a modern multi-core CPU with 16GB+ RAM, care more about privacy than processing speed, and are comfortable with Python toolchains and model downloads. This excels for one-time cleanup of messy download folders or migrating archived documents into sensible structures without cloud exposure. Skip if: You need real-time organization, work with large media libraries (video/audio unsupported), require guaranteed accuracy for mission-critical filing, or run resource-constrained hardware. Also skip if you’re not technical enough to troubleshoot model loading issues or Python dependencies—this isn’t a polished commercial product with support. The privacy guarantee is compelling, but only if you accept the speed/convenience tradeoff.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/llm-engineering/qiuyannnn-local-file-organizer.svg)](https://starlog.is/api/badge-click/llm-engineering/qiuyannnn-local-file-organizer)