Back to Articles

Building a $5/Month AI News Pipeline with Multi-Tier LLM Fallbacks

[ View on GitHub ]

Building a $5/Month AI News Pipeline with Multi-Tier LLM Fallbacks

Hook

Most AI news aggregators cost $50-100/month in API fees. This one runs for the price of a coffee, learns your editorial preferences, and keeps working even when individual data sources or LLMs fail.

Context

If you're building an AI agent or automation platform, staying current with AI news isn't optional—it's table stakes. But the traditional approaches fall short: RSS readers dump everything into a firehose, manual curation doesn't scale, and commercial aggregation services charge enterprise prices for features you don't need.

The openclaw-newsroom pipeline was built to solve this specific problem for OpenClaw, an AI agent platform. The creator needed automated news scanning across multiple sources (RSS, Reddit, Twitter, GitHub, web search), intelligent deduplication so the same story doesn't resurface daily, and LLM-powered curation that learns which stories actually matter. The constraint? Keep it under $5/month and make it reliable enough to run unsupervised. The result is a Python pipeline that treats both data sources and LLMs as unreliable infrastructure—assume failure, design for graceful degradation.

Technical Insight

The architecture centers on three persistent layers: deduplication memory, editorial profiles, and decision logs. Unlike stateless pipelines that forget everything between runs, openclaw-newsroom maintains SQLite databases that track what you've seen, what you've approved, and why.

The deduplication system is particularly elegant. Rather than comparing every new article against every historical article (O(n²) disaster), it uses a two-stage approach. First, exact URL matching catches obvious duplicates. Second, for new URLs, it extracts the article title and compares it against recent titles using fuzzy matching at an 80% similarity threshold. This catches the same story republished across different domains:

# Simplified deduplication logic from the pipeline
import sqlite3
from difflib import SequenceMatcher

def is_duplicate(url, title, lookback_days=7):
    conn = sqlite3.connect('dedup.db')
    c = conn.cursor()
    
    # Exact URL match
    c.execute('SELECT id FROM articles WHERE url = ?', (url,))
    if c.fetchone():
        return True
    
    # Fuzzy title match against recent articles
    cutoff_date = datetime.now() - timedelta(days=lookback_days)
    c.execute('SELECT title FROM articles WHERE scanned_at > ?', (cutoff_date,))
    
    for (existing_title,) in c.fetchall():
        similarity = SequenceMatcher(None, title.lower(), existing_title.lower()).ratio()
        if similarity > 0.80:
            return True
    
    return False

The 7-day lookback window is crucial. Go longer and you risk marking legitimate follow-up stories as duplicates ("GPT-5 Announced" vs "GPT-5 Now Available"). Go shorter and you miss cross-posted articles that appear on different sites days apart. Seven days empirically captures the news cycle for AI topics without creating false positives.

The multi-tier LLM failover demonstrates production-grade resilience thinking. The pipeline's primary LLM is Gemini Flash Lite—Google's cheapest model, optimized for speed over capability. When Flash Lite is unavailable or rate-limited, it falls back to Grok via OpenRouter, then finally to standard Gemini Flash. This isn't just redundancy; each tier has different cost and capability profiles:

# Simplified LLM failover chain
def curate_articles(articles, editorial_profile):
    llm_chain = [
        ('gemini-flash-lite', gemini_flash_lite_client, 0.01),  # $/1k tokens
        ('grok-beta', openrouter_client, 0.05),
        ('gemini-flash', gemini_flash_client, 0.02)
    ]
    
    for model_name, client, cost_per_1k in llm_chain:
        try:
            prompt = build_curation_prompt(articles, editorial_profile)
            response = client.generate(
                prompt=prompt,
                max_tokens=2000,
                temperature=0.3  # Lower temp for consistent editorial decisions
            )
            log_curation_decision(model_name, response, cost_per_1k)
            return parse_curation_response(response)
        except Exception as e:
            log_error(f'{model_name} failed: {e}')
            continue
    
    # All LLMs failed - return top articles by quality score
    return fallback_scoring(articles)

Notice the final fallback: if all three LLMs are down, the pipeline doesn't crash. It returns articles sorted by a keyword-based quality score (matches against a curated list of important terms like "breakthrough," "launches," "acquisition"). The curation won't be as nuanced, but the system keeps running.

The editorial learning loop is where this gets interesting for long-term use. After presenting curated articles to the human operator, the system logs which were approved, which were rejected, and looks for patterns. Did you reject three articles about hardware announcements? The editorial profile notes that and deprioritizes similar stories in future scans:

# Nightly editorial profile update
def update_editorial_profile():
    conn = sqlite3.connect('editorial.db')
    c = conn.cursor()
    
    # Analyze last 30 days of decisions
    c.execute('''
        SELECT article_category, 
               SUM(CASE WHEN approved THEN 1 ELSE 0 END) as approved,
               SUM(CASE WHEN rejected THEN 1 ELSE 0 END) as rejected
        FROM decisions 
        WHERE decision_date > date('now', '-30 days')
        GROUP BY article_category
    ''')
    
    profile = {}
    for category, approved, rejected in c.fetchall():
        if rejected > approved * 2:  # Strong rejection signal
            profile[category] = 'deprioritize'
        elif approved > rejected * 2:  # Strong approval signal
            profile[category] = 'prioritize'
    
    # Feed updated profile into next LLM curation prompt
    save_profile(profile)

This isn't machine learning in the formal sense—there's no gradient descent or neural network. It's a simpler heuristic system that trades sophistication for transparency and debuggability. You can open the SQLite database and see exactly why the system started deprioritizing certain topics.

The zero-dependency design deserves attention. Every script uses only Python's standard library. No pip install, no virtual environments, no dependency version conflicts. This makes deployment trivial (copy files, run script) and eliminates an entire class of maintenance burden. The trade-off? You're manually implementing functionality that libraries provide. The RSS parser uses xml.etree, the HTTP client uses urllib, the fuzzy matching uses difflib. More code to maintain, but that code never breaks because a transitive dependency five layers deep updated incompatibly.

Gotcha

The OpenClaw coupling is non-negotiable. This pipeline assumes it's running inside OpenClaw's workspace, with access to OpenClaw's channel delivery system and cron scheduling. You could theoretically extract the scripts and run them standalone, but you'd need to manually wire up scheduling (write your own cron jobs), storage (create workspace directories), and delivery (figure out where to send the curated articles). It's not a standalone tool—it's an OpenClaw subsystem that happens to be published separately.

The Twitter/X integration is fragile by design. The pipeline can use either the 'bird' CLI tool (an unofficial third-party Twitter client) or the official API with a paid key. Neither option is great. The bird CLI breaks whenever Twitter changes their web interface, and the official API costs $100/month minimum for useful access levels. In practice, this means Twitter is your least reliable data source. If Twitter news is critical for you, this pipeline will frustrate you. The resilience design means the pipeline continues without Twitter, but you'll miss stories.

Quality scoring is keyword-based, not semantic. The system doesn't understand that "OpenAI announces Sora" and "Sam Altman unveils video generation model" are related—it just matches keywords and counts social metrics. This creates two failure modes: semantically important articles without magic keywords get low scores, and clickbait with the right keywords gets artificially boosted. The LLM curation layer mitigates this somewhat, but the initial filtering happens before the LLM sees articles, so you might lose good stories early in the pipeline.

Verdict

Use if: You're already running OpenClaw and want automated AI news monitoring, you value cost efficiency ($5/month is genuinely achievable with the tiered LLM approach), you're comfortable with Python and SQLite for debugging when things go sideways, and you prefer systems that degrade gracefully over systems that fail catastrophically. The editorial learning loop makes this compelling for long-term use—it actually gets better at predicting what you care about. Skip if: You're not using OpenClaw (integration cost exceeds building from scratch), you need real-time news alerts (2-hour scan intervals are baked in), Twitter coverage is non-negotiable (the integration is too unreliable), or you want semantic understanding of article relationships (the keyword-based approach can't deliver that). Also skip if you're uncomfortable with the zero-dependency philosophy—sometimes pulling in well-tested libraries is the right call, and this codebase intentionally avoids that trade-off.