Back to Articles

Shortest: Writing E2E Tests in Plain English with Claude AI

[ View on GitHub ]

Shortest: Writing E2E Tests in Plain English with Claude AI

Hook

What if your QA engineer could write end-to-end tests without touching a single CSS selector or XPath expression? Shortest makes this possible by translating plain English into Playwright actions using Anthropic’s Claude AI.

Context

End-to-end testing has always been a maintenance nightmare. You write a beautiful test suite with carefully crafted selectors, then a designer changes a CSS class and 40 tests break. Traditional frameworks like Playwright and Cypress require developers to think like browsers—identifying elements, waiting for network requests, handling asynchronous state. This creates a barrier: non-technical team members can’t contribute to test coverage, and developers spend more time fixing flaky selectors than building features.

Shortest flips this model. It layers Anthropic’s Claude API on top of Playwright to interpret test instructions written in natural language. Instead of page.locator('[data-testid="login-button"]').click(), you write "Login to the app using email and password" and let the AI figure out which buttons to click. With over 5,500 GitHub stars, the framework promises something radical: tests that survive UI refactors and can be written by anyone who can describe user behavior.

Technical Insight

External Services

Shortest Framework

Test description + context

API request with instructions

Interprets & returns actions

Browser automation commands

Executes actions

Screenshots & state

Test lifecycle events

Custom assertions

Natural Language Test

Shortest Orchestrator

Claude AI API

Playwright Engine

Browser/Application

Hooks & Callbacks

System architecture — auto-generated

Under the hood, Shortest is a TypeScript testing framework that acts as an orchestration layer between your natural language instructions and Playwright’s browser automation. When you run a test, Shortest sends your test description to Claude’s API, which interprets the instructions and appears to guide the browser automation accordingly.

The setup is surprisingly minimal. After running npx @antiwork/shortest init, you get a config file where you specify your test pattern and AI provider:

import type { ShortestConfig } from "@antiwork/shortest";

export default {
  headless: false,
  baseUrl: "http://localhost:3000",
  testPattern: "**/*.test.ts",
  ai: {
    provider: "anthropic",
  },
} satisfies ShortestConfig;

Tests are TypeScript files that use the shortest() function. The simplest form is pure natural language:

import { shortest } from "@antiwork/shortest";

shortest("Login to the app using email and password", {
  username: process.env.GITHUB_USERNAME,
  password: process.env.GITHUB_PASSWORD,
});

The framework passes those environment variables as context, so the AI knows what credentials to use. What’s clever is that you’re not locked into pure AI execution. Shortest supports callback functions using .after() hooks for scenarios where you need deterministic assertions:

shortest("Login to the app using username and password", {
  username: process.env.USERNAME,
  password: process.env.PASSWORD,
}).after(async ({ page }) => {
  const clerkId = await page.evaluate(() => {
    return window.localStorage.getItem("clerk-user");
  });

  if (!clerkId) {
    throw new Error("User not found in database");
  }

  const [user] = await db
    .select()
    .from(users)
    .where(eq(users.clerkId, clerkId))
    .limit(1);

  expect(user).toBeDefined();
});

This hybrid approach is the architecture’s strongest design decision. AI handles the brittle selector logic, while you write traditional code for database queries and complex assertions. You’re not replacing Playwright—you’re using Claude as a selector engine.

The framework also supports test chaining for reusable flows:

const loginAsLawyer = "login as lawyer with valid credentials";
const loginAsContractor = "login as contractor with valid credentials";
const allAppActions = ["send invoice to company", "view invoices"];

shortest([loginAsLawyer, ...allAppActions]);
shortest([loginAsContractor, ...allAppActions]);

Each step in the array executes sequentially. This compositional pattern lets you build complex scenarios from simple, readable building blocks.

For API testing, Shortest offers two approaches. You can use the APIRequest class for structured requests or just describe the test in natural language:

shortest(`
  Test the API GET endpoint ${API_BASE_URI}/users with query parameter { "active": true }
  Expect the response to contain only active users
`);

The framework even handles GitHub 2FA authentication by accepting TOTP secrets in the configuration, demonstrating that it can tackle authentication flows that typically require complex Playwright scripts. Running tests is straightforward: pnpm shortest for all tests, or target specific files with pnpm shortest login.test.ts. The CLI supports line number targeting (login.test.ts:23) and headless mode for CI/CD integration.

Gotcha

The elephant in the room is cost and non-determinism. Every test execution hits Anthropic’s API, potentially multiple times per test. For a large test suite running on every commit, this could mean thousands of API calls daily. The README doesn’t discuss pricing implications, but at current Claude API rates, a team running 100 tests per deployment could see significant monthly costs compared to zero marginal cost for traditional Playwright tests.

Non-determinism is the second major concern. AI interpretation of UI state could potentially vary between runs. A button that’s clear in one execution might be ambiguous in another if there’s a loading state or animation. Traditional E2E frameworks are deterministic—the same code produces the same actions. With Shortest, you’re introducing AI decision-making into your test execution path, which means the same natural language instruction might occasionally produce different behaviors. Debugging failures becomes harder because you can’t inspect selector logic the same way you would with traditional locators. For critical user flows where you need guaranteed reproducibility, this potential unpredictability is worth considering.

Verdict

Use Shortest if you’re a startup or small team that values development velocity over test execution cost, especially if non-developers need to contribute to test coverage. It’s perfect for projects where UI changes frequently and you’re tired of fixing broken selectors, or when you need rapid test creation for MVP validation. The hybrid approach makes it practical for teams that want natural language for navigation flows but traditional assertions for business logic. Skip it if you’re running large test suites where API costs would be prohibitive, need absolute determinism for regulated industries or critical payment flows, or work in air-gapped environments without internet access. Also skip if your team has strong E2E automation expertise and values millisecond-level performance—traditional Playwright will always be faster and more predictable. Shortest isn’t replacing conventional testing frameworks; it’s carving out a niche for teams where test maintainability and authoring speed matter more than execution cost and determinism.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/automation/antiwork-shortest.svg)](https://starlog.is/api/badge-click/automation/antiwork-shortest)