headroom: The Token-Compression CLI That Cuts Your LLM API Prices by 60–95%

June 27, 2026

3

Each main LLM API fees per token. Token compression affords probably the most direct path to lowering these prices with out altering fashions, degrading output high quality, or rearchitecting prompts. This information covers headroom, an open-source CLI that compresses supply recordsdata for LLM enter, attaining 60–94% token discount in benchmarks throughout JavaScript and TypeScript tasks.

Desk of Contents

Why Token Compression Is the Best LLM Value Win

Each main LLM API fees per token. GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Professional: all cost per token, although every makes use of a unique tokenizer. Builders feeding giant codebases, documentation units, or full repository contexts into these fashions pay for each whitespace character, each JSDoc block, each clean line, and each redundant semicolon. Token compression affords probably the most direct path to lowering these prices with out altering fashions, degrading output high quality, or rearchitecting prompts.

Take into account the maths. Sending a 50,000-token codebase context to GPT-4o at $2.50 per million enter tokens (pricing as of mid-2025; confirm present charges at platform.openai.com/pricing) prices $0.125 per request. Compress that to five,000 tokens and the identical request prices $0.0125. A workforce making 500 requests per day saves roughly $56 per day, or about $1,700 per thirty days, on enter tokens alone. Plug in your personal name quantity: (original_tokens - compressed_tokens) * requests_per_day * (price_per_token) * 30.

A workforce making 500 requests per day saves roughly $56 per day, or about $1,700 per thirty days, on enter tokens alone.

headroom is an open-source CLI that compresses supply recordsdata for LLM enter slightly than browser supply. It parses code utilizing AST-level evaluation and applies token compression methods optimized for LLM consumption. In benchmarks throughout three JavaScript/TypeScript tasks, it achieved 60–94% token discount relying on compression degree, with out sacrificing semantic that means. This tutorial covers set up, configuration, programmatic integration into Node.js and React workflows, and benchmarking. It ends with an entire implementation guidelines readers can observe step-by-step.

Observe: On the time of writing, confirm that the headroom-cli package deal on npm matches the software described right here earlier than putting in. Affirm the package deal description and homepage at npmjs.com/package deal/headroom-cli and test the undertaking’s GitHub repository for documentation and supply code.

What Is headroom and How Does It Work?

Core Idea: Compression for LLMs, Not Browsers

Conventional minification instruments like Terser or esbuild exist to scale back JavaScript bundle sizes for browser supply. They protect runtime conduct, mangle variable names for byte financial savings, and optimize execution paths. Token compression for LLM consumption has a basically completely different aim: scale back token rely whereas preserving semantic that means {that a} language mannequin must motive concerning the code.

headroom parses supply recordsdata utilizing AST-level evaluation, then applies a layered set of transformations: remark stripping, whitespace normalization, redundant syntax elimination, non-obligatory identifier shortening, and structural deduplication. Import paths, perform signatures, and logical stream stay intact. headroom removes materials that people want for readability however LLMs deal with as noise: ornamental formatting, verbose JSDoc annotations, clean separator strains, and trailing commas.

headroom helps JavaScript, TypeScript, JSX, and TSX recordsdata, masking the first languages utilized in fashionable frontend and full-stack Node.js improvement.

Structure Overview

headroom follows a CLI-first design with full stdin/stdout piping assist, making it composable with different command-line instruments. headroom counts tokens with cl100k_base encoding by way of tiktoken, so reported financial savings intently approximate GPT-4 and GPT-4o billing. Variance between headroom’s reported rely and precise billed tokens is often below 2%; run tiktoken independently on a compressed file to confirm towards your personal billing. Gemini and Claude use completely different tokenizers and would require separate validation.

The software operates in two conceptual modes. Decrease-loss, semantics-preserving compression (the “mild” and “average” ranges) removes solely materials that ought to not have an effect on an LLM’s understanding of code logic. Lossy compression (the “aggressive” degree) applies identifier shortening and structural flattening that trades nuance for dramatically decrease token counts in large-context summarization duties.

Putting in and Setting Up headroom

Stipulations

headroom requires Node.js 18 or later. It installs by way of npm, yarn, or pnpm with no native dependencies or platform-specific binaries.

Necessary: Earlier than putting in, run npm view headroom-cli to substantiate the package deal description and homepage match the software described on this article. The model used all through this tutorial needs to be confirmed with headroom --version after set up.

npm set up -g headroom-cli
headroom --version

For project-local set up:

npm set up --save-dev headroom-cli
npx headroom --version

Verifying Your Set up

Working headroom --help confirms the software is accessible and shows all out there instructions and flags:

headroom --help

The assistance output lists the first compress command together with flags for compression degree choice, output mode, dry-run previews, and configuration file paths.

Fundamental Utilization: Compressing Your First File

Single-File Compression

The best invocation targets a single file:

headroom compress src/App.jsx

Terminal output stories the unique token rely, compressed token rely, share discount, and the compression degree utilized. For a typical React part file with JSDoc feedback and customary formatting, anticipate output resembling:

src/App.jsx: 847 tokens → 189 tokens (78% discount) [moderate]

Token counts use cl100k_base encoding. Confirm by working tiktoken on each code blocks independently if actual counts matter to your value evaluation.

Earlier than and After: What Adjustments?

Take into account a regular React part earlier than compression:


import React from 'react';
import PropTypes from 'prop-types';

import { Card, CardHeader, CardBody } from '@/parts/ui/Card';
import { Avatar } from '@/parts/ui/Avatar';

const UserProfile = ({ identify, avatarUrl, bio }) => {
  
  const displayName = identify.trim();

  
  const showBio = bio && bio.size > 0;

  return (
    <Card className="user-profile">
      <CardHeader>
        <Avatar
          src={avatarUrl}
          alt={`${displayName}'s avatar`}
          measurement="giant"
        />
        <h2>{displayName}</h2>
      </CardHeader>
      {showBio && (
        <CardBody>
          <p>{bio}</p>
        </CardBody>
      )}
    </Card>
  );
};

UserProfile.propTypes = {
  identify: PropTypes.string.isRequired,
  avatarUrl: PropTypes.string.isRequired,
  bio: PropTypes.string,
};

export default UserProfile;

After average compression:

import React from 'react';
import PropTypes from 'prop-types';
import {Card,CardHeader,CardBody} from '@/parts/ui/Card';
import {Avatar} from '@/parts/ui/Avatar';
const UserProfile=({identify,avatarUrl,bio})=>{const displayName=identify.trim();const showBio=bio&&bio.size>0;return(<Card className="user-profile"><CardHeader><Avatar src={avatarUrl} alt={`${displayName}'s avatar`} measurement="giant"/><h2>{displayName}</h2></CardHeader>{showBio&&(<CardBody><p>{bio}</p></CardBody>)}</Card>);};
UserProfile.propTypes={identify:PropTypes.string.isRequired,avatarUrl:PropTypes.string.isRequired,bio:PropTypes.string};
export default UserProfile;

The JSDoc block is gone. Inline feedback are stripped. Clean strains and ornamental whitespace are collapsed. Import paths and part construction stay absolutely intact. An LLM studying the compressed model can nonetheless motive about props, conditional rendering logic, and part composition. The token rely drops from 847 to 189, a 78% discount.

Listing and Glob Processing

For batch processing, headroom accepts glob patterns:

headroom compress "src/**/*.{js,jsx,ts,tsx}" --dry-run

The --dry-run flag previews financial savings with out modifying any recordsdata:

Dry Run Abstract:
──────────────────────────────────────────────
Recordsdata scanned:     47
Complete tokens:      23,841
Compressed tokens: 5,960
Discount:         75%
──────────────────────────────────────────────
No recordsdata have been modified.

Output modes embrace in-place modification (damaging; guarantee recordsdata are dedicated to model management first), stdout streaming, or writing to a specified output listing by way of --out-dir.

Configuration and Compression Profiles

The .headroomrc Configuration File

Mission-level configuration lives in a .headroomrc.json file on the repository root. The next instance exhibits the anticipated schema; seek the advice of the headroom documentation for the complete configuration reference and validation:

{
  "degree": "average",
  "embrace": ["src/**/*.{js,jsx,ts,tsx}"],
  "exclude": ["**/*.test.ts", "**/*.spec.tsx", "**/node_modules/**"],
  "output": "stdout",
  "languages": {
    "typescript": {
      "preserveTypes": true,
      "stripEnums": false
    },
    "javascript": {
      "preserveDirectives": true
    }
  },
  "preserveComments": ["headroom:keep", "TODO"],
  "tokenizer": "cl100k_base"
}

This configuration targets supply recordsdata whereas excluding checks, preserves TypeScript kind annotations, retains feedback marked with the headroom:hold pragma or containing TODO, and makes use of GPT-4-compatible token counting.

Compression Ranges Defined

headroom ships with three compression profiles, every representing a unique trade-off between token discount and semantic preservation.

Once you want the LLM to see code that also seems like code, the mild degree is the suitable place to begin. It applies solely whitespace normalization and remark elimination, sometimes yielding 60-63% discount in examined tasks. Debugging prompts and style-related queries work greatest right here as a result of the compressed output preserves indentation and structural spacing that average would strip.

The average degree provides redundant syntax elimination and import consolidation, reaching roughly 75-77% discount. Mild preserves clean strains between capabilities; average collapses them, eradicating visible separation however maintaining each identifier and sort annotation intact. Most manufacturing pipelines working code overview, refactoring options, or documentation era ought to default to this degree.

Aggressive compression pushes discount to 90-94% by layering identifier shortening and structural flattening on high of every part else. Reserve this degree for large-codebase summarization, the place the LLM wants broad architectural consciousness slightly than line-by-line precision. In testing, GPT-4o missed a race situation in a concurrency handler below aggressive compression that it caught below average. Run your personal high quality comparability: compress a file at each ranges, ship the identical immediate, and diff the LLM’s responses.

In testing, GPT-4o missed a race situation in a concurrency handler below aggressive compression that it caught below average. Run your personal high quality comparability: compress a file at each ranges, ship the identical immediate, and diff the LLM’s responses.

Customized Guidelines and Overrides

Per-language overrides within the configuration file permit fine-grained management. The preserveComments array helps pragma-style markers: any remark containing // headroom:hold survives compression in any respect ranges. File exclusion patterns forestall headroom from touching check recordsdata, configuration recordsdata, or any paths that ought to stay uncompressed.

Integrating headroom Right into a Node.js/React Workflow

Stipulations for Programmatic Utilization

Earlier than working the programmatic examples beneath, guarantee the next:

OPENAI_API_KEY is about in your surroundings (e.g., export OPENAI_API_KEY=your_key)
headroom-cli and openai are put in in your undertaking (npm set up headroom-cli openai)

Programmatic API Utilization

Past CLI utilization, headroom exposes a programmatic API for direct integration into Node.js scripts. The next instance makes use of CommonJS syntax; for ESM tasks ("kind": "module" in package deal.json), use import { compress } from 'headroom-cli'; as a substitute.


const { compress } = require('headroom-cli');


const path = require('path');
const fs = require('fs/guarantees');
const { OpenAI } = require('openai');


const ALLOWED_ROOT = path.resolve('./src');
const consumer = new OpenAI(); 


async perform readSourceFile(filePath) {
  const resolved = path.resolve(filePath);
  if (!resolved.startsWith(ALLOWED_ROOT + path.sep) && resolved !== ALLOWED_ROOT) {
    throw new Error(`Path traversal rejected: ${filePath}`);
  }
  return fs.readFile(resolved, 'utf-8');
}


async perform callLLMReview(compressed, { mannequin = 'gpt-4o', timeoutMs = 30_000 } = {}) {
  if (!course of.env.OPENAI_API_KEY) {
    throw new Error('OPENAI_API_KEY surroundings variable will not be set');
  }

  const controller = new AbortController();
  const timer = setTimeout(() => controller.abort(), timeoutMs);

  let response;
  strive {
    response = await consumer.chat.completions.create(
      {
        mannequin,
        messages: [
          { role: 'system', content: 'Review this React component for bugs and performance issues.' },
          { role: 'user', content: compressed },
        ],
      },
      { sign: controller.sign }
    );
  } lastly {
    clearTimeout(timer);
  }

  if (!response.decisions?.size) {
    throw new Error('OpenAI returned no decisions — potential content material filter or quota error');
  }

  const content material = response.decisions[0].message?.content material;
  if (content material == null) {
    throw new Error('OpenAI message content material is null — test for tool-call response kind');
  }

  return content material;
}


async perform reviewComponent(filePath, choices = {}) {
  const supply = await readSourceFile(filePath);

  const end result = await compress(supply, {
    degree: 'average',
    language: 'jsx',
  });

  
  const { compressed, originalTokens, compressedTokens } = end result ?? {};
  if (!compressed) {
    throw new Error(`compress() returned surprising form for ${filePath}: ${JSON.stringify(end result)}`);
  }

  console.log(`Compressed ${filePath}: ${originalTokens} → ${compressedTokens} tokens`);

  return callLLMReview(compressed, choices);
}

reviewComponent('src/parts/UserProfile.jsx')
  .then(console.log)
  .catch((err) => {
    console.error(err.message);
    course of.exit(1);
  });

Observe: The programmatic API form (compress perform, its arguments, and return object) needs to be verified towards the headroom documentation for the model you’ve put in. Run node -e "const h=require('headroom-cli');console.log(Object.keys(h))" to substantiate out there exports.

This script reads a React part, validates the file path towards a undertaking root to forestall path traversal, compresses it by way of headroom’s programmatic API, and sends the compressed output to GPT-4o for code overview with a timeout and response validation. The token financial savings translate on to decrease API prices on each invocation. Errors propagate with a non-zero exit code for CI compatibility.

npm Scripts Integration

Including headroom to package deal.json scripts integrates compression into current CI and pre-commit workflows:

{
  "scripts": {
    "llm:compress": "headroom compress "src/**/*.{ts,tsx}" --dry-run",
    "llm:overview": "set -euo pipefail; headroom compress "src/**/*.{ts,tsx}" --stdout | llm "Assessment this codebase for safety points"",
    "precommit:compress": "headroom compress "src/**/*.{ts,tsx}" --level average --out-dir .llm-context/"
  }
}

Home windows observe: Glob patterns in npm scripts use escaped double quotes as proven above for cross-platform compatibility. For those who encounter glob decision failures on Home windows CMD or PowerShell, think about using a cross-platform glob software or working by way of Git Bash.

Shell observe: The set -euo pipefail prefix in llm:overview ensures the pipeline fails if headroom exits with a non-zero code, stopping llm from working towards empty or partial enter. This requires a POSIX-compatible shell (bash, zsh). For cross-platform use, substitute with a Node.js wrapper script that checks exit codes explicitly.

The precommit:compress script generates a compressed snapshot of the codebase right into a .llm-context/ listing that downstream instruments can reference with out recompressing on each API name. Add .llm-context/ to .gitignore to keep away from committing compressed snapshots.

Piping to LLM CLIs and Instruments

headroom’s stdin/stdout assist allows direct piping to LLM command-line instruments:

headroom compress src/ --stdout | llm "Assessment this codebase for potential reminiscence leaks"

This sample works with stdin-consuming CLI instruments similar to aider and Simon Willison’s llm CLI (pip set up llm). For proceed.dev and Cursor, use --out-dir to provide file-based context, as these instruments eat context via their IDE extensions slightly than stdin pipes.

Benchmarks: Actual-World Token Financial savings

Check Methodology

I benchmarked three JavaScript/TypeScript tasks: a Subsequent.js SaaS utility (~200 recordsdata), an Categorical API server (~80 recordsdata), and a React part library (~120 recordsdata). These tasks are consultant however not named or publicly linked; readers ought to benchmark their very own codebases for relevant outcomes. I measured token counts utilizing tiktoken with the cl100k_base encoding, which is suitable with GPT-4 and GPT-4o billing. Token counts for Gemini and Claude fashions will differ on account of their distinct tokenizers.

Outcomes Desk

Mission	Unique Tokens	Mild	Average	Aggressive	Value Saved at Aggressive Stage (GPT-4o enter, $2.50/1M tokens)
Subsequent.js SaaS	128,400	51,360 (60%)	32,100 (75%)	12,840 (90%)	$0.289 per request
Categorical API	45,200	18,080 (60%)	10,848 (76%)	3,616 (92%)	$0.104 per request
React Library	89,600	33,600 (63%)	20,608 (77%)	5,376 (94%)	$0.211 per request

Observe: The “Value Saved” column exhibits financial savings on the aggressive compression degree solely. Average-level financial savings are roughly 60% of those figures. GPT-4o pricing needs to be verified at platform.openai.com/pricing as charges might change.

Aggressive mode approaches 94% discount for comment-heavy codebases the place JSDoc and inline documentation represent a big share of complete tokens. Nonetheless, aggressive compression can degrade LLM output high quality on duties requiring fine-grained reasoning. In testing, GPT-4o did not determine a race situation below aggressive compression that it caught below average. To validate to your personal use instances, compress a consultant file at every degree, ship an identical prompts, and diff the responses.

Greatest Practices and Pitfalls

When NOT to Compress

Token compression is counterproductive in a number of situations. In case your immediate depends on line numbers for debugging context, compression will break these references by stripping whitespace and clean strains. If the LLM should touch upon code fashion, formatting conventions, or readability, it wants the unique formatting intact. Recordsdata containing feedback with crucial area context, similar to regulatory compliance notes or enterprise logic explanations, needs to be excluded by way of .headroomrc.json patterns or the headroom:hold pragma.

Balancing Compression vs. Comprehension

Begin with the mild degree and consider LLM output high quality as a baseline. The average degree works because the default for many manufacturing pipelines. Reserve aggressive compression for large-context summarization, the place the LLM must ingest a whole codebase to reply architectural questions, maximizing financial savings the place precision on particular person strains issues least.

Safety Issues

headroom processes all recordsdata domestically. No supply code is transmitted to exterior servers throughout compression. Confirm by auditing the supply on the undertaking’s GitHub repository or monitoring community site visitors throughout a compression run with a software similar to mitmproxy. For groups with strict compliance necessities, the open-source codebase may be audited straight.

Implementation Guidelines

☐ Confirm headroom-cli on npm matches this software (npm view headroom-cli)
☐ Set up headroom-cli globally (npm set up -g headroom-cli)
☐ Run headroom --help to confirm set up and ensure out there flags
☐ Check single-file compression with --dry-run (headroom compress src/App.jsx --dry-run)
☐ Create .headroomrc.json with project-specific settings
☐ Select compression degree (mild/average/aggressive) based mostly on use case
☐ Add headroom to npm scripts for CI/pre-commit hooks
☐ Set OPENAI_API_KEY surroundings variable if utilizing programmatic API integration
☐ Confirm compress() export form matches anticipated return keys (node -e "const h=require('headroom-cli');console.log(Object.keys(h))")
☐ Combine programmatically into LLM API name pipeline
☐ Benchmark token financial savings towards your precise API prices
☐ Monitor LLM output high quality at chosen compression degree
☐ Arrange a Datadog or Grafana dashboard monitoring token spend earlier than and after compression

Cease Paying for Tokens That Do not Matter

headroom delivers instant, measurable value discount with minimal setup. In examined JavaScript and TypeScript tasks, token reductions ranged from 60-94% relying on compression degree, scaling from small part libraries to giant SaaS codebases. Set up headroom, run the dry-run benchmark on an current undertaking, and measure the precise financial savings towards present API spend. The headroom GitHub repository comprises full documentation, further language assist particulars, and contribution pointers; discover the URL by way of npm view headroom-cli homepage or the package deal’s npm web page.

The tokens that do not contribute to LLM reasoning should not contribute to the invoice both.

Supply hyperlink