Product4 min read

Rule-Based vs. Neural Compression: When to Use Each

LLMLingua and similar neural approaches achieve higher compression, but at a cost. We explain the trade-offs and when each mode is the right choice.

ziptoken

Engineering

March 15, 2025

ziptoken offers two compression modes: a fast, deterministic rule-based engine and an optional LLMLingua-powered neural mode. Choosing between them depends on your latency budget and compression goals.

Rule-based (default)

Latency: <5ms per call
Typical savings: 25–45%

Best for:

Neural (LLMLingua mode)

Latency: 100–400ms per call
Typical savings: 55–70%
Best for: Batch jobs, offline processing, maximising savings when latency is acceptable

Recommendation

Use rule-based for any user-facing request. Switch to LLMLingua for nightly batch summarisation jobs, document processing pipelines, or fine-tuning dataset preparation where you can afford to wait 200ms extra.

Start compressing your prompts

Free tier — 50,000 tokens/month, no credit card required.

Get started free Read the docs