Stop Paying LLM Prices for Redundant Time-Series Data

Large Language Models are becoming part of everyday data pipelines. But once you start feeding them time-series data, a quiet inefficiency shows up fast: tokens.

If you are sending metrics, logs, sensor readings, or financial ticks into LLMs using JSON or CSV, you are almost certainly paying more than you should-in cost, latency, and lost context window.

This post introduces TSLN (Time-Series Lean Notation): a text-based serialization format designed for how LLMs actually read numbers, not how humans assume they do.

The Core Problem: LLMs Don't Read Numbers Like Humans

Time-series data is structurally simple. Most values change slowly. Timestamps advance predictably. Fields repeat.

Humans immediately see the pattern. LLMs don't.

Modern tokenizers were built for language, not numeric streams. Two adjacent numbers that look similar to us may tokenize very differently to a model. Repeating structure gets re-tokenized every time. Verbose syntax quietly eats context.

At small scale, this is annoying. At production scale, it becomes an infrastructure cost.

A Math Analogy That Actually Fits

If you've taken calculus or physics, you already understand the idea behind TSLN.

Think about a moving object:

Position tells you where it is
Velocity tells you how position changes
Acceleration tells you how velocity changes

Now map that to time-series data:

Raw values are position
Differences between values are velocity (delta)
Differences between deltas are acceleration (delta-of-delta)

Most time-series data lives very close to zero acceleration. Values drift. Timestamps tick forward regularly. That structure is incredibly compressible-if you represent it correctly.

TSLN is essentially what happens when you apply this math intuition to data serialization, and then make it tokenization-aware.

Why JSON Is the Wrong Baseline

JSON is great for APIs and configuration. It is not great for streaming numeric data into LLMs.

Every row repeats keys. Every timestamp is fully restated. Precision is often higher than needed. None of this helps the model reason better, but all of it increases token count.

Traditional compression doesn't fully solve this either. LLM APIs operate on text, not binary blobs. By the time compressed data is base64-encoded, much of the advantage is gone.

TSLN takes a different approach: optimize before tokenization, not after.

What TSLN Does Differently

TSLN is still plain text. You can read it, diff it, stream it, and debug it without special tools. The difference is what gets written.

Instead of repeating full values, TSLN stores changes. Instead of repeating structure, it declares schema once. Instead of sending large floating-point numbers, it sends smaller, bounded deltas that tokenize consistently.

The result is data that is mathematically equivalent, semantically intact, and far cheaper for LLMs to consume.

In early benchmarks, the same datasets serialized with TSLN used up to ~80% fewer tokens than JSON. That reduction directly translates into lower cost, lower latency, and larger effective context windows.

TSLN it sits underneath AI systems that consume time-series data.

If you already have metrics, logs, or streams-and you want to analyze them with LLMs or other AI models, TSLN is about making that data cheaper and cleaner to move.

The same format can also feed non-LLM systems that benefit from compact, structured time-series input.

Open Source and Available Today

We recently released Go lang and Node.js implementations of TSLN under the MIT License.

These libraries handle encoding and decoding, support streaming use cases, and are designed to be dropped into existing pipelines with minimal friction. You can link directly to the GitHub repositories from here.

The goal is simple: make it easy to experiment, measure, and decide whether token-efficient serialization matters in your workload.

We have already implemented TSLN into our terminal and browser application.

We are actively expanding our evaluation across larger datasets, more diverse time-series patterns, and multiple models and tokenizers.

We will release a comprehensive whitepaper with deeper benchmarks, methodology, and real-world experiments very soon.