Tokens and Why They Matter

Better titled as: Why do I need to swipe my credit card to build with AI?

Feb 03, 2025

Estimated reading time: 5 minutes

Recently, I decided to run an experiment to build an application using AI agents. Cost was not something that had registered when I started this experiment, as I had assumed that I could do this for free. I began with a simple prompt but quickly noticed warnings in the chat window like:

“X tokens left” or “Y messages remaining.” Then came the dreaded: “Upgrade to a paid tier NOW!”

That led me down a rabbit hole—why do tokens matter so much when building with AI?

I had seen token numbers in my ChatGPT window before, but after I paid for the PRO plan, I’d only occasionally hit the message limit. So, I never thought much about it—until I found myself spending $15, $20… a gazillion USD just to build a simple application. That observation has led to this post.

So, here’s my rabbit hole of the week: Tokens.

So, what the heck is a Token?

A token is a chunk of text that an AI model processes. While you might think of AI interactions in terms of sentences or words, AI models break the text down into smaller pieces. This allows them to:

Handle language more efficiently
Process unfamiliar words
Reduce memory usage

Think of it like chopping vegetables for a recipe. Instead of dealing with a whole carrot, you slice it into smaller pieces, making it easier to cook and mix. AI does the same by breaking down text into manageable chunks.

For example:

"Hello!" → 2 tokens ("Hello" + "!")
"ChatGPT is great!" → 4 tokens ("ChatGPT", " is", " great", "!")

Different AI models tokenize text differently—even spaces and punctuation count as tokens.

Why Do Tokens Cost Money?

You might wonder: why does generating text cost anything at all? Unlike traditional software, where actions like clicking a button or running a script are virtually free, AI inference is computationally expensive.

Here’s why:

AI models require powerful GPUs and cloud servers to process each request.
Every token needs to be processed, stored, and referenced, which takes computing resources.
Longer context windows = more memory and processing costs.

Essentially, when you send a request to an AI model, you’re renting expensive hardware for a few seconds. This is why even free-tier users have strict token limits.

Why This Matters

Since AI providers charge based on tokens, the more tokens you use, the more expensive each interaction becomes. Long prompts, detailed responses, and lengthy conversations consume more tokens, which means higher costs for the user.

Token-Based vs. Message-Based Pricing: What’s the Deal?

AI providers have two common pricing structures:

Token-Based Pricing (Most Common)

✅ Used by OpenAI (GPT models), Anthropic (Claude), Mistral, and others
✅ Charges based on the total number of tokens processed (input + output)
✅ You pay for every character the model reads and generates

Message-Based Pricing (Less Common)

✅ Some chatbot platforms charge per conversation or per message
✅ Bundles AI usage into a flat rate per interaction rather than per token
✅ May cap the number of messages or complexity of responses

Why Some Applications Can Absorb Token Costs within Message-based Pricing

Some applications can absorb token costs within message-based pricing. These include:

High-volume consumer chatbots: Predefined responses limit token use per message.
Subscription-based SaaS models: Unlimited messaging balances high and low-usage customers.
Optimized AI agents: Structured, pre-processed text minimizes token use per message.
Advertising-supported models: Monetization through ads or upselling covers token costs.

Both models have trade-offs, but token-based pricing is the most transparent since it directly ties cost to processing power.

Why This Matters

If you're using AI agents for automation, content generation, or building a chatbot, understanding token usage is crucial for:

For builders and developers, costs are an expected part of the process. They know they will need to pay for additional usage at some point. However, for everyday users, this can come as a shock. Many casual users assume AI interactions are free or covered under a flat-rate subscription, only to find themselves running into token limits unexpectedly. This lack of awareness can lead to frustration when their conversations suddenly stop or when they realize they need to upgrade to continue using a service they thought was unlimited.

Controlling costs—preventing unexpected charges.
Optimizing responses—balancing detail vs. token efficiency.
Avoiding forced upgrades—not accidentally hitting a paywall too soon.

How Ignoring This Can Push You Into a Higher Pricing Tier

Many platforms offer free or hobby-tier plans with token limits. If you're unaware of token consumption:

Your AI assistant might stop responding mid-conversation due to hitting a token cap.
You might burn through your free-tier quota in just a few prompts.
Your project might suddenly require an upgrade to continue working.

In other words, what starts as a small experiment can quickly force you to bump up to a paid plan just to keep going.

Be Specific, But Not Too Specific - the catch-22

AI models work best with clear, detailed prompts—the more context you provide, the better the output. But here’s the irony:

More details = More tokens.
More tokens = Higher cost.

If you’re on a free or low-cost tier, you might try to be thorough in your instructions, only to find that your long, well-structured prompts push you past the free limit.

This creates a paradox:

You’re told to be explicit and detailed in your instructions.
But the moment you do, you get cut off or hit a paywall.

For example, if you build a chatbot and provide well-structured, multi-step instructions, your message might eat up your entire free-tier allocation in one go. That means you’ll be forced to upgrade to continue experimenting.

How to Optimize for Cost-Efficiency

If you want to make the most of your tokens without jumping into paid plans too soon, here are a few strategies:

Be concise—get to the point in your prompts while keeping critical details.
Use structured formats—bullet points, numbered lists, and precise wording help.
Limit unnecessary words—AI models don’t need flowery language; they need clarity.
Optimize outputs—set response length expectations (e.g., “Reply in 100 words”).
Use memory-efficient approaches—some platforms allow storing context outside the AI session.

Future trends

As AI adoption grows, pricing models may evolve. Here are a few possible trends:

Subscription-based models—flat-rate plans that cover a certain level of usage.
Per-task pricing—charging based on complexity rather than tokens.
Hybrid models—a mix of token-based and fixed pricing tiers.
Open-source models—self-hosted AI solutions to avoid cloud costs.

However, as long as AI requires heavy computing resources, some form of usage-based pricing will likely remain.

Final thoughts

At the end of the day, AI isn’t just a magical oracle that runs on goodwill—it requires computing power, infrastructure, and plenty of tokens to keep conversations going. Whether you’re a builder, a developer, or just a casual user, understanding how tokens work can save you from unexpected paywalls and help you make smarter decisions about your AI interactions.

So, the next time you find yourself staring at a 'You’ve reached your token limit' message, just remember: intelligence isn’t free, and AI needs its “ahem” fair “ahem” share of swipes at your credit card. Choose wisely, optimize where you can, and keep building.

Aparna’s Substack

Discussion about this post