Open Table of Contents

The fraud feeling
Why I did this
The 5 Things That Blew My Mind
How This Changed How I Work:
- - Before → After
What This Means for Content Professionals
Should You Build an LLM?

The fraud feeling

When I tell people I built a Large Language Model from scratch, they assume I’m a machine learning engineer.

I’m not. I’m an obsessive content gonk who got tired of pretending to understand what I was working with.

Every day I was reading about AI, thinking about how to optimize content for AI, how to advise clients on AI strategy — I realized I had no idea what was actually happening when a prompt hit a model. Fluent in the vocabulary but illiterate in the mechanics. That bothered me more than I admitted.

So in February 2026 I opened Sebastian Raschka’s Build a Large Language Model From Scratch and decided to just do it. I didn’t know the difference between a token and an embedding. I’d never trained a model. I’d written thousands of words about how transformer architecture is the magic sauce of this whole operation without understanding what a transformer actually does.

Now I’ve built a GPT model, trained it, fine-tuned it for classification, and taught it to follow instructions.

Here’s what I learned — and why it changed how I think about every piece of content I write.

Why I did this

The honest answer is that I felt like a fraud.

Not in a dramatic way — I was good at my job. But “good at my job” meant I could talk about things like large language models convincingly without being able to explain what made them work. I knew the outputs. I didn’t know the machine.

This matters more now than it did two years ago. AI Content Operations is a real category of work, and the roles that pay well aren’t looking for people who can use ChatGPT — they’re looking for people who can work alongside engineers, evaluate model behavior, and explain technical concepts to non-technical stakeholders. That requires a different kind of understanding than prompt engineering tips on LinkedIn.

The question that finally pushed me to start was simple: what actually happens when I type a prompt?

I didn’t have a good answer. So I went and found one.

The 5 Things That Blew My Mind

1. Tokens Are Not Words. I assumed LLMs process words. “Language” is words. It reads words. It outputs words. Clean and simple, right?

Wrong. LLMs process tokens — subword chunks that don’t map cleanly to how humans read text. “Understanding” might be two tokens: “under” and “standing.” “The” is one token. “Antidisestablishmentarianism” is four or five. Punctuation gets its own tokens. A space before a word changes the tokenization.

This isn’t trivia. It affects everything:

Cost: API pricing is per token, not per word. A 1,000-word article might be 1,300 tokens.
Limits: Context windows are measured in tokens. That “128K context window” means 128K tokens — not words.
Prompt design: Every character you type is being tokenized. Formatting choices have real effects.

I started using a tokenizer viewer to see how my prompts actually break down. It immediately changed how I write them. The model doesn’t see words the way I do — it sees chunks, and those chunks shape how it processes everything downstream.

2. Attention Is Literally About Attention. I’d been nodding along to “transformer attention mechanisms” for two years without understanding what that meant. Turns out it means exactly what it sounds like.

When the model generates text, it doesn’t just look at the previous word — it looks at all the previous words and calculates how much attention to pay to each one. Write “The cat sat on the mat because it was tired,” and the attention mechanism figures out that “it” refers to “cat” and not “mat.” Not magic, but math — this is a learned pattern based on billions of examples.

What this means practically:

The model is attending to everything you wrote. Context isn’t just helpful, it’s the signal.
Explicit connections outperform implicit ones. If you want the model to link two ideas, link them yourself.
Vague prompts get vague outputs because the attention mechanism has weak signals to work with.

I stopped writing prompts and hoping the model would figure it out. The mechanism is powerful, but it needs clear inputs.

3. “Training” Means Pattern Prediction, Not Fact Learning.

This one reframed everything.

I assumed training meant teaching a model facts. Show it enough data, it “knows” things. The reality: LLMs are trained to predict the next token. That’s it. Not to be accurate. Not to be truthful. To predict what text sounds right given what came before.

This explains so much that confused me before:

Hallucinations: The model predicts a plausible continuation, not a true one. “Abraham Lincoln invented the telephone in 1867” is a grammatically coherent sentence, so the model might generate it.
Confident wrongness: There’s no “I don’t know” state. The model just predicts the next token that fits the pattern.
Style over substance: Authoritative-sounding text was in the training data, so the model learned to sound authoritative.

I stopped treating LLM outputs as information and started treating them as pattern completions. When I prompt a model, I’m not asking for facts — I’m asking what text typically follows a question like mine. That single reframe changed how I verify and use everything it produces.

4. There Are Three Different Types of “Training”. When people say they’re “training a model,” they could mean three completely different things:

Pre-training — The model learns to predict the next token from massive amounts of unlabeled text: books, websites, code, everything. The result is a model that generates coherent text but won’t follow instructions. Ask it a question and it might just… keep generating text in the style of your question.

Fine-tuning — The model learns a specific task from labeled examples: classify this, summarize that, detect sentiment here. The result is a model that’s very good at one narrow thing.

Instruction tuning — The model learns to follow directions from instruction-response pairs. The result is a model that can chat, answer questions, and complete tasks the way you expect.

Same architecture. Different training. Completely different behavior. A pre-trained base model completing “The capital of France is…” is doing something fundamentally different from ChatGPT answering “What’s the capital of France?” — even if the output looks similar. One is pattern completion. The other is instruction following.

5. Chat Models Are Just Base Models with Extra Training. I thought ChatGPT was a different kind of thing from GPT-3. Different architecture and technology, categorically more advanced, locked in a black box.

It’s not. It’s the same architecture with more training. The recipe:

Start with a pre-trained model
Collect thousands of instruction-response pairs
Train until the model learns: instruction → appropriate response
Optionally apply RLHF (Reinforcement Learning from Human Feedback) to align outputs with human preferences

That’s it. Same model. More training. Bake for 450º for 25 minutes and out comes a chatbot. (Not literally.)

The implication that hit me hardest: this means you could take a base model and instruction-tune it for a specific domain. Want a model that’s deeply specialized in your industry, your clients, your content? You don’t need to build from scratch — you need good instruction-response pairs and a fine-tuning run. That’s accessible in a way “build a GPT” isn’t.

The pattern across all five: The mechanics of AI were so foreign to me that it felt like magic. It isn’t.

Every behavior that seemed mysterious — understanding context, following instructions, generating coherent text — has a concrete mechanism behind it. Tokens. Attention. Pattern prediction. Task-specific training. The magic dissolves when you see the gears (math). And once you see them, you can work with the machine instead of just hoping the black box does what you want.

There’s a sixth thing that deserves its own article: emergent capabilities — the phenomenon where models trained on enough data start doing things they were never explicitly trained to do. One of the strangest and most fascinating parts of all of this, and I’m still wrapping my head around it.

How This Changed How I Work:

Before: The Black Box

What my workflow looked like:

Write a prompt. Get a mediocre result. Change a word. Get a different mediocre result. Change the format. Get something better. No idea why. Like spinning knobs in a dark room. I guess we move on.

That was my entire process. Trial and error with no theory behind it. Sometimes I’d get great output and couldn’t reproduce it. Sometimes I’d get garbage and couldn’t diagnose why. The model felt like a temperamental coworker — sometimes helpful, sometimes not, never predictable.

When outputs were wrong, I had no framework for fixing them. Hallucination? Off-topic response? Inconsistent tone? My only tool was “try again.” I couldn’t tell you whether the problem was in my prompt, the model’s training, or something else entirely.

And having to talk about AI with stakeholders or engineers? I could feel the gap. I’d use the right words — “transformer architecture,” “attention mechanism,” “training data” — but I was parroting, not explaining. I know that engineers could tell. I could tell.

After: The gears

What my workflow looks like now:

The first thing that changed was how I write prompts. I stopped thinking of them as questions and started thinking of them as inputs to a prediction engine.

Writing prompts: When I understand that the model is paying attention to all previous tokens and using them to predict the next one, prompt design stops being guesswork. I know that putting the instruction at the end of a long context might weaken its attention signal. I know that providing examples shifts the model’s prediction toward a pattern I want. I know that being explicit about relationships between ideas gives the attention mechanism stronger signals to work with.

I’m not a better prompt writer because I learned a secret formula. I’m a better prompt writer because I understand what the model is actually doing with the words I give it.

Debugging outputs: When a model hallucinates now, I don’t just try again. I ask: is the model predicting something plausible that happens to be wrong? If so, I need to provide the correct information in the prompt and constrain the output to use it. That’s a training limitation, not a prompt problem — and the fix is different.

When the tone shifts mid-response, I think about attention. Is the model attending more to a later instruction than an earlier one? I restructure the prompt to put the most important constraints where they’ll get the strongest attention signal.

When outputs are repetitive, I recognize it as a prediction loop — the model is generating tokens that reinforce the pattern it’s already producing. The fix is to break the pattern, not to rephrase the question.

Talking about AI: This is where the biggest shift happened. When an engineer says “we’re fine-tuning the base model on our documentation corpus,” I now know exactly what that means. I know what they’ll get (a model good at one specific thing) and what they won’t get (general instruction following). I can ask relevant questions about their training data, their evaluation metrics, their deployment strategy.

I’m not pretending anymore. That’s the difference.

Evaluating tools: Every AI content tool claims to “understand your brand voice” or “optimize for SEO.” Before, I couldn’t assess those claims. Now I can ask: what model is this? Is it fine-tuned or prompted? What’s in the system prompt? Is the “optimization” happening in the model or in post-processing? The answers tell me whether the tool is doing something meaningful or wrapping GPT-4 in a branded interface.

Before → After

Write prompts by trial and error → Write prompts with intent
Debug by trying again → Debug by diagnosing the mechanism
Nod along in technical meetings → Actually contribute
Read AI marketing copy uncritically → Ask specific questions about the model
Explain AI with buzzwords → Explain AI with accuracy

The Real Lesson: The difference between guessing and debugging is understanding the mechanism. I went from “the model is temperamental” to “the model is doing exactly what it was trained to do — I just didn’t understand what that was.”

Every change in my workflow traces back to one of the five things I learned. Tokens changed how I think about costs and limits. Attention changed how I structure prompts. Pattern prediction changed how I evaluate outputs. The three training phases changed how I think about different tools. And knowing that chat models are just base models with extra training changed what I think is possible.

None of this made me an engineer. But it made me someone who can work with engineers — and that turns out to be the more valuable skill.

What This Means for Content Professionals

The Skill Gap

Here’s the problem: most content professionals who work with AI don’t understand how AI works.

They can use ChatGPT. They can write prompts. They can generate content, iterate on outputs, and build workflows around AI tools. But ask them what a token is, how attention works, why models hallucinate, or what the difference is between pre-training and fine-tuning — blank stares.

I know because I was one of them. And I was good at my job. But “good at my job” had a ceiling, and that ceiling was determined by how much I didn’t understand about the tools I was using every day.

The job market already sees this gap. Look at the postings. AI Content Operations roles don’t want people who can use ChatGPT — everyone can do that. They want people who can evaluate model behavior, work alongside ML teams, and explain technical concepts to non-technical stakeholders. One posting I looked at — Technical Writer at a healthcare AI company — asks for Python skills and the ability to “keep pace with emerging AI technologies.” That’s not about using AI tools. That’s about understanding them.

The “AI skills” resume problem. Everyone’s resume says “proficient in AI tools” now. It means nothing. It’s the 2026 version of “proficient in Microsoft Office.”

“Built a GPT model from scratch using PyTorch” means something different. It’s specific, verifiable, and it signals a depth of understanding that no amount of AI tool certifications can match.

The Opportunity

AI Content Operations is a real, growing category. It’s not a buzzword. Companies are building teams dedicated to managing AI-generated content at scale, evaluating model outputs for accuracy and brand alignment, maintaining quality across hundreds of AI-assisted workflows, and bridging the gap between engineering and editorial.

These roles pay well because they require a rare combination: content skills AND technical literacy. Most content people don’t have the technical side. Most technical people don’t have the content side. The overlap is small, and the demand is growing.

Technical writers who understand AI are in demand. Every AI company needs documentation — API docs, model cards, developer guides, internal knowledge bases, user-facing explanations. And the best technical documentation comes from people who’ve struggled to understand the thing they’re explaining.

I know this because I struggled. I read explanations of attention mechanisms that assumed I knew linear algebra. I read documentation that used terms I hadn’t learned yet. I read tutorials that skipped the “why” and jumped straight to the “how.” The experience of being confused by bad documentation fuels my need for the opposite.

Content strategists who can work with ML teams are rare. ML teams speak in tokens, parameters, epochs, and loss functions. Content teams speak in brand voice, user journeys, and editorial calendars. Someone who can sit in a room with both groups and translate between them isn’t a nice-to-have — they’re a force multiplier.

What You Can Do

You don’t need to become an engineer. Let me say that clearly because it’s the thing that almost stopped me: you don’t need to implement ML systems in production. You need to understand them well enough to work with the people who do.

That means learning five concepts — tokens, attention, the three training phases, why hallucinations happen, and how context windows work. Not the math behind them. Not the implementation. Just enough to have intelligent conversations and make informed decisions. That’s a weekend of reading, not a computer science degree.

Then build something small. Raschka’s book is the most accessible entry point I’ve found — written for people who aren’t engineers, well-commented code, no GPU required. Google Colab can handle the compute for free. The act of building something, even a toy model that generates gibberish, changes your relationship with the technology in a way that reading about it never does. You stop seeing AI as magic and start seeing it as machinery.

Then learn to explain it. That’s the actual superpower. Engineers can build the model. If you can sit in a room with a VP of Marketing, an ML engineer, and a product manager and translate between all three — that’s the job. That’s not an exaggeration. It’s the market signal.

Should You Build an LLM?

Who This Is For

Content professionals who work with AI daily. If your job involves prompting, evaluating, or managing AI-generated content, understanding the mechanics makes you materially better at it. Not theoretically better — practically better. You’ll write better prompts, diagnose problems faster, and have more credible conversations about AI. The investment pays for itself in workflow efficiency alone.

People applying to AI-adjacent roles. Technical writing at AI companies. Content operations. Developer relations. Product marketing for ML products. In interviews, being able to say “I built a GPT model from scratch” changes the conversation from “do you know AI?” to “how deeply do you understand it?” It’s a proof-of-work credential that no amount of LinkedIn Learning certificates can replicate.

Anyone tired of pretending. If you’ve been nodding along to transformer architecture discussions without understanding them, this cures that. If you’ve written content about AI without knowing how it works, this closes the gap. The relief of actually understanding — of moving from parroting to explaining — is worth the effort.

Who This Isn’t For

People who just want to use AI tools better. You don’t need to build an LLM to write good prompts or use ChatGPT effectively. There are plenty of resources for prompt engineering and AI-assisted workflows that don’t require understanding tokenization. If your goal is “use AI better at my current job,” there are faster paths.

People looking for a quick credential. 25 days isn’t nothing. Some days I spent two hours. Some days I spent five. Some days I spent two hours on a single concept, then realized I’d been overthinking it the whole time. If you want something you can finish in a weekend, this isn’t it.

People who freeze at the sight of code. Full honesty: there is code in this book. Python. PyTorch. It’s accessible — well-commented, well-explained — but it’s code. If that’s a hard no for you, this will be a harder climb.

That said, I froze at the sight of code too. And I got through it. So maybe this belongs in the “who this is for” column after all.

What You Actually Need

A computer with internet access. Google Colab is free and gives you cloud compute. No GPU required.

Basic Python familiarity. Helpful but not required — the book explains what you need as you go.

Willingness to feel dumb for a while. That’s the real prerequisite. Some concepts clicked immediately. Others took days. I stared at the attention mechanism for an embarrassingly long time before it made sense. Then I realized I’d been overcomplicating it and it was literally about attention. That’s the process.

Patience with yourself. The book builds on itself — each chapter assumes you understood the last one. Don’t skip ahead. Don’t rush. The foundation matters.

What you don’t need: a math degree (Raschka explains the math), prior ML experience (I had none), imposter syndrome to pass (I had plenty and it almost stopped me — don’t let it).

The Real Question

The question isn’t “can you do this?” You can. I’m proof — a content person with an English degree who thought a token was a kind of word.

The real question is “what changes when you do?”

For me, it changed how I write prompts (strategic instead of random), how I debug outputs (diagnostic instead of reactive), how I talk about AI (specific instead of vague), and how I think about my career (there’s a real skill gap I can fill, and I have evidence that I’ve filled it).

In a job market where everyone claims “AI skills,” actually understanding how it works is the differentiator. Not because it makes you better at using AI — though it does — but because it makes you credible in the rooms where the decisions get made.

25 days ago I didn’t know the difference between a token and an embedding. Now I’ve built a GPT model from scratch. The magic isn’t gone — it’s just been replaced by understanding. And that’s better

I Felt Like a Fraud. So I Built a GPT Model.

Table of Contents