Claude AI Workflow  ·  Real Case Study

How to Use Claude AI Effectively

Most people are using Claude wrong. They treat it like a chatbot. I use it like a production system — and it ships hours of real work every week.

By Scott Willis · April 10, 2026 · ~14 min read · View as slide deck →
How to Use Claude AIClaude AI WorkflowClaude Best PracticesClaude vs ChatGPTClaude PromptingAI Iteration LoopSystem Design

This guide is for people who already opened a Claude tab, got an answer that almost worked, and walked away thinking AI is overhyped. It's not overhyped. You're using it like a search engine when you should be using it like a junior engineer who never sleeps.

Below is the exact workflow I use to ship production code, file pro se legal documents, run a serverless honeypot, and generate organic content with zero ad spend. The same loop, every time, across every domain.

What is Claude AI? Claude is a large language model built by Anthropic for reasoning, coding, long-form writing, and tool use. The most effective way to use Claude is not "better prompting" — it's wrapping the model in a structured workflow with real artifacts, real tests, and tight feedback loops. Prompting is Stage 1. System design is Stage 2. This guide is about Stage 2.

What Most People Get Wrong About Claude

Three failure modes account for almost every "Claude isn't that useful" complaint I see online:

If you fix only those three things, your Claude experience improves more than switching models ever will.

My Actual Claude Workflow (Step-by-Step)

People keep asking me for my "prompt template." There isn't one. The thing that matters is the loop, not the wording. Here it is, in full:

Define the outcome → Claude drafts → Test against reality → Specific feedback → Ship or repeat.

The critical insight: most people stop at step 2. They get a draft and try to use it. The real value is in steps 3 and 4 — testing against your actual environment and giving feedback precise enough that the next iteration converges instead of wandering. That's it. That's the engine. Below is what each step actually means in production.

1. Define the outcome (before you type a single word)

This is the single most important step and the one people skip. You must know exactly what "done" looks like in production reality. Not "a good query." Not "a legal-looking brief." Done means: copy-paste into the live environment with zero errors, or filed with zero procedural defects, or a function that logs real probes and blocks nothing legitimate.

Write the definition in plain English first: "After this loop, I will have a single script that replaces the legacy ones, runs in under two seconds on a 10k-row dataset, and requires zero user training." A model is a world-class pattern-matcher. If the target pattern is fuzzy, the output is fuzzy. Clarity here is the entire game.

2. Claude drafts

Feed the crystal-clear outcome plus all the real data — never summaries. Paste the actual query text, the actual docket entries, the actual log lines, the actual analytics exports. State your hard constraints up front. One-shot draft. No hand-holding yet. Let it go full creative. You'll fix it in the next steps.

3. Test against reality (this is where 95% of AI users fail)

You become the merciless QA department. Run the code in a real dev environment. File the draft in test mode. Deploy the function and hit it with real traffic. Check the indexing tools for the new content.

Document every failure with surgical precision. Do not say "it's wrong." Say: "Line 47 throws an invalid handle on the temp buffer because the query uses an alias that only exists in the older code path."

4. Specific feedback (the convergence engine)

This is the art. Your feedback must be so precise that the next draft cannot possibly make the same mistake.

You are training the model on your exact domain reality in real time. That training only works if your descriptions are reproducible.

5. Ship or repeat

Two choices only. Ship = it passes your production test with zero caveats. Repeat = go straight back to step 3 with the new failure data. No "maybe one more prompt." The loop is sacred.

You keep repeating until the output is production-ready every single time. That's why the audit work delivered real measurable gains, why the filings forced substantive responses from experienced opposing counsel, and why the honeypot logs hostile probes while serving clean content to humans.

Why This Loop Crushes Every Other AI Workflow

Pro Tips to Make Your Loop Tighter

That's the entire methodology. It's boring, unsexy, and brutally effective — exactly like real engineering. Everything else in this article is what happens when you run this loop, hard, against real problems.

Professional: A 27% Performance Gain Across Hundreds of Database Views

I work in enterprise distribution software that runs on a relational database. Custom views accumulate over years. Reports get slow. The interesting question is never "is this view slow" — it's "is the join order wrong, is there a non-sargable predicate, is the index even being used."

Working with Claude on the audit, I batched the changes into discrete change sets and treated each one as a unit of work with a baseline, a hypothesis, and a rollback. Two patterns came up over and over:

300+
views audited
~27%
aggregate gain
70
change sets
99%+
match rate

None of those patterns are clever. The leverage was running the loop fast enough that we could touch dozens of change sets in the time a single audit normally takes. Claude wasn't doing the optimization — I was, with Claude as the rubber duck that could also write the rewrite. If you want the reverse-direction story — pulling data out of complex systems for real reporting — that's exactly what we build at the SiegeStack ETL Showcase.

The other professional win was a scan-to-ship workflow for a high-volume warehouse: a fragile suite of nine spreadsheet macros, replaced by a single script that watches a folder, queries the inventory database, and matches scans at 99%+. Co-authored end-to-end with Claude, deployed in production, no humans in the loop.

Legal: Pro Se Litigation

This is where the "Claude as smart junior" framing matters most.

An opposing attorney filed a false address with the court using an address I hadn't lived at in over a decade — 103 days after their own team personally served me at my actual location. A default judgment based on that filing stripped me of custody for over 490 days without my knowledge. When I found out, I filed pro se.

The defense moved to dismiss under attorney immunity. Their lawyer had been practicing for thirty years. I had Claude.

Claude helped me do three things that mattered:

The motion to dismiss was denied. The case is set for trial. There's an active criminal investigation and complaints have been filed with the relevant professional licensing bodies.

The verification rule: Claude fabricates citations. Not often, but enough that you must independently verify every case, every statute, every quote against the actual source before it goes in a filing. If you can't be bothered to do that, don't use AI for legal work. The cost of a hallucinated citation in front of a judge is your credibility — and credibility is the only currency a pro se litigant has.

Personal Project: A Honeypot at safesapcrtx.org

I wanted to understand how attackers actually behave by watching them — not by reading about it. So I built safesapcrtx.org: a static site that looks like a vulnerable WordPress install, deployed on Netlify's free tier, with edge functions acting as a serverless firewall and Supabase logging every probe.

Total monthly cost: $0. Daily intake: 80+ probes.

The probe paths are exactly what you'd expect: /wp-login.php, /xmlrpc.php, /admin/.env, /shell.php, /backup.sql, /phpmyadmin/. Each one tells you something about the attacker's playbook — credential stuffing, environment variable hunting, webshell discovery, database dump fishing. The interesting work isn't catching probes — it's clustering them into actor groups by behavior.

Claude helped me build the edge function classifier, the Supabase schema, and the dashboard. Two weekends of work, end to end. The honeypot is live and writing data right now. The repo is github.com/iamnotcheckingit-cyber/-safe-sapcr-texas.

Zero Ad Spend: The Strangest Validation

I have never paid for an ad. Not for SiegeStack, not for the honeypot site, not for anything. But people I have never met — strangers, completely unknown to me — independently paid for advertising campaigns promoting safesapcrtx.org. They thought the content mattered enough to spend their own money on it.

That is the strangest validation I've ever received as a builder. It also taught me something concrete: the content has to be good enough to make someone else want to amplify it. That bar is much higher than "good enough to ship," and it's the only bar that matters for organic growth.

The Honest Limitations

Everything above is real. So is everything below. Anyone selling Claude as a magic co-founder is lying.

Compaction is not lossless

When the context window fills, Claude summarizes the older parts of the conversation. That summary drops nuance and edge cases. You will lose things. Plan for it: checkpoint your decisions in external files, restart sessions deliberately, don't rely on Claude to remember what happened ten thousand tokens ago.

Confidently wrong, roughly 20% of the time

Claude will produce answers that sound right and aren't. Citations get fabricated. Working code gets rewritten unnecessarily. The human is the verification layer — always. If you don't have the domain expertise to catch the wrong answers, you cannot use Claude safely for that domain.

Instructions get ignored in long sessions

"Don't change X" gets violated. "Use the function we wrote earlier" gets ignored. Re-state critical constraints in every meaningful turn. Treat the assistant as memoryless even when it isn't.

Not actually a stateful partner

Default chat has no cross-session memory. What feels like continuity is you re-establishing context every time. That's a workflow you build, not a feature the model gives you.

The honest framing: Claude is a smart but unreliable junior, not a co-founder. The leverage comes from how you wrap it, not from the model alone.

From Prompting to System Design

This is the actual leap, and it's the part most articles miss. Stage 1 thinking is "how do I write a better prompt." Stage 2 thinking is "how do I build a system around the model." Power users live in Stage 2.

Don't:

Do:

The real formula: Claude + external memory + structured workflows = leverage. The honeypot, the database audit, the legal filings — none of those were "I asked Claude a question." Each one was a system: scoped tasks, real artifacts, tight feedback loops, verification at every step.

Claude vs ChatGPT: When to Use Each

People keep asking me "Claude vs ChatGPT — which one should I use?" The honest answer is both, for different things. They're not interchangeable. After running serious work through both daily, here's the split I actually use:

Use Claude when:

Use ChatGPT when:

My personal split is roughly 80/20 Claude/ChatGPT for production work and roughly 50/50 for exploration. Both have their place. Anyone telling you "X is better than Y, period" is selling you something. Anthropic's documentation is also worth reading directly — most people skip it, and most "Claude tips" articles you see are downstream of it.

Common Mistakes That Break Claude

If your Claude session feels like it's going nowhere, check this list. Nine times out of ten one of these is the cause:

Claude Workflow Template (Copy/Paste)

Here's the literal template I paste at the top of any new serious Claude session. Steal it. Modify it. The point isn't the wording — it's that you've forced yourself to answer every question in it before you start prompting.

# Session brief

## Outcome (definition of done)
- [What artifact will exist when this session is over?]
- [What test will it pass?]
- [Where will it be deployed/filed/shipped?]

## Constraints (hard rules)
- [Language / framework / version]
- [Things you must NOT change]
- [Style / tone / format requirements]

## Context (real artifacts only)
- [Paste the actual code / log / document / data — never summaries]

## Prior failures (so we don't repeat them)
- [What did the last attempt get wrong?]
- [What was the precise reason?]

## Verification plan
- [How will I test this output before shipping?]
- [What command / step proves it works?]

That's it. Five sections. If you can't fill all five in before you start prompting, you're not ready to prompt — go figure out the missing piece first. I can't overstate how much faster work moves once you internalize this.

Frequently Asked Questions

What is Claude AI used for?

Claude is used for long-form writing, code generation and refactoring, structured reasoning over long documents, legal and technical drafting, data analysis, and as the LLM behind agent / tool-use workflows. Its strength is holding coherent context over many thousands of tokens, which makes it well-suited for production engineering work and complex documents.

Is Claude better than ChatGPT?

Neither is universally better. Claude tends to win on long-context work, careful editing, and conservative tone. ChatGPT tends to win on fast breadth-first ideation and multi-modal features. Most serious users keep both open and route work to whichever fits the task. See the comparison section above for the split I use.

How do you write good prompts for Claude?

The best Claude prompts aren't clever — they're specific. State the outcome, paste real artifacts (not summaries), list hard constraints, and include any prior failures so the model doesn't repeat them. Then iterate with precise corrections, not vague "try again" prompts. The Claude workflow template above is exactly this, formalized.

Can Claude replace developers?

No. Claude amplifies developers — it doesn't replace them. Without domain expertise to verify output, catch hallucinations, and define what "done" actually means, you get confident-sounding garbage. The human is the quality gate. The 27% performance gain in this case study only happened because a human knew which patterns mattered and could test the rewrites against a real database.

What's the biggest mistake people make with Claude?

Treating it like a search engine instead of a junior collaborator. They paste a vague question, take the first answer, and walk away. The leverage is in the loop — defining done, drafting, testing against reality, giving precise feedback, and repeating until production-ready. Skip the loop and you're just chatting with an autocomplete.

Does Claude have memory between sessions?

Default chat has no persistent cross-session memory. What feels like continuity is you re-establishing context every time. Serious workflows externalize memory in project files, system prompts, or the Anthropic Memory Tool. Treat the chat as stateless and your reliability goes way up.

What I'd Tell Someone Starting Today

You're the expert. Claude amplifies domain knowledge. If you don't have the knowledge, you get confident-sounding garbage. The human is the quality gate.

Specificity compounds. One precise correction saves five rounds of guessing. Invest the time to describe exactly what's wrong.

Ship real things. The gap between "playing with AI" and "using AI" is whether something goes into production. File the motion. Deploy the script. Push the site live.

Side projects matter. My honeypot made me better at log analysis at work. My legal filings made me a better writer. Everything cross-pollinates.

AI levels the playing field. A pro se litigant produced filings that survived a motion to dismiss from a 30-year attorney. That wasn't possible five years ago. It is now. Use it.