What does 'working with Claude' actually look like in practice?

It's an iteration loop, not a single prompt. Define an outcome, Claude drafts, you test against your real environment, give precise feedback, refine. The leverage comes from running the loop many times against real artifacts.

Can Claude help with pro se litigation?

Yes, with verification. Claude can co-author petitions, draft responses to dispositive motions, and surface controlling precedent. Every citation must be independently verified against the actual case law before filing.

How much database performance gain did the loop deliver?

Roughly 27% aggregate across hundreds of production views, primarily from sargability fixes, index alignment, and removing unnecessary scalar functions.

What's the honest limitation?

Compaction is not lossless. Claude is confidently wrong ~20% of the time. Instructions get ignored in long sessions. Treat it as a smart unreliable junior, not a co-founder.

Prompting vs. system design?

Stage 1: 'how do I write a better prompt.' Stage 2: 'how do I build a system around the model' — externalized memory, MCP tools, scoped phases, verification loops. Power users live in Stage 2.

Case Study · AI Partnership

Working with Claude

A real case study in AI partnership across enterprise SQL, pro se litigation, threat intelligence, and content — with the unsexy parts left in.

By Scott Willis · April 10, 2026 · ~12 min read · View as slide deck →

Claude AIAI WorkflowIteration LoopPro Se LitigationThreat IntelSystem Design

Most "working with AI" articles fall into one of two camps: hype that pretends Claude is your co-founder, or skepticism that pretends nothing has changed. Neither matches what I actually do every day. This is the honest version — what's worked, what hasn't, and why the leap isn't about better prompting at all.

I'm an enterprise distribution software specialist by day, a pro se litigant by necessity, and a builder by habit. Claude has been a daily partner across all of it. Here's what that actually looks like.

The Iteration Loop

People keep asking me for my "prompt template." There isn't one. The thing that matters is the loop, not the wording. Here it is, in full:

Define the outcome → Claude drafts → Test against reality → Specific feedback → Ship or repeat.

The critical insight: most people stop at step 2. They get a draft and try to use it. The real value is in steps 3 and 4 — testing against your actual environment and giving feedback precise enough that the next iteration converges instead of wandering. That's it. That's the engine. Below is what each step actually means in production.

1. Define the outcome (before you type a single word)

This is the single most important step and the one people skip. You must know exactly what "done" looks like in production reality. Not "a good query." Not "a legal-looking brief." Done means: copy-paste into the live environment with zero errors, or filed with zero procedural defects, or a function that logs real probes and blocks nothing legitimate.

Write the definition in plain English first: "After this loop, I will have a single script that replaces the legacy ones, runs in under two seconds on a 10k-row dataset, and requires zero user training." A model is a world-class pattern-matcher. If the target pattern is fuzzy, the output is fuzzy. Clarity here is the entire game.

2. Claude drafts

Feed the crystal-clear outcome plus all the real data — never summaries. Paste the actual query text, the actual docket entries, the actual log lines, the actual analytics exports. State your hard constraints up front. One-shot draft. No hand-holding yet. Let it go full creative. You'll fix it in the next steps.

3. Test against reality (this is where 95% of AI users fail)

You become the merciless QA department. Run the code in a real dev environment. File the draft in test mode. Deploy the function and hit it with real traffic. Check the indexing tools for the new content.

Document every failure with surgical precision. Do not say "it's wrong." Say: "Line 47 throws an invalid handle on the temp buffer because the query uses an alias that only exists in the older code path."

4. Specific feedback (the convergence engine)

This is the art. Your feedback must be so precise that the next draft cannot possibly make the same mistake.

Bad: "Make it better."
Good: "The script fails on records where the invoice number contains a hyphen because you assumed all invoice numbers are numeric. Add a string check and handle that case explicitly."
Even better: "Reproduce the exact error I just saw, then fix it, then show me the before/after diff so I can verify."

You are training the model on your exact domain reality in real time. That training only works if your descriptions are reproducible.

5. Ship or repeat

Two choices only. Ship = it passes your production test with zero caveats. Repeat = go straight back to step 3 with the new failure data. No "maybe one more prompt." The loop is sacred.

You keep repeating until the output is production-ready every single time. That's why the audit work delivered real measurable gains, why the filings forced substantive responses from experienced opposing counsel, and why the honeypot logs hostile probes while serving clean content to humans.

Why This Loop Crushes Every Other AI Workflow

It forces you to stay the domain expert. You define done. You test reality. The model can't hide behind vagueness because you removed it.
It turns hallucinations into fuel. Every failure becomes training data for the next iteration.
It compounds across domains. The more loops you run on legal filings, the faster honeypot analysis converges, because your feedback style is now elite.
It's model-agnostic. Swap Claude for Grok, Gemini, GPT — the loop still works because the human is the constant.

Pro Tips to Make Your Loop Tighter

Keep a "war room" context file. Paste it at the start of every new session: definition of done, previous failures, system constraints. Externalized memory beats hoping the model remembers.
Build a personal pattern library. After every successful ship, save the final output plus the exact feedback that got it there. That library is more valuable than any prompt template you'll find online.
Time-box every loop to 15–20 minutes. If it isn't converging, the outcome wasn't defined clearly enough. Go back to step 1. Don't keep prompting.

That's the entire methodology. It's boring, unsexy, and brutally effective — exactly like real engineering. Everything else in this article is what happens when you run this loop, hard, against real problems.

Professional: A 27% Performance Gain Across Hundreds of Database Views

I work in enterprise distribution software that runs on a relational database. Custom views accumulate over years. Reports get slow. The interesting question is never "is this view slow" — it's "is the join order wrong, is there a non-sargable predicate, is the index even being used."

Working with Claude on the audit, I batched the changes into discrete change sets and treated each one as a unit of work with a baseline, a hypothesis, and a rollback. Two patterns came up over and over:

Date math wrapped around an indexed column in a WHERE clause kills sargability. Rewriting it so the column appears bare on one side makes the index usable again.
Scalar functions in SELECT lists silently force row-by-row execution. Inlining them is ugly but fast.

300+

views audited

~27%

aggregate gain

change sets

99%+

match rate

None of those patterns are clever. The leverage was running the loop fast enough that we could touch dozens of change sets in the time a single audit normally takes. Claude wasn't doing the optimization — I was, with Claude as the rubber duck that could also write the rewrite. If you want the reverse-direction story — pulling data out of complex systems for real reporting — that's exactly what we build at the SiegeStack ETL Showcase.

The other professional win was a scan-to-ship workflow for a high-volume warehouse: a fragile suite of nine spreadsheet macros, replaced by a single script that watches a folder, queries the inventory database, and matches scans at 99%+. Co-authored end-to-end with Claude, deployed in production, no humans in the loop.

Legal: Pro Se Litigation

This is where the "Claude as smart junior" framing matters most.

An opposing attorney filed a false address with the court using an address I hadn't lived at in over a decade — 103 days after their own team personally served me at my actual location. A default judgment based on that filing stripped me of custody for over 490 days without my knowledge. When I found out, I filed pro se.

The defense moved to dismiss under attorney immunity. Their lawyer had been practicing for thirty years. I had Claude.

Claude helped me do three things that mattered:

Surfaced controlling precedent. The line of cases that define when conduct falls outside attorney immunity. I read every one of them in full before citing any of them.
Framed the dispositive fact. The 103-day timeline between personal service at my actual address and the false filing is the entire case. Claude didn't find that — I did — but Claude helped me articulate why that fact alone defeats the immunity defense as a matter of law.
Drafted the structural skeleton. Causes of action, statutory citations, prayer for relief. I rewrote substantially every paragraph, but starting from a reasonable structure saved me weeks.

The motion to dismiss was denied. The case is set for trial. There's an active criminal investigation and complaints have been filed with the relevant professional licensing bodies.

The verification rule: Claude fabricates citations. Not often, but enough that you must independently verify every case, every statute, every quote against the actual source before it goes in a filing. If you can't be bothered to do that, don't use AI for legal work. The cost of a hallucinated citation in front of a judge is your credibility — and credibility is the only currency a pro se litigant has.

Personal Project: A Honeypot at safesapcrtx.org

I wanted to understand how attackers actually behave by watching them — not by reading about it. So I built safesapcrtx.org: a static site that looks like a vulnerable WordPress install, deployed on Netlify's free tier, with edge functions acting as a serverless firewall and Supabase logging every probe.

Total monthly cost: $0. Daily intake: 80+ probes.

The probe paths are exactly what you'd expect: /wp-login.php, /xmlrpc.php, /admin/.env, /shell.php, /backup.sql, /phpmyadmin/. Each one tells you something about the attacker's playbook — credential stuffing, environment variable hunting, webshell discovery, database dump fishing. The interesting work isn't catching probes — it's clustering them into actor groups by behavior.

Claude helped me build the edge function classifier, the Supabase schema, and the dashboard. Two weekends of work, end to end. The honeypot is live and writing data right now. The repo is github.com/iamnotcheckingit-cyber/-safe-sapcr-texas.

Zero Ad Spend: The Strangest Validation

I have never paid for an ad. Not for SiegeStack, not for the honeypot site, not for anything. But people I have never met — strangers, completely unknown to me — independently paid for advertising campaigns promoting safesapcrtx.org. They thought the content mattered enough to spend their own money on it.

That is the strangest validation I've ever received as a builder. It also taught me something concrete: the content has to be good enough to make someone else want to amplify it. That bar is much higher than "good enough to ship," and it's the only bar that matters for organic growth.

The Honest Limitations

Everything above is real. So is everything below. Anyone selling Claude as a magic co-founder is lying.

Compaction is not lossless

When the context window fills, Claude summarizes the older parts of the conversation. That summary drops nuance and edge cases. You will lose things. Plan for it: checkpoint your decisions in external files, restart sessions deliberately, don't rely on Claude to remember what happened ten thousand tokens ago.

Confidently wrong, roughly 20% of the time

Claude will produce answers that sound right and aren't. Citations get fabricated. Working code gets rewritten unnecessarily. The human is the verification layer — always. If you don't have the domain expertise to catch the wrong answers, you cannot use Claude safely for that domain.

Instructions get ignored in long sessions

"Don't change X" gets violated. "Use the function we wrote earlier" gets ignored. Re-state critical constraints in every meaningful turn. Treat the assistant as memoryless even when it isn't.

Not actually a stateful partner

Default chat has no cross-session memory. What feels like continuity is you re-establishing context every time. That's a workflow you build, not a feature the model gives you.

The honest framing: Claude is a smart but unreliable junior, not a co-founder. The leverage comes from how you wrap it, not from the model alone.

From Prompting to System Design

This is the actual leap, and it's the part most articles miss. Stage 1 thinking is "how do I write a better prompt." Stage 2 thinking is "how do I build a system around the model." Power users live in Stage 2.

Don't:

Write giant kitchen-sink prompts
Run long messy chat threads until compaction breaks them
Trust the model to remember what happened last week
Skip verification because the answer "sounds right"

Do:

Break work into discrete phases with checkpoints
Externalize memory: docs, files, system prompts, MCP tools
Restart sessions with clean, curated context
Use real tools (file I/O, APIs, databases) — not just chat
Verify every claim against reality before shipping

The real formula: Claude + external memory + structured workflows = leverage. The honeypot, the SQL audit, the legal filings — none of those were "I asked Claude a question." Each one was a system: scoped tasks, real artifacts, tight feedback loops, verification at every step.

What I'd Tell Someone Starting Today

You're the expert. Claude amplifies domain knowledge. If you don't have the knowledge, you get confident-sounding garbage. The human is the quality gate.

Specificity compounds. One precise correction saves five rounds of guessing. Invest the time to describe exactly what's wrong.

Ship real things. The gap between "playing with AI" and "using AI" is whether something goes into production. File the motion. Deploy the script. Push the site live.

Side projects matter. My honeypot made me better at log analysis at work. My legal filings made me a better writer. Everything cross-pollinates.

AI levels the playing field. A pro se litigant produced filings that survived a motion to dismiss from a 30-year attorney. That wasn't possible five years ago. It is now. Use it.

Working with Claude

The Iteration Loop

1. Define the outcome (before you type a single word)

2. Claude drafts

3. Test against reality (this is where 95% of AI users fail)

4. Specific feedback (the convergence engine)

5. Ship or repeat

Why This Loop Crushes Every Other AI Workflow

Pro Tips to Make Your Loop Tighter

Professional: A 27% Performance Gain Across Hundreds of Database Views

Legal: Pro Se Litigation

Personal Project: A Honeypot at safesapcrtx.org

Zero Ad Spend: The Strangest Validation

The Honest Limitations

Compaction is not lossless

Confidently wrong, roughly 20% of the time

Instructions get ignored in long sessions

Not actually a stateful partner

From Prompting to System Design

What I'd Tell Someone Starting Today

Related