Case Study  ·  AI Partnership

Working with Claude

A real case study in AI partnership across enterprise SQL, pro se litigation, threat intelligence, and content — with the unsexy parts left in.

By Scott Willis · April 10, 2026 · ~12 min read · View as slide deck →
Claude AIAI WorkflowIteration LoopPro Se LitigationThreat IntelSystem Design

Most "working with AI" articles fall into one of two camps: hype that pretends Claude is your co-founder, or skepticism that pretends nothing has changed. Neither matches what I actually do every day. This is the honest version — what's worked, what hasn't, and why the leap isn't about better prompting at all.

I'm an enterprise distribution software specialist by day, a pro se litigant by necessity, and a builder by habit. Claude has been a daily partner across all of it. Here's what that actually looks like.

The Iteration Loop

People keep asking me for my "prompt template." There isn't one. The thing that matters is the loop, not the wording. Here it is, in full:

Define the outcome → Claude drafts → Test against reality → Specific feedback → Ship or repeat.

The critical insight: most people stop at step 2. They get a draft and try to use it. The real value is in steps 3 and 4 — testing against your actual environment and giving feedback precise enough that the next iteration converges instead of wandering. That's it. That's the engine. Below is what each step actually means in production.

1. Define the outcome (before you type a single word)

This is the single most important step and the one people skip. You must know exactly what "done" looks like in production reality. Not "a good query." Not "a legal-looking brief." Done means: copy-paste into the live environment with zero errors, or filed with zero procedural defects, or a function that logs real probes and blocks nothing legitimate.

Write the definition in plain English first: "After this loop, I will have a single script that replaces the legacy ones, runs in under two seconds on a 10k-row dataset, and requires zero user training." A model is a world-class pattern-matcher. If the target pattern is fuzzy, the output is fuzzy. Clarity here is the entire game.

2. Claude drafts

Feed the crystal-clear outcome plus all the real data — never summaries. Paste the actual query text, the actual docket entries, the actual log lines, the actual analytics exports. State your hard constraints up front. One-shot draft. No hand-holding yet. Let it go full creative. You'll fix it in the next steps.

3. Test against reality (this is where 95% of AI users fail)

You become the merciless QA department. Run the code in a real dev environment. File the draft in test mode. Deploy the function and hit it with real traffic. Check the indexing tools for the new content.

Document every failure with surgical precision. Do not say "it's wrong." Say: "Line 47 throws an invalid handle on the temp buffer because the query uses an alias that only exists in the older code path."

4. Specific feedback (the convergence engine)

This is the art. Your feedback must be so precise that the next draft cannot possibly make the same mistake.

You are training the model on your exact domain reality in real time. That training only works if your descriptions are reproducible.

5. Ship or repeat

Two choices only. Ship = it passes your production test with zero caveats. Repeat = go straight back to step 3 with the new failure data. No "maybe one more prompt." The loop is sacred.

You keep repeating until the output is production-ready every single time. That's why the audit work delivered real measurable gains, why the filings forced substantive responses from experienced opposing counsel, and why the honeypot logs hostile probes while serving clean content to humans.

Why This Loop Crushes Every Other AI Workflow

Pro Tips to Make Your Loop Tighter

That's the entire methodology. It's boring, unsexy, and brutally effective — exactly like real engineering. Everything else in this article is what happens when you run this loop, hard, against real problems.

Professional: A 27% Performance Gain Across Hundreds of Database Views

I work in enterprise distribution software that runs on a relational database. Custom views accumulate over years. Reports get slow. The interesting question is never "is this view slow" — it's "is the join order wrong, is there a non-sargable predicate, is the index even being used."

Working with Claude on the audit, I batched the changes into discrete change sets and treated each one as a unit of work with a baseline, a hypothesis, and a rollback. Two patterns came up over and over:

300+
views audited
~27%
aggregate gain
70
change sets
99%+
match rate

None of those patterns are clever. The leverage was running the loop fast enough that we could touch dozens of change sets in the time a single audit normally takes. Claude wasn't doing the optimization — I was, with Claude as the rubber duck that could also write the rewrite. If you want the reverse-direction story — pulling data out of complex systems for real reporting — that's exactly what we build at the SiegeStack ETL Showcase.

The other professional win was a scan-to-ship workflow for a high-volume warehouse: a fragile suite of nine spreadsheet macros, replaced by a single script that watches a folder, queries the inventory database, and matches scans at 99%+. Co-authored end-to-end with Claude, deployed in production, no humans in the loop.

Legal: Pro Se Litigation

This is where the "Claude as smart junior" framing matters most.

An opposing attorney filed a false address with the court using an address I hadn't lived at in over a decade — 103 days after their own team personally served me at my actual location. A default judgment based on that filing stripped me of custody for over 490 days without my knowledge. When I found out, I filed pro se.

The defense moved to dismiss under attorney immunity. Their lawyer had been practicing for thirty years. I had Claude.

Claude helped me do three things that mattered:

The motion to dismiss was denied. The case is set for trial. There's an active criminal investigation and complaints have been filed with the relevant professional licensing bodies.

The verification rule: Claude fabricates citations. Not often, but enough that you must independently verify every case, every statute, every quote against the actual source before it goes in a filing. If you can't be bothered to do that, don't use AI for legal work. The cost of a hallucinated citation in front of a judge is your credibility — and credibility is the only currency a pro se litigant has.

Personal Project: A Honeypot at safesapcrtx.org

I wanted to understand how attackers actually behave by watching them — not by reading about it. So I built safesapcrtx.org: a static site that looks like a vulnerable WordPress install, deployed on Netlify's free tier, with edge functions acting as a serverless firewall and Supabase logging every probe.

Total monthly cost: $0. Daily intake: 80+ probes.

The probe paths are exactly what you'd expect: /wp-login.php, /xmlrpc.php, /admin/.env, /shell.php, /backup.sql, /phpmyadmin/. Each one tells you something about the attacker's playbook — credential stuffing, environment variable hunting, webshell discovery, database dump fishing. The interesting work isn't catching probes — it's clustering them into actor groups by behavior.

Claude helped me build the edge function classifier, the Supabase schema, and the dashboard. Two weekends of work, end to end. The honeypot is live and writing data right now. The repo is github.com/iamnotcheckingit-cyber/-safe-sapcr-texas.

Zero Ad Spend: The Strangest Validation

I have never paid for an ad. Not for SiegeStack, not for the honeypot site, not for anything. But people I have never met — strangers, completely unknown to me — independently paid for advertising campaigns promoting safesapcrtx.org. They thought the content mattered enough to spend their own money on it.

That is the strangest validation I've ever received as a builder. It also taught me something concrete: the content has to be good enough to make someone else want to amplify it. That bar is much higher than "good enough to ship," and it's the only bar that matters for organic growth.

The Honest Limitations

Everything above is real. So is everything below. Anyone selling Claude as a magic co-founder is lying.

Compaction is not lossless

When the context window fills, Claude summarizes the older parts of the conversation. That summary drops nuance and edge cases. You will lose things. Plan for it: checkpoint your decisions in external files, restart sessions deliberately, don't rely on Claude to remember what happened ten thousand tokens ago.

Confidently wrong, roughly 20% of the time

Claude will produce answers that sound right and aren't. Citations get fabricated. Working code gets rewritten unnecessarily. The human is the verification layer — always. If you don't have the domain expertise to catch the wrong answers, you cannot use Claude safely for that domain.

Instructions get ignored in long sessions

"Don't change X" gets violated. "Use the function we wrote earlier" gets ignored. Re-state critical constraints in every meaningful turn. Treat the assistant as memoryless even when it isn't.

Not actually a stateful partner

Default chat has no cross-session memory. What feels like continuity is you re-establishing context every time. That's a workflow you build, not a feature the model gives you.

The honest framing: Claude is a smart but unreliable junior, not a co-founder. The leverage comes from how you wrap it, not from the model alone.

From Prompting to System Design

This is the actual leap, and it's the part most articles miss. Stage 1 thinking is "how do I write a better prompt." Stage 2 thinking is "how do I build a system around the model." Power users live in Stage 2.

Don't:

Do:

The real formula: Claude + external memory + structured workflows = leverage. The honeypot, the SQL audit, the legal filings — none of those were "I asked Claude a question." Each one was a system: scoped tasks, real artifacts, tight feedback loops, verification at every step.

What I'd Tell Someone Starting Today

You're the expert. Claude amplifies domain knowledge. If you don't have the knowledge, you get confident-sounding garbage. The human is the quality gate.

Specificity compounds. One precise correction saves five rounds of guessing. Invest the time to describe exactly what's wrong.

Ship real things. The gap between "playing with AI" and "using AI" is whether something goes into production. File the motion. Deploy the script. Push the site live.

Side projects matter. My honeypot made me better at log analysis at work. My legal filings made me a better writer. Everything cross-pollinates.

AI levels the playing field. A pro se litigant produced filings that survived a motion to dismiss from a 30-year attorney. That wasn't possible five years ago. It is now. Use it.