playitsmart.nl

Back to home

12 May 2026 · 4 min read

Post #1

Claude tests 35 times until I get sick of it

The real story of working with AI isn't the smart answers. It's the obsessive checks.

This morning I was staring at my revenue data. Eighty-six NULLs across thirteen thousand seven hundred fifty records. Half a percent. Not much, but enough to make me wonder: which companies have no revenue according to my data?

First check showed weird names. ExxonMobil. Kroger. Ecolab. Universal Music Group. All multinationals with obvious substantial revenue.

My first reaction: bug. Something's wrong in my ingest. Time to review the code.

But before I did, Claude did what Claude always does.

Claude has a SQL query ready before I can think

"Erik, before we review the code, let's check if those NULLs are randomly distributed or follow a pattern. Paste this query into Supabase."

SELECT EXTRACT(YEAR FROM fiscal_date) as year,
       EXTRACT(QUARTER FROM fiscal_date) as quarter,
       COUNT(*) as null_count
FROM fundamentals
WHERE revenue IS NULL
GROUP BY 1, 2
ORDER BY null_count DESC;

I copy. I paste. I run. I get a table back. One row stands out: 45 NULLs in Q4 2019. The rest spread across many quarters, one or two each.

I send it to Claude. Answer comes back immediately: "Not a bug. This is the COVID-corruption pattern from early 2020. Many financial data providers never properly processed Q4 2019 numbers because of the March chaos. Which 45 tickers exactly?"

Next query. Paste, run, output back.

"A mix of sectors. Classic random pattern. Confirms my hypothesis. For your use case it doesn't matter: Value uses recent quarters, Quality uses 5-year averages, any 2019 backtest forward-filled. No fix needed."

Three minutes. Done.

How I would have worked before

Before, and by "before" I mean before this project, I could have done a few things with those eighty-six NULLs. Ignore them. Or spend a day digging into what's going on. Or, and this is what most people do, manually check three tickers, decide it's good enough, and move on.

In none of those scenarios would I have known within three minutes that it wasn't a bug, that it was a specific historical pattern, and that it didn't affect my system.

Because my brain doesn't work that fast.

The difference isn't in the tools

Everyone thinks: AI is smart, so things move faster. True. But the real difference isn't in thinking speed. It's in the discipline of checking.

Claude has a SQL query ready before I can think. Not because Claude is smarter than me. Because Claude has no problem writing it. No coffee needed, no "let me gather my thoughts", no annoyance that twenty minutes earlier he had to write a SQL too.

And that's exactly the discipline I didn't have. In my previous working life I built things without this level of inspection. Something worked. Done. Next. The idea that you could test every assumption instead of trust it, that just wasn't something you did. Not because it was clever, but because it was so much work that nobody did it.

Now it's no work anymore. It takes three minutes.

Tests 35 times until you get sick of it

My way of working with Claude is literally documented in instruction files in my project. One of the rules: whenever in doubt, write a check first. A test first. Show the data first.

And Claude follows it. Sometimes too well.

Sometimes I just want to move on. "It works now, keep going." And then Claude says: "Before we move on, let's check whether the input is actually an ints array and not a floats array. Paste this 5-line test." And there I sit, sighing. Yes Claude, I know, ints. Probably. Maybe.

Three times a day I get a little sick of it.

But every single time, every time, that one test or that one check finds something I'd overlooked. A weird null. A type mismatch. An edge case with Dutch stocks that's different from American ones. Something that in another life would have been a week of Friday-afternoon debugging.

What makes this different from developing the old way

Back when I had Kadenza, a data consultancy, I saw plenty of projects where things went into production that nobody had tested. It worked. On one machine. For one customer. At one moment. And then it broke at the first edge case and there was an email with the subject line "URGENT".

With AI that doesn't change automatically. AI can also just spit out code that ends up in production untested. The difference is in how you instruct the AI. In the rules you document. In the discipline you enforce.

Claude tests at lightning speed. I analyze slowly. Together we hit a tempo no senior developer can match alone. Not because I'm so smart. Because the cost of checking has changed. And that changes what you can afford to test.

That's the real story of working with AI. Not the smart answers. The obsessive checks.

Follow weekly?