Strategy5 min read

5 questions to ask your AI vendor before signing

You have 3 AI vendors on your shortlist. 5 questions, 2 minutes, to find out which one actually delivers expertise.

Share

You have 3 AI vendors on your shortlist, all of them talk about transformation and cite client cases you can't verify. You no longer know who's telling the truth.

Here are 5 questions, asked in order, that take less than 2 minutes total to validate the expertise level of your AI vendor:

This is the qualification framework that the best American AI teams have been using since April 2026. It comes from the writings of Garry Tan, CEO of Y Combinator. The vocabulary is still in English, and that's a good thing. When a consultant uses these terms correctly, they signal that they read doctrine at the source.

Question 1: where do your skill files live, and can we see them versioned?

A skill file is a markdown document that teaches an AI how to do something, not what to do. The what comes from context. The how lives in the file.

The test reveals the asset you're buying. If your vendor shows you a git repository with structured, dated, versioned markdown files, you're buying a durable asset you can migrate tomorrow morning if you change tools. If they answer that "everything is in the platform" or "in the team's shared Notion account", you're paying for something that will disappear the day they leave.

At Albus Factory, I have 95 skill files. One per competency. All self-contained in a repository. If I change my tech stack tomorrow, the skill files remain valid. That's the definition of an investment that lasts.

Question 2: how many lines of code is your harness?

The harness is the program that runs the LLM. 4 functions only. Running the model in a loop. Reading and writing files. Managing context. Enforcing security rules. That's it. Everything else lives in skill files.

Tan targets 200 lines for a clean harness. If your vendor's answer is in the thousands, you're in the red zone. That means they've stacked dozens of tools, multiple connections, god-tools that do everything. Result documented by Tan: an operation that should take 100 milliseconds takes 15 seconds. 75 times slower. 3 times more tokens. 3 times more chance of failure.

If your vendor talks about sophisticated orchestration, advanced middleware layers, a proprietary platform they built for you — you're not paying for AI that works. You're paying for technical friction that will need to be dismantled later.

Question 3: what document describes the routing between your agents?

When a request comes into the system, who handles it? Which documents are consulted? According to what rules? The answer to these three questions fits in a document called a resolver. It's the equivalent of your AI company's org chart.

Without a resolver, you don't have an AI system — you have a pile of tools sitting next to each other and the vague hope that they'll coordinate.

The concrete test: ask your vendor for the document that describes who does what. If the answer is confusing, or if you're shown a Notion that lists tools without specifying trigger rules, you're in a risk zone.

Question 4: where do you draw the line between latent work and deterministic work?

There are two types of work an AI system can do. Latent work requires judgment. Deterministic work requires precision.

Reading an email and understanding the sender's intent is latent. Counting the emails received today is deterministic. Synthesizing a conversation to extract a decision is latent. Calculating the total invoiced this quarter is deterministic.

The trap is asking an LLM to do deterministic work. It can do it. It will often give you a plausible answer. But it will be wrong from time to time, and you won't know when. Counting, sorting, calculating — these tasks must be done by code, not by a model.

When you're shown an agent that produces a financial report, ask where the boundary lies. Which part is calculated by deterministic code. Which part is synthesized by the LLM. If the answer is vague, if you're told "the LLM handles everything", the report will be correct 80% of the time. Enough for a demo, dangerous for a decision.

At Albus, the rule is non-negotiable. Judgment (who to contact, what angle, what priority) goes through the LLM. Email sending, counting, scheduling are 100% deterministic. We never ask the model to count or sort.

Question 5: what do you call diarization in what you deliver?

Diarization is the production of structured profiles from masses of unstructured documents. The system reads everything on a subject, retains contradictions, identifies what has changed, and produces a synthetic judgment page. Not a summary — an analyst brief.

When you're shown an assistant that synthesizes client conversations or internal documents, ask whether it's synthesis or diarization. Synthesis tells you what was said. Diarization tells you what should worry you, what has changed since last time, what contradicts what the sales rep said two weeks ago. The first costs a lot for very little. The second changes how you make decisions.

If your vendor doesn't know how to answer this question, you know they're delivering summaries and calling it intelligence.

The 2-minute test

There's your framework. 5 questions, in order, to ask at your next meeting with an AI vendor.

Where do your skill files live and can we see them versioned. How many lines of code is your harness. What document describes the routing between your agents. How do you distinguish latent work from deterministic work in an engagement. And what do you call diarization in what you deliver.

This architectural doctrine is recent and very few people in France — or anywhere in Europe — have taken the time to learn it yet.

Share