Dex
7 min readBy Dean Craftsman

The three questions that separate agentic IT from chatbot copilots

Every vendor will call their product agentic at your next demo. Three questions tell you whether it actually investigates, executes, and acts under policy.

Every vendor at your next IT demo will call their product "agentic." The word has gone from technical term to marketing default in about a year, and it now describes everything from a true autonomous engineer to a chatbot with a slightly better system prompt. That's a problem for the buyer, because the gap between those two things is the gap between a tool that does the work and a tool that asks your team to.

This post hands you a filter. Three questions you can ask in the room, before the demo ends, that tell you which category a product actually belongs to — regardless of what the slide deck says. Each one isolates a capability that's expensive to fake and easy to verify. Ask all three and the polish falls away.

Why "agentic" stopped meaning anything

The word "agentic" used to draw a real line: software that investigates, plans, and executes on its own, versus software that responds when spoken to. That line still exists. The marketing around it doesn't anymore.

The reason is simple. "Agentic" tests well with buyers, so every product wants the label, and most of the work to earn it happens in places a demo doesn't show — the permission model, the execution layer, the policy engine. A chatbot that surfaces a knowledge-base article and a system that resets an account against Entra ID under a delegated permission can look identical for the ninety seconds they're on screen. The difference only shows up when something has to actually change in your environment.

So stop evaluating the label. Evaluate the three behaviors underneath it. For a fuller treatment of the category itself, see our working definition of agentic IT — this post is the field test.

A three-row grid comparing a chatbot copilot against an agentic IT engineer across three questions: whether it investigates, whether it executes, and whether every action requires an explicit policy.

Question one: Does it investigate, or just respond?

A copilot responds to what's in front of it. An agentic IT engineer investigates what's actually happening.

Watch how the product handles an ambiguous request. A user says "I can't get into SharePoint." A copilot takes that string and answers it — it returns a help article on SharePoint access, or asks the user to clarify, or drafts a ticket. It is reasoning about the sentence. An autonomous IT engineer treats the sentence as a symptom and goes looking for the cause: it checks the user's group memberships, the site's permission inheritance, whether a Conditional Access policy is blocking the session, whether the license that grants access lapsed last night. It gathers context from the environment before it decides anything.

The real-world consequence of getting this wrong is the escalation that didn't need to happen. A tool that only responds will confidently hand back an answer to the wrong problem — and the ticket bounces to a human anyway, now with a layer of false confidence on top. Investigation is what lets a system resolve the cases that don't match a script, which is most of the interesting ones. In the room, ask: show me what it does with a vague request that doesn't name the actual cause. If it answers the words instead of the problem, it responds. It doesn't investigate.

Question two: Does it execute, or just suggest?

This is the question that separates the categories most cleanly, because it's the one a demo is most carefully staged to blur.

A copilot produces output: a suggested fix, a drafted reply, a recommended next step. Then a human reads it, decides, and performs the action. That human is still doing the work — the copilot just made the keystrokes faster. An agentic IT engineer performs the action itself. It resets the account, assigns the license, fixes the group membership, reconfigures the policy — against the real backend system, end to end, and closes the request. Nobody picks up where it left off.

The tell is what happens after the tool "finishes." Ask directly: when this is done, does the change exist in my tenant, or does someone on my team still have to go make it? If a human still has to act, you bought a faster way to suggest work, not a way to do it. The consequence of getting this wrong is the one buyers feel six months in: the tool deflected nothing. Containment metrics look great, the queue is unchanged, because every "resolution" was really a recommendation that a human still had to execute. Suggesting is cheap. Executing — safely, against production, with the right permissions — is the hard part, and it's the part that actually removes the ticket. Dex does this across L1 through L3: routine resets and access work, and the deeper Tier 2 and Tier 3 troubleshooting and configuration that used to require a senior engineer.

Question three: Does every action require an explicit policy?

The first two questions establish whether a product is autonomous. This one establishes whether its autonomy is safe to point at your environment — and it's the question most "agentic" demos can't survive.

Autonomy without enforcement is a liability, not a feature. A system that can execute real changes needs a hard answer to "what stops it from doing the wrong thing?" There are two kinds of answer, and they are not close to equivalent. The weak answer is instructions — the system is told, in its prompt, to stay within certain bounds. That's a guardrail made of language, and language can be argued with: an ambiguous request or a prompt-injection attack can talk a model out of its own instructions. The strong answer is enforcement at the execution layer — every action must match an explicit, structured policy, checked in code below the model, where no prompt can reach it. No matching policy, no action. Dex's policy engine works this way: a six-layer model from global down to runtime, enforced deterministically, so that prompt injection can't bypass it because the safety doesn't live in the prompt.

The consequence of getting this wrong is the breach you can't audit. A tool governed by instructions might grant an admin role it shouldn't, or act on the wrong user's account, and you'd find out from the logs after the fact — if it logged at all. Ask the vendor: is the guardrail a prompt instruction or a code-level policy, and can you show me the audit trail for an action it refused to take? The answer tells you whether you're looking at governed autonomy or a confident model with good intentions. For how Dex enforces this and what the audit trail looks like, see our security model.

How the three questions work together

The questions are sequential, and they filter in order. A product that fails question one — that responds instead of investigates — never reaches the interesting cases, so the other two barely matter. A product that passes one but fails two investigates well and then hands you a recommendation, which is a smarter copilot, not an engineer. A product that passes one and two but fails three is the dangerous one: genuinely autonomous, genuinely unenforced, and a real risk against a production tenant.

Only a system that answers yes, yes, and code-level policy across all three is doing the thing the word "agentic" is supposed to mean. That's the bar. It's also, not coincidentally, the definition of the category: software that investigates, plans, and executes IT work itself, under explicit policy, with a full audit trail. Dex is built to that definition — not as the point of these questions, but as proof the bar is reachable.

What to do with this

Walk into the next demo with the three questions written down, and ask them in order. Make the vendor show, not tell — a vague request handled live, a change that actually lands in a test tenant, an action refused because no policy permitted it. The products that belong to the category will welcome the test, because passing it is the whole point. The ones wearing the label will get vague exactly where it counts.

The future of IT isn't copilots that make your team type faster. It's engineers that do the work and stay inside the lines you drew for them. Three questions are enough to tell which one is sitting across the table.

Frequently asked

What's the difference between agentic IT and an AI copilot?
A copilot assists a human who is still doing the work — it drafts a response, suggests a fix, or summarizes a ticket, and a person executes. Agentic IT does the work itself: it investigates the root cause, plans the sequence of actions, and executes real operations in the backend system under an explicit policy, with a full audit trail. The dividing line is whether a human still has to act after the tool finishes. If yes, it's a copilot.
What questions should I ask a vendor who claims their product is 'agentic'?
Three cut through the marketing. First: does it investigate the actual environment, or does it respond to what's typed in the chat? Second: does it execute the change against the real backend system, or does it suggest a fix for a human to apply? Third: is every action it takes bound to an explicit, code-level policy, or does it rely on prompt instructions to stay in bounds? A true agentic IT engineer answers yes, yes, and policy on all three.
Is Microsoft Copilot agentic IT?
Microsoft Copilot is a copilot by design and by name — it assists a person who remains in the loop and performs the action. It can draft, summarize, and suggest inside the Microsoft 365 surface. It does not investigate root cause across the tenant, execute IT operations end-to-end under a delegated-permission policy, or close requests before they become tickets. Those are the jobs of an autonomous IT engineer, which is a different category.
Why does a policy engine matter more than a smarter model?
Because autonomy without enforcement is a liability. A more capable model that relies on prompt instructions to stay in bounds can be talked out of those bounds by prompt injection or an ambiguous request. A code-level policy engine enforces every action at the execution layer, where instructions can't reach it: no matching policy means no action. That's what makes autonomy safe enough to deploy against a production tenant — the guarantee lives below the model, not inside it.
Does Dex only handle L1 password resets?
No. Dex autonomously resolves L1 through L3 — routine Tier 1 work like password resets, MFA recovery, and access provisioning, and also the deeper Tier 2 and Tier 3 troubleshooting, configuration, and engineering-adjacent tasks that used to require a senior technician. Only genuine architectural or judgment cases escalate to a human, and they escalate with full context attached.