Lawyer in the Loop

The 60–90 Day Law-Firm AI Pilot Program: Embed AI Literacy, Governance, and Lawyer-in-the-Loop Controls (Without Killing Momentum)

This is a practical, time-boxed pilot plan for managing partners, practice group leaders, KM/innovation, ethics/risk, and IT/security who need…

Promise Legal Staff

16 Apr 2026 • 7 min read

This is a practical, time-boxed pilot plan for managing partners, practice group leaders, KM/innovation, ethics/risk, and IT/security who need measurable AI wins without informal tool sprawl. You’ll get a phased rollout, three workflow playbooks (intake, extraction, drafting), mandatory lawyer-in-the-loop (LITL) checkpoints, short training modules that change behavior, and a KPI dashboard to decide whether to scale or roll back. (For a deeper definition of the review model, see lawyer-in-the-loop.)

Program-at-a-glance (60–90 days)

Weeks 0–2 (Design): approved use-case list; tool/model allowlist; draft SOPs.
Weeks 3–6 (Pilot): run 2–3 use cases using intake/extraction/drafting playbooks; capture LITL approval artifacts; start audit logging.
Weeks 7–12 (Scale): standardize SOPs; finish training; set KPI baselines + targets.

Governance principle: “No unlogged AI, no unreviewed AI, no unscoped AI.” For policy depth and audit-ready governance, align to a broader framework like The Complete AI Governance Playbook for 2025.

Week 0–2 — Set the rules of the road (scope, risk tiers, and pilot selection)

Step 1: Name sponsors + an “AI Control Group.” Keep it small and empowered: a partner sponsor (accountability), risk/ethics (guardrails), KM (playbooks/SOPs), IT/security (approved tools, access, logging), and 2–4 pilot associates/paralegals who will test real matters and report failures.

Step 2: Define “safe to pilot.” Start where AI reduces effort without changing legal outcomes: deposition summaries, clause extraction from executed contracts, first-pass internal memos. Out of scope (initially): filings, client advice letters without heightened verification, new factual claims, and highly regulated data.
Step 3: Risk tiers → controls. Tier 1 (internal) = lighter review; Tier 2 (client deliverables) = strict LITL + provenance; Tier 3 (filings/advice) = enhanced verification + senior sign-off (sometimes “no AI”).
Step 4: Pick 2–3 pilots max. Choose high-frequency tasks with clear inputs/outputs, measurable quality, and a contained confidentiality profile.

Scenario: litigation wants AI to draft a motion. Common failure is scope creep + citation hallucinations. Pilot extraction/summarization first, then add a “filing-safe” gate later. For workflow framing, see AI Workflows in Legal Practice: A Practical Transformation Guide; for policy depth, align to The Complete AI Governance Playbook for 2025.

The three playbooks you need (and how to make them usable on day 1)

Your pilot will only scale if lawyers can follow the process under time pressure. The design rule is simple: one page per workflow (intake, extraction, drafting) with embedded checklists — no separate policy binder that nobody opens.

Purpose + “definition of done”: what the workflow produces and what it must include to be considered complete.
Approved tools/models + allowed environments: where the model can run (web UI, private tenant, API) and who can access it.
Data handling rules: what can/can’t be pasted, when to redact, and how to label privileged materials.
Prompting + retrieval approach: reusable prompt templates, required constraints (“use only provided sources”), and examples.
Mandatory LITL checkpoints: who reviews, when, and what artifact gets saved (approval note, redline, source bundle).
Output QA checklist: accuracy, citations/quotes, privilege/confidentiality, tone, and “no new facts” enforcement.
Logging requirements: matter ID, tool/model, sources used, reviewer/approver, and where logs live.

Workflow-first design beats tool-first buying — see Stop Buying Legal AI Tools. Start Designing Workflows That Save Money.

Playbook #1 — Intake: turn messy requests into AI-ready tasks without leaking confidential info

Goal: standardize intake so you don’t feed AI the wrong facts, the wrong documents, or the wrong level of risk. A good intake step prevents off-scope use, missing approvals, and “quick asks” turning into unreviewed client advice.

Intake form (minimum fields): matter ID/client; jurisdiction; confidentiality tier; document types; allowed sources (executed agreements, transcript set, research memo); intended audience; deadline; and required attachments (authoritative docs only — no screenshots, no hearsay summaries).

“AI allowed?” toggle: pre-filled from the risk tier plus client restrictions (and defaults to No if unknown).
Mandatory LITL checkpoint: a partner (or delegated reviewer) approves scope, source set, and whether client consent/disclosure is required before any prompts are run. Save the approval artifact.

Scenario: a client emails a patchwork of facts asking for “a quick assessment.” Risk: AI amplifies unverified facts and privilege labels get lost. Do instead: require the intake form + a fact-verification list, and limit AI to drafting an issues list, not advice. See What is Lawyer in the Loop? and align confidentiality controls with The Complete AI Governance Playbook for 2025.

Playbook #2 — Extraction: get reliable structured data (clauses, issues, timelines) with source citations

Goal: produce consistent, auditable extraction outputs that can always be traced back to an authoritative document — so reviewers aren’t “trusting the model,” they’re checking evidence.

Extraction spec (template): define the document set (exact filenames/IDs + executed vs draft), the fields to extract (e.g., termination for convenience, notice period, venue), acceptable confidence thresholds, and a hard requirement that each field includes a quotation + page/section cite. Missing info must be “unknown” — no guessing, no gap-filling.

Mandatory LITL checkpoints: the associate verifies every extracted field has a supporting quote/cite; for Tier 2 work, a senior reviewer spot-checks a fixed percentage (e.g., 10–20%) and signs off.
Quality controls: enforce an evidence-first citation trail; run a conflict check so parties/dates/defined terms match the executed version, not earlier drafts.

Scenario: AI extracts termination rights from a stack of MSAs, but silently reads a draft. Fix it at the process level: intake requires executed docs, and the extraction spec requires document identifiers + versioning. For audit trail and provenance ideas, see API-First Compliant AI Workflows.

Playbook #3 — Drafting: use AI for first drafts while preserving legal judgment, citations, and client voice

Goal: accelerate first drafts without delegating legal conclusions to a model. Treat AI as a drafting engine constrained by authoritative inputs and enforced review gates.

Drafting SOP (template): limit inputs to authoritative sources (extracted facts with cites, client positions, governing law/research notes). Allow only pilot-safe outputs: internal memo, issue spotter, clause alternatives, email outline. Prohibit: new factual assertions, fabricated citations/quotes, and final filing language in the initial phase.

Pre-draft LITL: reviewer approves the thesis, audience, and the exact source set the model may use.
Post-draft LITL: reviewer confirms (a) every legal claim is supported, (b) every cite/quote is verified, and (c) confidential information handling matches the matter’s tier.
Delivery gate: partner sign-off before any Tier 2 client-facing draft is sent.

Scenario: AI drafts a demand letter with an aggressive tone and overstates the law — escalating the dispute. Fix it with drafting constraints (tone + permitted positions) and keep the partner delivery gate. For adoption framing and proof points, see AI for Law Firms: Practical Workflows, Ethics, and Efficiency Gains and AI in Legal Firms: A Case Study on Efficiency Gains.

Training modules (weeks 1–6): AI literacy that changes behavior, not just awareness

Use 20–30 minute micro-modules with a short assessment and role-specific drills. The goal is not “AI familiarity,” but repeatable habits: evidence-first work, correct data handling, and non-negotiable review gates.

Module 1: how LLMs fail (hallucinations, brittleness, false precision) + the evidence-first habit.
Module 2: confidentiality/privilege + what not to paste, when to redact, and where prompts/outputs may be stored.
Module 3: prompting for legal work (constraints, approved sources, asking for uncertainty).
Module 4: lawyer-in-the-loop in practice (checkpoints, artifacts, sign-offs).
Module 5: citation/quote verification workflow (fast validation steps and common traps).
Module 6: incident reporting + near-miss culture (no blame, rapid fixes).

Role tracks: partners (governance and client communications), associates (playbooks + verification), staff (intake discipline + logging). Run a red-team drill using intentionally flawed output and the review checklist.

Measure: completion %, quiz pass %, and observed LITL compliance on pilot matters (not self-reported usage).

KPI dashboard for safe scaling (weeks 3–12): measure adoption, quality, and auditability together

Start by capturing a baseline (pre-AI cycle time, rework rate, and error types) so “time saved” isn’t a feeling. Then track a small KPI set weekly — adoption and safety must improve together.

Adoption/coverage: % targeted teams piloting; # matters using playbooks; % tasks run through approved tools.
Time/throughput: median time saved per task; cycle time reduction by workflow.
Quality/safety: % outputs flagged at LITL; % requiring substantive rework; citation error rate; confidentiality near-misses; incident count.
Governance/auditability: % AI uses with complete logs (matter ID, tool/model, prompt, sources, approver); mean time to produce an audit trail.
Training/controls: completion + pass rate; LITL bypass rate (target 0); % matters hitting required checkpoints.
Business outcomes: margin impact; client satisfaction on pilot matters; AI-attributable write-offs.

Instrumentation: log automatically where possible (API logs, document IDs, tool/model), and capture the rest manually (approval notes, review artifacts). Hold a 30-minute weekly review with stop/go thresholds.

Example rollback rule: if drafting rework rises above a set threshold (e.g., +20% vs baseline), reduce scope (Tier 2 → Tier 1) until training and prompts are corrected.

Week 7–12 — Scale safely: expand scope, standardize controls, and decide what becomes “business as usual”

Scale only if the controls are working. Minimum criteria before expansion: training completion meets target, LITL bypass = 0, provenance/logging coverage meets target, and incident/near-miss rate is stable (or declining).

Standardize: convert playbooks into KM-owned SOPs; publish approved prompt templates; add client-facing disclosure/consent language where required; formalize tool approval, vendor review, and retention rules.
Operating rhythm: monthly steering meeting, quarterly policy review, and incident post-mortems that result in specific playbook updates (not generic reminders).

Scenario: a second practice group wants in after hearing “time saved.” Risk: they copy prompts without controls. Do instead: duplicate the playbooks, require the same checkpoints/metrics, and run a short shadow pilot before go-live.

Anchor long-term decisions in The Complete AI Governance Playbook for 2025 and expand methodically using AI Workflows in Legal Practice.

Appoint sponsor + control group; pick 2–3 pilots; implement playbooks.
Enforce LITL artifacts + logs; launch training; stand up KPIs + weekly review.
Set day-30 and day-90 scale/rollback rules; request a pilot design workshop or playbook template pack.

Running a 60–90 day AI pilot at your law firm and trying to keep governance, ethics compliance, and lawyer-in-the-loop controls in place without killing the momentum? Promise Legal helps firms design risk-tiered pilot scopes, draft AI acceptable-use playbooks, build supervision artifacts, and set the KPIs that make the difference between a pilot that scales and one that gets quietly shelved.

Talk to Promise Legal