M&A Due Diligence for AI Products: A Technical and Legal Checklist for Acquirers

AI targets require a different diligence playbook than standard tech M&A. Here are the six risk buckets, red flags that should reprice or kill a deal, and how to structure AI-specific reps and warranties.

M&A Due Diligence for AI Products: A Technical and Legal Checklist for Acquirers
Loading AudioNative Player...

Why AI Targets Need a Different Diligence Playbook

When Anthropic agreed to settle claims over pirated books used to train its models, the number attached to the deal was approximately $1.5 billion — roughly $3,000 for each of about 500,000 works swept into its training data. The Bartz v. Anthropic settlement did not arise from a corporate acquisition, but the exposure it illustrates travels directly with any AI company once it changes hands. An acquirer that fails to trace where a target's training data came from is not just underwriting a compliance gap — it is potentially assuming a liability that scales with the size of the dataset and the reach of the model built on it.

⚠️
A single training-data dispute produced a roughly $1.5 billion settlement figure. That liability doesn't disappear at closing — it transfers to whoever owns the model.

Standard technology M&A diligence was not built to catch this. The playbook most buyers still run — IP assignment confirmations, employment and invention-assignment agreements, data privacy policy review, cybersecurity posture checks — assumes that a target's core asset is code, and that code's provenance is settled by employment contracts and clean-room development practices. Skadden's M&A in the AI Era analysis argues buyers now have to ask a different question entirely: what proprietary datasets does the target actually own or hold rights to, and how permissioned and traceable is that data back to its source. Skadden's guidance goes further, urging that training-data rights be treated as fundamental representations in the purchase agreement — the kind that survive longer and carry higher indemnity caps than routine reps, precisely because the downside risk resembles what Anthropic faced rather than a garden-variety IP dispute.

Deal professionals increasingly flag that the practical danger in AI-driven acquisitions is not primarily regulatory fines. Bloomberg Law's analysis frames the real risk as existential to the deal itself: the model — the asset the acquirer is paying for — can become unusable if its data origin, licensing chain, or governance record cannot withstand scrutiny once it operates at enterprise scale. A target's model might work perfectly in a demo and still be commercially unsalvageable six months after close if a licensor, an author class, or a regulator successfully challenges how its training data was sourced. That distinction — between a compliance problem and an asset-destroying problem — is why AI targets require a diligence playbook built around different questions than the one corporate development teams have used for the last two decades.

The Six AI Diligence Buckets

Standard technology diligence checklists were not built for targets whose core asset is a trained model. Acquirers need a framework that treats the model itself — how it was built, what it depends on, and what liabilities travel with it — as a distinct diligence category, separate from the usual IP assignment and open-source audits. The following six buckets cover the areas where AI-specific risk concentrates and where a standard tech diligence checklist will miss material exposure.

Training Data Provenance

The foundational legal question for any AI target — whether training a model on copyrighted works constitutes infringement or fair use — remains unresolved at the appellate level, which means diligence teams are underwriting risk against a moving target. In January 2026, Judge Sidney Stein of the Southern District of New York affirmed an order compelling OpenAI to produce 20 million anonymized ChatGPT logs sought by plaintiffs in the New York Times copyright litigation, one of several ongoing discovery disputes in NYT v. OpenAI; OpenAI has said it will appeal. As of that litigation's posture, the Third Circuit had not yet heard its first appellate argument on AI training fair use, which as of June 2026 had still not occurred. Outcomes also vary sharply by jurisdiction: in Getty Images v. Stability AI, the UK High Court found in November 2025 that Stability AI did not commit secondary copyright infringement by making its model weights available for download, though Getty had already abandoned its primary training-phase and output-infringement claims in that forum, while the parallel US case remains in jurisdictional discovery over a motion to transfer venue to the Northern District of California. Acquirers should treat training data provenance as unsettled law rather than a resolved compliance question, and price the target accordingly.

Model Ownership and IP Chain of Custody

Whether a target trained a model in-house, fine-tuned an open-weight base model, or licensed a third-party model changes what an acquirer actually owns — and open-weight licenses carry restrictions that survive the transaction. Meta's Llama Community License, for example, is not approved by the Open Source Initiative and imposes a 700-million-monthly-active-user scale threshold, a competitor-use restriction, and a ban on using Llama outputs to train non-Llama-derivative models. Because monthly active users aggregate with the acquirer's own user base post-close, a target well under the threshold on its own can trigger the license the moment it's absorbed into a larger buyer, requiring a separate license directly from Meta before the deal closes or immediately after. Diligence teams should map every model in the target's stack back to its underlying license terms, not just confirm that a license exists.

AI Act and Regulatory Risk Classification

For any target with EU exposure, the compliance burden an acquirer inherits depends on where the product falls within the EU AI Act's four risk tiers. Prohibited systems include those using subliminal or manipulative techniques, exploiting vulnerable groups, enabling social scoring, or performing most real-time remote biometric identification by law enforcement. High-risk systems — covering biometrics, critical infrastructure, education, employment and recruitment, benefits eligibility, law enforcement profiling, and migration or judicial decision-support — require risk management processes, data governance, technical documentation, and human oversight. Limited-risk systems like chatbots and deepfake generators require only transparency disclosures, while minimal-risk tools such as spam filters and AI-enabled games carry no mandatory obligations. Acquirers should not assume EU-style classification transfers cleanly to US state law: TRAIGA, Texas's Responsible AI Governance Act signed June 22, 2025 and effective January 1, 2026, takes a narrower, intent-based approach that targets deliberate misuse rather than classifying systems by inherent risk level, meaning a target compliant under the EU framework may still need separate analysis for US operations.

Bias and Disparate Impact Liability

Algorithmic bias exposure is active litigation risk today, not a theoretical future harm. In Mobley v. Workday, No. 3:23-cv-00770 (N.D. Cal.), Judge Rita Lin granted preliminary collective certification under the ADEA on May 16, 2025; she dismissed the intentional-discrimination claim but allowed the disparate-impact claim to proceed against Workday as a vendor of AI hiring-screening tools. A March 6, 2026 ruling rejected Workday's argument that the ADEA doesn't cover job applicants, and roughly 14,000 individuals had opted into the collective as of mid-2026, with the case still in discovery and no trial date set. The exposure isn't limited to companies that build hiring tools for their own use — it extends to vendors whose tools are deployed by others. The EEOC's technical assistance document had advised employers to evaluate AI-based selection tools for disparate impact and clarified that employers can bear responsibility for algorithmic tools even when a third-party vendor developed or administers them, though this EEOC guidance was non-binding technical assistance and the agency's AI-specific guidance page has since been removed. Diligence should not treat the guidance's removal as a signal that the underlying disparate-impact exposure has gone away — Mobley shows the theory is alive in active federal litigation regardless of the EEOC's current posture.

Third-Party API and Model Dependencies

A target that depends on a foundation model API rather than an owned model carries a distinct risk: providers have unilaterally revoked or restricted access with little notice, even absent any compliance failure by the customer. In June 2024, OpenAI blocked API access across dozens of unsupported countries with minimal advance notice. In June 2025, Anthropic cut off Windsurf's direct access to Claude with under five days' notice amid acquisition-related tensions, despite no compliance failure on Windsurf's part. And in April 2026, Anthropic's Safeguards Team revoked access for an organization's more than 60 accounts without identifying any specific policy that had been violated, leaving a web form as the only avenue of appeal — each of these are documented instances of unilateral API revocation. An acquirer buying a target whose product is a thin layer over a third-party model API is buying exposure to that provider's discretion, not just its pricing.

Regulatory Enforcement History

Acquirers should also check whether a target has faced, or is exposed to, algorithmic disgorgement — an FTC remedy that orders destruction of models or algorithms trained on improperly obtained data, not just the data itself. In FTC v. Rite Aid Corp., Docket No. C-4308, a December 2023 consent order marked the first FTC enforcement action against a company for biased or unfair use of AI, in that case facial recognition technology; the order required destruction of all photos and videos collected via the system, plus any data, models, or algorithms derived from them, and required third parties who had received that data to delete it and confirm deletion in writing. This wasn't a one-off: the FTC's Rite Aid consent order was at least the fifth time the agency has used algorithmic disgorgement, following its first use against Cambridge Analytica in 2019, where the FTC ordered deletion of "any algorithms or equations that originated, in whole or in part" from improperly obtained data.

⚠️
If a target's model was trained even partly on data obtained through a since-revoked consent, a breached terms of service, or an unlawful scrape, the FTC's algorithmic disgorgement precedent means an acquirer could inherit an obligation to destroy the model itself — not just settle a fine. This risk should be underwritten before signing, not discovered after close.

Red Flags That Should Change the Deal

Not every AI-related finding in diligence is a dealbreaker, and not every finding is a footnote. The distinction usually comes down to whether the problem can be fixed with money and paper — a price adjustment, an indemnity, an escrow holdback — or whether it threatens the usability of the asset itself. Dealmakers report treating training data governance failures and unresolved regulatory exposure as the two categories most likely to move a deal off its original terms, according to Shumaker's AI M&A risk checklist.

Flags That Warrant Repricing or Escrow

These issues are common enough in AI targets that acquirers increasingly price around them rather than abandon deals over them, provided the target is willing to remediate and stand behind specific representations.

  • Undocumented training data sourcing — reliance on shadow libraries, torrents, or bulk scraping instead of licensed or properly consented datasets, per Shumaker's AI M&A risk checklist
  • No documented destruction or retention policy for data that was infringing, sensitive, or collected without a clear legal basis
  • Copyrighted or proprietary content traceable in model outputs or weights, suggesting the model has memorized material it had no right to use
  • Undisclosed indemnification obligations owed to AI vendors that were not surfaced in the data room
  • Unapproved third-party AI tools used by employees outside of any sanctioned vendor list or security review
  • Absent or incomplete bias-audit history for models used in employment, lending, housing, or other high-stakes decisions

Risk professionals generally treat these as quantifiable: the cost of remediation, indemnification, or escrow can be estimated even when the underlying facts are messy.

Flags That Warrant Walking Away

A smaller set of findings changes the calculus entirely, because they threaten the target's core asset rather than its balance sheet. Bloomberg Law's framing is direct on this point: where a model's data origin, licensing, and governance cannot withstand regulatory or litigation scrutiny at enterprise scale, the model itself risks becoming unusable — no indemnity clause repairs that.

  • An active or recent regulatory investigation into the target's AI practices, particularly one resembling the Rite Aid precedent, where the FTC ordered destruction of the models and algorithms themselves, not just a fine
  • High-risk classification under the EU AI Act with no compliance file, no conformity assessment, and no realistic path to closing the gap before the acquirer inherits the obligation
  • No verifiable chain of title for model weights — meaning the acquirer cannot confirm who actually owns what they are buying
  • Training data governance so undocumented that it cannot be reconstructed even with target cooperation

No specific transaction has been publicly reported as repriced, restructured, or terminated because of one of these flags, but the reasoning risk professionals apply is consistent: if the underlying data or IP problem is severe enough to force retraining, deletion, or a shutdown order, the acquirer isn't buying a discounted asset — it's buying a liability with a brand name attached.

⚠️
If diligence cannot establish a clean chain of title for training data and model weights, no representation or warranty in the purchase agreement can fully substitute for that missing paper trail.

Structuring AI Reps and Warranties in the Purchase Agreement

Diligence findings only protect an acquirer if they get translated into contract language. For AI targets, that means moving beyond the standard IP and data privacy representations found in most technology purchase agreements and building a dedicated set of AI-specific reps, extended survival periods for the highest-risk items, and escrow structures sized to the litigation exposure diligence actually uncovered.

AI-Specific Representation Categories

Skadden's guidance on AI-era M&A calls for enhanced deal protections that go beyond generic IP reps, including specific representations on data rights, IP violations, model safety, and regulatory compliance — with an explicit rep that the target has not committed material data protection or IP violations in building or operating its models. Shumaker, Loop & Kendrick's recommended rep categories give this more granular shape: clean title to AI-generated assets (both copyright and trade secret protection), lawful sourcing and documented rights for every training and fine-tuning dataset, privacy compliance backed by valid consents and disclosures, and conformance with the terms of service of any third-party AI vendors or foundation models the target relied on, with no prohibited uses.

Illustrative language along these lines (for discussion purposes only, not drawn from any specific precedent agreement) might read: "The Company has, and has documented, all rights necessary to use each dataset used to train, fine-tune, or evaluate the Company's AI models, and such use has not infringed, misappropriated, or violated any third party's intellectual property or privacy rights." A second illustrative bullet: "No AI model developed, licensed, or deployed by the Company has been the subject of any actual or threatened claim, demand, or regulatory inquiry alleging unlawful training data use, output infringement, or noncompliance with applicable AI-specific regulation." These are starting points for negotiation, not boilerplate to copy — the actual scope should track what diligence found.

Survival Periods and Indemnity Caps

Because training data rights are foundational to whether the target's core product can legally exist, Skadden's guidance treats them as fundamental representations rather than ordinary business reps — meaning they should carry survival periods measured in years, not the twelve-to-eighteen-month window typical for general reps, and indemnity caps set meaningfully above the standard fraction of purchase price. This escalation is a direct response to how AI liability actually surfaces: training-data infringement claims often take years to litigate and can mature long after closing, so a rep that expires before the exposure is resolved provides no real protection.

Escrow and Holdback Structures

Where diligence has already surfaced pending or threatened litigation tied to training data, a fundamental-rep promise alone is not enough — acquirers need dollars set aside. Shumaker's recommended protections pair the AI-specific reps with escrow arrangements scoped to identified AI and IP issues and with warranty insurance specifically underwritten for AI risk, rather than relying on generic representation-and-warranty policies that may exclude AI-related claims altogether. The Bartz v. Anthropic settlement, discussed earlier in this guide, is the clearest illustration of why this matters: a training-data claim that looked historical and remediated at signing still matured into roughly $1.5 billion in liability. An escrow or holdback sized to reflect that kind of tail risk — not just the deal's general indemnity basket — is what actually protects the purchase price.

Actionable Next Steps — A Diligence Checklist

No bar association or standards body has issued an official checklist for AI-specific M&A diligence — the field is moving too fast for that kind of consensus to form. What exists instead is an emerging body of practice among M&A and technology counsel, converging on a shared set of document requests and analytical steps. Hunton Andrews Kurth's diligence framework is one of the clearest articulations of that convergence, and it anchors the checklist below.

The starting point is documentation of how the target's training data was actually obtained. Hunton's framework calls for confirming legal acquisition of all training data, evaluating whether any of it was pulled together through scraping or harvested from user input without the permissions to support that use, and documenting whether each data source is proprietary or public. Acquirers should also pull the target's third-party contribution records, funding agreements, and any contracts that could create shared or retained rights in the data or the resulting models — rights that do not disappear just because the company changes hands.

License review sits alongside data provenance as a first-order task. Every open-source or third-party model incorporated into the target's stack needs its license terms obtained and read in full, with particular attention to disclosure obligations tied to open-source components and indemnification language covering third-party IP claims. This is not a formality — as the Llama license's monthly-active-user threshold shows, the acquisition itself can be the event that changes which license tier a company falls into or whether a "free" license still applies post-close.

The remaining checklist items track the diligence buckets and drafting mechanics covered earlier: classifying the target's AI systems against the EU AI Act's four-tier risk framework, confirming the continuity of foundation-model API access given the industry's history of unilateral revocations, and building the escrow and indemnification structure around whatever gaps the technical and legal review surfaces. Put together as a pre-signing punch list, the work looks like this.

  1. Request and review documentation proving legal acquisition of all training data, including scraping logs, user-consent records, and licensing agreements for any third-party datasets.
  2. Identify every open-source and third-party model or library embedded in the target's AI stack, obtain the current license text for each, and flag any usage-based triggers (such as MAU thresholds) that the transaction itself could activate.
  3. Classify each AI system the target operates or embeds against the EU AI Act's four-tier risk framework and confirm what compliance obligations attach at each tier.
  4. Pull all foundation-model API contracts and terms of service, and confirm in writing whether the vendor has any history of unilateral access changes affecting the target's product.
  5. Request the target's bias-audit and model-evaluation history, including any internal or third-party testing conducted before this diligence process began.
  6. Draft AI-specific representations and warranties covering training-data rights, license compliance, and model performance, matched to indemnification caps and escrow terms sized to the actual risk uncovered.
  7. Confirm that any third-party contributor, funding, or joint-development agreements affecting the AI systems have been fully disclosed and assessed for retained or shared rights.

Evaluating an acquisition of an AI-enabled target? Promise Legal helps in-house teams structure AI-specific diligence, reps and warranties, and escrow terms before signing.

Start the conversation