Predictive Coding Defensibility and TAR Protocol

Predictive coding defensibility and TAR protocol

Courts don't reject predictive coding because the technology failed. They reject it because the lawyers couldn't explain what they did. That distinction matters more than most practitioners realize, and we've seen it play out in enough disputes to know: defensibility is a documentation problem first, a technology problem second.

TAR protocol isn't a single procedure, it's a category. Technology-assisted review covers several distinct workflows, and each one carries different defensibility requirements. TAR 1.0 (simple active learning) requires a defined seed set and documented coding decisions from that seed set. TAR 2.0 (continuous active learning) has different initialization requirements but demands equally rigorous recall validation. If you're not clear on which you're running, your protocol memo won't hold up under scrutiny.

The Seed Set Is Where Defensibility Starts

In our experience, the seed set is where most defensibility problems originate. Not from bad coding, but from undocumented coding. The model learns from whatever the reviewer decided, and if you can't reconstruct why those decisions were made, opposing counsel has a foothold.

A defensible seed set requires three things: a documented protocol for how reviewers were instructed to code relevance, a record of who coded which documents, and a quality check on the seed set itself before training begins. That last point is frequently skipped. Experienced practitioners will tell you that a 5-10% error rate in seed coding materially degrades model performance downstream. We've tracked this across deployments on Relativity, DISCO, Everlaw, and Reveal, and the pattern holds regardless of platform.

The protocol memo for seed set coding should address:

  • How reviewers were instructed to handle ambiguous documents
  • Whether borderline relevant documents were coded responsive or non-responsive
  • How privilege was handled in the training population
  • Who had authority to resolve coding disputes in the seed set

None of this is technically complex. It's procedurally disciplined. The firms that struggle with defensibility challenges are typically the ones who treated the seed set as a quick-start step rather than a formal protocol event.

QC Sampling: The 95% Confidence Standard

Here's the thing about QC sampling: the math isn't optional. Courts and opposing parties have increasingly referenced statistical standards in TAR disputes, and the de facto benchmark in federal litigation is a 95% confidence level with a plus or minus 2% margin of error. That requires a minimum sample of approximately 2,401 documents from the non-responsive set. Pull fewer, and you can't make the confidence claim.

The sampling protocol needs to be prospectively defined. A common mistake is deciding after the review how many documents to sample and then working backward to call it sufficient. Judges have noticed. In several well-cited TAR disputes, the problem wasn't the review quality, it was that counsel couldn't show the sampling decision was made before they knew the results.

Document the following before sampling begins:

  • The target confidence level and margin of error
  • The calculated minimum sample size for your population
  • How the random sample will be drawn (software method, seed value if applicable)
  • Who will conduct the elusion review and under what coding instructions

Honestly, this is the section most review teams underinvest in. The TAR platform does the math automatically, but the decision to set those parameters has to be made by a human and logged in the protocol documentation before you run the sample.

Recall Rate: What You Report and When

Recall is the metric that matters for production defensibility. It answers the question a producing party has to be able to answer: what percentage of the relevant documents in the collection did we find? An 85-90% recall rate at 30-35% review depth is a benchmark we've seen hold up repeatedly in productions, but recall rate alone isn't the deliverable. Context is.

Fact: a 90% recall rate with a poorly defined relevance scope can be less defensible than an 87% recall rate with a precisely documented protocol. What you're really reporting is "90% of documents meeting this defined relevance standard were identified." If the relevance standard is vague or shifted during the review, that percentage means less than it looks like.

The recall documentation for production should include:

  • The final recall estimate and the method used to calculate it
  • The confidence interval around that estimate
  • The training cutoff point (when the model stopped improving materially)
  • Any post-cutoff manual review or supplemental sampling

One more thing: recall should be documented at the point of cutoff decision, not reverse-engineered after production. We've reviewed productions where recall was calculated post-hoc to support a decision that had already been made, and that sequence is exactly what opposing counsel's experts look for.

Platform Agnosticism and What It Means for Protocol

A defensible TAR protocol needs to be platform-agnostic in its logic, even if the mechanics are platform-specific. The core procedural requirements don't change whether you're running on Relativity's Active Learning, DISCO's AI Review, Everlaw's Prediction, or Reveal's Brainware. Courts evaluate the process, not the software.

This matters practically because productions increasingly cross platform boundaries. Data might be processed in one system and reviewed in another. In our tracking of multi-platform productions, the defensibility gaps almost always appear at handoff points, where documentation of decisions made in System A wasn't carried forward into System B.

The solution is a protocol memo that describes the procedure in platform-neutral terms, with platform-specific implementation details as an appendix. "We used active learning with a prioritized review queue" is a protocol statement. "This was implemented in Relativity using the Active Learning project with the following queue settings" is the implementation detail. Keeping them distinct makes the memo portable and makes it easier to explain to a judge who isn't familiar with any of the specific tools.

The Documentation Stack

When defensibility is challenged, the question isn't whether TAR works. It's whether you can reconstruct what you did and why. In our experience, the firms that survive TAR challenges aren't necessarily the ones who ran the cleanest reviews. They're the ones who kept the paper trail.

The minimum documentation stack for a defensible TAR production:

  • Pre-review protocol memo (relevance standard, technology selection, team composition)
  • Seed set coding protocol and QC records
  • Training iteration log (round count, sample reviewed per round, model performance metrics)
  • Cutoff decision memo (who decided to stop training and on what basis)
  • QC sampling protocol and elusion review results
  • Final recall documentation with confidence interval
  • Production log with document counts by category

Seven documents. None of them are technically complicated to produce. The discipline is in generating them contemporaneously, not after a dispute emerges. That's the difference between a defensible production and an expensive argument.