Capability

Predictive Coding — TAR That Works With How Attorneys Review

Technology Assisted Review that improves as attorneys review — not a fixed model run once at project start. Designed to support FRCP Rule 26 cooperative discovery workflows.

What Predictive Coding Does — and Doesn't Do

Predictive coding uses machine learning to classify documents based on their relevance to the issues in a case. An attorney reviews a training set of documents and tags them as responsive or non-responsive. The model learns from those decisions and applies them to the full document population — producing a ranked list by predicted relevance.

Predictive coding does not replace attorney judgment. It organizes a large document population so attorney effort is concentrated where it matters most — on the uncertain documents and the privilege calls, not on obvious non-responsive material.

TAR 1.0 vs. TAR 2.0: What the Distinction Means in Practice

TAR 1.0 (simple passive learning) trains a model on a fixed seed set, then applies that model to the remaining population. The model does not update as reviewers continue working. This approach was sufficient for early predictive coding acceptance in courts but introduces risk on complex multi-custodian matters where the seed set may not fully represent the document population.

TAR 2.0 (continuous active learning) updates the model as reviewers work through the document population. Every review decision — responsive, non-responsive, privileged — is incorporated into the model's next round. The model does not require a complete seed set before it begins ranking the full population.

Discovarc uses continuous active learning. Courts that have scrutinized TAR methodology — including in cases citing Da Silva Moore, Rio Tinto, and related precedent — have generally accepted active learning approaches when the protocol documentation adequately describes the methodology.

TAR 1.0

Fixed seed set before review begins
Model trained once, then applied
Seed set quality critical to outcome
Simpler protocol documentation
Less adaptive on large heterogeneous sets

TAR 2.0 (Discovarc)

Model updates with each reviewer decision
Continuous active learning throughout review
Uncertain documents prioritized for review
Adapts to emerging document patterns
More defensible stopping criteria

Seed Set Construction

Even in a continuous active learning system, the initial seed set matters. A well-constructed seed set exposes the model to the breadth of document types, custodians, and time periods in the collection before the active learning loop begins.

What goes into a good seed set

Stratified random sample: draws from each custodian proportionally to their contribution to the total corpus
Time-period coverage: seed set should span the full date range of documents, not just recent material
Document-type diversity: emails, attachments, and standalone documents each behave differently under the model; a seed set dominated by one type underrepresents others
Known-relevant documents: counsel should include documents known to be responsive from prior review or deposition preparation, if available

The role of counsel in seed set selection

Seed set selection is a legal decision, not a technical one. Counsel determines which documents are responsive — the model learns from those decisions. Discovarc surfaces a stratified candidate pool and tracks which documents were included in the seed set for protocol documentation purposes.

Iterative training loop diagram for Technology Assisted Review showing seed set selection, model training, reviewer feedback, and stopping criteria cycle

Iterative Training and Stopping Criteria

In continuous active learning, each review round produces a new model state. Discovarc presents the documents with the highest model uncertainty for attorney review each round — the documents where the model is least confident about the correct classification are those where human judgment adds the most value.

How the iteration loop works

After each review round, Discovarc retrains on all reviewed documents and re-scores the remaining population. Documents previously ranked in the exception zone may move to high confidence as the model gains more context; documents in the high-confidence range may be sampled into QC rather than reviewed individually.

Stopping criteria

The question of when to stop reviewing is one of the most contested issues in TAR protocol documentation. Discovarc supports several stopping criteria approaches:

Elusion testing: a random sample drawn from the predicted non-responsive set is reviewed by counsel to measure recall. If the elusion rate falls below the agreed threshold, review is complete.
Recall threshold: the estimated proportion of responsive documents that have been identified. Discovarc calculates a recall estimate at each stopping point based on the reviewed and sampled populations.
Marginal productivity: the rate at which new responsive documents are found per review round. When this rate falls below a defined level, the protocol can support stopping.

The choice of stopping criterion should be made in consultation with opposing counsel under the cooperative discovery framework of FRCP Rule 26 and documented in the TAR protocol.

Note: Discovarc generates TAR protocol documentation designed to support FRCP Rule 26 cooperative discovery disclosure. This documentation describes the methodology applied — it is not a legal opinion or a guarantee of court acceptance. Protocol documentation should be reviewed by supervising counsel before disclosure.

Protocol Documentation for Court Submission

When a party uses Technology Assisted Review, the opposing party and potentially the court may request disclosure of the methodology. FRCP Rule 26 imposes a general obligation of cooperation on discovery process decisions, and courts have increasingly expected TAR protocols to be documented and disclosed.

What Discovarc generates

TAR protocol documentation: description of the methodology applied (TAR 2.0 / continuous active learning), seed set construction approach, iteration count, stopping criterion used
Seed set audit log: which documents were included in the seed set and who reviewed them
Review round logs: timestamped record of each review round, model version, and scoring event
Elusion test record: the elusion sample selected, review results, and estimated recall rate
Stopping criterion documentation: the specific threshold applied and the data supporting the stopping decision

This documentation is designed to provide supervising counsel with the factual basis for cooperative disclosure under Rule 26. The legal sufficiency of any disclosure is a matter for counsel to assess.

Talk through your matter type — Request a Walkthrough.

We'll walk through how the continuous active learning workflow maps to your collection size, custodian count, and production timeline.

Request a Walkthrough