Capability
Predictive Coding — TAR That Works With How Attorneys Review
Technology Assisted Review that improves as attorneys review — not a fixed model run once at project start. Designed to support FRCP Rule 26 cooperative discovery workflows.
What Predictive Coding Does — and Doesn't Do
Predictive coding uses machine learning to classify documents based on their relevance to the issues in a case. An attorney reviews a training set of documents and tags them as responsive or non-responsive. The model learns from those decisions and applies them to the full document population — producing a ranked list by predicted relevance.
Predictive coding does not replace attorney judgment. It organizes a large document population so attorney effort is concentrated where it matters most — on the uncertain documents and the privilege calls, not on obvious non-responsive material.
TAR 1.0 vs. TAR 2.0: What the Distinction Means in Practice
TAR 1.0 (simple passive learning) trains a model on a fixed seed set, then applies that model to the remaining population. The model does not update as reviewers continue working. This approach was sufficient for early predictive coding acceptance in courts but introduces risk on complex multi-custodian matters where the seed set may not fully represent the document population.
TAR 2.0 (continuous active learning) updates the model as reviewers work through the document population. Every review decision — responsive, non-responsive, privileged — is incorporated into the model's next round. The model does not require a complete seed set before it begins ranking the full population.
Discovarc uses continuous active learning. Courts that have scrutinized TAR methodology — including in cases citing Da Silva Moore, Rio Tinto, and related precedent — have generally accepted active learning approaches when the protocol documentation adequately describes the methodology.
- Fixed seed set before review begins
- Model trained once, then applied
- Seed set quality critical to outcome
- Simpler protocol documentation
- Less adaptive on large heterogeneous sets
- Model updates with each reviewer decision
- Continuous active learning throughout review
- Uncertain documents prioritized for review
- Adapts to emerging document patterns
- More defensible stopping criteria
Seed Set Construction
Even in a continuous active learning system, the initial seed set matters. A well-constructed seed set exposes the model to the breadth of document types, custodians, and time periods in the collection before the active learning loop begins.
What goes into a good seed set
- Stratified random sample: draws from each custodian proportionally to their contribution to the total corpus
- Time-period coverage: seed set should span the full date range of documents, not just recent material
- Document-type diversity: emails, attachments, and standalone documents each behave differently under the model; a seed set dominated by one type underrepresents others
- Known-relevant documents: counsel should include documents known to be responsive from prior review or deposition preparation, if available
The role of counsel in seed set selection
Seed set selection is a legal decision, not a technical one. Counsel determines which documents are responsive — the model learns from those decisions. Discovarc surfaces a stratified candidate pool and tracks which documents were included in the seed set for protocol documentation purposes.
Iterative Training and Stopping Criteria
In continuous active learning, each review round produces a new model state. Discovarc presents the documents with the highest model uncertainty for attorney review each round — the documents where the model is least confident about the correct classification are those where human judgment adds the most value.
How the iteration loop works
After each review round, Discovarc retrains on all reviewed documents and re-scores the remaining population. Documents previously ranked in the exception zone may move to high confidence as the model gains more context; documents in the high-confidence range may be sampled into QC rather than reviewed individually.
Stopping criteria
The question of when to stop reviewing is one of the most contested issues in TAR protocol documentation. Discovarc supports several stopping criteria approaches:
- Elusion testing: a random sample drawn from the predicted non-responsive set is reviewed by counsel to measure recall. If the elusion rate falls below the agreed threshold, review is complete.
- Recall threshold: the estimated proportion of responsive documents that have been identified. Discovarc calculates a recall estimate at each stopping point based on the reviewed and sampled populations.
- Marginal productivity: the rate at which new responsive documents are found per review round. When this rate falls below a defined level, the protocol can support stopping.
The choice of stopping criterion should be made in consultation with opposing counsel under the cooperative discovery framework of FRCP Rule 26 and documented in the TAR protocol.
Protocol Documentation for Court Submission
When a party uses Technology Assisted Review, the opposing party and potentially the court may request disclosure of the methodology. FRCP Rule 26 imposes a general obligation of cooperation on discovery process decisions, and courts have increasingly expected TAR protocols to be documented and disclosed.
What Discovarc generates
- TAR protocol documentation: description of the methodology applied (TAR 2.0 / continuous active learning), seed set construction approach, iteration count, stopping criterion used
- Seed set audit log: which documents were included in the seed set and who reviewed them
- Review round logs: timestamped record of each review round, model version, and scoring event
- Elusion test record: the elusion sample selected, review results, and estimated recall rate
- Stopping criterion documentation: the specific threshold applied and the data supporting the stopping decision
This documentation is designed to provide supervising counsel with the factual basis for cooperative disclosure under Rule 26. The legal sufficiency of any disclosure is a matter for counsel to assess.
Talk through your matter type — Request a Walkthrough.
We'll walk through how the continuous active learning workflow maps to your collection size, custodian count, and production timeline.
Request a Walkthrough