Perspectives › Predictive Coding

TAR 1.0 vs. TAR 2.0: What Litigation Support Teams Need to Know in 2025

Naomi Ashford · Founder & CEO, Discovarc · February 11, 2025 · 8 min read

TAR 1.0 vs TAR 2.0 comparison diagram for litigation support teams

Technology Assisted Review has been part of the e-discovery lexicon since at least 2012, when Judge Andrew Peck's opinion in Da Silva Moore v. Publicis Groupe (S.D.N.Y. 2012) endorsed the methodology for the first time in federal court. In the thirteen years since, the market has divided into two meaningfully different approaches — and the distinction shapes everything from how you structure your protocol to how you defend your review in contested motion practice.

Litigation support directors and e-discovery counsel tend to know they're using "predictive coding" without always knowing which generation of methodology sits under the hood. That gap matters: TAR 1.0 and TAR 2.0 have different accuracy profiles, different protocol demands, and different implications for matters where the custodian universe keeps changing mid-review.

What TAR 1.0 Actually Means

TAR 1.0 is the simple passive learning (SPL) family of approaches. The workflow is: attorneys review a seed set of documents and apply responsive/non-responsive coding; that training corpus is fed to a classifier (typically logistic regression or support vector machine against a bag-of-words feature space); the model scores the remaining documents; attorneys review a quality-control sample; if validation metrics clear a pre-agreed threshold, the review stops.

The "passive" in SPL means the model does not request specific documents for attorney training — it accepts whatever the seed set contains. Early implementations often used random seed selection, which creates a well-known problem: if responsive documents are rare in the overall corpus (say, 5% prevalence), a random seed set will contain very few responsive examples, and the model will learn primarily what non-responsive looks like. You can address this with targeted seeding — front-loading known-relevant hot docs into the seed set — but that strategy introduces its own selection pressure into the training process, which some opposing counsel have challenged as non-representative sampling.

TAR 1.0 is still defensible. In re Biomet (N.D. Ind. 2013) accepted an SPL-based workflow where the parties agreed on a protocol upfront. The methodology's limitations become more pronounced on large, multi-custodian, multi-issue matters — the scenarios where litigation support teams are most likely to feel the shortfall.

What TAR 2.0 Changes

TAR 2.0 is the continuous active learning (CAL) family. The defining characteristic is that the model re-ranks the remaining document population after every attorney review batch and surfaces the highest-uncertainty documents — those the model is least confident about — for the next training round. Attorneys are not reviewing a fixed seed set and then stepping back; they are continuously training the model throughout the review, while the model is simultaneously scoring documents for production cutoff decisions.

This has two practical consequences. First, CAL makes much more efficient use of attorney review time for rare-responsive document populations: the model prioritizes surfacing likely-relevant documents early rather than waiting for the full corpus to be scored. Second, because training and production review are concurrent, the model's accuracy improves as attorneys code the most informative documents — the ones closest to the decision boundary between responsive and non-responsive.

Platforms implementing CAL include Brainspace (acquired by Reveal), which uses concept clustering alongside active learning to surface thematically related document groups, and Relativity's Active Learning module within RelativityOne. The Sedona Conference addressed both methodologies in its Best Practices Commentary on Search and Information Retrieval — though the technology has advanced considerably since those guidelines were first drafted.

Where the Difference Shows Up in Practice

Consider a scenario: a mid-size manufacturing company faces an employment class action. Discovery involves twelve custodians across four years. Custodians are added to the scope in waves as the litigation develops. The total ESI volume after deduplication is 1.4 million documents.

Under a TAR 1.0 SPL protocol, the addition of new custodians midway through review creates a problem: the original seed set was not trained on their documents, and the model's scoring for the new custodian population may be systematically off-base — particularly if those custodians used different communication channels or terminology. Counsel must decide whether to retrain from scratch, patch the seed set, or treat the new custodian documents outside the TAR protocol.

Under a CAL protocol, the model continues to accept attorney coding feedback throughout. New custodian documents enter the active learning queue and the model iteratively incorporates them into its scoring. The protocol documentation still requires explicit disclosure of how the new custodian population was handled — but the algorithmic mechanism is more naturally accommodating of corpus expansion.

This is not to say TAR 1.0 cannot handle large matters — it can, with careful protocol design. The practical boundary is more about the cost of attorney time to correct for sampling gaps than about any categorical deficiency in the SPL family.

Protocol Documentation Differences

Under either approach, your TAR protocol documentation needs to cover the same baseline: seed set composition (or initial training document composition for CAL), elusion testing methodology, recall target thresholds, and what happens when the opposing party challenges your cutoff decisions. FRCP Rule 26(g) requires that the certifying attorney have "reasonable inquiry" into the completeness of the production — which, in TAR contexts, courts have interpreted as requiring some understanding of the validation methodology even if the attorney did not personally design it.

TAR 2.0 protocols have a more complex documentation burden in one sense: because training is continuous, the protocol must define stopping criteria that are not simply "model validation cleared X% recall." Common CAL stopping criteria include target recall thresholds verified by independent sampling, stabilization metrics (the model stops improving meaningfully with additional training), or privilege-log completion combined with a separate elusion check.

Rio Tinto v. Vale (S.D.N.Y. 2015) and Hyles v. NYC (S.D.N.Y. 2016) both addressed the extent to which courts can compel a particular TAR methodology. Judge Peck's opinion in Hyles is instructive: he declined to require the producing party to switch from keyword search to TAR, on the grounds that the producing party bears primary responsibility for its review methodology choices, subject to the producing party's obligation to produce all responsive, non-privileged documents.

The practical lesson: courts will scrutinize whether your chosen methodology was implemented consistently with your protocol documentation, not whether you chose TAR 1.0 or TAR 2.0. A well-documented SPL process with rigorous elusion testing is defensible; a poorly documented CAL process with no validation records is not.

Simple Active Learning and the SPL/CAL Spectrum

Some practitioners use a third category — simple active learning (SAL) — to describe approaches where the model surfaces high-confidence positive predictions for early review (to accelerate privilege identification or hot-doc identification) without full continuous training integration. SAL sits between SPL and CAL in complexity and protocol overhead. It is useful when the team needs rapid identification of the highest-value documents in the first review wave — for instance, when a preliminary injunction motion creates a compressed production timeline — but the matter does not justify the full CAL protocol infrastructure.

We're not saying SAL or TAR 1.0 are inadequate for all purposes — on single-custodian matters with clear subject-matter scope and a cooperative opposing party, an SPL protocol with good seed set design may be entirely sufficient and more cost-effective than standing up a full CAL implementation. The point is that these are substantively different methodologies with different performance profiles, and the selection decision should be made deliberately and documented in your ESI protocol.

Choosing the Right Approach for Your Matter

A few factors that should drive the methodology selection conversation with your review team:

Corpus stability: Is the custodian universe and data scope likely to change during review? CAL handles corpus expansion more gracefully.
Responsive document prevalence: If the prevalence rate is below 5%, CAL's targeted surfacing of likely-relevant documents dramatically reduces the training set size needed for a reliable classifier.
Matter timeline: Fast-turnaround productions (preliminary injunction support, government investigation responses) may favor targeted SPL with front-loaded seed set design over a full CAL workflow standing-up period.
Opposing party cooperation: If opposing counsel has agreed to a Predictive Coding Protocol in the Rule 26(f) conference, that protocol should specify which methodology family you're committing to and what validation metrics will be applied.
Platform capability: Confirm whether your review platform implements true continuous active learning or a batch-retraining SPL variant marketed as CAL. The distinction matters for protocol accuracy.

Whatever the methodology, the obligation to provide defensible documentation of your review process remains constant. Privilege determinations made during TAR review — including decisions about which documents to route to privilege review rather than the responsive/non-responsive track — remain counsel's decision and cannot be delegated to algorithmic scoring alone. The model surfaces candidates; attorneys make privilege calls.

If you'd like to understand how Discovarc's implementation handles protocol documentation and methodology selection for different matter types, the walkthrough request form is the right starting point.