Cost Reduction

Document Review Cost Reduction with Active Learning

April 10, 2025 Haruki Tanaka 6 min read

A 50,000-document collection lands on your desk. Your team quotes the client somewhere between $45,000 and $80,000 for first-pass review. That number is built on a familiar assumption: attorneys have to read most of it. They don't. And in our experience working with litigation-support firms, the ones that figure this out early save substantially more than the ones who treat active learning as a nice-to-have.

Active learning changes the math fundamentally. Not incrementally. The baseline shifts.

What the Baseline Actually Costs

Before you can appreciate what active learning saves, you need an honest accounting of what traditional linear review costs. On a 50,000-document collection, first-pass review runs 600 to 800 attorney hours. At blended contract-attorney rates, that's $45,000 to $80,000 before quality control, before privilege logging, before anything downstream. Per-document cost lands between $0.90 and $1.60. Those are real numbers we see regularly.

The underlying driver is review depth: in a linear model, you're looking at 85-100% of documents before you've satisfied recall targets. QC compounds the problem. Without technology-assisted review, our data shows reclassification rates running 15-25% in post-review audits. That's not just extra hours; it's the kind of error rate that creates sanctions exposure on proportionality challenges.

Fact: the cost problem isn't attorney speed. It's volume. Active learning targets volume.

How Active Learning Compresses Review Depth

Active learning models work by learning from every coding decision a reviewer makes, then re-ranking the remaining document pool in real time. The system isn't randomly sampling. It's prioritizing documents most likely to affect the classification boundary. After a seed set of 300-500 reviewer judgments, the model's predictions become reliable enough to drive review cutoffs.

Here's what that looks like in practice. Recall targets of 85-90%, which satisfy most proportionality standards, are achievable at 30-35% review depth on a well-structured collection. On that same 50,000-document set, you're reviewing 15,000 to 17,500 documents instead of 42,000 to 48,000. Attorney hours drop to roughly 35-40% of the linear baseline.

Run the arithmetic: 600 baseline hours become 210-240 hours. At $75/hour blended rate, that's $15,750 to $18,000 against a $45,000-$80,000 baseline. The per-document cost on reviewed documents stays similar, but you're paying it on far fewer documents. Effective per-document cost on the full collection drops from $0.90-$1.60 to roughly $0.32-$0.55.

We've seen this play out across collections ranging from 30,000 to 300,000 documents. The compression ratios are consistent.

QC Reclassification: The Hidden Multiplier

Cost reduction from review-depth compression is the obvious win. Less discussed: what active learning does to QC reclassification rates.

In linear review without a model, QC audits regularly surface 15-25% reclassification. That's not because the attorneys are bad reviewers. It's because fatigue and inconsistency compound over thousands of documents reviewed without systematic feedback loops. By session 40 of a linear review, the reviewer has drifted from the coding standards they applied in session one. Nobody catches it until QC.

Active learning introduces a continuous feedback mechanism. The model's predictions surface near-boundary documents for reviewer attention, which means ambiguous calls get more scrutiny, not less. In our tracking, active learning deployments reduce post-review QC reclassification rates to 5-8% on average. That's a 3x improvement over linear QC outcomes.

Translate that to hours: on a 600-hour linear project with 20% QC reclassification, you're spending roughly 120 additional attorney hours correcting errors. At 8% reclassification on the compressed active learning review, QC overhead on 210 hours runs about 17 hours. The QC savings alone offset a meaningful fraction of the model setup cost.

Platform-Agnostic Deployment

One objection we hear consistently: firms are already in Relativity, DISCO, Everlaw, or Reveal. They can't add another platform. That's a reasonable concern and also not actually a barrier.

Discovarc runs as a platform-agnostic layer. The active learning engine connects to your existing review environment through standard APIs. Reviewers stay in the interface they know. The model operates in the background, re-ranking the coding queue and surfacing priority documents. No parallel workflow, no database migration, no retraining your reviewers on new UI conventions.

This matters for cost modeling because the friction cost of switching review platforms is real. If active learning required a platform change, the savings calculation gets murkier. It doesn't. The savings are incremental against your existing infrastructure cost, not net of migration overhead.

Where the Numbers Can Miss

Honest caveat here. Active learning delivers these compression ratios consistently on collections with identifiable relevance patterns. On collections with very low prevalence, below 2% relevant, the seed-set dynamics get harder. The model needs enough positive examples to learn from, and in extremely low-prevalence collections, finding those examples through initial random sampling takes longer than expected.

Similarly, collections with significant near-duplicate or family relationships need pre-processing to avoid inflating review depth. If 40% of your collection is email threads where the model is counting individual messages rather than families, your depth calculation is off. Processing discipline matters as much as the model itself.

In our experience, these edge cases account for roughly 10-15% of collections where compression ratios underperform. The other 85-90% track closely to the figures above.

Building the Business Case

If you're making the case to clients or your own finance team, the numbers need to be concrete. Here's the framing that works:

Metric	Linear Review (Baseline)	Active Learning
First-pass hours (50K docs)	600-800 hrs	210-320 hrs
Review depth to 85-90% recall	85-100%	30-35%
QC reclassification rate	15-25%	5-8%
Effective per-document cost	$0.90-$1.60	$0.32-$0.55
First-pass cost range	$45,000-$80,000	$15,750-$24,000

Present those five rows. The conversation tends to move quickly after that.

Practical note: the table above assumes a blended attorney rate of $75/hour. If your firm's rates are higher, the absolute savings scale accordingly. The ratios hold regardless of rate.

The cost reduction from active learning isn't theoretical. It's the product of two compounding effects: reviewing far fewer documents to hit recall targets, and catching errors earlier when reclassification is still cheap. Together, they bring first-pass costs down to 35-40% of traditional baselines, consistently, across platform environments your team already uses.

The firms holding back aren't skeptical about the model. They're waiting for someone to make the business case clearly. Now you can.

What the Baseline Actually Costs

How Active Learning Compresses Review Depth

QC Reclassification: The Hidden Multiplier

Platform-Agnostic Deployment

Where the Numbers Can Miss

Building the Business Case

Related Articles

TAR and Active Learning for Document Review Workflows

Litigation Support Firm Review Cost Benchmarks

Concept Clustering for Collection Analysis