QC Sampling and Recall Validation in E-Discovery

QC sampling and recall validation in e-discovery

Most review teams treat QC as a formality. Run a sample, check a few docs, declare the review done. We've seen this approach blow up in productions where opposing counsel later found that 18% of responsive documents were sitting in the "non-responsive" bucket. That's not a QC problem. That's a no-QC problem.

Statistically valid quality control in e-discovery isn't complicated. But it does require math, and it does require discipline. This piece walks through how we approach QC sampling and recall validation at Discovarc, from initial sample sizing through elusion testing and TAR protocol disclosure.

Sample Size Is Not a Guess

Here's the thing: the number of documents you need to sample is determined by statistics, not intuition. For a QC sample at 95% confidence with a ±2% margin of error, you need approximately 2,401 documents drawn from the non-reviewed population. That number is fixed regardless of whether your total corpus is 50,000 documents or 5 million. The population size becomes almost irrelevant once you're above about 100,000 documents.

The formula people forget is that as you tighten the margin of error, sample requirements grow fast. At ±2%, you need 2,401. At ±1%, that jumps to 9,604. Most matters can tolerate ±2% at 95% confidence. Cases with extreme financial exposure or regulatory scrutiny sometimes justify ±1.5% (roughly 4,268 documents). We've rarely seen a litigation matter where going tighter than that changed a production outcome.

What matters more is how you draw the sample. A QC sample from the non-reviewed population must be genuinely random. Not stratified by custodian. Not biased toward recent documents. Not pulled from the review platform's "first in queue" logic. Random. If your platform doesn't support true random selection from a defined population, that's a tool limitation worth knowing before you need it.

QC Sample Size Reference (95% Confidence)
Margin of ErrorRequired Sample SizeTypical Use Case
±2%2,401Standard commercial litigation
±1.5%4,268Regulatory / class action
±1%9,604High-stakes criminal / financial fraud

What the QC Sample Actually Measures

A well-designed QC sample does two jobs. First, it reveals your model's recall rate by surfacing any responsive documents the model coded as non-responsive. Second, it measures your error rate against prior predictions so you can quantify whether the model is behaving as expected given the training data.

In practice, reviewers hand-code the sample documents independently. The resulting disposition disagreements are the signal. If your 2,401-document sample turns up 4 responsive documents the model missed, your estimated recall rate is approximately (1 - 4/2401) = 99.8%. That number is the output you report in TAR protocol disclosure. It's also the number opposing counsel will scrutinize.

Recall targets vary by matter type. In our experience, most TAR protocols set a recall floor somewhere between 85% and 90%. Some courts and agreed-upon protocols push to 95%. The Discovarc default target is 85-90% recall, validated through the QC sampling process. If the sample reveals recall below the agreed threshold, the model needs additional training rounds before the review is considered complete.

Honest moment: recall validation through sampling is probabilistic. You're estimating model performance within confidence bounds, not certifying it with certainty. The 95% confidence interval means you'd expect to get the same or better result 95 out of 100 times you ran the same sampling procedure. Courts and opposing counsel generally accept this framing when it's stated clearly in protocol disclosures. What they don't accept is vague claims like "the model performed well."

Elusion Testing Starting in Round 4

Elusion testing is the specific practice of sampling the model's predicted-non-responsive population to check whether responsive documents are "eluding" classification. It's not the same as general QC sampling, and the timing matters.

We start elusion testing from round 4 of predictive coding. Why round 4? Earlier rounds have too much model instability. The model is still actively incorporating training feedback, and elusion rates from rounds 1-3 reflect that instability more than they reflect actual review quality. Starting too early gives you noisy data that leads to premature calls about model readiness. Not useful.

From round 4 onward, stable elusion rates are one of the primary stabilization signals. Our protocol treats the model as stable when elusion rates across three consecutive rounds remain within 1 percentage point of each other. A single low elusion result is not stabilization. Three consistent low results are.

Practical note: If elusion rates drop sharply between rounds 4 and 5, then tick back up in round 6, that's a training issue, not a sampling anomaly. Re-examine your seed set before adding more training examples. Adding more seeds to a contaminated training set makes things worse, not better.

TAR Protocol Disclosure Format

Every number from your QC sampling process feeds into a disclosure document that courts and opposing counsel can audit. Discovarc outputs this documentation in a structured format designed to match standard TAR protocol disclosure expectations. The disclosure includes:

  • Total document population size
  • Training rounds completed
  • QC sample size and selection methodology
  • Estimated recall rate with confidence interval
  • Elusion rate per round (from round 4 forward)
  • Model stabilization criteria and date achieved
  • Seed set composition summary

This is the output that matters. You can run the most rigorous review process in the world, but if you can't document it in a format that holds up to scrutiny, the rigor is invisible. Our data shows that cases using structured TAR protocol disclosures face fewer discovery disputes than those relying on attorney certification alone. Fact: courts increasingly expect this documentation. Providing it proactively signals competence and reduces the odds of a challenge.

Unreviewed Documents and Predictive Disposition

No matter how large your review team, most matters will have a population of documents that never get human eyes. The model handles those. Predictive coding disposition means the model's classification for those documents is treated as final, subject to the QC validation process described above.

This is where the Discovarc workflow diverges from manual TAR approaches. Rather than having attorneys review low-ranked documents to verify non-responsiveness, we use the validated model to handle that population directly. The QC sampling process is the quality gate. If the model passes QC, the disposition stands. If it doesn't pass, we go back to training.

Integration with Relativity, DISCO, Everlaw, and Reveal means the disposition data flows directly into your existing review platform. We're not asking teams to migrate to a new system or export data into something proprietary. The QC results, elusion rates, and protocol disclosures are generated within the Discovarc layer and pushed back to your platform in the format it expects.

Simple as that.

QC sampling and recall validation are the backbone of defensible AI-assisted review. The math is accessible. The process is repeatable. The documentation is automatable. What it requires is a commitment to running the protocol correctly every time, not just when the stakes are obviously high.