Most e-discovery teams that come to us have Relativity Analytics already licensed. They just haven't built a workflow that actually uses it. The module sits there, the ActiveLearning queue waits, and reviewers keep working linear batches at $4 to $7 per document. We've seen firms spend north of $180,000 on a single collection that a properly configured analytics workflow would have cut by 40%. This guide walks through the exact setup we use.
Start With a Real Seed Set, Not a Convenient One
The seed set is where most firms go wrong. They grab 200 documents a senior reviewer already coded, load them in, and wonder why the model misfires in round three. Honestly, that's not a Relativity problem. That's a seed problem.
We configure seed sets at 500 documents, minimum. Specifically: 250 relevant, 250 non-relevant, selected to span the conceptual range of the collection, not just the most obvious hot docs. For a financial fraud matter with a 2 million document corpus, that means you need representatives from board communications, accounting exports, external correspondence, and the boring operational traffic that will make up 60% of your non-relevant population.
The breakdown matters. Uniform 50/50 at seed stage avoids the precision collapse you get when the model over-trains on one class. Skew it later, after round two, once you know the actual prevalence.
Practical note: pull your seed set from Relativity's near-duplicate and email thread clusters, not from a keyword search. You want conceptual breadth, not keyword density.
Configuring Iteration Batches for Manageable QC
Iteration batch size is where firms consistently underestimate the operational load. Relativity's default ActiveLearning documentation suggests 2,000 documents per review batch. In our experience, that's too big for QC to keep pace with in multi-matter environments where reviewers context-switch daily.
We run batches at 1,200 to 1,500 documents. Here's the logic. A senior reviewer doing QC sampling can check 150 to 200 documents per hour with reasonable rigor. At 1,200 documents and a 10% QC sample rate, that's 120 documents, or about 45 minutes of QC time per batch. You can complete two full batch cycles in a single day and still have time to address errors before the next iteration retrains.
The iteration cadence itself: we train after every batch, not every two or three. More frequent retraining costs marginal compute time but catches model drift earlier. On a 1.5 million document collection, we typically see stabilization, defined as less than 3% change in rank order between iterations, around round five or six.
One thing that surprises people: iteration batch management isn't just about the model. It's about reviewer workflow. Batches need to complete before the next training run. That means assigning batches to specific reviewers with firm deadlines, not dumping them into a shared queue and hoping. We build daily batch completion targets into our project plans at the outset.
QC Sampling Setup That Won't Let You Down in a Deposition
Defensibility isn't just a legal term. It's a technical specification. Your QC setup needs to produce documented evidence that the review reached a statistically supportable recall target, and that sampling identified errors at a rate low enough to justify your elusion rate claims.
The configuration we use:
- Per-batch QC: 10% stratified random sample from each iteration batch, reviewed before the next training run
- Elusion testing: begin at round four, 400-document random sample from the not-relevant population, using a simple random sample from the rank-ordered queue, not the model's high-confidence non-relevant set
- Richness estimate: updated after each elusion round using a hypergeometric calculator; stop when the 95% confidence interval on recall crosses 90%
- Privilege QC: separate 5% sample pulled from relevant batches, reviewed by a different senior reviewer than the one who originally coded
In our data, reaching 90% recall typically requires reviewing 35% of the collection. That's the number we tell clients to budget for upfront. Some matters hit 90% at 28%. Some go to 42%. But 35% is a defensible planning estimate backed by our own historical data across more than 60 matters.
Document every sampling decision in a protocol memo. Which documents were pulled? Who reviewed them? What error rate was found? This memo, not your Relativity dashboard screenshot, is what opposing counsel will ask for.
Discovarc Export Integration: Getting Results Back Into Relativity
Here's where the workflow closes. Discovarc's analytics layer operates on the same document set that lives in your Relativity workspace. We don't ask you to export documents and re-import them elsewhere. The platform reads directly from your Relativity load file structure and writes results back as a batch list.
The export path: once a Discovarc analysis run completes, coding decisions and confidence scores export as a Relativity-compatible batch list. You import that batch list directly into your workspace, apply it to the appropriate saved search, and the coding populates your custom fields without touching your original reviewed set. There is no reconciliation step.
This matters for two reasons. First, chain of custody. Your review record stays clean because all coding actions occur within Relativity's native audit trail. Second, speed. We've seen this integration cut post-analytics reconciliation from 3 to 4 hours of manual work down to under 20 minutes per batch cycle.
The setup requires a one-time configuration of your Discovarc workspace to point at your Relativity file server path and authenticate against your API key. After that, the export runs on a button click inside the Discovarc dashboard. Real talk: if your Relativity instance is hosted by a third-party provider, you'll need to confirm outbound API access is enabled before you start. Most providers have this on by default, but some restrict it on shared infrastructure plans.
Putting the Workflow Together
The full sequence looks like this. Configure seed set at 500 documents, 50/50 split, conceptually diverse. Run initial training. Release batch one at 1,200 documents. QC at 10%. Retrain. Repeat. Begin elusion testing at round four. Export Discovarc results back to Relativity as batch list after each confirmed iteration. Continue until elusion testing confirms 90% recall. Document everything in a protocol memo that can survive a discovery dispute.
It's not complicated. It's just a discipline most firms haven't built yet. The technology has been in Relativity for years. The workflow around it took longer to figure out.
If you want to walk through how this maps to a specific matter you're planning, our team is available. Request a demo and we'll work through your collection parameters before you finalize your review plan.