Review Strategy

Custodian Analysis for First-Pass Review Efficiency

March 13, 2025 Haruki Tanaka 6 min read

Not every custodian in a litigation matter is equally important. We've tracked this pattern across dozens of collections: in a typical 50,000-document set, three to five custodians account for 60 to 75 percent of the responsive documents. The other twelve, fifteen, or twenty custodians? They contribute almost nothing relevant. Yet without custodian-level analysis, your review team treats that collection as a uniform mass, burning attorney hours at equal depth across custodians who'll never produce a single hot document.

That's the core problem custodian analysis solves. And it's more tractable than most firms realize, because the signal appears early.

What Custodian-Level Data Looks Like After 20 Percent Review

In our experience, meaningful custodian contribution patterns stabilize after you've reviewed roughly the first 20 percent of a collection. By that point, you have enough document-level coding data to calculate per-custodian relevance rates with reasonable confidence. Not certainty, but enough to act on.

Here's what that data typically shows. Suppose you pull responsiveness rates for all custodians after processing the first 10,000 documents from a 50,000-document collection. You'll usually find a distribution that looks something like this:

Custodian Tier	First-Pass Relevance Rate	Recommended Action
High-yield (top 3-5)	18-35%	Dedicated senior reviewers, priority scheduling
Mid-yield	4-12%	Standard first-pass workflow
Near-zero yield	<2%	Privilege-only pass, bulk non-responsive

That distribution isn't surprising to anyone who's worked in e-discovery for more than a few years. What is surprising is how rarely firms act on it in real time. The 20 percent checkpoint is where custodian analysis earns its keep, but only if someone is actually looking at the numbers and adjusting the workflow accordingly.

Identifying High-Yield Custodians and Allocating Resources

When a custodian's first-pass relevance rate comes in at 25 percent or higher, that's a signal worth acting on immediately. These are the custodians who were closest to the events at issue. Their files should get dedicated resources: senior reviewers who understand the case theory, not rotating contract staff who've been briefed for two hours. Speed matters less here than accuracy, because these documents will likely end up in production or depositions.

Fact: high-yield custodians identified early for dedicated resources typically see 30 to 40 percent lower reclassification rates during QC than the general review population. The investment in specialized attention pays back in reduced rework.

The identification process itself doesn't require sophisticated analytics. A basic export of document counts and responsiveness decisions by custodian, sorted by relevance rate, gives you what you need. What matters is actually doing it, consistently, rather than waiting until the full collection is coded to discover that three custodians did all the work.

This is also where Discovarc's custodian dashboards change the workflow. Instead of exporting data to a spreadsheet and doing the math manually, custodian contribution metrics update in real time as reviewers code documents. Project managers can see the distribution shifting as review progresses and intervene before the pattern has fully hardened.

What to Do With Near-Zero Custodians

This is where significant cost lives. A custodian with a 0.8 percent relevance rate in the first pass will almost certainly not improve substantially as review continues. The responsive documents are there, but finding them isn't worth the cost of a full first-pass review at standard rates.

Near-zero custodians warrant a different protocol. Privilege-only review is the most common approach: a single pass to identify potentially privileged content, with the remainder coded non-responsive in bulk. This requires some documentation to defend later, but it's a defensible methodology when you have the data to support it.

Some firms go further and apply predictive coding scores to near-zero custodians' documents, using low-confidence thresholds as a further filter before privilege review. This approach works, though it adds process complexity. The simpler version, documented privilege-only review with bulk non-responsive coding for documents below a score threshold, is often enough.

Honestly, the bigger issue is the comfort level of the attorneys signing off on the review. In our tracking, the firms most willing to act aggressively on near-zero custodian data are the ones who've documented their methodology in advance and cleared it with case counsel before review begins, not after.

The Economics of Custodian-Stratified Review

The cost argument is straightforward. If near-zero custodians account for 30 percent of your document volume but are unlikely to produce any responsive documents beyond what a privilege pass will identify, routing them to a full first-pass review is pure waste. At standard rates for a 50,000-document collection, that waste runs to real money.

Consider the math: 600 to 800 attorney hours is a typical first-pass workload for a collection that size. If you can redirect 30 percent of that workload to a streamlined privilege-only protocol, you're saving 180 to 240 hours. At market rates for contract attorney review, that's a meaningful reduction in review spend without any sacrifice in recall or defensibility, provided you've documented the custodian analysis and the protocol decision.

The savings aren't theoretical. Litigation support firms that have adopted custodian stratification as a standard step, rather than an ad hoc decision on large matters, report consistent first-pass cost reductions. The variability comes from case type, custodian count, and how early in the collection the analysis happens, not from whether the approach works.

Practical note: custodian analysis doesn't require waiting for all custodians' data to be processed. If you can pull even a 15 percent sample from each custodian's files early in collection, you'll have enough signal to make initial routing decisions before the full collection has been processed.

Making It Defensible

The defensibility question comes up in every matter where custodian stratification is applied. Opposing counsel, courts, and clients all want to know: how did you decide which custodians got full review and which didn't?

The answer needs to be documented and data-driven. A protocol memo that says "we analyzed per-custodian responsiveness rates after reviewing 20 percent of each custodian's documents, identified three custodians with rates below 2 percent, and applied a privilege-only protocol to those custodians' remaining files" is defensible. A decision made informally, without documentation, is not.

Discovarc generates custodian analysis reports as standard workflow outputs, not add-ons. The data that drives stratification decisions is captured automatically, timestamped, and exportable as part of the matter record. That audit trail is what makes the methodology hold up, whether in a meet-and-confer, a court inquiry, or a client billing review.

Three to five custodians driving the responsive universe isn't a quirk of certain matters. It's a pattern. The firms that build custodian analysis into their standard first-pass workflow, rather than discovering this pattern after the fact, are the ones controlling review costs rather than just reporting them.

What Custodian-Level Data Looks Like After 20 Percent Review

Identifying High-Yield Custodians and Allocating Resources

What to Do With Near-Zero Custodians

The Economics of Custodian-Stratified Review

Making It Defensible

Related Articles

Concept Clustering for Collection Analysis

TAR and Active Learning for Document Review Workflows

Document Review Cost Reduction with Active Learning