eDiscovery Review and Predictive Coding with Statistics

Anne Kershaw and Joe Howie recently wrote a great summary for LTN on predictive coding in eDiscovery. The article gives a brief history and the evolution of review and coding practices in discovery, then gets to the good stuff. They present some pretty compelling numbers from recent studies that show just how inconsistent review efforts can really be. This isn’t a technical deep dive article in predictive coding or statistics, but I hope it helps get the word out on what the Sedona Conference has been saying for a few years now.

I find their list of high points most interesting: Transparency, Replicability, Reevaluating production sets, Confidentiality, Shortened time lines. While you could say transparency is aided by the simple fact that predictive coding systems record more data from its users on how and why a document was coded one way or the other, the black box nature of the algorithms used to determine document links is still an issue for me. This probably won’t be changing any time soon unless consumers (attorneys and judges) demand it. Right now the amount of noise present in eDiscovery is so high that it is, perhaps, acceptable to give this a pass for the moment.

My absolute favorite quote from this article? Well, it has to do with why more attorneys aren’t using predictive coding:

Given the claimed advantages for predictive coding, why isn’t everyone using it? The most mentioned reason, cited by 10 respondents, was uncertainty or fear about whether judges will accept predictive coding. (Paradoxically, at a recent U.S. Magistrates’ Conference, a participant jurist asked for advice on how to convince lawyers to use this type of approach.)

This is in line with what I see in the industry every day. Despite eDiscovery education initiatives popping up in every legal conference, many attorneys still don’t seem to get it. Having sat through quite a bit of painful legal education in my time, I’ve seen a recurring issue with how new ideas are presented in the legal setting. Quoting David Alan Grier as Science Dude on tonight’s episode of Bones: “And what do we say about clarity? It’s barbarity that clarity is a rarity.” Just because you’re a good attorney, doesn’t mean you’re a good educator. This is true of so many professions…

Sampling, one of my favorite topics, is given a mention at the end of the article. I think many litigation shops fall back into old habits too easily, and forget how much time and money proper sampling can save them. Not just in processing and review, but also in time [not] wasted in the courtroom. Jason R. Baron and Ralph Losey literally talk about it all the time.

If you’re interested in learning more about statistical sampling for eDiscovery and how you can save us all a big headache (and maybe some money, too), head on over to my Presentations page and grab a copy of Statistical Validation And Data Analytics In eDiscovery, a talk I gave at IQPC eDiscovery West 2010 in San Francisco earlier this year. Let me know how you use sampling or predictive coding and how we can better educate the community in the comments!