Last week, I wrote about the use of keyword search terms and whether they are, or should be, the preferred means to identify documents and information relevant in discovery.
Since it does not make a whole lot of sense to present a problem without also presenting a solution — at least that’s what one of my managers suggested to me early in my career — I thought it would be appropriate to review some of the alternatives to the use of search terms.
Whether you like it or not, artificial intelligence is entering our lives in ways that many of us could never have imagined. Digital marketing and promotion, the diagnosis of serious medical conditions, and autonomous vehicles are a few of the more glaring examples.
When the Federal Rules were written 80 years ago, did anyone ever consider that AI or machine learning would be used in the legal industry to predict the outcome of a legal case, to compare documents in multibillion-dollar business deals, or to identify documents and information in discovery?
That day has arrived. Whether or not you subscribe to the idea that machine learning is a form of AI, the fact remains that today we are able to parse the text of millions of documents in a fraction of the time — and a fraction of the cost — it took just 20 years ago.
The problem, it seems, is that many view predictive technologies like machine learning as unknown, unproven, imprecise, or incomplete. It’s the “black box” that either frightens or intimidates users. Some also say an inherent bias skews results one way or another.
The truth is that text is just text, words are just words, characters just characters. We put characters and words in documents all the time. We use the same characters and words over and over. There’s a lot of repetition, and, yes, some words are more meaningful, but regardless, patterns begin to emerge, and meaning can be derived, if not specifically, certainly conceptually.
Depending upon which dictionary you use, English has about 250,000 words in use. All of these words together take up about 500 pages of text, or, for the more data-minded readers, about 1 MB of data. It should not surprise anyone that we have figured out how to index, analyze, and categorize this relatively small number of words.
In case you did not see it coming, when it comes to machine learning and predictive technologies to identify relevant documents in discovery, it helps to think about it as something akin to advanced search techniques. We’re not talking about robots reading documents or any of the neural or deep-learning or theories of artificial general intelligence. We’re not yet to the point where machines, least of all in the legal industry, are reasoning and learning on their own. We’re basically talking about search terms on steroids.
So next time your legal technology project manager, your e-discovery expert or data scientist suggests that you use machine learning or TAR (or whatever we’re calling it that day), don’t worry so much about the black box; focus instead on what you’re trying to learn from the documents. The available tools can get you there faster, cheaper, and just as accurately.
Mike Quartararo is the President of the Association of Certified E-Discovery Specialists (ACEDS), a professional member association providing training and certification in e-discovery. He is also the author of the 2016 book Project Management in Electronic Discovery and a consultant providing e-discovery, project management and legal technology advisory and training services to law firms and Fortune 500 corporations across the globe. You can reach him via email at mquartararo@aceds.org. Follow him on Twitter @mikequartararo.