BigDataIn a recent Harvard Business Review article, Kira Radinsky makes the argument that we need to start thinking about companies that have large historical data sets and block others from getting access to them as potential monopolists. Ms. Radinksy, the CTO and Co-founder of SalesPredict, explains how some technologies, like search engines, operate best when they have deep historical data to include in their algorithms. For example, one leading search engine performs 31% better with deep historical data. This performance metric raises the obvious implication, if true more generally, that without a deep historical data set your organization’s performance could be at risk. Ms. Radinsky points out that around 70% of organizations “still aren’t doing much with big data.” In other words, if you don’t have a big historical data set now, and if it is hard to build one, you may be out of luck.

Lawyers Without Data (Avocats Sans Données)*

When it comes to the legal industry, we know a different story. While some players are starting to use data and some new entrants (presumably LegalZoom and Rocket Lawyer, for example) are generating big historical data sets, overall we are a data set poor industry. I’m ignoring data sets that are built on billable hour data, since I consider such data both highly unreliable and of questionable value. I also am ignoring case law, a data set of better quality and higher value. While there is nothing wrong with case law, it is a highly derivative data set with some distinct peculiarities, so I’m not sure yet how much overall value it brings.

The data sets that we really want, that would add value, aren’t being built. These data sets would give us rich insights into individual and organizational behavior and risk management mechanisms. It is one thing to do an ediscovery search and find the incriminating emails in an antitrust lawsuit. But, it is something else to watch the email flow, catch the emails when created, and change behavior so the lawsuit never happens. While this may sound futuristic, there is nothing that says if we can search after the fact, we can’t search before the meltdown. Now, imagine what we could do if we could aggregate some data across companies and industries.

We can take this concept further in other areas. Instead of fraud detection we have something called fraud prevention. Patterns of behavior trigger alarms telling retail loss prevention teams that they should conduct an investigation. As they learn more about risk, those teams change behaviors in the organization making it more difficult for the fraud to arise.

As lawyers, we should have high interest in matching behaviors to outcomes. Rather than drafting a contract “tightly,” why not focus on setting up the commercial arrangements so the likelihood of a breach is remote. I have (often) seen parties do quite the opposite. In an effort to “protect” themselves, some parties twist the negotiations and drafting so tightly that it is hard to conceive of anyone complying with the contract. Shock and dismay ensues when the parties get into a dispute and find that neither of them was complying with the contract terms.

Ms. Radinksy asks whether there should be a Sherman Antitrust Act for data. I think the answer is no. I believe the Act is sufficient to address situations where a company acts improperly to obtain a monopoly position, whether through data acquisition or some other means. Rather, I think the question is to what extent are we comfortable with companies building large data sets and then using them to modify behavior, even if they do so legally. That is a public policy question, not a legal question.

There are many reasons in favor and against allowing the growth of big data sets that can be used to modify behavior without additional policy limits on their use. Those arguments, however, start to slide into the area of our comfort level generally with data having an ever-growing impact on our lives. Zosha Millman just published a nice article on lawyers living in a machine-learning world, and it touches on some of the big data issues lawyers will need to confront.

Lean Thinking and Big Data

When we approach things from a lean thinking perspective, our goal is to do as much as we can using simple tools and approaches before we contemplate technology. As soon as I jump from person to tool, I increase the complexity. Tools must be built, maintained, replaced, and generally have limited utility. Tools often beget more tools, creating a complexity build-up.

When we go to gathering big data sets and then using that tool, and we haven’t thought through the process, we find surprises. Big data is here to stay (and grow). But, we are finding lots of surprises as we jump to using it without having worked through the processes. Lawyers are not deep in the big data trenches yet, so we have an opportunity. As we look towards building our own big data sets, we should first start thinking about the processes surrounding the acquisition and use of those data sets. Then, perhaps, as we build the sets we can avoid some of the surprises our clients are grappling with today.

* Doctors have Doctors Without Borders (Médecins Sans Frontières) to help the world, so perhaps we need an organization for lawyers recognizing our contribution to the world by “operating” without data.