On January 24, 1848, John Wilson Marshall was building a water-powered sawmill for John Sutter. The mill was close to Coloma, California, near the base of the Sierra Nevada mountains. Marshall was a carpenter who had emigrated from New Jersey. Although Marshall wasn’t looking for gold, he later claimed that he knew immediately upon seeing the flakes what he had found.
At the time Marshall found the flakes, Mexico and the United States were still at war over the California territory. The population of the territory was mostly Native Americans (around 150,000) with some 6,500 Californos (people of Mexican or Spanish descent) and about 700 others (mostly from the United States). A few days after the discovery (and not because of it) the United States and Mexico signed the Treaty of Guadalupe Hidalgo which ended the Mexican-American War and left California to the United States.
Sutter and Marshall tried to keep the discovery confidential, but obviously did not succeed. Within two months a newspaper was reporting on the discovery. By mid-June, three-quarters of the male population of San Francisco had left to goldmine, and by August the number of miners in the Sutter Mill area had reached 4,000.
The next year was written into U.S. History as The Gold Rush of ’49. The non-native population of California grew from 700 before the discovery to 100,000 at the end of 1849. By September 1850, California had become a state. The Rush peaked in 1852 when approximately $81 million (in 1852 dollars) of gold was pulled from the ground. By 1857, the annual production had dropped to around $45 million where it stayed for many years. The Rush was one of the most important events in re-shaping the face of the United States.
The 19th century in the U.S. was known for gold, but the 20th century was marked by hydrocarbons. While some believe the 21st century’s gold, especially later in the century, may be water, the current rush focuses on data.
The Strange Stories of Law
You have heard the statistic: each day more data is generated and stored than the amount of data that existed in all of history prior to the computer age. Large companies that entered the world as retailers, search engines, or social media companies found the real value of their businesses was in data. In Silicon Valley, it almost became irrelevant what your business could do, the focus was on the data set it could build.
And then we come to the legal industry. We can tell two versions of the legal industry story. The first story goes like this:
Recognizing the threat cybersecurity breaches present to their clients, law firms decide to thwart the attackers using an unusual approach. They accepted the futility of keeping hackers out of their systems. Instead of following the norm of keeping information as accessible data, which can be indexed, accessed, and manipulated, law firms keep their information somewhat like teenage boys keep their rooms. As one law firm leader said, “We decided that if our data was a mess and even we, who know it best, have difficulty finding and doing anything with it, hackers would have more trouble and simply give up.”
A team of red hat associates was tasked with hacking the system to find a group of documents to use as templates when drafting for a client. The blue hat defense team’s strategy was simple. “We pretended we were partners and randomly withheld helpful information from the red hat team.” The red hat team gave up after many hours and decided to draft from scratch.
The second story goes like this:
While clients and the world around them screamed about data, lawyers continued their quest to be oblivious. Lawyers in firms, corporations, and other service organizations knew that if they hadn’t enjoyed knowledge management, they would enjoy data management even less. Again adopting the “to do nothing is to do something” approach, lawyers have ignored pleas to treat their documents as data gold.
When asked about this strategy, a lawyer responded “The world around us has been changing for decades and yet here we sit today, almost unchanged. To respond to this ‘data fad’ by doing something would go against our strongly held belief that all tasks should be done by lawyers and not other service providers, even computers. Indeed, we are considering asking bar associations to file actions against all computer companies for the unauthorized practice of law.”
Choose which version you prefer, but the reality is that lawyers in firms, departments, and other legal service provider organizations are in the same boat. Legal data is not created and stored as the precious commodity it is.
The Stories Data Could Tell
The work that lawyers do tells stories of risks and responses. What gave rise to the lawsuits? What did the parties do? What steps were taken in the ligation? How long did they take? What were the responses? We can explore similar questions in any area of law, and it is those questions and responses which are embedded in what law firms store on their servers.
The challenge with most data sets, unlike those in law, is not getting to them. Finding data sets can be easy. The challenge is getting them in shape to use. Data scientists call this step data wrangling or data munging and it eats up 80% of their time.
Think about a data set in your firm that you actually keep as a data set: your customer information. Your firm or law department has a system for keeping track of customer (in the case of law departments, law firm) information. If you check the system, you will find out of date entries, missing information, duplicate entries, and incorrect entries. Imagine how long it would take if you froze the system today and had someone focus on cleaning up the database. Of course, as soon as they finished and you resumed using the database, you would find it out of date.
Now apply those problems to your real data. All of the documents sitting on your servers form your database. You may have a knowledge management system, and still your data is not ready for use. At best, you have a collection of documents with some perfunctory information filled in by field. Your knowledge management system uses a not-too-sophisticated search process to locate documents responsive to your request. When you find them, you can’t do much with them except copy and use them as templates. Definitely not state-of-the-art.
When I talk about data, I mean the ability to access specific information from those files, combine it with other data, and produce information that will help solve client problems. For example, what if you could combine data from all the employment lawsuits you have handled with data from government and court data sets. Could you construct a model that gives specific information about each type of employment lawsuit?
You may think of this as fantasy, but it isn’t. Today, startups have breached the barrier and are applying this type of analytics and more as they find and use data sets. One small but growing area is computational linguistics. Put very simply, CL applies statistical tools to text. Through machine learning, computers can use the CL tools to understand text far beyond “supreme w/5 court.” Tools using CL in law are in the early stages, but they all face the same challenge: getting access to clean data sets.
This is where lawyers enter the picture. By recognizing today that the information built into the data sets is the gold that will help law firms and law departments protect clients, lawyers take the first step. The second, is to start transforming what already is in the sets into data, and the third is to store whatever new items are created as data.
If You Make It The Bad Guys Will Come
At this point, a good question to ask is what about the cybersecurity threat? As they say, there are two types of companies: those that have been hacked and those that don’t think they have been hacked. The experts with whom I have talked agree that law firms are and will continue being hacked. The firms just do not have the sophistication to prevent the hacks. That is not a slam against law firms. It is hard to find any organization, and so far no one has named one, that is immune to hacks.
So if the hacks will happen, why should lawyers turn what they have into data? My first scenario above was written in jest, but lawyers do ask if it isn’t better to have the hackers find the messy teenager’s room than a nice, neat library?
The response to hacking isn’t to abandon the quest for data, just like the response to computers isn’t to become a modern-day luddite. All firms and corporations should take reasonable steps (and today more are going well past reasonable) to protect against hacks. Assume there will be hacks and focus on the data. Just because a hacker can get into a system doesn’t mean the hacker can get access to, un-encrypt, and assemble all the data in a way that will help them. You have a security alarm on your house, but you don’t leave all of your valuables lying on the kitchen counter. Thieves still take gold, but we still mine it. Cybersecurity is a challenge, not a bar, to keeping and accessing data.
Data Is Becoming Essential
Data is going to be more than a way to use and manipulate what you create and store. It will become an essential part of the modern law practice. Let’s look at one last example: blockchains. I won’t go into a detailed description of blockchains, I’ll keep it simple. A blockchain is a database that is distributed, not centralized. Each record in the chain may hold data, a program, or both. The records are hardened against tampering through strong encryption and distribution. Blockchains reduce and sometimes eliminate the need for intermediaries.
The terms of a smart contract are built into the code embedded in the blockchain. If condition X happens, then Y occurs. No ambiguity, no equity (at least that is the theory). Once the contract is formed and built into the blockchain, no on can alter the blockchain (more precisely, an altered blockchain becomes an instantly visible anomaly rejected by blockchain holders).
Lawyers who do not understand blockchain, code, computers, or how the system should work will be at a severe disadvantage. Yet big banks and other large players are actively looking into using blockchain or similar technologies as part of their systems. Since the contract is in the code, we can treat the contract as data and start combining and manipulating it.
Mine the Data Now
Lawyers have believed for centuries that they need to study the law, but they can pick up everything else quickly so that they can apply the law to it. Litigators are famous for believing they can litigate an employment case in the pharmaceutical industry this week and an antitrust case in the retail industry next week. Large firms have moved beyond this by making everyone specialize (and sub-specialize), but the feeling still exists. So, lawyers wait and watch. When they think something has become so well established that the world can’t possible go back, lawyers make their move.
While lawyers may believe they can wait until everyone is deep into data and then put their toes in the water, it doesn’t work that way. I mentioned at the outset competitors in the retail, search engine, and social media industries. They have built data sets so large and deep that it is unlikely anyone can catch them. In fact, recognizing that the prize is data and not tools, Silicon Valley has embraced a new trend. These companies are posting on the Internet for anyone to use many of the most sophisticated tools they have developed.
Why would they open source the tools? Because these companies know that the tools are useful and by open sourcing them they may get interesting insights from others who use them. Making the tools available allows the scientists who developed them to showcase their work, an important part of attracting and keeping talent. But these companies also know that without their incredible data sets, others will not be able to use the tools to replicate what these companies do. The tools help, but the data sets are essential.
Law firms and law departments have yet to realize that tools are becoming widely available. The firms and departments will need help, from academia, consultants, and others, to understand and employ the tools. But, the tools will not be the chokepoint. The real value is in the data. Each firm and each department has value in its proprietary data. To realize that value, they must start treating it as gold and not as dirt. Welcome to the 21st century.