DataStupidIt was the 1992 election campaign, and James Carvill, candidate Bill Clinton’s campaign strategist, was fighting to keep the troops focused on what mattered. He hung a sign in the campaign’s Little Rock, Arkansas campaign headquarters with three messages:

1. Change vs. more of the same

2. The economy, stupid

3. Don’t forget health care.

Nothing stays quiet in politics, the contents of the sign made it out, and now we all know the phrase “[it’s] the economy, stupid.” The phrase has been modified and used in many situations, including: it’s the data, stupid!

The Four Revolutions of Legal Materials

We can segment the history of legal materials using several dimensions. I divide the history into four phases:

  • Parchment to paper
  • Paper to published
  • Published to digitized
  • Digitized to data

At one time, legal materials meant writs penned by lawyers or scriveners. The few things written were put on parchment (sometimes called vellum, an animal skin paper). While it is easy to think we are far past this phase, the House of Commons and House of Lords in the UK recently debated whether official acts should now be recorded on paper instead of vellum (answer: no, vellum will still be used).

The next step was from paper (by now, wood pulp or cotton based) to published. Books of cases, closing binders, treatises, all became the place to go for collections of documents.

In the latter part of the 20th century, we moved from published to digitized. Documents were created and stored on computers and case books became online research databases. Today, we live mostly in the digitized era. But in law, even though documents are digitized they aren’t very useful.

The Era of Legal Data

Digitization still represents state-of-the-art for law firms and law departments. But the next revolution is digitized data, and that move already has started.

The world of law is the world of unstructured documents. Imagine working on a document with the following sentences:

Grainger accepted payment from Duncan. Duncan delivered the payment to Grainger by handing him a check made payable to “Grainger Consulting, Ltd.” in the amount of $2,150.00, dated April 21, 2014.

The sentences mean nothing to the computer. They could as easily be written this way:

Xxxxxxxx xxxxxxxx xxxxxxx xxxx Xxxxxx. Xxxxxx xxxxxxxxx xxx xxxxxxx xxxx Xxxxxx. Xxxxxx xxxxxxxxx xxx xxxxxxx xx Xxxxxxxx xx xxxxxxx xxx x xxxxx xxxx xxxxxxx xx “Xxxxxxxx Xxxxxxxxxx, Xxx.” Xx xxx xxxxxx xx xx,xxx.xx, xxxxx Xxxxx xx, xxxx.

This is unstructured text. The computer does not have information about the characters or words telling it, for example, that “Grainger” and “Duncan” are named entities or that “payment” is something different from “handing.”

We can easily give the computer more information, and we often do this through something called “tagging.” You already know about tagging. You tag photos with the names of the people in them, you tag blog posts with the subjects covered, and if you are an SEC lawyer you have seen XBRL tagging of financial data in 10-Qs. The tagging you see and do (with the exception of financial data) requires that you manually assign the tags. But, much of the tagging for text, as with financial statements, can be done automatically. Instead of a digitized, but unstructured, document lawyers could have data—a document broken into pieces that can be manipulated with the proper tools.

Legal Data and The Future

Of course, the key question is not whether lawyers can convert text to data, but what is the value of doing so? It the conversion simply means computer geeks have another thing to play with, then it makes no sense for the world at large to shift.

The value of doing so, I believe, is deep and will accelerate the change from law being a religion of the past practiced by a cloistered tribe, to a flexible tool of the future that can help individuals and organizations at all levels of the economic ladder. That is big value, so the next question will be “what do you have to back up that belief?”

We all know by now that data—as an augmentation to what we can do as humans and not as a replacement—will play a big role in our future. The same is true for lawyers. Let’s go through some examples.

As a transactional lawyer, one question I was often asked was whether what we were proposing to do or what the other side was proposing to do was “market.” This simple question usually leads to a spirited, but worthless, debate between opposing counsel. The proponent of the clause argues it is market, the opponent argues it isn’t market. Clients sit there perplexed: surely this is a question that can be answered objectively? The answer is “of course,” but not as law is currently practiced.

Three years ago, as I was working on a large (over $1 billion) financing agreement, the question came up all the time during negotiations. The firms on either side of the negotiations were (and are) top tier firms recognized as “the” firms to use for financing. Yet, neither firm could answer the market question. The usual response was “we could have our library staff look at recent financing documents to see if there is a pattern.”

The documents they would search were “materials agreements” to the companies involved, and so they had been attached to filings with the SEC. That is, they were publicly available. Anyone could download the document, convert it to data, and do searches on the documents. In fact, collecting these documents, tagging them, and using them as a corpus would have put any firm in a great position. But, to my knowledge, no firm has gone that far.

As a second example, consider the many briefs filed in lawsuits each day. Judges consistently complain about the quality of brief writing. Their complaints, by the way, are not directed solely at small firms or lawyers who occasionally appear in court. The epidemic of poorly written briefs extends up through the ranks to the largest firms.

If those briefs were turned into data, we could use the data for many purposes. For example, we could perform quality studies on the briefs. We also could compare the briefs to the court decisions (did the brief overlap with the decision, were the cases cited used by the court, did the arguments make their way into the decision, and so on). We could compare briefs across firms and even develop quality measures to tell us which firms and which lawyers have the best written most persuasive briefs. Instead of measuring the quality of law firms based on where the lawyers went to school, we could measure quality based on the legal product.

The list of ways we can use legal data is long and growing every minute. Legal data can be combined with data from other sources to construct predictive modeling. Data streams from sensors and mobile devices can be combined with legal data to create early warning systems—predictive analytic models that tell us when certain actions may lead to a lawsuit. Turning documents into data also is the first step in converting contracts into smart contracts, connecting law to the world of blockchain technology.

I’m a Lawyer, Not a Computer Scientist

Most lawyers are dizzy at this point. They don’t understand technology in its basic form (Can you describe to me how the internet works? What happens when you hit “send” for an email?) and now I’m asking them to go from those .docx files to computational linguistics and natural language processing. Time to run!

The key is understanding the difference between the lawyer trying to do it all, and the lawyer managing a collaborative team that does it all. None of us can do everything (despite what we think), but we all need to learn to manage teams. Law departments should move from hiring lawyer after lawyer to hiring one or two legal data scientists (who may be lawyers with technology training). By using the legal data scientists to automate certain steps (document assembly) and combine that with data tagging, a law department would take itself instantly into the 21st century. The future of law belongs to teams.

One final note about legal data. What law firms and law departments seem not to realize is that stored on their servers is 21st century gold. Today, Google, Facebook, and Amazon have put themselves in enviable positions. They each control massive data sets that enable them to analyze the world in ways we didn’t believe possible a decade or so ago. It will be difficult for other companies to build comparable data sets. IBM CEO Virginia Rometty puts it nicely:

What steam was to the 18th century, electricity to the 19th and hydrocarbons to the 20th, data will be to the 21st century. That’s why I call data a new natural resource.

In the law, the large legal publishers have data sets that also give them an advantage. Other publishers are looking for data sets that will help them build positions in the publishing industry similar to what Google, Facebook, and Amazon have done in their respective domains. For example, Elsevier recently announced it is purchasing the Social Science Research Network (SSRN). SSRN is a significant publishing platform for social sciences and humanities, and one of its main libraries is devoted to law. Overall, it has about 673,000 papers. Elsevier will be combining SSRN with its technology platform, Mendeley:

SSRN is devoted to providing “tomorrow’s research today” through specialized research networks in the social sciences and humanities. We facilitate the free posting and sharing of research material (e.g., conference papers, preprints, non-peer-reviewed papers) in our subject areas. Social science papers tend to have fewer co-authors, so networking and sharing ideas, hypotheses and drafts during the research process are critical; SSRN helps authors evolve their research and communicate their results worldwide.

Mendeley is a researcher workflow tool that helps researchers organize, discover and share their research. Mendeley is also becoming a collaborative environment for sharing early results of research but is more focused in science, technology and medical fields. Its technology platform, enhanced by Elsevier’s investment, uses metadata from articles and usage on its site to develop a suite of analytic tools that directs researchers towards the best people to collaborate with and what to read.

What does the combination really mean? It gives Elsevier unprecedented access to an enormous database. It isn’t the papers, it is the data. In this case, data represents influence or impact within the scholarly community which is something very valuable to scholars and institutions. As one blogger put it: “The reason is obvious to anyone who works in the university: impact = higher rankings, higher rankings = more and better students, more donors, more reputation for the institution… all of which translates into the ability to hire more high impact researchers.” The motivation to access data may be different for lawyers, but the need is no less than in academia.

Lawyers also object by saying that the knowledge of how to convert text to data and manipulate it is a computer science, not something for humanities majors who became lawyers. Ironically, text tagging grew out of the humanities where language, history, philosophy, and other professors have been tagging text for decades.

Lawyers love to find excuses for resisting change. In fact, a recent Altman Weil survey shows that over 90% of large firm managing partners know their firms need to change (become more efficient), and yet over 64% of partners resist change (up 20 points from a year ago). So be it. There will be a few firms that can get by ignoring change, while technologists and clients (the real clients) work behind the scenes on software that reduces or even eliminates the need for lawyers (don’t chuckle, the software already exists).

Lawyer are their own worse enemy. The profession is changing slowly and will not disappear overnight or perhaps ever. In the meantime, the demand for lawyers (versus legal work) shrinks, alternatives pop up daily, and the world moves past the era of scriveners with their vellum. If you don’t believe me, just check—it’s in the data.