Rob’s Notes 16: How LLMs Can Help Normal People Read Legal Documents

And 5 tips for how to do it

Dec 14, 2024

You probably get emails like this all the time. “We’re updating our privacy policy” yadda ya. I’ve worked on privacy and related topics for two very big tech companies, but just like you, I also almost never read these privacy policy updates either.

EXCEPT, what I sometimes do now is feed them into a large language model and ask some questions.

It’s still somewhat hard to do this. You have to print the document as a PDF, and then ask questions about it. Companies you tend to trust will tell you exactly what changed between versions but if they don’t and you want to compare the new policy to the old one, you may have to go to the Wayback machine and find the old one and compare the two. But given that these documents have kept expanding in length (especially with more US state-specific privacy laws now on the books!), the docs may be too big for the “context window”** of the model and so it might not be able to compare them!

But I decided to try it for this Roku privacy update. I’ll skip the prep steps for Claude Sonnet 3.5, but I started with a general question and then just drilled down further; the first question was: “what are the most significant changes between the old privacy policy and the new one?”

One of the things you’ll learn when you first ask a general question like that is that the company has added a lot of language to their policy. But it’s not necessarily that they’re giving themselves new rights, it’s more that they’re clarifying things. “This matches a broader trend in privacy policies: they tend to become more detailed over time not because companies are necessarily collecting more data, but because they're being more explicit about what they've already been collecting. This transparency is generally positive for users, even if it can sometimes make privacy policies appear more invasive simply because they're being more specific about existing practices.”

Transparency in company data handling practices has been increasing, partially because it may help better insulate these companies from potential litigation, and/or to prepare for new regulations they see on the horizon.

Most of what changed in this version of the policy appears to be Roku adding more specificity, though the analysis indicates that “The one area where there might be some expansion of practices is in the geolocation tracking, as the new explicit mention of receiving precise geolocation from advertising partners could indicate new data-sharing relationships. However, even this might have been covered under the previous policy's broader language about receiving information from advertising partners.”

Some general advice on feeding legal documents into an LLM is:

Don’t only ask superficial questions. Follow up; drill down further to get into the details. Just like a Google Search, you may need to ask more than one question to focus the LLM’s ‘attention’ on the right things, or you’ll just be left with generic generalities.
Understand if their practices are out of step with the company’s peers. This doesn’t necessarily require uploading more documents, since a lot of this is in the models’ data already. I followed up and asked about this, and Roku isn’t too far out of step with other companies in this space, where “differences in their privacy policies often reflect their different business models rather than fundamentally different approaches to privacy. For example “Roku emphasizes advertising and content partnerships” whereas “Amazon integrates TV data with their broader ecosystem”
Make the context personal to you. Tell the LLM what aspects you may care about, and have the analysis focus there. For example, I asked about anything specific to Texas; under the relatively new Texas Data Privacy and Security Act, I get new rights similar to those folks in Colorado, Connecticut, Florida, Montana, Oregon, and Virginia that let me see what data they’ve collected about me and allow me to request that it be deleted.
LLMs often get you to 70% very quickly. They may then need help. I’ve found these tools to get me the first part of any answer FAST, but the last bits are often somewhat elusive, and (for now) may still require human insights! LLMs could oversimplify complex legal concepts in ways that miss important nuances, and they don’t necessarily replace getting actual legal advice.
“Red team” your own legal documents. If you’re part of a team creating legal docs, try to compare them in LLMs before you ship. Even for something as simple as understanding the appropriate reading level of a policy. Roku’s privacy policy, by the way, is “written at approximately a college reading level (Grade 13-16). This aligns with research showing that most privacy policies are written at a college or post-graduate reading level, despite recommendations from privacy advocates that they should be written at a more accessible level (around Grade 8)” - le sigh!

I feed legal documents into Claude, Google’s Gemini or ChatGPT (I pay for all three) all the time. I’ve found them to be very helpful, and to help me focus my precious time and resources when I talk to human lawyers. I anticipate they’ll help all of us navigate an increasingly complex world of the many overlapping hardware and software products (with their terms of use, clickwrap contracts and policies), all fighting for our attention and data.

**A context window is like an LLM's short-term memory, limiting how much text it can process at once, similar to how much you can hold in your mind while reading. Gemini’s is about as long as 1,500 pages of text whereas Claude is about 1/6th of that size.