Handling Sensitive Data in LLM Systems

I’m Paula Muldoon and I’m a staff software engineer at Zopa Bank in London wrangling LLMs into shape for our customers. I’m also a classically trained violinist and I love growing potatoes to break up difficult soil. This post is not written with AI because I like writing.

In my last post, Context in LLM Systems of Experts, I mentioned that you probably want to store an audit trail of LLM interactions.

However, some of that data may be very sensitive so you need to handle it in line with regulations as well as what’s best for your customer.

It’s best practice to assume that any data sent to an LLM can be extremely sensitive. Some examples:

a customer gives their name and address when ordering potatoes (PII, or personally identifiable information)
a customer enters their credit card number (PCI data) – you probably don’t want them entering this data in this way but users do weird things
a customer is explaining they can’t pay their potato bill because they tripped and fell over a bucket of Brussels sprouts and aren’t able to work due to a broken wrist (medical data)

So make sure wherever you’re storing this data has the correct safeguards – encryption, access control, the usual.

Do not log inputs and responses to LLMs as they may contain sensitive data. (Also, LLMs can be wordy so you might end up paying for a lot of log storage costs you don’t really need.)

It’s very tempting to take transcripts from LLM-based conversations and turn them into evals. If you’re going to do this, make sure you scrub any sensitive data first.

And make sure that you have an agreement with whatever model provider you’re using that they won’t use your data (including your customer’s data) for training models. If you want to use this data for fine tuning models and your T&Cs only say you won’t use the data for training models, you’re in a grey area. I’d make sure your customers know and as always, scrub the sensitive data first.

I firmly believe that software engineers have an ethical duty (as well as a legal obligation, at least in the UK) to protect our users’ data. Hopefully you agree, so make sure, as you’re building your potato and other vegetable LLM systems that you’re keeping your users’ best interests at heart.

I think what I’m trying to say is scrub the sensitive data like you’d scrub your potatoes before entering them into the village gardening competition.

Next up: handling partial data in system of expert workflows

Paula Muldoon

Blog

Handling Sensitive Data in LLM Systems

Tagged as

Leave a comment Cancel reply

Share this:

Related

Tagged as

Leave a comment Cancel reply