Rob Vandenberghe
Information Security Officer
ChatGPT and other LLMs have taken the world by storm. As a result, compliance and information security teams have been overwhelmed by this new technology as well. Usage of ChatGPT has a big impact on privacy regulations, such as GDPR, compliance and information security in general. Something which is often neglected in businesses with all the dire consequences. Recent (mis)use of these tools caused data breaches in multiple corporations, like Samsung, where crucial trade secrets were leaked to OpenAI’s ChatGPT.
In this blogpost you’ll learn to leverage the amazing features of ChatGPT without causing a massive data breach at your company.
With the default setting of ChatGPT you run into the most risks for the following 2 reasons.
First, OpenAI will store all content (prompts and responses) to improve its models. This means that all data fed to ChatGPT will remain on their servers for forever. Which is something that you want to certainly avoid when working with sensitive data such as company secrets or personal information (storage limitation principle in the GDPR)
A second risk arises for non-US based companies since all content is processed and stored in the US. This is especially problematic when processing personal data in light of Schrems II. Where it was ruled that US Surveillance practices caused insufficient data protection of residents of the EU, thus making the transfer of personal data to the US unlawfull.
The first issue can simply be addressed by disabling the setting to share data with OpenAI’s ChatGPT for training purposes:
Content (and personal data) will still remain on OpenAI systems for 30 days to monitor abuse. You will still have a hard time selling this to your Data Protection Officer, but nonetheless the risks are heavily reduced.
The same goes for sharing trade secrets with ChatGPT. It is better to have them there for 30 days than forever, but it's still not a great idea.
A more compliance and privacy friendly way of using GPT models is to use OpenAI’s APIs directly. Since March 1, 2023, OpenAI does not use prompts and responses to improve models.
In terms of personal data protection OpenAI offers an important “functionality”, you can request OpenAI to amend the normal terms of service with their Data Processing Addendum. As expected with major American tech companies, it is not possible to force your own Data Processing Agreement to OpenAI and you're stuck with their version. Nonetheless a Data Processing Agreement is an important requirement within the GDPR to share personal data with “processors” such as OpenAI. It even includes Standard Contractual Clauses which is a mechanism that can be used to legitimately send data to the US, but further assessment is required to determine if Standard Contractual Clauses are sufficient.
The next step, and currently final step, towards a compliance and privacy friendly way of using GPT models is currently using the Azure OpenAI Service.
Azure OpenAI Service provides maximum control of the prompts and responses generated by GPT-3/4 models.
By default the prompts and responses are temporarily stored by the Azure OpenAI Service in the same region as the resource for up to 30 days. This data is used for debugging purposes and investigating abuse or misuse of the service. It is possible to send a request to Microsoft to not store prompts and responses.
For GPT-3.0, the region can be US or Europe, for GPT-3.5 or 4.0, only US at this moment.
(Note: In the meantime, GPT-3.5 or 4.0 are available in the US and Europe)
In terms of Data Protection you can rely on the default Data Processing Agreement, including the Standard Contractual Clauses, provided by Azure. Azure being a part of Microsoft which is headquartered in the US and thus an international organisation, still qualifies the use of Azure oOpenAI Service as a transfer of personal data outside of the EU.
(Note: in the meantime, in July 2023, the European Commission adopted its adequacy decision for the EU-US Data Privacy Framework, meaning that personal data can flow freely from the EU to US companies participating in the framework (including Azure/Microsoft), without having to put in place additional data protection measures.)
But technically speaking there is no difference between using the Azure OpenAI Service APIs or using an Azure Virtual Machine. So for companies who already accepted the risk of using international organisations to process data, basically every company, no new risks are introduced.
Combining the facts that resources can be located in the EU, a Data Processing Agreement with Standard Contractual Clauses is in place and that prompts and responses are not stored (if request is approved by Microsoft) on Azure systems makes Azure OpenAI the easiest way to minimise information security and compliance risks when using GPT models even with personal data.