Public cloud storage services run under a shared responsibility model for data protection and cybersecurity. The cloud service provider (like Amazon or Microsoft) is responsible for the underlying “security of the cloud” and its infrastructure, while users are responsible for securing their own AWS S3 buckets and the data inside, known as the “security in the cloud.”

Personally Identifiable Information (PII) in the Cloud: What We Found

In an effort to understand the prevalence of publicly exposed sensitive data, Laminar Labs scanned publicly facing cloud storage buckets and was able to detect personally identifiable information (PII) in 21% of these buckets – or one in five. Information uncovered included addresses, email addresses, phone numbers, drivers license numbers, names, loan details, credit scores, and more. 

Our original hypothesis was that this publicly available data were public datasets or public files, things that were meant to be online. But what we learned was that the majority of this data was actually misplaced data. Data that was mistakenly placed into a publicly exposed bucket where it became unintentionally exposed. Additionally, in some cases, the S3 bucket may have been misconfigured to be public when it should not have been. Both are prime examples of “shadow data.” Shadow data is any sensitive data that is not subject to an organization’s centralized data management framework and is not visible to data protection teams. For example, snapshots that are no longer relevant, forgotten backups, misplaced data, sensitive data log files which are then not properly encrypted or stored, and many more examples.

Here is a summary of some of the sensitive data that we found

  • A file containing PII of people who used a third-party chatbot service on different websites – including names, phone numbers, emails – and the messages sent to the bot (for example – people seeking unemployment benefits and more)
  • A file containing loan details – name, loan amount, credit score, interest rates and more
  • A participant report for an athletic competition, including PII (name, address, zip code, email and more) and medical info
  • A VIP invite list including names, email, and address information
  • A file with first names, last names, ethereum address and bitcoin address information, and block card email addresses. 

The Risks of PII data in the Public Cloud: Why You Should Care

Because this data contains such highly sensitive information as loan details, bitcoin addresses and conversations about unemployment benefits, we believe that this data has the potential to put the organizations to whom the information belongs at risk. Organizations cannot properly protect data they do not know is exposed. And in the shared responsibility model, keeping this data secure is the responsibility of the organization that owns the buckets in which the data resides. Fortunately, there are ways to uncover and address this risk. 

How to Mitigate & Protect PII in the Cloud:

PII Data Discovery & Monitoring

The first thing that needs to be done in order to start taking care of the problem is understanding what publicly exposed sensitive data is in your environment. However, doing this in the cloud is not as simple as it may seem. Many times S3 buckets that are not public can contain specific files and objects that are public, leaving security teams unaware of the risks. On the other hand, many buckets are supposed to be publicly exposed, for example hosted websites, and unseen shadow data can be misplaced in these intentionally exposed buckets. These misplaced files are often hard to locate amongst the many legitimate files that are housed inside those buckets. 

In other words, what is needed is a data-centric view, not an infrastructure-centric one. A way to catalog all data in a cloud environment, figure out which files and objects contain sensitive information and make sure these objects aren’t publicly available without hindering the availability of other files that are safe.

Third Party Data Access Control 

Another needed step is making sure that third parties that need access to your data have access only to what they must, as handing your data over to a third party introduces a whole new layer of security threats. We will dive into this topic on a separate post.

Recommendations: 

Many organizations focus on protecting their cloud infrastructure first. This is an important component of a comprehensive cloud security solution set, but given that an organization’s most valuable asset is its data, infrastructure security alone is not enough. True protection of sensitive data requires a dedicated data security posture management solution that can autonomously discover all data, known and shadow, whether it’s a bucket that is unintentionally set to public or the much harder to find sensitive data misplaced in buckets that are intended to be public. 

To keep up with Laminar research, see my colleague’s post on Versioning and watch our blog channel

Laminar Labs is Laminar’s expert security research team that discovers, analyzes, and designs defenses for in-depth data security risks specializing in cloud data technologies. With original research and in-depth analysis, the labs team helps protect Laminar customers’ sensitive data and contributes to the broader information security community by sharing both findings and best practices.

View all articles by Laminar Labs