How To Combat Shadow Data—The Data Security Risk Lurking in the Shadows
January 17, 2022 | 8 min. read
In simple terms, shadow data is your company’s data that is copied, backed up or housed in a data store not governed, under the same security structure, nor kept up-to-date by security or IT.
As an example, think about your main production data store. Of course, this is where you have your content, applications and data accessible to all those who require it, but you also are keenly aware of it, keep it up to date, and have rigid security protocols in place. By contrast, consider the copies that are made of the data in that production database that are not being secured: the copy that exists in a test environment, in an unmanaged backup that started as a lift and shift, or orphaned backup and abandoned databases.
You’ve likely heard the term “shadow IT.” This is the technology, hardware, software, applications or technology projects that are run outside the governance and oversight of your corporate IT.
At one point, shadow IT was scary, a major threat to the security of an organization’s data. However, as the challenge became more known and companies took it seriously, teams figured out how to manage and contain it.
Since then, major advancements in technology – like the mass migration to the cloud – have brought us data democratization, which in itself is a boon to all organizations and consumers. Your data is important, and allowing greater access to this data for those who need it creates more opportunities, more effectiveness.
However, the cloud also allowed data to be spread around to various places you may not even be tracking. Gone are the days of completely self-contained, on-premise systems. With greater access comes greater risk. And now a new threat has arrived. One that, in comparison, dwarfs the risk of shadow IT. It’s the largest threat to your data security: shadow data.
Do you know where your sensitive data lives? And do you have the tools and resources to manage it? Shadow data is a prominent yet frequently overlooked problem, but there are tools and resources to tackle it and secure your most valuable currency – your data.
As more and more companies move to the cloud the landscape of cloud technologies expands and becomes more complex. As more and more developers utilize the flexibility of the cloud to spin up new data storage assets with the click of a button, without consulting security or IT, the data attack surface also increases. Add in data democratization and the lack of a perimeter and shadow data becomes increasingly prevalent, as does the risk of data breach as traditional data security strategies fail to keep up.
There are four major factors that have changed cloud data protection and given way to the spectre of shadow data:
Think about where all of your data might live. And then think about where copies of this data may exist. In a typical example, you likely have the following:
All of these are unique data stores in and of themselves and any of them can be a dangling S3 backup, an unlisted embedded data store, or just become a stale data store. The problem is, they all contain sensitive data: customer information, employee information, financial data, applications, intellectual property, etc. And most likely they’re not visible to your data protection teams. They’ve become invisible, unmanaged, and unsecured.
This is your Shadow Data. Check out our insightful shadow data infographic for a visual representation of the threat.
Shadow data can be your biggest vulnerability. In a lot of cases, this data is not used anymore. Forgotten about or not even visible or accessible to corporate IT teams. On the whole, the people in your organization who should know about these stores of data don’t know about them, leaving it open prey to cybercriminals.
In fact, most data breaches often occur in shadow data environments.
Take for example the very recent SEGA Europe data breach, where the massive gaming company inadvertently left users’ personal information publicly accessible on an Amazon Web Services S3 bucket.
The mishap left wide open for hackers and cybercriminals to dig into many of SEGA Europe’s cloud services, along with API keys to their instances of MailChimp and Steam, which provided full access to these services for anyone who found it.
Fortunately for SEGA, the joint efforts of SEGA’s internal security team, combined with a team of external security researchers, the mishap was discovered and access to sensitive data was contained.
How did this happen? Shadow data. Someone inadvertently stored secure, sensitive files in a publicly accessible AWS S3 bucket and didn’t realize the extent of vulnerability. It is quite easy to misconfigure an Amazon AWS bucket, and that little mistake could have cost the company irreparable damage.
Twitter also experienced something quite similar, where, due to a ‘glitch’ that caused user’s personal information and passwords to be stored in a readable text format on their internal system, rather than disguised by their process known as “hashing”.
The mishap caused embarrassment and scrutiny for Twitter. The major social platform had to publicly urge its more than 330 million users to change their passwords.
For many organizations, a simple breach of one of their shadow data environments could be crippling.
Unmanaged data stores inevitably occur. Shadow data occurs. It’s unintentional, and it’s a normal byproduct of an organization moving at the pace of the cloud. But there are ways to ensure you’re protected and have the proper visibility into every place your data may live.
Cloud-native monitoring solutions built in the cloud, for the cloud now exist to combat shadow data and allow data protection teams to move at the speed of the cloud. These cloud data security solutions must Discover and Classify continuously for complete visibility, Secure and Control to improve risk posture and Detect Leaks, and Remediate without interrupting data flow.
As you evaluate solutions to protect your sensitive cloud data, ensure you have a platform that can scan your entire cloud account and automatically detect all data stores and assets, not just the known ones. Ensure that once data is scanned, the solution can categorize and classify the data, maintaining a cloud datastore framework that allows you to prioritize and manage all of your assets effectively.
Having full data observability lets you understand where your shadow data stores are and who owns them. Doing so leads to a secure environment, faster, smarter decision-making across the enterprise, and the ability to thrive in a fast-moving, cloud-first world.
Get notified when a new piece is out