Is a “data graveyard” the latest challenge haunting your business operations? It’s no question that data is the backbone of all modern organizations, driving ambitious projects forward through a foundation of rich insights. But what happens when an organization collects, stores, and then rarely uses valuable data about customer satisfaction, business operations, or the results of strategy implementation?
Data becomes buried under more data, collecting metaphorical dust and costing enterprises revenue opportunities, efficiency and productivity improvements, and more. In fact, Forrester reports that “between 60% and 73% of all data within an enterprise goes unused for analytics.” Privacy regulations add additional compliance targets for healthcare and financial services enterprises, leading to over 43 zettabytes of data that are stored and inaccessible for research and analytics purposes. The result? Data graveyards, high storage and maintenance costs, and missed opportunities for catalyzing business growth informed by intellectually-valuable data.
What is a data graveyard?
A “data graveyard” is a trending term understood to be a large repository of unused data, typically resulting from the collection of information without the capacity or resources to analyze it. Data graveyards differ from data silos in that data silos are usually controlled by one department or business unit and isolated from the rest of an organization. Data silos can become data graveyards over time –– if data sits unused, then an organization is unable to maximize data collection, storage, or analysis to its full potential.
Why do data graveyards exist?
Data graveyards are prevalent for a variety of reasons, ranging from data collection for AI initiatives and machine learning to informing more effective business decisions. Data is often collected for a specific purpose, but then one of three things might happen:
- The data is used for its specific purpose and then deleted
- The data is used for its specific purpose and perpetually stored
- The data is never used for its specific purpose but held for potential future use
This also applies to data that is collected without an intended application. Companies might collect data that could be used to inform product development, marketing strategies, revenue maximization, and more –– but may not have the operational resources, infrastructure, or organizational capacity to prioritize the use of such data in a given business quarter. In these cases, data is collected on a good-to-have basis, but left untouched as more informative or relevant data is collected.
Additionally, organizations might choose to retain data from legacy systems. As businesses prioritize digital transformation, outdated computing software or hardware might retain valuable or private information that an enterprise deems necessary for future contexts. The average merchant spends over 58% of their IT budget to maintain legacy systems, even as they develop new strategies to compete with e-commerce giants like Amazon and Alibaba. In the healthcare industry, hospitals that previously used “homegrown” software products may require software patches or updates to maintain old data –– especially as they transition to newer solutions from Cerner, Epic, or MEDITECH.
During these software and system transitions, organizations may feel challenges in determining which information to store or maintain for future purposes. For many, information on consumer behavior or long-stored healthcare data could yield future gains, justifying extended maintenance of legacy systems. However, allowing high-value data to rest in a data graveyard for eternity can inevitably lead to future risks for an organization. What are these risks?
What challenges do data graveyards pose?
Legal compliance and practical considerations apply to any organization maintaining a data graveyard. For global enterprises, compliance with The General Data Protection Regulation (GDPR) is a requirement for working with businesses and consumers in European regions. Following its implementation in 2018, businesses are now required to abide by the following principles when storing and handling personal data:
- The Storage Limitation Principle
Under this principle, personal data may not be kept for longer than necessary to achieve the specified purpose for collecting the data. Even if the collection and use of personal data is entirely lawful, organizations may not retain data beyond its intended use –– meaning that data graveyards, when filled with personal data, are non-compliant with the GDPR.
- The Purpose Limitation Principle
Under this principle, personal data can only be processed for specified, explicit, and legitimate purposes. Organizations must be transparent about the intended use of data and remain accountable to consumers, which means that data collected for “improved product features” cannot be used for targeted marketing purposes in the future, unless consumers have given explicit consent to both acts. This means that organizations seeking GDPR Compliance should collect data on a need-to-have basis, as opposed to a good-to-have basis.
- The Accuracy Principle
Under this principle, data must be accurate and kept up to date. Personal data must be complete and rectified without undue delay –– so if an organization holds data from 15 years ago and hasn’t touched it since, the GDPR requires either deletion or anonymization of this data. Inaccurate data can also lead to false insights and uninformed decision-making, so cleaning out a data warehouse could lead to more optimal business outcomes.Practical considerations also lead to unnecessary costs for businesses with large data graveyards. The average cost to store 1TB of file data per year is $3,351, with companies collecting and storing petabytes worth of data each year on-prem or in the cloud. Internal resources might also spend long hours maintaining an unused pool of data, which means an organization’s data graveyard could become a greater cost than source of revenue potential. How, then, can businesses revive data graveyards to drive future business growth?
How can my organization eliminate data graveyards?
- Determine privacy compliance requirements for your organization
The General Data Protection Regulation (GDPR), Health Insurance Portability and Accountability Act (HIPAA), and California Consumer Privacy Act (CCPA) are a variety of regulations governing the collection, storage, and use of personal information. Depending on your organization’s scope and sector, requirements from these laws could apply to your business operations. Identifying specific opportunities for compliance can guide future data-centric decision making.
- Identify current costs and benefits from long-term data storage
Not all data storage is a bad thing, but storing all data certainly is. If the costs of retaining outdated or simply old information exceeds the potential value of using the data, then it’s time to clear the ghosts out of your organization’s closet. Note that the GDPR does not apply to anonymized data, so it is possible to retain valuable components of old information while maintaining legal compliance.
- Develop an internal “Retention Policy”
Even if these regulations don’t apply to your organization, internal data retention policies can reduce costs and improve insights garnered from data over the long term. How long does your team need data to achieve a specified purpose? Will you delete collected data on a specific date, or will the data be anonymized for future uses? Clarifying your business’s data practices can prevent the build-up of unused data, ultimately reducing the size and cost of your data graveyard.
- Collaborate with vendors to maximize data usage
Privacy-enhancing technologies have revolutionized how organizations can use long-stored data and maintain legal compliance. Using privacy enhancing computation (PEC), organizations can work together to derive insights from one-way encrypted and anonymized data sets. What does this mean? If your old and unused data has a use case, PEC can help unlock the intellectual property value of your data without compromising client, patient, or consumer privacy.
What is the TripleBlind Solution?
The TripleBlind Solution is the most complete and scalable approach to PEC techniques on the market. We provide robust and sustainable measures to analyze, pool, process, or collaborate while data remains protected in use. The best part? We never see, store, or share your data. The TripleBlind Solution offers the following additional advantages:
- Supports GDPR & HIPAA Compliance through Privacy-By-Design –– TripleBlind never stores or handles any personal data. TripleBlind’s technology permanently and irreversibly de-identifies data through a combination of one-way encryption and distributed computing, which allows the algorithm to generate the same outputs without requiring an Algorithm Provider to process or use any data within the scope of the GDPR’s definition of personal data.
- Exceptional AI/ML modeling and analysis toolset –– TripleBlind enables all data operations to occur on any type of data, without adding speed penalties or requiring additional storage. Train AI models and find enterprise solutions faster than and with greater accuracy than ever before.
- Collaborate securely & seamlessly with 3rd-party vendors –– TripleBlind provides robust digital rights management (DRM). Each data operation must be explicitly approved by the appropriate administrator. Once approved, the dataset is one-way encrypted for one-time use. Once the operation is complete and the result is returned to the appropriate party, the one-way encrypted data is rendered useless. Permissions can be set as broadly or specifically as needed, to govern both internal and external use of an organization’s information assets.
Our privacy-enhancing techniques keep data in place, allowing controllers and processors of data to interact in a peer-to-peer software environment. Your dead data doesn’t have to haunt your organization by sitting unused or in a silo. If you’re ready to learn more about how PEC can revive your data graveyard, download a complimentary copy of our whitepaper here.