For companies that handle sensitive data, compliance with privacy regulations is the bare minimum for expectations. Companies have moral and business-related obligations to ensure that any private data they collect is protected from unauthorized access and use.
At the same time, sensitive data is a massive source of revolutionary insights. Privacy-enhancing strategies are designed to enable the operationalization of sensitive data while still maintaining the privacy of individuals and the protection of sensitive digital assets. Whether hardware- or software-based, these strategies use different approaches to protect data while allowing for more value to be extracted for scientific, social, and commercial benefit. Following are three of today’s top privacy-enhancing strategies: tokenization, synthetic data, and trusted execution environments.
In business applications, tokenization can be used to outsource responsibility for handling sensitive data. Companies can store sensitive information in a third-party database and not have to dedicate the resources needed to oversee and handle this data.
While this is an obvious benefit, tokenization doesn’t address many security risks. The most prominent issue with tokenization is being able to trust a third party with access to sensitive data. While business associate agreements can be used to hold a third party liable for misuse, an unethical actor seeing the massive commercial value of a sensitive dataset could consider violating any agreements a comparatively small price to pay.
Furthermore, tokenization adds a layer of complexity to an organization’s infrastructure. In the example of financial transactions, a customer’s account information must be de-tokenized and re-tokenized for authentication to occur. In situations involving massive dataset, such as the training of machine learning algorithms, this added layer of complexity translates to enormous computational costs.
Also, tokenization may not address digital rights management and compliance issues, especially when a third-party provider is storing sensitive data in another jurisdiction or country. While this strategy may be popular and effective for financial transactions, it isn’t well-suited to processing datasets and engaging in international data partnerships.
Collecting massive amounts of data for analysis can be a regulatory and logistical nightmare. One popular privacy-enhancing framework developed to address these myriad data challenges is synthetic data.
Unlike standard data that is collected from original sources, synthetic data is generated from the statistical properties of real data and often serves to augment or replace that real data in mission-critical applications..
Because synthetic data holds the promise of generating new insights and enabling powerful artificial intelligence technologies, it has become a highly regarded tool in industries that deal with sensitive information — finance and healthcare in particular.
Although synthetic data can be very useful, it does have major limitations. Synthetic data systems are not particularly adept at generating outlier data, and this means synthetic data often fall short of real-world data. An over-dependence on imprecise synthetic data could lead to false insights that are costly in business, and possibly deadly in healthcare situations.
Synthetic data is an effective strategy in use cases with a narrow focus.. In situations involving a wide distribution of outcomes, however, synthetic data often proves to be quite limited in value. This reality is particularly problematic given the fact that many narrow cases have already been defined or studied at this point, and wider-scope studies are now of greater interest and importance.
The generation of “new data” from the statistical properties of sensitive data can also be fraught with challenges. If the original data set is significantly biased, problematic bias will likely be passed into the synthetic dataset. Addressing potential bias in synthetic data requires specialized knowledge of context around the data, and thus the need to address bias means decreased practicality as a privacy-enhancing framework.
Furthermore, it has been shown possible to identify real people based on information in a synthetic dataset, especially if the system used to generate the data set is flawed. Currently, this isn’t a widespread problem, but if synthetic data is more broadly adopted, reverse-engineering private data could become a more attractive option for wrongdoers.
In essence, synthetic data takes an imperfect approach to preserving privacy, resulting in limited actual utility.
Trusted Execution Environments
Trusted execution environments (TEEs) are physical hardware enclaves housing processing systems that are isolated from any processing performed by a main computer to allow for the protected storage and computing of sensitive information.
TEEs are designed to protect both the data and code running inside the environment. In data collaborations, TEEs can enable secure remote communications. They store, manage, and use encryption keys only within a secure environment, which limits the possibility of eavesdropping. Unfortunately, there are a number of issues associated with TEEs. Because these systems are mostly proprietary hardware assets, they do not readily support platform interoperability. This type of privacy-enhancing strategy can also be cumbersome, and using it can be like having a private sandbox on Mars: It’s a secure environment, but it’s difficult to get there.
TEEs are also not impervious to attack. A number of studies have revealed how cryptographic keys can be stolen, and side-channel attacks can be used to expose security vulnerabilities.
Because they are hardware-based, TEEs are not easily patched or updated – new hardware is required. Software, on the other hand, can be updated instantly over the internet, enabling patches to security vulnerabilities, bug fixes, and new functionality to be added in real time.
Finally, TEEs require data and algorithms to be physically aggregated on one machine or server. This is often impossible due to data laws which keep data locked in place. The use of TEEs for cross-border data collaboration could result in a violation of GDPR or data residency laws, resulting in steep fines and reputational damage.
A More Flexible and Practical Privacy-Enhancing Strategy
Many of the most popular privacy-enhancing strategies are effective for certain use cases. However, each one has significant limitations and vulnerabilities. The TripleBlind Solution is an elegant and flexible approach to privacy enhancement that can augment or even replace the top strategies in use today.
Available via a simple API as a software-based solution, our technology improves the practical use of privacy-enhancing technologies and addresses a wide range of use cases. Offering true scalability and faster processing than other options, our technology can unlock the intellectual property value of data while protecting privacy and supporting regulatory compliance.
Please contact us today to learn more about our superior privacy-enhancing solution.