Overcoming Privacy Hurdles: Approaches for Challenges in Data Sharing

Okta’s Director, Corporate Counsel, Product and Privacy, Fatima Khan, Elizabeth Harding, Privacy Attorney at law firm Polsinelli and TripleBlind’s Co-founder and CEO Riddhiman Das explored current privacy regulations, how those regulations affect companies in the market today, and what solutions are available to overcome regulatory hurdles.

Differential privacy banner image

How TripleBlind’s Data Privacy Solution Compares to Differential Privacy

Differential privacy is not a specific process like de-identification, but a property that a process can have. For example, it is possible to prove that a specific algorithm “satisfies” differential privacy. Informally, differential privacy guarantees the following for each individual who contributes data for analysis: the output of a differentially private analysis will be roughly the same, whether or not you contribute your data.

When computing on data via differential privacy, it adds stochastic deterministic noise to each data element that masks the actual data element. Stochastic refers to a variable process where the outcome involves some randomness and has some uncertainty. This might result in significant accuracy degradation, whereas TripleBlind’s one-way encryption algorithms don’t add any noise to the dataset that would impair results. 

Differential privacy is suitable for situations with a higher degree of tolerance for error. For example, Apple keyboard suggestions – Apple doesn’t need to know exactly what you’re typing, but needs to know in general what people are typing to offer reasonable suggestions. 

Apple itself sets a strict limit on the number of contributions from a user in order to preserve their privacy. The reason is that the slightly biased noise used in differential privacy tends to average out over a large number of contributions, making it theoretically possible to determine information about a user’s activity over a large number of observations from a single user. It’s important to note that Apple doesn’t associate any identifiers with information collected using differential privacy.

The majority of the other approaches to data collaboration we’ve covered only work for tabular /columnar data; including homomorphic encryption, secure enclaves, tokenization. These approaches face severe challenges when it comes to producing high-performance, accurate models on complicated datasets like x-ray image analysis. However, TripleBlind is the solution to this problem since these images are encrypted – complying with HIPAA regulations.

 

diagnostic ai diagram

TripleBlind uses data from outside sources to be used in our private infrastructure to compute and develop accurate diagnostics. Our Blind AI Pipeline ensures that the original data cannot be reversed engineered and is compliant with HIPAA regulations.

 

If you’re interested in knowing more about how you can safely and efficiently share data , please email contact@tripleblind.ai for a free demo. Don’t forget to follow TripleBlind on Twitter and LinkedIn for our latest updates. 

This is the final blog of our Competitor Blog Series where we compared TripleBlind’s technology to other approaches of data collaboration. If you missed the other blogs, you can check them out below!

 

Read other blogs in this series:

Business Agreements
Homomorphic Encryption
Synthetic Data
Blockchain
Tokenization, Masking and Hashing
Federated Learning

event banner image

Money20/20 Europe, September 21-23, 2021, RAI Amsterdam

TripleBlind Co-founder and CEO Riddhiman Das will participate in a Money20/20 Europe challenge event titled, “The Future Is Trustless. What Do We Need Instead?” on Wednesday, September 22, 9:00 -10:00 a.m. TripleBlind will also exhibit at booth C56. Learn more about Money20/20 here, click here to set up a meeting or demo with TripleBlind.

tokenization masking and hashing banner image

How TripleBlind’s Data Privacy Solution Compares to Tokenization, Masking and Hashing

Tokenization is the process of turning a piece of data, such as an account number, into a random string of characters called a token that has no meaningful value if breached. Tokens serve as a reference to the original data, but cannot be used to guess those values. 

Its use is gaining popularity, especially in the financial services industry. However, there are several limitations to this approach to data sharing compared to TripleBlind. 

When you tokenize a particular data element, you’ve lost the ability to compute on that data element. Let’s say you’re tokenizing a social security number; it will make aggregation and dataset joining tasks much more difficult because the same social security number can be stored as different data types in different datasets resulting in different token values.

However, with TripleBlind, your end result has higher accuracy with 100% data fidelity because all elements in the data are used for computation. Nothing is hidden, removed, or replaced. The data is used as-is while in complete compliance with the strictest regulations (such as GDPR, CCPA, and HIPAA). 

Let’s say you try a different but similar approach – masking or hashing. Masking has various approaches ranging from simple to complex. A simple method is to replace the real data with null or constant values. A slightly sophisticated approach would be to mask the data to retain the identity of the original data to preserve its analytical value. Masking always preserves the format, but there are risks of reidentification. 

A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called hash values, hash codes, digests, or simply hashes. The values are usually used to index a fixed-size table called a hash table.

When masking or hashing medical data for an element like male or female, it isn’t that helpful because every instance of “male” will have the same value, and every instance of “female” will mask/hash to the same value. Therefore, you must remove the 18 HIPAA identifiers from the datasets entirely before its use. 

TripleBlind’s innovative solution allows all those HIPAA identifiers to remain in the dataset with a 0% chance of the data being reidentified at any point. These identifiers include important information for medical insights, such as biometric identifiers or facial images.

 

HIPAA Identifiers

1. Name 10. Account Number
2. Address 11. Certificate or License Number
3. Significant Dates 12. Vehicle Identifiers
4. Phone Numbers 13. Device Identifiers
5. Fax Numbers 14. Web URL
6. Email Address 15. IP Address
7. Social Security Number 16. Finger or Voice Print
8. Medical Record Number 17. Photographic Images
9. Health Plan Beneficiary Number 18. Other Characteristics that Could Uniquely Identify an Individual

 

Tokenization only works for tabular and columnar data, so most organizations end up combining different approaches like masking and tokenization to get the maximum value out of their data, but it doesn’t have to be this way. Our solution is a one-fits-all type.

To find out how TripleBlind works for your business, schedule a call or reach out for a free demo at contact@tripleblind.ai.

To learn more about how TripleBlind compares to other competitors and methods of data collaborations, follow us on LinkedIn and Twitter to be notified when we post the next installation in our Competitor Blog Series. Check out our previous blogs here!

TripleBlind vs Blockchain banner image

How TripleBlind’s Data Privacy Solution Compares to Blockchain

​​Blockchain is a shared, immutable ledger that facilitates recording transactions and tracking assets in a business network. It’s most commonly associated with cryptocurrency, a record of transactions made in bitcoin or other cryptocurrencies that are maintained across several computers that are linked in a peer-to-peer network.

Blockchain has its advantages. It’s a great way to keep an audit trail of who might have done what to your data, but it’s not a good solution for data sharing in the long run. With blockchain, the data stored can still be accessible by certain individuals via a private key. 

With TripleBlind, all parties involved in data sharing will always know what is being done to their data. We provide audit trails of all operations, plus all parties must provide cryptographic consent to every operation done. There’s a fine-grained control of data and algorithm interactions where TripleBlind can manage individual attributes and record-level permissions on the data. This allows for accurate cryptographic auditability of every data and algorithm interaction without anyone ever seeing the raw data.

Sharing data through blockchain means it’s inherently public. It allows multiple tiers upstream and downstream to be transparent and highly visible – two words you don’t want associated with sensitive data.

Lastly, blockchain is not built for the future. It’s necessary to have an approach to data sharing that won’t come undone and leave businesses scrambling to use the next best solution. It’s costly, inefficient, and ineffective. 

TripleBlind is the future of data sharing and complies with the strictest of data privacy laws and regulations. It can be used around the globe, and our operations will automatically comply with local regulations such as GDPR since everything stays one-way encrypted during our process, and no one gets a copy of the raw data. 

To schedule a call or free demo to explore how TripleBlind can work for your business, please reach out to contact@tripleblind.ai. To keep up to date with our latest blogs, follow us on Twitter and LinkedIn!

Read other blogs in this series:

Business Agreements
Homomorphic Encryption
Synthetic Data
Tokenization, Masking and Hashing
Federated Learning
Differential Privacy

TripleBlind vs Federated Learning banner image

How TripleBlind Compares To Federated Learning

Federated Learning is a learning paradigm that allows multiple parties to collectively train a global model using their decentralized data without the need to centrally store it; and, thus, without the need to transmit it outside the owner’s infrastructure.

Google coined the term Federated Learning in 2016, and the company has since been at the forefront of AI training through this method. From a high level of abstraction, Federated Learning goes through the following steps:

  • A central server chooses an algorithm or statistical model to be trained. The server transmits the model to several data providers, often referred to as clients (consumers, devices, companies, etc.);
  • Each client trains the model on their data locally and shares updates with the server;
  • The server receives model updates from all clients and aggregates them into a single global model. The most common aggregation approach is averaging.

Federated Learning has the opportunity to be beneficial in both healthcare and financial markets, with the potential to create unbiased medians of large amounts of consumer information. In healthcare, trained models via Federated Learning can help with diagnosing rare diseases based on other patient data. In fintech, Federated Learning allows institutions to detect crime and risk factors within their collaboration network. 

Federated Learning only accesses the results and learnings based on the algorithms, which are then sent back to the server without sharing the actual data. It is meant to keep individual consumer data private. However, while Federated Learning allows for more privacy than has previously been possible with AI, there are downfalls when it comes to the model privacy and efficiency of collaboration through Federated Learning. 

Because Federated Learning requires each of the clients to train the model on their entire dataset locally, there is both a high computational load and high communication overhead.

When multiple parties collaborate through Federated Learning, the model through which the collaboration takes place is known to everyone involved, making it susceptible to several attacks that could lead to data leakage. Moreover, it also puts the actual model privacy at risk.

TripleBlind’s Blind Learning approach is superior to and more efficient than Federated Learning and offers a more secure and precise way to share data. With TripleBlind’s groundbreaking solution, de-identified data is shared through models in which TripleBlind and all other parties involved are blind to the model and the original data.

 

reconstruction attack example image

This comparison shows how private data shared via TripleBlind’s solution remains private and de-identified in the case of a data breach

 


Data sets are shared so that only relevant information to the collaboration can only be used for its intended purpose. By preventing reconstruction attacks, TripleBlind ensures there is no risk of the data ever being re-identified if there were to be a data breach.

We are comparing TripleBlind’s technology to other modes of data collaboration as part of our Competitor Blog Series. Stay up to date with TripleBlind on Twitter and LinkedIn to learn more. If you’re interested in knowing more about how collaborating using TripleBlind’s patented solution can safely and efficiently unlock privacy for you, please email contact@tripleblind.ai for a free demo.

Read other blogs in this series:

Business Agreements
Homomorphic Encryption
Synthetic Data
Blockchain
Tokenization, Masking and Hashing
Differential Privacy