How TripleBlind’s Data Privacy Solution Compares to Tokenization, Masking and Hashing

Tokenization is the process of turning a piece of data, such as an account number, into a random string of characters called a token that has no meaningful value if breached. Tokens serve as a reference to the original data, but cannot be used to guess those values. 

Its use is gaining popularity, especially in the financial services industry. However, there are several limitations to this approach to data sharing compared to TripleBlind. 

When you tokenize a particular data element, you’ve lost the ability to compute on that data element. Let’s say you’re tokenizing a social security number; it will make aggregation and dataset joining tasks much more difficult because the same social security number can be stored as different data types in different datasets resulting in different token values.

However, with TripleBlind, your end result has higher accuracy with 100% data fidelity because all elements in the data are used for computation. Nothing is hidden, removed, or replaced. The data is used as-is while in complete compliance with the strictest regulations (such as GDPR, CCPA, and HIPAA). 

Let’s say you try a different but similar approach – masking or hashing. Masking has various approaches ranging from simple to complex. A simple method is to replace the real data with null or constant values. A slightly sophisticated approach would be to mask the data to retain the identity of the original data to preserve its analytical value. Masking always preserves the format, but there are risks of reidentification. 

A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called hash values, hash codes, digests, or simply hashes. The values are usually used to index a fixed-size table called a hash table.

When masking or hashing medical data for an element like male or female, it isn’t that helpful because every instance of “male” will have the same value, and every instance of “female” will mask/hash to the same value. Therefore, you must remove the 18 HIPAA identifiers from the datasets entirely before its use. 

TripleBlind’s innovative solution allows all those HIPAA identifiers to remain in the dataset with a 0% chance of the data being reidentified at any point. These identifiers include important information for medical insights, such as biometric identifiers or facial images.

 

HIPAA Identifiers

1. Name 10. Account Number
2. Address 11. Certificate or License Number
3. Significant Dates 12. Vehicle Identifiers
4. Phone Numbers 13. Device Identifiers
5. Fax Numbers 14. Web URL
6. Email Address 15. IP Address
7. Social Security Number 16. Finger or Voice Print
8. Medical Record Number 17. Photographic Images
9. Health Plan Beneficiary Number 18. Other Characteristics that Could Uniquely Identify an Individual

 

Tokenization only works for tabular and columnar data, so most organizations end up combining different approaches like masking and tokenization to get the maximum value out of their data, but it doesn’t have to be this way. Our solution is a one-fits-all type.

To find out how TripleBlind works for your business, schedule a call or reach out for a free demo at contact@tripleblind.ai.

To learn more about how TripleBlind compares to other competitors and methods of data collaborations, follow us on LinkedIn and Twitter to be notified when we post the next installation in our Competitor Blog Series. Check out our previous blogs here!