Tokenization is the process of turning a piece of data, such as an account number, into a random string of characters called a token that has no meaningful value if breached. Tokens serve as a reference to the original data, but cannot be used to guess those values.
Its use is gaining popularity, especially in the financial services industry. However, there are several limitations to this approach to data sharing compared to TripleBlind.
When you tokenize a particular data element, you’ve lost the ability to compute on that data element. Let’s say you’re tokenizing a social security number; it will make aggregation and dataset joining tasks much more difficult because the same social security number can be stored as different data types in different datasets resulting in different token values.
However, with TripleBlind, your end result has higher accuracy with 100% data fidelity because all elements in the data are used for computation. Nothing is hidden, removed, or replaced. The data is used as-is while in complete compliance with the strictest regulations (such as GDPR, CCPA, and HIPAA).
Let’s say you try a different but similar approach – masking or hashing. Masking has various approaches ranging from simple to complex. A simple method is to replace the real data with null or constant values. A slightly sophisticated approach would be to mask the data to retain the identity of the original data to preserve its analytical value. Masking always preserves the format, but there are risks of reidentification.
A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called hash values, hash codes, digests, or simply hashes. The values are usually used to index a fixed-size table called a hash table.
When masking or hashing medical data for an element like male or female, it isn’t that helpful because every instance of “male” will have the same value, and every instance of “female” will mask/hash to the same value. Therefore, you must remove the 18 HIPAA identifiers from the datasets entirely before its use.
TripleBlind’s innovative solution allows all those HIPAA identifiers to remain in the dataset with a 0% chance of the data being reidentified at any point. These identifiers include important information for medical insights, such as biometric identifiers or facial images.
|1. Name||10. Account Number|
|2. Address||11. Certificate or License Number|
|3. Significant Dates||12. Vehicle Identifiers|
|4. Phone Numbers||13. Device Identifiers|
|5. Fax Numbers||14. Web URL|
|6. Email Address||15. IP Address|
|7. Social Security Number||16. Finger or Voice Print|
|8. Medical Record Number||17. Photographic Images|
|9. Health Plan Beneficiary Number||18. Other Characteristics that Could Uniquely Identify an Individual|
Tokenization only works for tabular and columnar data, so most organizations end up combining different approaches like masking and tokenization to get the maximum value out of their data, but it doesn’t have to be this way. Our solution is a one-fits-all type.
To find out how TripleBlind works for your business, schedule a call or reach out for a free demo at firstname.lastname@example.org.
To learn more about how TripleBlind compares to other competitors and methods of data collaborations, follow us on LinkedIn and Twitter to be notified when we post the next installation in our Competitor Blog Series.