Curious how TripleBlind works? We recently hosted a TripleBlind Live webinar, with Chad Lagomarsino, David Almeida, and Alex Koszycki, who ran a live demo of TripleBlind — showing how to use privacy-enhancing computation in any of your data operations.
To set the stage for the demo, it’s important to know that TripleBlind has two major groups of users. The first is researchers (or data scientists), who are performing data operations in larger execution environments, with data beyond what they currently own. The second group is machine learning engineers, tasked by their company to produce models to gain better insight into their own data sets.
With this in mind, this demo is broken down into two parts:
- The first part covers the basic user interface, which is more accessible for our less technically- advanced users. We show how a data owner, such as a health insurance payer (with claims data on patients with PII that can’t be shown to the public) can use TripleBlind for tasks such as a Blind Join. Blind Join allows for collaboration with other data sets and for viewership of statistical summaries. This allows data owners to present their data without exposing the raw data itself.
- In part two, we go in-depth on how to run a particular model you’ve architected and brought to TripleBlind. This is for users such as analytics firms, who might not have their own data set, but have a model they want to train. This section explains using the SDK and training a linear regression model, as well as some additional features.
Part 1 — TripleBlind for Data Owners
In part one of our webinar, you’ll learn how TripleBlind never sees, shares, or stores your data., You’ll also gain an understanding of how to present your data to other users without exposing raw data. We also explain how to edit data, show what data appears through mock data, and show how to manage access requests.
In this example, the data owner example is a health insurance payer — they have claims data on patients, containing personally identifying information (PII) that can’t be shown to the general public. This type of user might be connected to hospital systems, such as the Mayo clinic
Here is a look at a sample data display:
Here, you can seehow a user accesses our web application,, who is part of an example insurance organization, and how both parties work behind the firewall.
In this example, the insurance organization comes to TripleBlind with their own datasets, including claims info on patients. Legally, they cannot share this information due to HIPPA, CCPA, and other privacy regulations The insurance organization must keep the data on-site, which is what drives them to connect with TripleBlind –– We allow for analysis of their datasets without moving, sharing, or exposing private information.
Sharing data without exposing it
With TripleBlind, you’re not sharing the data –– ou’re only sharing the model output, so data collaboration without exposing sensitive information is possible.t. TripleBlind does not access raw data at any point — We simply generate a display of the data using mock data, showing you the “shape” of the data.
In other words, you get a visual and statistical representation of the dataset, portraying what that data looks like and mirroring the distribution of the data. This helps provide an idea of what might be in the data set, and what may be worth digging into a bit deeper.
From here, you can take a look at the Exploratory Data Analysis (EDA) report, allowing you to look for anomalies and patterns, as well as generate hypotheses.
In this overview section, the application gives you a quick sense of the overall statistics of this data set. You can also look into more detail, such as opening an expanded histogram, and checking for outliers (for example, a larger portion of younger people might skew the data set).
Here, you can see a correlation matrix for your data, highlighting which variables you may want to include or omit. This UI provides a quick way to get insight into how the data will be useful.
This also allows users to set privacy settings, hiding or displaying any columns your organization has given you the rights to. You can edit mock data, change its display, or change the types of masking.
Protecting data anonymity and IP in TripleBlind
Of course, you may wonder if features like unmasking mean that a user can re-identify the data. However, rest assured we’ve put safeguards in place to avoid this issue.
Safeguard #1: Not enough values
You won’t be able to run operations on a column in TripleBlind if it doesn’t have enough return values. Every operation has a minimal number of return values, in order to protect anonymity of the individuals. In other words, you can’t isolate an individual value, as long as you set your K-grouping to higher than 1.
Safeguard #2: Sampling randomly
Another factor with unmasking a field is that it samples randomly from that field. This means you won’t be able to match it up line-by-line to what the record was identifying on an individual level.
In the demo, we go over how you can access specific data assets, such as cigarette purchases in pharmacy data, gaining insight like price and brand without revealing the identity of those customers.
Safeguard #3: Blind Joins
With a Blind Join — which we cover in more detail in the webinar — you can connect multiple data sets to see where these individuals appear in other sections of your data. If the analytics firm hasn’t given you access to the data you need, you can easily submit an access request.
For every new data set, the data is encrypted at runtime, and access must be approved for each organization that wants it.
Safeguard #4: SMPC
Requests for operations, such as a SQL query, go through a TripleBlind security protocol called SMPC (Secure Multiparty Compute), so both parties have some information, but neither party has enough information to reconstruct the data being sent over. It uses data from both machines to return an aggregated output.
TripleBlind uses a variety of techniques to enable this SMPC functionality – some are open source, but some are proprietary methods.
Safeguard #5: Blind Learning
Models also use Blind Learning, so if you’re using a layered model for instance, you’re able to split that model into pieces, and run different parts of the model on the data owner’s and data user’s machines. Data owners will never be able to see the full model, which protects the IP of the analytics firm.
With TripleBlind, your data is always protected at-rest., Data or models will always stay behind the firewall of the organization that owns that data or model. You will have the same security features that your organization uses.
Part 2 — TripleBlind for model owners
Say you come to TripleBlind with a model you’ve built. You want to train it and run inferences on data that you don’t own. With TripleBlind you can run your training and inference steps on that analytics model, without needing to go through the process of putting that data elsewhere, storing it somewhere, or going through the legal process of access requests.
You now have an easy way to train your model.
A data user in this instance might be a contracted health analytics firm, perhaps connected to a pharmacy like Walgreens or CVS.
In this example, the user has been tasked with creating a linear regression model to describe predicted cost values for a patient in a health insurance claims table. They can run a script on the mock data to get a better visualization of the mock data.
This side of TripleBlind allows greater customization via the SDK, such as integrating the data into the modeling pipeline. (It’s important to note that TripleBlind uses a combination of raw and mock data, depending on what’s appropriate for a particular use case).
You can import your code as a package or wrapper, to wrap around existing functions in Python or R (you can also import different model formats such as MML or ONNX). With TripleBlind’s scripts, you can pull data tables, retrieve mock data, and pull a sample of that data to save locally on your machine.
From here, you can run your standard Python pipeline using this mock data for visualizations, Matplotlib, Seaborn, etc. When reviewing the output, you can see which variables are categorical, supporting the development of your regression model and allowing you to see key information that indicates what you might need to encode for.
TripleBlind’s auto-encoder also allows you to turn categorical variables into continuous variables for your regression model.
TripleBlind gives you the flexibility to incorporate the tool into your pipeline as you need it.
Run operations without exposing the raw data
You can also run operations like SQL queries on the raw data, without exposing the raw data — you just get the output. (We go over this in more detail in the demo to show what this process looks like).
Remember, you can’t use this to obtain individual rows if the K-grouping behavior is not met (i.e., the operation will fail). This protects the anonymity of the specific rows in any query.
TripleBlind offers a robust setup for customizing access rights, so you can set up an agreement for multiple discrete access requests, rather than needing to request access each time you need training data for a model. These agreements might include limits by time or by overall usage of the data, for instance.
Once you’ve set up the data you need, you can train your linear regression model and save a local copy.
Moving outside of TripleBlind if needed
After you’ve trained the model, you can take it outside TripleBlind and use it wherever you want (even if you were to decide to stop using TripleBlind).
From that point, you can run test data to calculate the expected output variables. Even if you don’t own the test data or training data, you can do a remote inference, using test data owned by another organization — without revealing any of the raw data of those patients, or moving it anywhere.
This means you can immediately use these inference results, such as for population health studies, or even for selling it back to the insurance company.
Watch the full TripleBlind demo webinar
To get a closer look at how these processes work, be sure to watch the full TripleBlind demo webinar.