NOTHING TO SEE HERE
Privately Qualify, Disqualify, & Prepare Datasets for Analysis
TripleBlind’s Blind Data Tools provide data views that enable you to understand and prepare for the use of protected datasets without revealing any sensitive information. These tools give data users the ability to qualify or disqualify datasets, and prepare to operate on the full dataset in a privacy-preserving way.
Find what you’re looking for.
Exploratory Data Analysis
TripleBlind’s web interface allows data scientists to view valuable metadata about any dataset positioned by a data provider. Each field includes summary statistics like mean, minimum, maximum, count, and more.
TripleBlind’s Data Profile EDA report empowers you to determine if the data is relevant for your use case before having to formally negotiate terms of access with the data provider. With TripleBlind’s exploratory data analysis, data owners can provide descriptive statistics and characteristics without ever risking the privacy or security of the underlying dataset
Hi! I’m Audrey Warters, Marketing Coordinator at TripleBlind, and today I want to talk about how to work with data the TripleBlind way.
One of the most common reactions we get when talking with customers is, “Okay I get it, your technology keeps all my data safe, but how are data engineers and data scientists supposed to understand datasets they can’t see?”
So if you’ve been wondering how to perform exploratory data analysis, or normalize two datasets with TripleBlind, you’re not alone.
In this video we’ll dive into the TripleBlind tools so we can see exactly how TripleBlind keeps the data usable without sacrificing privacy.
But first, it’s important to understand a few key things about TripleBlind.
To start, we never store or have access to your data. With TripleBlind you host an Access Point on your cloud or on-prem servers. Any data positioned on the Access Point is always yours, and it never leaves.
Next, anytime someone within or external to your organization attempts to run a process against your data, you’ll receive an Access Request and have the opportunity to accept or reject it.
And last, if you accept the Access Request, your data stays put. A one-way transformation is performed on the data, the process is run, and the output (and only the output) is delivered to the user who initiated the Process. Said differently, TripleBlind returns the result of an API call to the requestor, never the actual data used to calculate the result.
If you’d like to learn more about the details of this process, you can download our whitepaper by clicking the link in the description.
With that understanding, I’ll open the TripleBlind Web Interface and show you how it’s possible to work with data even though the raw records are never seen.
From my web browser, I’ll login to tripleblind.app and once I see the welcome screen, I’m going to click “Dataset Explorer” in the left-side navigation.
Here I can see every Dataset my organization has given me access to, as well as any datasets third-parties have positioned with TripleBlind in a manner to make them visible to “Public”.
Let’s imagine I want to perform an analysis or model training, and one of my variables to consider is Gender. It’s very common for different datasets from different organizations to format a gender field differently, due to different industry and regulatory standards.
Without more information this problem would be impossible to resolve. However, a user doesn’t need to see the actual detailed data in the fields to untangle the problem. What the user actually needs is “metadata” about that field. In this particular case, the user needs to know what the valid values are for that field.
So how do we understand the valid values in a field in TripleBlind? Glad you asked!
First I’m going to click into this dataset. From the detail screen I can see a few things like the name, description, owning organization, and a timestamp.
Below I can see a table of “Mock Data”.
The Mock Data table displays a 10-record set of representative data for the dataset. It is important to remember that this representative data is just that; it does not contain actual raw values.
Scrolling through the Mock data I can see there is a field labeled “Gender” and it appears the owner has chosen to store this as a string with each record containing the full text, either “male” or “female”.
Now let’s look at another dataset.
It appears that this owner has decided to label their field as “SEX” and while still stored as a string, they’ve elected to use the designation M for “male” and F for “female”.
This is handy to know if I need to write a query that combines this and the previous dataset.
I’m also curious about the summary descriptive statistics of this dataset, which is difficult to determine when I can only see 10 records.
Again, TripleBlind has your back. Simply click over to the Data Profile tab and you’ll find a robust overview of the dataset including metadata like summary statistics, variables, and correlations.
Let’s take a look at one more dataset, to demonstrate the importance of viewing Mock Data and the Data Profile.
Clicking through to the detail screen, I can immediately tell this must be a French company since their field names are clearly “en français”.
Here I can see “le genre” which Google Translate tells me is French for “gender”, and it appears they’ve formatted it using 1s and 0s.
It’s possible that this field is either an integer or a boolean, and I’m not certain which is correct.
So I’ll simply navigate to the FAQ tab and ask the data owner.
“Is “le genre” a bool or int?”
They’ll be notified of my question and can choose to post the reply publicly, or directly to me.
Now that we understand the differences between these datasets, we can address them with TripleBlind’s data preprocessing tools and successfully perform any manner of analysis, algorithm or model training – but that’s for another video.
Hopefully now you see just how easy we’ve made it to get an understanding of datasets positioned with TripleBlind without direct access, and how this approach dramatically reduces the liability associated with handling sensitive data.
For questions or inquiries about TripleBlind go to our website at tripleblind.ai/contact, or email us at firstname.lastname@example.org
A privacy-preserving preview, displayed just so.
The Asset Overview page for a dataset displays a 10-record set of representative data for the dataset. This Mock Data table is a privacy-preserving representation of the underlying dataset, purposefully prepared by the data provider.
Fields containing sensitive data are masked with random characters or replaced with realistic, reader-friendly AI-generated random values.
For data that is not sensitive, the data provider has the ability to selectively unmask fields considered to be safe for display. The 10-record set of the unmasked field contains a random sample of values from the underlying dataset.
Two or more organizations may have the same data on different samples of a population, but combining that data for analysis is rarely a seamless process. Mock Data allows the data scientist to see the fields, formats, data-types, and example values in disparate datasets without needing to see the raw underlying data, and appropriately account for any differences in their data preparation and algorithms.
Wrangle. Munge. Build.
Blind Sample takes the representative sample shown in Mock Data to the next level. This operation provides data scientists with a realistic privacy-preserving sample similar to the real underlying data.
Request any number of records from Blind Sample, and TripleBlind will use underlying privacy-preserving methods, such as generative adversarial neural networks (GANs), to create sample records that look and feel like the real underlying dataset. The representative sample can be downloaded, examined, and used to develop your process before deploying against the real data.
With Blind Sample, data engineers can:
- view and interact with real data
- refine their understanding
- prototype their processes
They can do this all before implementing the solution against the real, protected (blind) data.
Book A Demo
TripleBlind keeps both data and algorithms in use private and fully computable. To learn more about Blind Learning, or to see it in action, please book a demo!