How TripleBlind can Help Produce Better Models

The Power of TripleBlind Technology as demonstrated through the Blind Learning Model

It’s an age-old rule of AI: an algorithm is only as good as the data it is trained on. If the training dataset is small or biased in some way, the model will not be able to produce accurate results when new scenarios are presented. To ensure a model’s accuracy and effectiveness, it is imperative that the model be trained on a large quantity of accurate and unbiased data. However, this is often difficult to accomplish due to data scarcity. Even if an organization is able to acquire a large amount of data, this data is likely skewed in some way (location, ethnicity, gender, etc.) as it only comes from the organization’s own customers and/or users. Therefore, the organization’s models, trained only on its own data, are not universally applicable. This poses a large problem for any organization that wishes to deploy an algorithm for wide scale use. There is, however, a very clear solution: find a way to train the model across multiple datasets, provided by other organizations. 

Here, we present an example of how the TripleBlind technology can do just that: solve the problem of data scarcity while maintaining privacy.

Example with Publicly Available Data

To demonstrate the power of TripleBlind technology, we developed a test case with the following characteristics. We used portions of the MNIST* dataset held in two different organizations to train a convolutional neural network (CNN). 

After separating 29,000 MNIST images into 2 datasets of 6,000 images and 23,000 images, we handed ownership of each dataset to two different organizations, Organization A and Organization B. We first trained a CNN model using only Organization A’s dataset. The resulting model, trained with only Organization A’s 6,000 images, proved to be 91.60% accurate when tested against a separate set of 1,000 MNIST images. 

Then, using TripleBlind’s privacy ensuring technology, we used the same scripts to train a CNN model over the datasets owned by both Organization A and Organization B. Because we used TripleBlind’s platform, we were able to drastically increase the size of the training dataset, resulting in a CNN model trained over 29,000 images. We tested the model in the same way, against the same set of 1,000 MNIST images as before. The result was an accuracy of 96.10%. 

By using TripleBlind’s technology, we were able to both increase the size of the training dataset and increase the accuracy of the model. The training dataset increased from 6,000 images to 29,000 images (a 380% increase in size). This larger training dataset resulted in a model with increased accuracy- the new model was 96% accurate, as compared to the 91% accuracy of the model trained on less data.

Throughout the entirety of this process, privacy was ensured. Organization A was NOT able to see the data of Organization B. Likewise, Organization B was NOT able to see the data of Organization A. Neither organization could see the algorithm that was created. The training dataset was doubled, thus bettering the model’s accuracy, without compromising the privacy of either organization’s data. 

One should note that this example was conducted with evenly split data- the datasets were not biased in any way. In the real world, Organization A’s 6,000 images would likely be biased in some way. For this example, perhaps 90% of their 6,000 images would come from the numbers 1-5. Similarly, 90% of Organization B’s images might consist of the numbers 6-9. With these biased datasets, the accuracy differences between the first model and the model trained using TripleBlind’s platform would be much more drastic. 

*Modified National Institute of Standards and Technology

A Story of Privacy During the Pandemic

On February 27th we started something here at TripleBlind. After a conversation with Ramesh Raskar of MIT about an idea that struck him when Mike Pence’s team arrived at a healthcare conference asking for ideas of how to battle Coronavirus, we began building a contact tracing solution. (Even though we and nobody outside of certain specialists knew the term “contact tracing” back then.) This became the open source cellphone app known as Private Kit, which spawned the community behind the COVID Safe Paths app and the Safe Places system.

This system is good. It is it going to help stop the spread, it will #flattenthecurve. Even more importantly, I believe this is how we restart the world. I think having 14 days of “all green dots” is going to be the key to leaving our homes, opening our schools, rejoining our colleagues and feeling safe around strangers again.

But most importantly, this is Private.

Why is private important? It is an important sounding word, but is it really italics-worthy important? I mean, really, isn’t fighting the disease that is paralyzing the entire world more important? Can’t we forget Private for now? Let me tell you a story…

Several years ago there was a wildly contagious disease known as MERS (Middle East Respiratory Syndrome). It hit South Korea hard. Hard enough that people were scared and privacy was the last thing they cared about. A bill passed to help authorities perform contact tracing. To make it more efficient they were authorized to collect video, credit card and everyone’s cell phone location. The days preceding a patient’s diagnosis were almost perfectly reconstructed and published so the community could tell if they’d made contact with these people. It worked and MERS was stopped. So…that’s a good thing, right?

Here is the next chapter of that story. Recently, patient #15 with COVID-19 in a district of South Korea was broadcast — texted to the cellphones of everyone in a certain neighborhood. This text included a link to some facts. A female in her 20s working at Jacks Bar had been diagnosed. The trail of her last few days was published so people could tell if they’d had made contact with her. This was for the public good, a little loss of privacy is worth it — right?

Then the worst tendencies of scared people took over. The trail was examined…scrutinized by an online crowd. She’d gone home sick on March 27th. She went to eat at a restaurant on March 30th. She waited until April 3rd to go to the hospital. She even stayed with someone one night. How irresponsible! What a self-centered youth! And what about that person she’d stayed with? It was “anonymous” data, no names were ever published. But how many 20 year olds work at Jacks Bar? And how many of them live in that specific neighborhood?

People are storytellers. Points of data get connected and stories just naturally emerge. It is human nature.

Here is another version of that same story: A 20 something young woman starts to feel sick in the middle of a terrifying epidemic. ¹She leaves work early. Scared. Afraid she’s sick. Afraid she’ll lose her job. Afraid she won’t be able to pay her bills. After hiding for two days she is too weak to cook, but starving. She goes to get some food, hoping it will hold her over until it passes. She finally feels scared for her life and goes to the hospital to learn she is the victim of the virus. She was lucky and recovered. But this victim will forever be branded as the girl who jeopardized her neighbors. She’ll always be blamed for anyone who got sick, whether she actually came near them or not. The disease is gone, but she may never recover.

I’m not certain if my story is completely true. But it is why Private matters. Surveillance by a trustworthy official is easy to allow. Privacy is hard. But is it possible. And she deserved it. So does my daughter. So does my son. So do you.

This is why I care. This is why I’ve worked 18 hours a day, every single day for nearly two months. This is why TripleBlind exists.

#Covid19 #FlattenTheCurve #PrivacyMatters


partially inspired by segment at 7:40 of “The Coronavirus Guilt Trip


The Privacy “Faustian Bargain”

As many of you know I recently joined my good friend Riddihiman Das in an effort to build a cryptographically powered privacy system. We’ve been joined by a small team of experts and we are working hard to build an API that will enable bulletproof privacy as a service. Why does the world need “bulletproof privacy as a service”? I’m glad you asked! The short answer is because many of our most ubiquitous online services have developed business models that depend on surveilling us, and then “monetizing” (i.e. read “selling”) the data they accumulate. Data about us – some of which is deeply personal. 

The following was prompted by the November 20, 2019 issue of “The Download” (from the people over at MIT Technology Review). Today’s issue alone refers to at least four articles regarding the loss of privacy most of us suffer due to what the first article calls the “Faustian bargain” most users are forced to make. 

The first article, from Amnesty International is a “scathing indictment of the world’s dominant internet corporations” ( The paragraph that caught my attention is “This ubiquitous surveillance has undermined the very essence of the right to privacy,” the report said, adding that the companies’ “use of algorithmic systems to create and infer detailed profiles on people interferes with our ability to shape our own identities within a private sphere.” The article then quotes Amnesty International as making a recommendation that is logical, but is unfortunately inadequate “Amnesty called on governments to legally guarantee people’s right not to be tracked by advertisers or other third parties. It called current regulations — and the companies’ own privacy-shielding measures — inadequate.” Good thought, but regulations won’t do it all. Too many legislative hurdles (i.e. read “lobbyists”), and in large parts of the world the local legal systems aren’t strong enough to enforce good regulations. At TripleBlind we think the better answer is mathematically enforced cryptography that doesn’t rely on laws, rather privacy is built into the protocol. We are working to “build in” privacy preservation, and we want to give you the keys to either lock or unlock your data as you see fit. 

The second article is about a home camera system and the many ways data (i.e. read “pictures of you, your friends and whoever walks past your house”) from these devices is used. This article makes a couple points that caught my attention. First is the point about how the camera company shares data with other entities in ways that are not transparent to their customers (i.e. you and me). The second is the point that after the camera company shares that data with an undisclosed third party, the camera company no longer can control what happens to/with the data (i.e. read “undisclosed third party can use the pictures for whatever purpose or resale they want”). At TripleBlind we believe both of these positions are incorrect. We are working to build a system that allows you to control the release of the information in your data (note – I did not say “data”, I said “information in your data”) differentially – when, to whom, for what purpose, for what duration and at which price. 

The third article is what I think the authors meant to be a case of “surveillance for good”. I think most of us would say the goal (helping people with gambling disorders control their behavior) is a good one. That said – think of the privacy implications of this application – especially when the behavior in question is coupled with a “frequent player” card/id. When the casino knows who you are, and that you display behaviors that their marketing department associates with being a “good customer” how do you think they will react? There is a very good chance the casino already knows individual customer’s gambiling behavior, and have tailored their marketing to that behavior. They are probably going to encourage you to visit the casino as often as possible. In the overall eco-system of individually targeted advertising intended to get customers in the casino – do you really think making some customers take a few second break will really make a difference? At TripleBlind we believe the better answer is to keep that “frequent player” card identity private, and allow the customer to control the dissemination of the data associated with it (and the advertising associated with it). 

Consider the fourth article a type of “public service announcement” from the folks at the Mozilla Foundation. It’s their “creepy rating” of various Christmas gift items. We can’t all go live in a cave or under a rock until better privacy tools arrive, but we can be vigilant and at least try to manage the privacy compromises we make everyday. 

In the meantime at TripleBlind we are working to deliver tools that will allow you to control your data, differentially release the knowledge in it and allow it to be interacted with algorithms in a way to protect both the data and the algorithm from disclosure. We believe this is going to be good for everyone. Once your privacy is cryptographically enforced you and the companies with which you do business will find even more interesting (and potentially lucrative) ways to use the ever larger and ever more granular data we produce every day. We might even find a way to change the terms of the privacy “Faustian bargain”.

I continue to believe this privacy thing is a big deal. 

Let’s Eat a Private Cake

After I left Ant Financial/Alibaba, I was filled with gratitude toward Ant Financial, Alibaba, and our global partners for 3 incredible years – they have been an absolute blast. No one is enabling global financial inclusion at the rate Ant Financial is, and I’m grateful to have gotten an opportunity to foster that. I worked in China, Israel, The USA, Canada, Colombia, Mexico, Brazil, India, Indonesia, Singapore, Thailand, Malaysia, the Philippines, South Korea, Hong Kong, Japan, Macau, Germany, Russia, The UK, Finland and several other countries. The work has grounded me and helped me understand how enabling global trust at the scale Ant does helps people self-actualize. I will forever be an Aliren.

As for what’s next for me – I am going to take a stab at building cryptographically powered privacy, without reliance on the legal system. This effort is called TripleBlind. We are building an API that will enable bulletproof privacy as a service, allowing the option to enforce privacy mathematically.

As more and more of our information is stored and transacted with in the digital world as opposed to the analog world, the current approaches we take to such private transactions fall short. The default approach is to slap some mumbo jumbo legalese into a privacy policy with the expectation that no one will ever read it. The evidence would suggest that these approaches don’t work – because they leave open the option to abuse the trust afforded to them by their end users.

The legal/contractual approach to privacy falls short for several reasons:

  1. It still leaves open huge holes to allow misuse of the data, intentionally or otherwise, both internally and externally. Breaking compliance requires just one incompetent or malicious actor in the entire organization. E.g. the major credit bureau using “admin/admin” as their credentials for their primary database. Or the major credit card issuer keeping all of their credit pull information in an unsecured S3 bucket.
  2. The custodians or owners of the data cannot consent to every operation performed on that data. While they might have the option to do so on paper, there’s no way to enforce it. It relies on the right organizational processes and structures in place, which are fallible, if they even exist. If the privacy policy is in the way of a particular operation, the data custodian can unilaterally change the privacy policy contract on the actual data owner. If you’re lucky, you might get an email at 3am telling you that the contract changed and you somehow already consented to it.
  3. The western world also has a tendency to take rule of law for granted. As we shift to a world where the vast majority of internet users are not from the western world, incumbent approaches that assume contracts can actually be enforced are inherently “broken”.

The core thesis around TripleBlind is that privacy enforced data & algorithm interactions can unlock the tremendous amount of value that is currently trapped in private data stores and proprietary algorithms. If we move from a world of “don’t be evil” to “can’t be evil”, we can enable entities to freely collaborate around their most sensitive data and algorithms without compromising their privacy, allowing them to work together to create compounded value in a way never before possible.

Around privacy, I believe we can have our cake and eat it too – let’s eat a private cake.