A Platform Approach to AI Privacy

Riddhiman Das

As enterprises seek to embrace AI and GenAI, there is significant interest in Responsible AI and the need to preserve trustworthiness of AI. Privacy of AI and data are an open challenge today, particularly as enterprises aggregate large amounts of sensitive enterprise data to train or prompt sophisticated AI models. The AI risk management framework [AI RMF 1.0], NIST establishes privacy and security as two key pillars among the seven principles of trustworthy AI systems.

The potential impact of AI is undisputed, however there’s still significant uncertainty among enterprises as they decide how to deploy AI at scale. A survey of 200 business leaders by KPMG (June 2023) recently found that 80 percent believe that generative AI will disrupt their industry and nearly all (93 percent) think that it will provide value to their business. 77 percent of the businesses polled say that the uncertain and evolving regulatory landscape significantly impacts their investment decisions. Business leaders also anticipate that the most regulatory action to be around data privacy. Protecting personal data and privacy concerns lead their priorities (63 percent) around effective risk management.

AI Trust and Privacy Concerns

At a basic level, there are five privacy questions enterprises are struggling with AI and Gen AI projects (in the reverse order of their process of deployment)–

  1. Prompting – How to protect enterprise data when prompting a model? This can be generalized to inferencing of a model on raw regulated data (think PHI, PII).
  2. Enterprise Data Leakage – Even if an LLM provider contractually agrees not to store data, how to ensure that no sensitive data or corporate IP crosses the enterprise boundary to a third-party hosted LLM?
  3. Training/Tuning – How to safely train AI models with sensitive regulated data (think PHI, PII) or enterprise data without co-mingling the data within the model?
  4. Pre-training analytics and processing data pipeline – When data pipelines are set up to analyze and pre-process data before training, how to ensure privacy of the data in use with analytics (querying, joining, reporting)?
  5. Discovery of data – Any real AI project that seeks to create a rich and effective model requires privacy throughout the process of iterative discovery, exploration, normalization and aggregation of high value data sets.

Typical AI POC’s get slowed down on their journey to production, as these questions create confusion between key stakeholders, typically Business leaders, Privacy officers, and AI developers that can slow down the journey to production. See the figure below.

Figure 1: The AI Privacy Perfect Storm

Figure 1: The AI Privacy Perfect Storm

TripleBlind AI Privacy Platform

Over the past 3 years, our founding team started with a mission to develop the automatic trust layer for all collaborators that want to privately share data and models. As we have worked with enterprises on their journey towards AI, TripleBlind has built the foundation for a new approach to privacy and protection.. Today, most of the pieces of the puzzle are together and we are excited to announce our AI Privacy platform.  The platform ensures end-to-end privacy for AI models and sensitive data, along the full life-cycle of AI as shown below.

AI Lifecycle

Figure 2: The TripleBlind AI Privacy Platform

There are three overarching design principles in the platform:

  1. AI Life-cycle Optimization: The AI lifecycle requires a variety of analytics and processing between data and models. No single privacy preserving technology or Privacy Enhancing Technology (PET) is optimal for all processing. The best PET or combination of PETs needs to deliver the optimal combination of performance, latency, and privacy protection. The AI Privacy Platform orchestrates and selects the optimal AI Blinding technology for process without the user having to worry about this.
  2. Privacy by design: Most PET’s that are available as state of the art are still maturing and fall short of practical requirements of AI processing. Our AI Blinding privacy technologies address this issue and are designed to deliver the best price performance and privacy protection across various families of PET’s.
SMPC Performance Diagram

Figure 3: Blind learning accelerates training vs. Federated learning
Figure 4: Blind Inferencing provides significant acceleration over state-of-the-art SMPC approaches

For instance, Blind Learning provides the fastest training method and significantly reduces the computational cost (by 3X) while providing higher accuracy compared to federated learning. Additionally, Blind learning provides stronger privacy guarantees to reduce the risk of a model inversion attack compared to federated learning.

Similarly, Blind Inferencing is the first SMPC protocol that achieves neural network inferences in practical time and delivers 100% higher performance vs. state-of-the-art SMPC for equivalent accuracy.

  1. Usability: TripleBlind platform goal is to embed privacy primitives into AI applications without changing the existing workflow. TripleBlind provides a simple SDK that can be incorporated into popular tools such as Scikit-Learn, PyTorch, TensorFlow and other analytical tools. A typical scenario is shown in Figure 5. For any new applications or models, the process is as easy as incorporating the output of the model into the process. The TripleBlind early access program provides resources to pilot such applications for free.

Figure 5: A code snippet illustrating the actual SDK code to train a VGG16 model on two distrusted datasets, using Blind Learning.
Note: that the datasets are referenced using their UUIDs (in this scenario, both datasets are owned by other organizations).

Join our mission

Please participate in our early access program if you are interested in designing and rolling out a production AI project that involves sensitive or enterprise data. We are looking forward to sharing use-cases and product solutions that demonstrate how privacy can indeed build trust with AI ecosystems.