Federated learning is a machine learning technique that aims to protect the privacy of the data being used for model training and is sometimes referred to as “collaborative learning.” By design, it does not require data to be transferred to the model owner, making it ideal for industries where handling sensitive data is involved. Before we delve into the intricacies of federated learning, though, let’s back up a bit. Given that federated learning is a technique used for machine learning, let’s start there.
Machine learning is a form of artificial intelligence that uses algorithms that are fine-tuned using training data — or sample data — to make better predictions or decisions. This kind of artificial intelligence (AI) is more common than you might think, and it is currently used for everyday applications like email filtering and speech recognition.
With that in mind, let’s unpack federated learning.
How Federated Learning Works
Originally developed by Google in 2016 to train ML models on data stored on mobile phones, the term “federated learning” is used in multiple ways by different researchers today. TripleBlind has developed several important innovations in this field that improve scalability and practical use.
Federated machine learning offers organizations the ability to extract patterns from data held across multiple, decentralized devices through the use of a shared model. That means that different datasets can remain in their original locations, without the need to encrypt and share data with collaborating organizations to make the data usable to them, bypassing a potential point of weakness in security.
In other words, federated learning provides the ability to run an algorithm across multiple devices or clients that contain localized data. Federated learning enables training a model without having to transfer the data to a centralized location, but it works by training on the edge devices and needs a server to orchestrate the training process. Meanwhile, decentralized federated learning is simply too immature to be a viable option. In essence, the local nodes or individual devices in a network are trained on local data, and the output parameters — but not the data — are then exchanged.
The output parameters — or weights and biases — of the network are, respectively, how much influence an input exerts on the output, and additional, mathematically constant inputs, similar to an intercept added to a linear equation, that affect the data inputs. The type of federated learning performed depends on how these parameters are exchanged.
Federated learning is not the same as distributed learning. Distributed learning attempts to parallelize computing power (i.e., chunking large, complex problems by running smaller pieces of the calculation simultaneously on different computers), while federated learning actually results in more computations occurring separately, as models trained using disparate datasets are trained individually at each data storage location, with the resulting outputs being aggregated and averaged by the model trainer/data user at the end (a.k.a., the server).
The majority of FL models require that the data across all owners have the same shape (i.e., vertically-partitioned versus horizontally partitioned data). It works similarly to how the human brain can accept diverse inputs and weigh them based on applicable factors to make decisions.
Types of Federated Learning
There are three main types of federated learning platforms, each with its own set of advantages and disadvantages:
- Centralized federated learning uses a central server to carry out the training and coordinate activity between the nodes in the network. The server also selects the nodes and aggregates the information or model updates. A feature of this model is the simplicity afforded by centralization, but a disadvantage is that the server can experience bottlenecks as information comes in simultaneously from different nodes.
- Decentralized federated learning features nodes that can coordinate with other interconnected nodes without the need for a centralized server to update the shared model with their localized data. This method eliminates the centralized server as a potential point of failure. However, the network topology — how the nodes are organized together — can affect the learning process.
- Heterogeneous federated learning (HFL) is increasingly being applied in fields that involve various kinds of data owners, like devices that use the internet of things (IoT), which can have very different capabilities in terms of communication and computational power.
Practical Applications
Imagine that a company in a sector that handles private data wants to use not just their own data, but that of partner organizations to spot trends and opportunities. They can develop a machine learning model without actually having to transmit their data with each other. That means that data privacy, security, and access rights are non-issues. Such computational collaboration can pay big dividends in a variety of industries, including:
- Healthcare and medical insurance industries, where personally identifiable information must be kept private and protected, but where being able to access datasets from multiple organizations can help in, say, the diagnosis of rare conditions or spotting trends related to patient health.
- Manufacturing, where models resulting from federated learning could be used to develop equipment maintenance schedules, potentially saving the industry significant sums of money. When equipment unexpectedly breaks, production can come to a halt.
- Financial industry, where federated learning allows banking and financial tech institutions to more accurately detect signs of illicit activity and provides lending institutions the ability to better predict lending risk.
Challenges of Federated Learning
While federated learning offers some distinct opportunities, it’s by no means perfect. Let’s consider some of the key challenges to federated learning:
- High IT investment: Federated learning nodes need to communicate with one another frequently, so they need a lot of bandwidth.
- Inconsistent security: Different nodes can have different levels of security, making some nodes in an FL circuit more vulnerable than others to attack, which could impact the resulting model updates.
- Reverse-engineering risk: There is a chance that models could be reverse-engineered to identify data, making it advisable to use additional privacy-enhancing methods.
- Device-specific limitations: Particularly when it comes to devices that use the internet of things, there is a strong possibility that different kinds of devices have different communication and computational capabilities.
Federated learning is, for the time being, here to stay, and new promising FL techniques, like one using blockchain, may allow for even more secure computing between members of a federation. However, because FL is still in its infancy, more research is needed.
A Better Solution
TripleBlind’s privacy enhancing computation solution mitigates the challenges associated with federated learning for building accurate models from sensitive, private datasets. How does this work exactly? Our Blind Compute technology provides irreversible encryption that allows for completely safe processing of one-way encrypted data, eliminating the need for data to remain encapsulated in their original nodes.
Our technology completely maintains data fidelity resulting in better computational outcomes. TripleBlind complies with the world’s most stringent regulatory bodies and regulations. That’s just where the advantages begin. TripleBlind offers:
- Fast computational speeds. TripleBlind protocols are the most efficient among the current state-of-the-art methods used for machine learning.
- The ability to run any kind of operation on any type of data. That includes artificial intelligence algorithms on images, voice, text, and video.
- Quantum computing safety. TripleBlind’s formal mathematical proof of quantum resistance means that even with access to the best computers in the world, a user’s odds of reconstructing original data is no better than taking a stab in the dark.
- The ability to audit digital rights. This ensures that data is only used in authorized ways by the counterparties with whom it’s being shared.
With TripleBlind, organizations in even the most highly regulated industries can safely unlock the business value of data while preserving privacy and enforcing regulatory compliance to facilitate responsible innovation.
Reach out today to book a demo and see first-hand how TripleBlind can help your organization do more with data.
Book A Demo
TripleBlind is built on novel, patented breakthroughs in mathematics and cryptography, unlike other approaches built on top of open source technology. The technology keeps both data and algorithms in use private and fully computable.