Rhino FCP

Federated Learning + Edge Computing + Rhino = Federated Computing

The Rhino Team

Chris Laws

COO

Dr. Ittai Dayan

Co-founder & CEO

Yuval Baror

Co-founder & CTO

January 30, 2025

Every AI strategy presupposes access to large volumes of high quality data in order to train, fine-tune, validate, run inference, deploy, and monitor ongoing performance of AI/ML models. Finding, accessing, sharing, and processing those data in a way that protects data security, privacy, and sovereignty represents a universal challenge to product teams, data teams, and data science teams who need sign-off from privacy, security, and legal teams. Federated Computing, as provided by the Rhino Federated Computing Platform (Rhino FCP), can help. Said in laymen’s terms, Federated Computing is running code on decentralized data, sharing only the results of the code but never moving the data. This approach represents a critical evolution in how cross-organization data collaborations should work - streamlining the arduous process of consortium building and even enabling use cases that would otherwise be impossible due to data privacy or IP confidentiality concerns.

This blog post goes into more detail about what exactly is Federated Computing, including the relationship to related concepts such as Federated Learning, Federated Analytics, Edge Computing, and Distributed Computing. We also highlight why Federated Computing is a better choice than alternatives for data collaborations including data clean rooms, open source Federated Learning frameworks, and centralizing large amounts of sensitive data. We close with a brief introduction to Rhino FCP itself, and describe some of the features making it the best choice to serve as the ‘data collaboration tech stack.’

What is Federated Computing?

Federated Computing (FC) is an emerging technique of collaborators across organizations performing computations locally where data resides, and only sharing aggregations of the results back to a centralized location. (Contrast this with centralizing data from multiple collaborators and then running computations.) These computations can range from the simple (e.g. counts of items across different silos) to slightly more complicated (e.g. transform the units of those items from pounds to kilograms) to more complicated still (e.g. train an ML model on those data across multiple silos) to very complex (e.g. deploy one partner’s software on another partner’s data behind their firewall).

FC addresses critical challenges regarding data privacy, data security, and data sovereignty. In FC, data custodians maintain complete control over their data. This means their organization’s existing security controls remain in place. By not sending data to collaborators outside of their organization, custodians reduce any risk of misuse of the data by their partners. Collaborators can also introduce additional privacy-enhancing techniques such as differential privacy or k-anonymization into their FC projects for further protection of data privacy.

Federated Computing is related to but distinct from several other concepts:

Federated Learning (FL): FL is the technique of training an ML model on distributed data - only aggregating model weights to arrive at a global optimum model. FL represents one of the techniques under the umbrella of FC.
Federated Statistics (FS) or Federated Biostatistics: FS is the practice of analyzing raw data stored locally on distributed data, aggregating only the results. This would include descriptive statistics, correlations, regression analysis, etc. - all on data residing across multiple silos. FS is another technique under the FC umbrella.
Edge Computing (EC): EC refers to the practice of running applications closer to where data reside. This technique is often used where latency is critical (e.g. self driving cars), but also describes how code is run in FC i.e. the consortium runs code “at the edge” by running analytics or training models on data residing in each member’s silo.
Distributed Computing (DC): DC is the practice of dividing problems into many tasks, which are then solved by a distributed network of computers. While this is technically true of FC, DC is really meant to refer to instances where problems are too complex to be solved by one computer and therefore have to be distributed for resourcing, rather than addressing privacy or security issues.

Why is Federated Computing the Future?

The world is moving towards more protection of data privacy. The EU has been on the vanguard with General Data Protection Regulation (GDPR), followed by other jurisdictions implementing many of the same rigorous protections (e.g. Personal Information Protection Law in China, California Consumer Privacy Act). Cyber risk is also making data custodians more conservative, as they hope to minimize the severity of any data breach by not taking on unnecessary liability of storing unnecessary personal data. In response to these trends is also a move towards adopting more privacy-enhancing technologies (PETs) including confidential computing, (homomorphic) encryption, secure key management systems, differential privacy mechanisms, secure enclaves, synthetic data, private set intersection protocols, and secure multi-party computation frameworks. FC platforms can incorporate these PETs to provide a multi-layered approach to data privacy and security.

There are several alternatives to FC, but we believe FC outcompetes them all due to stronger security posture and better functionality.

Centralizing data is the default pattern today, but is becoming outmoded. Organizations are unwilling to centralize complex datasets with partners due to the risk of reidentification. Centralizing data also requires paying large egress, transfer, and duplicative storage fees. Also, as soon as data are shared, they are out of date - requiring frequent refreshes to stay relevant, adding to the expense. The risks associated with centralization also lead to contracting friction, slowing time to value.
Digital Clean Rooms offer an environment for collaboration operated by a trusted third party, but rely on contractual agreements to limit how data are used versus having privacy controls inbuilt - and do not necessarily have strong hardware-based security controls in place. While effective at facilitating collaboration among users of the same platform, collaboration across platforms is impossible. In a world of fragmented adoption of platforms, this presents a huge stumbling block to a potential collaboration.
Trusted Execution Environments (TEEs) offer strong hardware-based security measures, but not all workloads can be run inside of TEEs, there are often performance limitations, and complexity in implementation with key management and attestation.
There are a variety of open source FL frameworks. These are powerful tools for training ML models on distributed data. They do not, however, provide Federated Analytics capabilities nor are they enterprise-hardened for deploying production grade workloads.

AI leaders will need a data collaboration approach that allows them to work hand-in-glove with their data partners, and Federated Computing is the only technique that enables completely flexible use cases while also providing rigorous protections for data subjects and custodians.

Why Federated Computing with Rhino FCP?

Rhino Federated Computing has built the world’s leading data collaboration platform, Rhino FCP. Rhino cut our teeth in the world of healthcare and life sciences, where data privacy is absolutely paramount and data custodians fiercely protective of data subjects. Following Rhino co-founder Dr. Ittai Dayan’s leadership of the landmark project on Federated Learning in healthcare, the EXAM Study, we set out to build an enterprise-ready FL solution, but quickly realized that collaborators needed more than ‘just’ Federated Learning, they needed Federated Computing.

While maintaining a focus on security & privacy, we have turned Rhino FCP into a totally secure, totally extensible collaboration sandbox - allowing collaborators to run code on one another’s data while ensuring that security, privacy, and legal teams can be comfortable. Rhino FCP offers flexible architecture (multi-cloud and on-prem hardware), end-to-end data management workflows (multimodal data, schema definition, harmonization, and visualization), a privacy screen (custom differential privacy budget, custom k-anonymization values), and allows for the secure deployment of custom code & 3rd party applications via persistent data pipelines.

‍

Rhino Federated Computing Platform High-Level Architecture

Figure: High-level Rhino FCP architecture diagram

‍

Rhino FCP also features several powerful applications that build on the strong foundation of the basic Rhino FCP.

The Harmonization Copilot: Generative AI-powered workflow that reduces data harmonization expenses by automating the mapping of idiosyncratic data into a target data model, leveraging Large Language Models that scale across clients without requiring any data transfer for model training or inference.
The Federated Computing app: The ML Ops tools needed for federated preprocessing / annotation / transformation of data, or federated training and validating AI models.
Federated Datasets app: Online database and visualization layer for multimodal datasets sitting behind the client’s firewall, allowing for rapid discovery and seamless linking into new analytics and development projects by internal and external viewers.
Federated Trusted Research Environment (fTRE): Allows organizations to grant third parties controlled access to data sources, facilitating analysis including a full suite of biostatistical methods, AI model development, and deployment of commercial software packages on those data); but ensuring data always remain behind the site’s firewall.

Rhino FCP can also seamlessly integrate into enterprise users’ tech stacks, serving as middleware to any number of federated workloads run on distributed data.