Increasing Transparency in AI Model Performance and Reducing Time to Insights with Distributed Registries for a Prominent Medical Society
A leading Medical Society has partnered with Rhino Health to augment this society’s investments in edge computing, providing value-added solutions to its members, such as improved AI model evaluation and distributed registry building.
Solution 1: AI Model Evaluation
AI model vendors invariably tout high-performing models. Often, these models perform exceptionally well on a narrow training data set. Unfortunately, these training data sets are often too limited - resulting in a model that is not generalizable to a broader population. Clinical leaders want to take advantage of these models but are resource-constrained. They do not want to waste their hospital’s resources implementing a model only to have it not perform on that hospital’s patient population. Our partner Medical Society wished to offer its members a streamlined solution to evaluate AI model performance using those members’ local data, serving as a trusted broker so its members could be confident in their investments in AI models - without taking on the risk of moving sensitive patient data, and simultaneously protecting vendor model Intellectual Property (IP).
Existing Tech Stack
The Medical Society looked to Rhino Health to integrate with a software suite already available to its members. The Medical Society’s existing AI application is an AI toolkit designed for users with minimal programming or machine-learning experience. The existing application uses cloud-native technology and is hosted on the Medical Society’s servers. The AI application also offers learning resources, algorithm building, testing, and community activities. The AI application on-prem is a local instance behind the institution’s firewall, enabling algorithm testing and development with local data while participating in community activities like Federated Learning.
Medical Society’s AI Application and Rhino Health’s Federated Computing Platform (FCP) Integration
The Medical Society’s Central Hub interfaces with health IT systems (e.g., PACS¹, EHR²) using protocols like DICOM³ and HL7 FHIR⁴. The Medical Society’s AI application, running on the hub, facilitates local or federated AI training and testing. Rhino Health’s FCP is installed alongside the hub, sharing network storage and backend software components for data processing and model training. The primary use case in this stage is AI model evaluation using local datasets from health systems. The AI application’s standard validation metrics will be applied to evaluation records exported to a shared storage folder.
Securing IP in AI Model Evaluation with Rhino Health’s FCP
The AI evaluation process using Rhino Health’s FCP is designed to harness AI’s power in evaluating medical data and safeguarding model developers’ IP. Users can either explore datasets that others have made available, or make available their own data on the platform. Key steps include de-identifying and annotating the data for privacy and relevance, followed by data validation with the AI application. A critical aspect of this workflow is the robust protection of the model developer’s IP. As evaluation jobs are conducted using Rhino Health’s REST API⁵, and the results are processed and stored within Rhino Health’s secure infrastructure, the proprietary algorithms and IP of the models remain secure. This ensures that developers can confidently use the platform knowing their IP is protected against unauthorized access or duplication.
When a health system initiates an evaluation in the AI application, a user will be set up and run the evaluation job with Rhino Health’s REST API. Users can identify an appropriate Rhino Health project and dataset for the task. The process involves creating an AI evaluation task using a specific evaluation container and running it to process the data. Once the evaluation is complete, the results are stored in Rhino Health’s internal storage in a secured, privacy-protected manner. The evaluated data is then exported to a shared storage space, accessible for further analysis by a limited set of authorized users. Finally, the evaluation metrics, summarizing the AI models’ performance, are displayed within the AI application and accessible to a wider group of stakeholders while maintaining the integrity of the model developers’ IP.
The project is supported by a robust, flexible infrastructure setup, including:
- Source systems can be located either on-premises or provided by cloud services.
- Connectivity is managed through cross-cloud interconnect and cloud VPN⁶, leading to a shared VPC⁷. This VPC comprises essential components like Firewalls Rules, Cloud Identity, the Medical Society’s AI application VM⁸, and Rhino Health’s Client VM.
- The AI application VM and Rhino Health Client VM are interconnected with a shared storage volume, facilitating data import and export.
In the execution flow, the AI application VM is responsible for DICOM and EHR connectivity and initiating tasks through the Rhino Health REST API. The Rhino Health Client VM manages the evaluation job and exports the results to file storage.
This first solution illustrates a comprehensive approach to AI model evaluation - giving potential buyers the ability to test a model on real data - and also demonstrating the power of the Rhino Health Federated Computing Platform to integrate with a variety of existing infrastructure.
Solution 2: Multimodal Distributed Registries
The second solution Rhino Health is helping our partner build addresses the creation of multimodal distributed registries. These registries comprise a database that systematically collects, categorizes, and stores metadata from various sources (modalities) to support clinical trials, quality & safety measure development, and other research applications. “Multimodal” refers to the inclusion of different types of data, which can include:
- Clinical Data: Patient histories, physical examination results, clinical outcomes, and follow-up information.
- Imaging Data: Radiological images from MRI, CT scans, X-rays, and other imaging technologies.
- Laboratory Data: Results from blood tests, biopsies, and other laboratory tests.
- Genomic Data: Generic information that may influence disease progression or treatment response.
- Patient-Reported Data: Information directly reported by patients, such as symptoms, quality of life assessments, and treatment side effects.
- Device Data: Data from medical devices like pacemakers or continuous glucose monitors.
A multimodal registry facilitates comprehensive research considering various aspects of patient health and disease journeys. By integrating different data types, researchers can gain a holistic view of patient conditions, treatment effects, and outcomes. This integration is crucial for several applications, including:
- Enhancing Clinical Trials: Multimodal registries can improve patient selection, trial monitoring, and outcome assessment, leading to more effective and efficient clinical trials.
- Advancing Personalized Medicine: They support the identification of biomarkers for disease and response to treatment, which is key for personalized medicine approaches.
- Facilitating Longitudinal Studies: Registries allow tracking of patient information over time, providing valuable data for longitudinal studies that can inform treatment and care strategies.
- Improving Quality of Care: Insights from registry data refine clinical guidelines, improve patient management strategies, and inform policymaking.
Traditional registries face challenges with cost-effectiveness and complying with privacy regulations, limiting their scale and versatility. By adopting a distributed architecture, the Medical Society can extract more powerful insights, exert greater control over data quality, scale the operation, and update software periodically to meet local needs. Using the Rhino Health FCP reduces costs and improves the long-term variability of research efforts.
Illustrative examples includes:
- Establishing auditable processes and securing data communication.
- Ensuring data protection through authentication and authorization measures.
- Enabling complex data queries managed by the Medical Society or authorized third parties.
The Medical Society is taking advantage of the Rhino FCP's innovative feature set as part of its registry building. Rhino is using Natural Language Processing (NLP) run on member healthcare organization edge nodes to extract insights from clinical notes. This application is novel because the model runs at the edge, avoiding data transfer to reduce compliance risk and liability. The Medical Society is also building on Rhino Health’s Federated Datasets feature, which streamlines the creation of multimodal datasets using hard-to-de identified data elements like detailed clinical notes and specific imaging studies. This feature and the resulting registries provides medical researchers with richer, more detailed datasets than typically available today, enhancing the depth and quality of research. This leads to enriched data registries, kept locally at the discretion of participating sites while still granting the Medical Society access to query data warehouses, thereby generating on-demand distributed registries with governed access.
Solution 3: Data Harmonization of Federated Multimodal Data
Idiosyncratic health data, often characterized by heterogeneous formats, varying data structures, and diverse terminologies across clinical systems, poses significant impediments to developing and deploying AI-based healthcare solutions. Each health system, and even hospitals within a single health system, stored clinical data in various formats. The challenge of idiosyncratic health data led to the development of standardized frameworks such as the OMOP CDM⁹ and HL7 FHIR. Despite these standards, solution projects often stall because of the challenge of aligning data models across sites. Rhino Health has developed a Large Language Model (LLM)-driven solution to automate this process for our partners.
LLMs allow researchers to overcome an often arduous, manual mapping process. We complete two distinct tasks in this automated data harmonization process:
- Syntactic interoperability: Defines the format of the data exchange that takes place between systems. This might come in the form of a relational table with standardized table and column names (e.g., the condition_source_value column in the condition_occurrence table of the OMOP CDM) or a specific JSON structure (e.g., the patient resource as defined by the HL7 FHIR standard).
- Semantic interoperability: In healthcare, all data uses a common vocabulary that enables accurate and reliable machine-to-machine communication. For example, semantic interoperability would entail using of the ICD10 standard to represent a patient’s conditions so that, regardless of the health system, the same code (‘E11.9’) means a diagnosis of type 2 diabetes without complications. Unfortunately, health systems employ a range of medical terminologies, and nonstandard ways of documenting important clinical details have led to significant barriers in this layer of interoperability. It is essential, then, to “normalize” or codify the data being exchanged wherever possible.
We use LLMs to auto-generate syntactic and semantic data mappings, all without data transfer. The Data Harmonization solution fosters a system that allows for seamless, near-real-time model inference and result processing, effectively closing the loop from training to model deployment. Running the LLM at the edge means patient data stay secure and private.The results lead to more personalized and accurate medical research outputs. Furthermore, the solution assists in addressing the challenges of patient cohort identification and data harmonization in Federated Learning, enhancing medical research’s overall quality and relevance.
Conclusion
The strategic partnership between the leading Medical Society and Rhino Health has advanced medical research methodologies and paved the way for future innovations in healthcare. The three solutions built with the Medical Society are prime examples of how Rhino Health is enabling our partners’ AI strategies by providing crucial components on the data collaboration tech stack.
Notes:
(1) PACS: Picture Archiving and Communication System, is a medical imaging technology used primarily in healthcare organizations to securely store, digitally transmit, and access medical images.
(2) EHR: Electronic Health Records are versions of patients’ paper charts and are real-time, patient-centered records that make information available instantly and securely to authorized users. They contain a patient’s medical history, diagnoses, medications, treatments plans, immunization dates, allergies, radiology images, and laboratory and test results.
(3) DICOM: Digital Imaging and Communications in Medicine is an international standard for storing, retrieving, printing, and transmitting information in medical imaging. It includes a file format definition and a network communications protocol. It enables the integration of medical imaging devices such as scanners, servers, workstations, printers, network hardware, and PACS from multiple manufacturers.
(4) HL7 FHIR: Health Level Seven Healthcare Interoperability Resources is a standard for exchanging healthcare information electronically. It focuses on ease of implementation, and it is based on emerging industry approaches such as RESTful web services, OAuth, and JSON. FHIR enables different healthcare systems to effectively share clinical and administrative data. It’s designed to enable information sharing in a granular, easy-to-access way, making it more functional in the internet-based era of healthcare.
(5) REST API: Representational State Transfer Application Programming Interface is a set of rules and standards used for building and interacting with web services. It allows different software applications to communicate with each other using the standard methods of HTTP requests to access and use data.
(6) VPN: Virtual Private Network refers to a technology that creates a safe and encrypted connection over a less secure network, such as the internet. It is used for securely connecting different parts of a network, ensuring safe data transfer and communication between systems, which is essential in healthcare settings for protecting sensitive medical data.
(7) VPC: Virtual Private Cloud is a concept within cloud computing that provides a private, isolated section of a public cloud to an organization. It allows creating a virtual network in the cloud, where resources such as virtual machines, storage, and network configurations can be securely managed and operated.
(8) VM: Virtual Machine, is a software emulation of a physical computer. It runs an operating system and applications just like a physical computer, but it exists as a software-defined entity, residing on a physical host machine.
(9) OMOP CDM: Observational Medical Outcomes Partnership Common Data Model is a standardized data model used in healthcare to facilitate the organization, integration, and analysis of observational healthcare data from different sources. The OMOP CDM harmonizes diverse data sources into a common format and structure, which allows for effective and efficient querying, analysis, and sharing information. In the projects that involve advanced AI and machine learning, the OMOP CDM plays a critical role in ensuring that the data used for training and validating AI models is standardized, high-quality, and representative of real-world clinical scenarios. This standardization is essential for developing robust, accurate, and generalizable AI applications in healthcare.