Transforming Public Health Practice with Rhino Health’s Harmonization Copilot
Executive Summary
Rhino Health’s Harmonization Copilot, a new application of the Rhino Federated Computing Platform (Rhino FCP), addresses the significant challenges public health organizations face integrating diverse data sources by standardizing data to common interoperability standards like FHIR and OMOP¹. Leveraging Generative AI, the Harmonization Copilot automates data transformation, enhancing the quality and enabling effective collaboration across various stakeholders. The Harmonization Copilot offers users automated data cleaning & curation, semantic & syntactic mapping, and custom ontologies. The Harmonization Copilot integrates seamlessly with Rhino FCP’s robust data management, federated learning, and advanced analytics capabilities, offering a cost-effective, scalable solution that simplifies regulatory compliance and fosters cross-departmental cooperation.
By streamlining data harmonization processes, the Harmonization Copilot significantly improves decision-making and public health outcomes, transforming data management and facilitating timely and effective public health interventions.
1. Introduction
Public health organizations, including government health departments, regulators, agencies, hospitals, laboratories, leverage data across various functions at both from the local and global levels. These encompass surveillance and research, policy development and planning, health protection, health promotion, service delivery (including primary health care and emergency response), regulation and enforcement of health laws, workforce development, and equity and social justice initiatives.
However, these organizations need help integrating data from diverse sources such as hospitals, clinics, and laboratories. Despite the challenges posed by different formats and terminologies, effective data integration and analysis, from simple biostatistics and epidemiology to multimodal AI, can lead to significant improvements in public health outcomes.
Collaborations with other stakeholders, such as professional societies, foundations, academic medical centers, community hospitals, biobanks, private companies, and patient advocacy groups, further complicate this effort. Each stakeholder maintains patient data in different local data models, creating barriers to effective collaboration. Rhino Health addresses these challenges through the Rhino Federated Computing Platform and its new application, “Harmonization Copilot.” This application standardizes data to common interoperability standards like FHIR, common data models, and frameworks like OMOP, facilitating more accessible and more effective collaboration.
2. The Current Landscape and Challenges of Data Harmonization in Public Health
Public health organizations manage vast and diverse datasets originating from multiple sources. These include electronic health records (EHRs), patient registries, insurance claims, and various administrative systems. These systems are not designed to communicate with each other—this lack of interoperability results in data inconsistencies and inefficiencies.² Furthermore, Public health initiatives frequently require collaboration across different agencies and departments, each with its data management practices, further complicating the data harmonization process.
Challenges:
- Data Heterogeneity and Inconsistencies: Differences in data definitions, standards, and quality across agencies or departments further complicate harmonization efforts. For example, the same piece of information might be recorded differently across various datasets, such as “DOB,” “date_of_birth,” “birthdate,” and “birth_date.” These inconsistencies hinder effective data harmonization.
- Interoperability Issues: Disparate data models and terminologies across public health organizations lead to significant interoperability issues. Different terms for the same data, such as “heart attack” and “myocardial infarction,” complicate data integration and usage. These discrepancies necessitate a robust harmonization process to ensure data can be accurately interpreted and utilized across various systems and departments.
- Incomplete or Missing Data: Public health datasets frequently have incomplete records, missing values, or gaps. These inconsistencies can complicate the harmonization process and lead to inaccurate or biased insights. Ensuring data completeness and accuracy is crucial for reliable data harmonization.
- Legacy Systems and Data Silos: Many public health organizations rely on outdated legacy systems and databases, resulting in data silos and incompatible data formats. Integrating and harmonizing data from these disparate systems is often resource-intensive and technically challenging. Overcoming these silos is essential for creating a unified data environment. Furthermore, overburdened IT teams usually need help managing these disparate systems, hindering effective data utilization and collaboration.
- Privacy and Security Concerns: Public health data often includes sensitive information, such as personal or confidential data. This necessitates strict privacy and security measures during the harmonization process. Compliance with data protection regulations, such as HIPAA³ or GDPR⁴, adds another layer of complexity to the harmonization efforts.
- Limited Resources and Expertise: Data harmonization requires significant resources, including skilled personnel, advanced tools, and robust infrastructure. Public health organizations often face budget constraints and lack expertise, delaying critical activities like AI development and model validation. Compliance with data protections, such as HIPAA in the US and GDPR in the EU, also demands significant resources, further straining limited budgets.
- Scalability and Maintenance: As data volumes and sources grow, scaling and maintaining harmonized datasets becomes increasingly complex and resource-intensive. Ensuring the harmonization process remains adaptable and sustainable over time is a significant challenge. The ability to scale and sustain harmonized datasets is crucial for long-term success.
Disparate data models and overburdened IT teams hinder collaboration and effective data utilization, significantly impacting decision-making and the ability to deliver timely and effective public health interventions. These challenges highlight the importance of robust data governance, standardization, and collaboration within and across public health organizations. Effective data harmonization can unlock the full potential of their data assets, enabling more informed decision-making and enhanced collaboration.
3. Introducing the Harmonization Copilot
The Harmonization Copilot is a new application of the Rhino FCP that helps automate the complex data harmonization process. Leveraging Rhino FCP’s capabilities, the Harmonization Copilot uses Generative AI to streamline the standardization of diverse data formats, ensuring compliance with common interoperability standards such as FHIR and common data models and frameworks such as OMOP. This standardization is crucial for improving data interoperability and facilitating collaborative research and analytics across various institutions.
The following diagram shows a visual workflow illustrating the steps from data ingestion to harmonization and reporting, demonstrating the Harmonization Copilot functionality and ease of use:
3.1. Harnessing the Power of Generative AI for Data Harmonization
At the heart of the Harmonization Copilot is Generative AI employed to navigate the intricacies of syntactic and semantic harmonization. Using LLMs allows the Harmonization Copilot to perform sophisticated data transformations, ensuring that data from different sources can be accurately mapped to common standards. This reduces the manual effort traditionally required in data harmonization projects, significantly accelerating the process.
Generative AI enables the Harmonization Copilot to complete the data harmonization workflow intelligently:
- Semantic Understanding: The AI understands and maps semantic similarities across different data sources, ensuring consistent and accurate data integration.
- Automated Code Generation: The AI minimizes human intervention and errors by generating and executing code for data transformation.
- Continuous Improvement: Through a human-in-the-loop approach, the system continually learns and improves its mappings, enhancing accuracy over time.
3.2. Harmonization Copilot: Key Features and Capabilities
The Harmonization Copilot is designed with several key features and capabilities that make it a valuable tool for public health organizations:
- Intuitive Workflow: The application provides an easy-to-use interface facilitating collaboration between data engineers and clinical subject matter experts (SMEs). This collaborative environment ensures that the harmonization process is both efficient and accurate.
- Human-in-the-Loop Refinement: The Harmonization Copilot continuously refines its data mapping by incorporating user feedback. This iterative process enhances the quality and reliability of the harmonized data.
- Edge Processing: The Harmonization Copilot processes data locally within the institution’s environment to ensure data privacy and security. This capability aligns with stringent data protection regulations and reduces the risk of data breaches.
- Flexibility: The Harmonization Copilot supports transforming local data to FHIR resources and the OMOP¹ common data model. Additionally, users can define their bespoke data models and map to custom vocabularies, offering tailored solutions to meet specific needs.
The Harmonization Copilot is seamlessly integrated into the Rhino FCP, which brings the following capabilities for enhancing data harmonization:
- Federated Learning (FL): Users can engage in collaborative AI model training without sharing raw data, preserving privacy while leveraging diverse datasets. This preserves data privacy while leveraging diverse datasets from multiple sources. The FL [See the potential of AI and FL in public health emergencies in a landmark publication by Rhino Health’s Co-Founder and CEO, Ittai Dayan (Dayan et al., 2021)⁶] ensures enhanced model robustness and generalizability through diverse data inputs.
- Data Management: The platform offers robust tools for data ingestion, validation, and versioning tools. It ensures data integrity by enforcing predefined schemas and rules during data import, providing real-time data validation, and maintaining comprehensive version control to track changes and updates to datasets, ensuring transparency and reproducibility.
- Federated Statistical Methods and Analytics: This application allows public health organizations to perform advanced statistical analyses on harmonized data without centralizing it. It enables collaborative research and analytics while maintaining data privacy and security. Use cases include multi-site clinical trials, public health surveillance, and large-scale epidemiological studies.
- Remote Data Viewers and Annotation: This tool facilitates remote data access and annotation, allowing researchers and clinicians to view and annotate datasets from multiple locations securely. It supports collaborative projects, peer reviews, and training initiatives by providing real-time data interaction and feedback.
4. Benefits for Public Health Organizations
The Harmonization Copilot can transform this landscape by standardizing data to common interoperability standards like FHIR and common data models and frameworks like OMOP, facilitating more accessible and effective collaboration. It provides several advantages for public health organizations:
- Cost-Effectiveness: High mapping coverage at competitive pricing reduces the need for additional clinical and data engineering human resources, offering a cost-effective solution for data harmonization.
- Streamlined Data Processing: Harmonization Copilot automates data harmonization, drastically reducing project timelines by streamlining ETL processes. This enhanced processing speed accelerates data availability and utilization.
- Improved Data Quality: Accurate and consistent data mappings enhance data quality, leading to better insights and decisions.
- Enhanced Collaboration: Harmonization Copilot facilitates cross-departmental and cross-organizational collaboration, enabling more effective data use. It bridges the gap between data scientists, data engineers, and clinical SMEs, fostering a collaborative environment.
- Regulatory Compliance: Harmonization Copilot ensures compliance with data privacy and security regulations through local processing and robust governance, making it easier for public health organizations to meet regulatory requirements.
5. Public Health Applications
By addressing diverse public health needs, the Harmonization Copilot demonstrates its value in facilitating efficient data integration, improving decision-making, and enhancing public health interventions. The successful implementation of the Harmonization Copilot in healthcare settings highlights its potential to transform data harmonization processes across various public health applications, ultimately enabling more effective and timely responses to pressing health challenges.
The Harmonization Copilot will empower public health agencies across several domains:
5.1. Health Protection: Enhancing Outbreak Surveillance and Response
Public health agencies are critical in responding to disease outbreaks by coordinating surveillance, implementing control measures, and providing accurate information to the public and policymakers. Rapidly identifying and confirming cases, tracing contacts, and understanding transmission dynamics require accurate and up-to-date information. However, outbreak investigation faces significant data challenges, such as the timely collection, integration, and analysis of vast amounts of epidemiological data from diverse sources, all while protecting privacy. The Harmonization Copilot helps solve these challenges..
Example scenario: A public health department is tasked with containing an outbreak of a novel pathogen. Data arrives from various hospitals using different terminologies for key variables such as exposures, symptoms, signs, and outcomes. The Harmonization Copilot automatically harmonizes this data, enabling public health officials to quickly identify trends and hotspots. This timely and accurate data integration supports better decision-making, allowing swift public health interventions, such as containment, vaccination, and exposure reduction.
5.2. Health Services Delivery: Streamlining Research and Planning
Public health organizations research to continuously monitor and improve health service delivery across indications and populations. However, inconsistent data formats and incomplete records can hinder research efforts. The Harmonization Copilot addresses these issues by ensuring data quality and consistency.
Example scenario: A government health research agency aims to evaluate the effectiveness of a new neonatal health program across multiple regions. The program data from various regions is consistent and stored in different formats. The Harmonization Copilot standardizes this data, ensuring it is accurate and comparable. Researchers can then analyze the harmonized data to draw meaningful insights about the program’s impact, leading to evidence-based policy recommendations.
5.3. Policy Development: Improving Cross-Departmental and Intersectoral Collaboration
To address the social determinants of health and improve equity, health departments must collaborate with non-health sectors, particularly education and social services. However, this task often encounters challenges with data silos due to legal, administrative, and technical constraints. These impede collaboration and efficient data utilization. The Harmonization Copilot facilitates cross-departmental data integration, enhancing collaboration.
Example scenario: A state health department wants to integrate data from its public health, social services, and emergency management departments to better understand the social determinants of health. Each department uses different data systems and formats. The Harmonization Copilot harmonizes data across these departments, creating a unified dataset. This integrated data enables comprehensive analysis, helping the state to design more effective interventions that address the root causes of health disparities.
5.4. New Analytics and Tools: Supporting AI Development and Model Validation
Developing and validating AI models across disease indications and populations requires high-quality, harmonized data. The Harmonization Copilot ensures that data used in AI projects is standardized and reliable.
Example scenario: A public health agency is developing an AI model to predict the readmission risk of a subpopulation post-stroke. The model requires large volumes of high-quality data from various sources, including electronic health records, laboratory results, and social media feeds. The Harmonization Copilot standardizes this data, ensuring it is suitable for AI model training and validation. The result is a robust and accurate AI model that helps the agency predict hospital flows to manage scarce resources more effectively.
5.5. Regulation: Enhancing Compliance
Public health organizations must ensure adherence to stringent data privacy and security regulations. The Harmonization Copilot helps these organizations meet regulatory requirements by processing data locally and maintaining robust data governance.
Example scenario: A federal health agency is responsible for ensuring that all data collected from various healthcare providers complies with national data privacy standards. The Harmonization Copilot processes data locally at each provider’s site, ensuring that sensitive information is not transferred. This approach maintains data privacy and ensures compliance with data protection regulations like HIPAA³ in the US and GDPR⁴ in the EU.
5.6. Health Promotion: Facilitating Population Health Management
Establishing population health requires integrating data from various public health initiatives and healthcare providers. The Harmonization Copilot facilitates these programs’s development, monitoring, and evaluation through interoperable data sources.
Example scenario: A county health department aims to implement a population health management program to reduce the incidence of Type 2 Diabetes by a multimodal program at community centers. Data from various sources, including hospitals, community clinics, and wellness programs, needs to be integrated to understand the incidence and optimize delivery. The Harmonization Copilot standardizes this data, creating a comprehensive dataset that provides insights into population health trends. This information allows the county to design a targeted intervention and monitor its effectiveness over time.
6. Success Stories
The Harmonization Copilot is already in use in hospital settings, demonstrating clear potential benefits for public health organizations: Automating Clinical Data Standardization with Rhino Health’s Harmonization Copilot. Testimonials from stakeholders highlight the positive impact of using the Harmonization Copilot, reinforcing its value and effectiveness.
“Overseeing our partnership with Rhino Health has been transformative. The Harmonization Copilot has changed how we handle clinical data, seamlessly integrating and standardizing vast arrays of information across multiple systems. The Rhino Federated Computing Platform’s Harmonization Copilot not only enhances our operational efficiency but also boosts our capabilities in patient care and clinical research, strengthening our healthcare innovation assets at ARC Innovation at Sheba Medical Center.” —Benny Ben Lulu, Chief Digital Transformation Officer, Sheba Medical Center and Chief Technology Officer at ARC Innovation.
7. Integrating with Rhino FCP
The Harmonization Copilot is seamlessly integrated into the broader Rhino Federated Computing Platform (Rhino FCP), bringing additional capabilities. Rhino FCP enhances data collaboration, analytics, and biostatistics, offering a comprehensive solution for public health organizations.
The benefits of other applications within the Rhino FCP, such as Federated Datasets, Remote Data Viewers and Annotation, Federated Training and Validation, and Federated Statistical Methods and Analytics, further enhance the platform’s value. These applications support advanced data analysis and collaboration, driving more informed decision-making.
8. Conclusion
Harmonization Copilot addresses the significant challenges of data harmonization in public health, transforming public health data management. By standardizing data to common interoperability standards like FHIR and common data models and frameworks like OMOP, Harmonization Copilot facilitates collaboration, improves data quality, ensures regulatory compliance, and offers a cost-effective solution. The critical role of data harmonization in public health cannot be overstated, and Harmonization Copilot is poised to make a substantial impact.
Discover how Harmonization Copilot can transform your data harmonization processes, enhance collaboration, and drive more effective public health interventions. Contact us today to learn more, schedule a demo, and start your journey toward seamless data integration and improved public health outcomes.
References and Notes:
¹ FHIR: is an interoperability standard [developed by HL7 (Health Level 7)] for exchanging and integrating healthcare data. At the same time, OMOP is a common data model (CDM) and framework [developed by the Observational Health Data Sciences and Informatics (OHDSI) community] for standardizing and analyzing observational healthcare data for research purposes. The two standards are complementary, as FHIR can be used to exchange and integrate data into the OMOP CDM, enabling interoperability and facilitating large-scale analytics and research on observational healthcare data.
² Cheng, C., Messerschmidt, L., Bravo, I., et al (2024). A General Primer for Data Harmonization. National Library of Medicine. National Center for Biotechnology Information. Scientific data. Available at: https://www.nature.com/articles/s41597-024-02956-3.
³ HIPAA: The Health Insurance Portability and Accountability Act (HIPAA) is a US law designed to provide privacy standards to protect patients’ medical records and other health information provided to health plans, doctors, hospitals, and other healthcare providers.
⁴ GDPR: The General Data Protection Regulation (GDPR) is a regulation in EU law on data protection and privacy in the European Union and the European Economic Area, addressing the transfer of personal data outside the EU and EEA areas.
⁵ ETL (Extract, Transform, Load): A process that involves extracting data from various sources, transforming it into standardized formats compatible with common data models like OMOP and FHIR, and loading the harmonized data into a centralized repository. This ensures seamless data integration and enables effective data analysis.
⁶ Dayan, I., Roth, H. R., Zhong, A., et al. (2021). Federated Learning for Predicting Clinical Outcomes in Patients with COVID-19. Nature Medicine. Available at: https://www.nature.com/articles/s41591-021-01506-3.