Information Integration in Medical Databases Chris Trezzo, Professor Chen Li The Clinical Knowledge Gap With the recent explosion of new knowledge in the area of basic science, a gap has occurred between the biological understanding of health and the medical treatments that doctors have to offer patients. This gap is often referred to as the “clinical knowledge gap.” One of the things attributed to the formation of this gap, is the fact that in order to demonstrate the effectiveness of a new treatment, a large and costly clinical study is required. These studies are beyond the capacity of individual institutions, and therefore require complex data exchange infrastructure in order to support. This expensive overhead discourages the creation of clinical studies, and further slows the progression of the medical industry. · · S ummer U ndergraduate 2 R esearch 0 F ellowship in 0 I nformation 6 T echnology Currently, cooperative trial groups and consortium databases exist, but they are separate from medical center digital information systems. This causes significant variability across contributing centers, including data quality and quantity. California currently contains about 1/10 th of the total U.S. population, furthermore the UC system comprises 5 of the 8 medical schools in California. This represents a very large population that could support clinical studies with strong statistical power. The University of California has the potential to become the largest, most efficient, and most consistent clinical research laboratory in the United States. The key is linking the medical information infrastructure across all 5 schools and affiliated academic medical centers. The University of California: A Unique Position A Pilot Project Medical databases can be very complex, and an institution’s information infrastructure could span over many systems. This makes data integration an enormous task. Therefore, this pilot project has been created to transform the problem into a more manageable size. The two major goals of this project are: to show that integrating clinical data from multiple medical centers is logistically feasible, and to demonstrate the integrated system’s power and potential. For the pilot project only two medical centers, UCI and UCLA, will be integrated. Also, the department of neurosurgery, specifically Cerebrovascular Neurosurgery, was chosen to be the pilot department because it is a manageable size, and it is already one of the most collaborative departments. Figure 1 - Integrated Schema. This is a small section of the current integrated schema, the entire schema has over 100 tables. This was designed based off of questions that doctors would like to ask about clinical data. Two, very simple, examples of these types of questions are: How many patients have a specific diagnosis? What is the mortality rate of a patient with a specific diagnosis, after a certain operation was performed? Mediator-Based System Design For the pilot project, a mediator-based system (Fig. 2) was chosen for the overall approach to data integration. There will be a “virtual database,” called the mediator, which has an integrated schema from both sources. The user will issue queries to the mediator, which will in turn send each query to the corresponding wrapper. The wrapper translates the received query according to the schema from its source, and then passes the translated query to the actual source. The source responds with results from the query, the wrapper translates the query back in terms of the mediator, and the mediator combines the results from the sources. The integrated result is then displayed to the user. Figure 2 - Mediator-Based System. Current Project Status The pilot project is still in an early phase, and requirements engineering is still taking place. Knowing exactly what the doctors want from the integrated system is crucial for making it as effective as possible. An initial schema has been designed for the mediator (Fig. 1), but it is still in its first iteration. The initial momentum for the project has been created, and hopefully a solid groundwork has been laid for future work to build on. Acknowledgments References Stuart Ross, Calit2 SURF-IT Said M. Shokair, UROP Professor Chen Li, Department of ICS Mark E. Linskey, M.D., Department of Neurological Surgery Chiedozie Nwagwu, M.D., Department of Neurological Surgery Audrey Milne, Department of Neurological Surgery Garcia-Molina, H. Ullman, J. Widom, J.; Database Systems: The Complete Book, 2002, Prentice Hall. Linskey, Mark; University of California Clinical Research Initiative, UC Neurosurgery Directors Meeting 2005.