An Enterprise Architecture For Biosurveillance Systems

This section describes how an architect tasked with the design of an information system for a single organization would approach the design task. Such an organization might be a city health department or a larger organization, such as the CDC. IT professionals understand the principles of enterprise architecture very well. This section is a primer on methods in current use in many biosurveillance projects.

4.1. Architectural Style

The architect would begin by selecting an architectural style, which today would be LCS architecture.3 LCS is an architectural style that follows the following principles (rules):

(1) some components (the servers) provide functions or data at the request of other components (the clients), and

(2) components must be grouped into layers, with components of one layer accessing the functions of components in the layer below it (i.e., the upper components are clients of the lower layer components) (Figure 33.1). RODS, Electronic Surveillance System for the Early Notification of Community-based Epidemics (ESSENCE), Biosense, and virtually all modern enterprise systems use LCS architecture. Vendors of commercial off-the-shelf software design their products to fit within a LCS architecture For example, database management systems (DBMSs; e.g., Oracle, Microsoft SQL Server) and geographic information systems (GISs; e.g., ESRI's ArcGIS software) are all designed to function as servers in an LCS architecture.4

4.2. Blueprint

To complete the specification of the system, the architect would specify a data model, a set of components and a set of diagrams that specify how the components are connected.

4.2.1. Data Model

A data model for any information system will specify the data elements, vocabulary, grammar, and semantics of the system. Data elements are the basic units of information in a system, for example, in a biosurveillance system, the type of data collected (e.g., emergency department registration data, over-the-counter (OTC) medication sales, or microbiology results). Microbiology results will include data elements such as the species and genotype of the organism, antibiotic sensitivities, and specimen source. PHIN specifies the data elements required in a section called Implementation Guides. (CDC, 2005b)

3 You will also find LCS architectures referenced as two-tiered, three-tiered, or multi-tiered architectures.

4 Although the current PHIN documentation does not explicitly state that PHIN employs a layered client-server architecture style, previous technical specifications of NEDSS used this term and an information technologist reading the PHIN documentation would recognize PHIN as a layered client server design.

figure 33.1 Layered-client server architecture. In a layered-client server architecture, the architect groups components into layers, and components in one layer make requests to the layer below it but do not jump across layers. Note that a component can be both a client and a server (component B is a client of component C and a server to component A).

figure 33.1 Layered-client server architecture. In a layered-client server architecture, the architect groups components into layers, and components in one layer make requests to the layer below it but do not jump across layers. Note that a component can be both a client and a server (component B is a client of component C and a server to component A).

The system encodes the data elements by using standard vocabularies, grammar, and/or semantics dictated by the type of data that are collected. We described the common vocabularies, grammar, and semantics appropriate for many data types in Chapter 32 of this book. For example, prescribed medications should use RxNorm as the vocabulary, National Council for Prescription Drug Programs (NCPDP) as the grammar, and some form of the PHIN Logical Data Model (LDM) as the semantic model. PHIN specifies the use of LOINC, SNOMED, International Classification of Diseases Ninth Edition (ICD-9), and Current Procedural Terminology (CPT) for vocabularies; HL7 and X12 for grammar; and HL7 RIM and PHIN LDM for semantics.

4.2.2. Set of Components

The architect must specify the set of components necessary to carry out the biosurveillance processes for the planned system. Almost all enterprise systems will include components that provide network, directory, and security functions. We describe these components first in this section. Most systems will also include many components that support directly the biosurveillance process—data collection, data storage, case detection, outbreak detection, outbreak characterization, user interface, and notification. Figure 33.3 is a diagram that shows how we lay out these components in relation to each other.

Network. The network allows multiple computers and the applications that run on them to communicate with each other.

In the past, architects had to choose among a wide range of network technology. Today, the most pervasive network technology is the Internet Protocol (IP), and a biosurveillance system should run on an IP-based network.

Directory Component. The directory component comprises one or more applications that contain information about the resources—users, computers, applications, and servers— available in the system. For example, PHIN recommends that a biosurveillance system store information about the users of the system on a lightweight directory access protocol (LDAP)-compatible application, such as Microsoft's Active Directory, Sun's Directory Server, or the open source OpenLDAP. The LDAP application maintains a list of users along with their contact information, passwords, etc. This information is necessary for the system to ensure secure access to the system and notify personnel of a possible outbreak. PHIN makes recommendations on what data elements should be included in the directory in the PHINDIR specification (CDC, 2005e) and specifies public health exchange directories by using directory service markup language (DSML) and ebXML messaging.

Security Component. The purpose of the security component is to perform authentication and authorization. Authentication refers to the process that determines whether the people accessing the system are who they say they are. Authorization refers to the process that determines which data and functions of the systems a user may access.

The architect might specify that an authentication component use one or more of the following:

1. Something the user knows (e.g., their name and password)

2. Something the user has (e.g., a security token, such as an RSA Secure identification card or a digital certificate)

3. Something the user is (e.g., biometrics such as a fingerprint or iris pattern)

A biosurveillance system, particularly those that contain identifiable data (e.g., persons with HIV) should use at least two of these methods to ensure that data do not get into the wrong hands. Information technologists regard the use of more than one authentication method as strong authentication. PHIN recommends the use of strong authentication. (CDC, 2005c)

For each user, a biosurveillance system maintains a list of accessible functions and data types. To perform authorization, a system compares a user's request with this list and allows or rejects a user's request.

Data Collection Component. A biosurveillance system must be able to obtain data from a variety of different systems, share its data with other systems, and make requests for additional data. IT professionals refer to the process of collecting and sharing data across multiple existing systems as data integration. Biosurveillance systems such as RODS include a data integration component that collects and shares data from/with multiple outside systems. Commercially available software for data integration includes Sun Microsystems SeeBeyond, Microsoft's Biztalk, IBM's Websphere Business Application, and the open source Business Integration Engine.

The process of data collection can be sorted into four subprocesses—extraction, transport, transformation, and loading (Figure 33.2). Extraction is the process for pulling data out of another computer system, and takes place within that system (usually at a computer located in a facility owned by an organization other than the organization for whom the architect is designing the system). Transport is the transfer of the extracted data to the data recipient. Transformation is the process of manipulating the data (i.e., changes in data format, mapping local terms to standard terms, ignoring certain data elements, and generating new data elements from existing elements) so that they matches the data model of the data recipient's systems. Loading is the process of inserting the data into the storage component. PHIN does not specify how to extract data from a provider system but does specify the use of the PHIN Messaging System standard (a secure method of transferring data from one computer to another) to transport the data and XML for data transformation.

We note that the data collection component is a major component in a biosurveillance system that requires proper configuration and maintenance. We discuss support of a data collection component in the next chapter.

Data Storage Component. The purposes of the data storage component in a biosurveillance system are to provide local storage and efficient retrieval mechanisms for biosurveillance data. An architect must consider two major types of data that need to be stored in a biosurveillance system—transactional data and cached data.

Transactional data are raw data collected from data providers, such as users, hospitals, laboratories, water supply systems, and retailers, as well as the results from analyses of the data. Examples of transactional data include manually entered case information (e.g., name, contact information, partners), an electronic patient registration from an emergency department (e.g., date/time of the visit, age, sex, chief complaint, home zip code and work zip code of the patient), and outbreak alerts (e.g., date/time of the alert, data types analyzed, severity of the alert).

Cached data are either cached query results or Online Analytic Processing (OLAP) data cubes. Cached query results are the results from previous requests for data from a client to

figure 33.2 Data collection comprises four subprocesses— these processes take place.

extraction, transport, transformation, and loading. Boundary lines indicate where each of figure 33.2 Data collection comprises four subprocesses— these processes take place.

extraction, transport, transformation, and loading. Boundary lines indicate where each of the storage component. These cached query results improve efficiency when users make multiple requests for the same data because the storage system does not have to search and assemble the results repeatedly. OLAP data cubes are a transformation of the transactional data to support efficient analysis. An example of data found in an OLAP cube is the number of patients who live in a particular zip code under the age of 18 with respiratory complaints each day from January 1, 2004, to January 1,2005. Although a biosurveillance system can obtain such information without an OLAP data cube by retrieving all of the transactional records that meet the criteria and counting the number of records obtained, such a process takes an inordinate amount of time (1000 times slower).

An architect should use a DBMS designed for an enterprise information system, such as Oracle and Microsoft SQL server, for the storage component. The DBMS should support trans-actional data, cached query results, and OLAP data cubes. We discuss these configuration and maintenance issues of a DBMS in the next chapter.

Detection and Characterization Components. The detection and characterization components are the brains of a biosurveillance system. They use natural language processing tools to extract features from free-text data, diagnostic expert systems to detect possible cases of disease, and spatial and/or temporal analytic methods to detect and characterize possible outbreak of disease. The detection and characterization components store the results of their analyses—cases, outbreaks, and outbreak characteristics—in the storage component. Chapters 3, 13, and 17 provide detailed discussions of case detection, outbreak detection, and the details of natural language processing. The detection and characterization components may use a general statistical analytic tool, such as SAS, or specialized analytic software, such as the algorithms discussed in Part III of this book.

User Interface Component. An architect will include interfaces for the users of a biosurveillance system to facilitate review of the data. The predominant mechanism for interacting with users today is a computer display, keyboard, and mouse. Such a user interface may be Web-based or desktop-based. In a desktop-based system, a program that runs on the user's desktop computer connects to a server that accesses the data or provides the functions of the system. In a Web-based system, a Web browser runs on the user's desktop computer that connects to a Web server that accesses the data or provides the functions of the system.

An architect chooses a Web-based or desktop-based user interface based on several factors that include the needs of the users and the ability of the organization to maintain desktop applications. Desktop-based user interfaces are faster, more interactive, and integrate well with other desktop applications but require installation onto a user's desktop computer.

Web-based user interfaces are universally accessible from any Web browser but lack the interactivity and integration that a desktop application provides. PHIN specifies Web-based user interfaces for some parts of the system. For example, the section of PHIN that discusses manual entry of data into an early event detection system says that, "Web browser-based data systems should be developed using commercial application server technology as part of a multitiered web development system using open-platform web servers" (CDC, 2005d).

Epidemiologists commonly view surveillance data as a set of tables (e.g., line listings), graphs (e.g., epidemic curves), or maps. In a biosurveillance system, the user interface component displays data as tables and graphs generated by a graphing tool or statistics package, such as SAS. The user interface component generates maps from a GIS, such as ESRI's ArcGIS software. PHIN recommends that the system display data as line listings, graphs, and maps. (CDC, 2005a)

Notification Component. The last component in a biosurveillance system is the notification component. This component sends alerts to users in the form of an e-mail, automated voice to a phone, page, or fax and tracks the responses to these alerts. The alerts contain information or links to information about possible outbreaks or cases of disease.

PHIN recommends the use of the Common Alerting Protocol (CAP) (Jones and Botterell, 2005) for this purpose. CAP is an open, nonproprietary digital format for emergency alerts and notifications.

4.2.3. Layout of the Components

We show the layout of the components in Figure 33.3.

0 0

Post a comment