Steve DeFrancesco, Kevin Hutchison, and Michael M. Wagner RODS Laboratory, Center for Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania
An organization, when designing, developing, or acquiring a biosurveillance system, faces many decisions that range from the design of the physical facility that will house the system, to what type of software to use for the system, to methods for data collection and storage. Understanding of the full range of technical options and their strengths and limitations is essential for making informed decisions. The purpose of this chapter is to explore a set of key decisions not previously discussed in the chapters on standards and architecture. In particular, we will discuss the physical facility, software, functionality, methods of data transmission, and the use of data utilities. An organization should make most if not all of these decisions before acquisition or implementation. The immediate and long-term process of building and operating a biosurveillance system will be less costly and disruptive to the organization if the issues we discuss in this chapter are considered in advance of acquisition or implementation.
2. HOSTING FACILITY_
Installing and maintaining a biosurveillance system is a large undertaking that will consume significant resources both initially and in the future. Because there are companies that can operate biosurveillance systems off site, an organization needs to weigh the costs and benefits of operating a system in-house or outsourcing its operation (or joining with similar organizations to share the cost and effort). The decision has significant ramifications for the organization. It will influence how the organization's staff spend their time: installing and maintaining a biosurveillance system, or using it as a tool in their daily activities. In some respects, this decision resembles the decision whether to develop a phone system in-house or use external services, except that the widespread availability of excellent telephone companies makes the latter decision quite straightforward.
For some organizations, this decision may be a very quick one: If an appropriate facility does not exist or resources are not sufficient to develop and maintain a facility—including funding, hardware, and seasoned maintenance personnel, the organization should not attempt to install and maintain a biosurveillance system. Building an adequate facility can be quite expensive.
Each situation is different, but a rough estimate is a few hundred thousand dollars to a few million dollars in initial capital investment alone. In addition, there will be recurring costs owing to yearly maintenance contracts for equipment, the cost of utilities, and salaries for support personnel. In general, local hosting is expensive and time-consuming.
An alternative to the local installation, and perhaps a better use of funds, is to outsource the entire system to an application service provider (ASP). Using an ASP to install and maintain a system at its location can eliminate or reduce many of the costs associated with building, operating, and maintaining a hosting facility. The ideal situation is when the provider of the biosurveillance software (commercial vendor, government organization, university) also offers its product as an ASP service. Using a provider who offers an ASP service will not only be more cost-effective than a local installation but will also provide the comfort of knowing that the biosurveillance software provider is managing all aspects of the actual biosurveillance service for the organization.
Whether building a local system or using an ASP, it is important to review the considerations outlined in the following section to ensure the hosting facility is suitable for your biosurveillance system.
To achieve the goal of maintaining a highly available system (one that is available 24 hours a day, seven days a week), whether it be for Web-based disease reporting, hospital infection control, electronic laboratory reporting, or other purposes, you must consider the physical and environmental characteristics of the facility housing the system. In particular, you must consider location, power, cooling, security, cabling/connectivity, monitoring/management, and service/maintenance contracts. There are entire books devoted to many of these factors. In this section, we provide a brief overview of each in order to give you a base understanding. We suggest that you skim these sections at this time and note them for future reference when you are ready to start your design process.
When assessing the suitability of a potential hosting facility, whether it be for the primary location or a backup location, take into consideration how prone the site is to a natural disaster, be it an earthquake, hurricane, tornado, volcano, landslide, or flooding. Evaluate the surrounding area; talk to local utility companies and to neighboring landowners to determine if the site is susceptible to other types of interruptions. Local landowners can provide insights to problems that may not be available from formal sources. Examples of this type of information might include knowing that the local power grid is prone to outages during bad weather or that new construction is about to begin in the surrounding area, which may translate to backhoe season. With any proposed nearby construction, a concern to be aware of is the digging up of water lines and communication cables. Also, ask if the local storm sewer system ever overflows during heavy rains. Good site selection and proper planning, design, and maintenance can help avoid main outages. Evaluate the site completely internally and externally.
Stable power is the most important resource to any system. Take it away and your system will come to a grinding halt (and may not restart). When reviewing a facility's electrical system, pay particular attention to the size, number, and location of feeds servicing the building; the internal division of power to different circuit panels; grounding; and, of course, the size and type of uninterruptible power supply (UPS) and generator(s) that are in place in the event that the building looses power.
The location of the electrical service coming into the building is critical. Ideally, the electrical service to a building will be redundant, with two service lines entering the facility from opposite sides of the facility. This configuration reduces the chance of the facility losing power owing to any construction or external mishaps.
A central in-line UPS system is the key to preventing downtime caused by short power outages, surges, or brown outs. The UPS will have the building's main power feed(s) connected to its input, will condition the power, and then will supply the conditioned power to the attached systems within the facility. The UPS should be able to maintain this supply for an average of 10 to 20 minutes (Liebert Corporation, 2003).
There should be a diesel or gasoline generator to provide sustained electrical backup to the UPS system. There should be one or more generator units, depending on the level of reliability required and financial constraints. The unit(s) serves as the main source of power for the facility in the event that the main power feed(s) go down. During a power failure, the UPS system will automatically take the load of the attached systems until the generators can start and reach certain operating parameters. The generator will automatically assume the load of the UPS and continue to supply power to the systems until the main power feed(s) is re-established. The period that the generator can supply power to the systems depends on the power draw from the systems and the amount of fuel on site. Typically, an on-site fuel tank will hold 1000 gallons or more and support the operation of the generator(s) at maximum capacity for eight hours or more. If the facility is located in an area that is prone to harsh weather, such as large snow falls or flooding, it is highly recommended to have two different fuel suppliers in order to minimize the possibility of running out of fuel owing to one supplier not being able to access the facility from their location.
Computers produce heat.To maintain normal system performance and to allow people to service the machines, the average operating temperature in a hosting facility should be between 64°F and 70°F.
Today's servers produce between 1600 to 3400 British thermal units (BTU) per hour, and a typical cabinet enclosure can house up to 42 of these units (a typical room air conditioner generates 5000 BTU per hour of cooling). The facility's cooling system needs to be sized appropriately to the load, and the backup power system needs to be able to sustain the cooling systems in the event of a building power loss. If the cooling systems go off-line for even a short period, the ambient air temperature of the server room could quickly increase to the point of causing machine failure.
The most secure facility has most, if not all, of the following security measures: fencing, gates, guard stations, cameras, and motion detection systems to reduce intruder access. Inside the facility, additional provisions exist, including mantraps in lobbies and common areas, card and/or biometric readers, more cameras, and a fire alerting and suppression system.
The fire suppression system in a facility will typically be a very early smoke detection and annunciation system that notifies staff of a problem before smoke or fire are seen. In the event of a fire, a preaction sprinkler system activates and controls the areas and amount of water release. FM200 gaseous suppression systems work in conjunction with a preaction sprinkler system. The facility should utilize zoned systems, so that a fire extinguished in one area will not affect equipment and personnel in other areas.
A structured cabling system is critical to ensure reliable data transmission within a facility as well as provide a reliable path for data access outside the facility. There are many different structured cabling system designs and types, but all should follow the Electronic Industries Alliance/Telecommunications Industry Association (EIA/TIA) transmission and installation standards (TIA, 2005).
Once the facility's physical and environmental characteristics are deemed adequate, examine the network connectivity. Analyze the entire network, starting from the point where the system connects to the local area network (LAN) all the way to the outbound Internet connections. The goal is to identify any single points of failure (i.e., any single device or connection that, when shut off, will disable part or all of the biosurveillance system), for example, one network connection from the system to the LAN, one network switch that all incoming and outgoing traffic must traverse, or only having one connection out to the Internet (or two Internet connections taking the same path out of the building). While you assess the network topology, take note of any physical vulnerabilities that may be present, such as unmarked network cabling being run through open spaces where it can be confused with other types of cabling, network devices not being securely located, or diverse Internet connections traversing the same physical path outside of the building.
A hosting facility should have management tools that the operators of the facility use to monitor, measure, and manage the performance and availability of systems and applications. This service needs to have a tie-in to all of the key components of the facility, including, but not limited to, generators, fuel tanks, cooling units, humidity and water sensors, fire protection, security and network services, and your system. The monitoring system will have a user interface for operators that provides a clear and comprehensive view into all of these key systems so operators can monitor performance, spot incipient problems, and predict system growth needs (e.g., the need for more disk capacity).
A server facility has many electrical and mechanical systems, including diesel generators, air conditioners, computers, and disk and tape systems. Having appropriate maintenance contracts in place for regular service of all facility equipment is necessary to maintain high availability. The importance of maintenance contracts is easy to overlook, but they should receive the same attention as other factors to reduce down time and prolong the life of the equipment. Each system or piece of equipment may have a different service period, so work with the equipment manufacturers to establish proper service intervals.
Figure 35.1 summarizes factors to consider when evaluating a hosting facility.
Once the server facility is selected or designed, it is time to decide what means you will use to prevent interruption of service or loss of data. Designers use the concept of mean time to repair (MTTR) when weighing outage related decisions.
MTTR is the average amount of time required to restore a system to a normal working condition from a failure.The designers should establish the acceptable MTTR for the system (i.e., the acceptable "downtime").
It is possible to develop or acquire a recovery strategy that has zero MTTR, which we refer to as "Wall street" reliability, because financial transactions are protected in this manner. To accomplish this, you must have a second mirrored site (commonly referred to as a disaster recovery site).The secondary site takes over all functionality of the primary site in the event the primary site goes offline. The site is called mirrored because it receives a mirror copy of data received by the primary site. This recovery strategy may come with a large
3. Backup systems respond automatically without human intervention
4. Power system to support equipment with dual power supplies from different sources;
5. Air-conditioning should be available in sufficient quantities to allow all systems and network equipment to operate within their published environmental conditions
6. Alarm and fire protection systems isolated and zoned to not affect the services outside of
7. Preventative maintenance contracts in place for all major systems avoiding unnecessary
8. Emergency response procedures in place, including documented change management, equipment manuals, network architecture diagrams, system schematics, labeling conventions and personnel / vendor problem escalation charts figure 35.1 Data center quick checklist. Summary checklist to determine a facility's readiness to be classified as server grade and support a highly available system.
price tag, however, because it requires a duplicate set of systems and network connections to data sources. The cost can be reduced by using a commercial disaster-recovery service because it already exists and multiple customers share its cost.
If for some reason, such as cost, you decide not to use a disaster-recovery service, then the next level of reliability comes from operating a mirrored system at your primary facility, which will be as reliable and data-loss proof as using a disaster-recovery service except in the event of loss of the entire facility (assuming that your facility satisfies all of the power, cooling, network, and other criteria described above).
Still less expensive (and riskier) is to rely solely on periodic (typically daily) back up of data to tape or disk.The risk is that if the primary data storage system fails completely, you will lose all data for the period between the last backup and the failure. Perhaps more importantly, some or all of the biosurveillance system functions will be unavailable to users until the system is replaced and data reloaded.
Was this article helpful?