Internet as Sentinel II The Global Public Health Intelligence Network

The Global Public Health Intelligence Network (GPHIN) is a new type of biosurveillance system that continuously monitors global media sources such as newswires and websites for articles about disease outbreaks and other events of international public health concern (World Health Organization, 2005, Mawudeku and Blench, 2005, Heymann and Rodier, 2001). GPHIN disseminates this information to subscribers such as the WHO, governments, and non-governmental organization. Health Canada in collaboration with the WHO developed the initial prototype of GPHIN in 1997. The annual subscription fee for GPHIN in 2004 ranged from $30,000 Canadian (for a university) to $250,000 (for a country).

The input data for GPHIN comprise news service items, newspaper articles, and reports from approximately 10,000 primary sources. Two news aggregators—Factiva and Al Bawaba— account for many of the sources.3 Al Bawaba is the Arabic language content provider to GPHIN.

ProMED-mail is a GPHIN data source. GPHIN also uses a web spider to search selected websites. GPHIN selects articles for dissemination to its subscribers using a two-step process (Figure 26.3). First, a proprietary information retrieval algorithm computes a relevance score for each article for each of eight topics of interest: animal diseases, human diseases, plant diseases, biologics, natural disasters, chemical incidents, radioactive incidents, and unsafe products. If an article has a sufficiently high relevance score (X in Figure 26.3), GPHIN disseminates it to subscribers immediately. If an article scores

How Search Gphin

figure 26.3 The GPHIN Biosurveillance System.An information retrieval algorithm computes relevancy scores for a large number of documents as they arrive from news aggregators, ProMED e-mail, GPHIN's web spider and other sources. If a document's relevance score is greater than X (high relevance), it is disseminated immediately. If the score is between Y (lower relevance) and X, it is queued for human review. If the document scores lower than Y, it is assigned to "Trash" but also reviewed, but not with the same level of effort or immediacy. (Components not shown: machine translation, human editing of machine translations, and archiving.)

figure 26.3 The GPHIN Biosurveillance System.An information retrieval algorithm computes relevancy scores for a large number of documents as they arrive from news aggregators, ProMED e-mail, GPHIN's web spider and other sources. If a document's relevance score is greater than X (high relevance), it is disseminated immediately. If the score is between Y (lower relevance) and X, it is queued for human review. If the document scores lower than Y, it is assigned to "Trash" but also reviewed, but not with the same level of effort or immediacy. (Components not shown: machine translation, human editing of machine translations, and archiving.)

3 Factiva (R) aggregates news from approximately 9,000 primary sources in 152 countries and 22 languages. A1 Bawaba is the Arabic language content provider to GPHIN.

below a certain threshold (Y in Figure 26.3), it is automatically "trashed.'' The system places articles that score between the two thresholds in a queue to await human analysis (the second step). The net result of this computer filtering followed by human review is approximately 100-150 articles disseminated per day.

Subscribers then decide whether to take action based on the information. WHO, for example, contacts about four countries per week to request verification of information found in GPHIN articles.

GPHIN is multilingual. It analyzes articles in eight languages: Arabic, Chinese (Simplified), and Chinese (Traditional), English, Farsi, French, Russian, and Spanish. The automatic filtering of articles occurs in the original language. Natural language programs translate English articles into the other seven languages and non-English articles into English. Human analysts edit the translations performed by the translation software prior to distribution.

Although Health Canada describes GPHIN as a near realtime system, the reference point is the time that an outbreak or other event is described in one of the sources being monitored (i.e., after it is detected by some other means and written about). Nevertheless, GPHIN often provides an earlier warning of events of interest to the international community than other methods. Of the 578 outbreaks verified by WHO between July 1998 and August 2001,56% were initially picked up by GPHIN (Heymann and Rodier, 2001).

The U.S. Department of Agriculture's (USDA) Animal and Plant Health Inspection Services, Veterinary Services, Center for Emerging Issues (CEI) also filters and analyzes information gleaned from the Internet for indications of the emergence and/or spread of animal diseases in the United States and abroad. CEI uses an electronic scanning methodology that analyzes large amounts of Internet-sourced text in a relatively short time. The system evaluates the text against predefined queries to extract articles of possible interest, thus quickly identifying information requiring further review. CEI staff construct queries to find records containing information about specific animal diseases, disease situations for which a specific disease is not identified, outbreaks of a novel disease, and changes in nations' veterinary and regulatory activities that may indicate disease outbreak. For example, one query finds records that may indicate an animal disease outbreak of unknown or undiagnosed cause. In this case, the query includes specific animal names in association with terms such as dead, dying, sick, ill, outbreak, and so on. CEI periodically reviews and updates the search queries.

Internet data sources such as listserves, industry websites, and articles from worldwide electronic news sites are loaded into the system on a daily basis. CEI analysts review articles that are tagged by the queries to determine if the information is of value. Information that is deemed useful is then stored in the system's file manager.

Typically, CEI's querying and filtering tool analyzes 900 to 1000 records per day, of which about 190 to 200 are tagged and then manually reviewed. Of the reviewed records, on average, about two or three per day are deemed of value and are stored. CEI then tracks these disease situations or potential disease situations.

Both GPHIN and the CEI system use computers for not only data collection, archiving, and selective dissemination of new information, but also for finding documents and filtering them for relevance. The latter uses distinguish them from ProMED-mail, in which list moderators do all the filtering, and there is no automatic searching of the web.

0 0

Post a comment