Many organizations operate websites designed for patients or for healthcare providers. For example, the National Library of Medicine at the National Institutes of Health operates MedlinePlus (http://www.medlineplus.gov), a nationally recognized website that contains health information for patients and caregivers on virtually every medical topic of interest. WebMD, Medscape, Embase, the Cochrane Library of Databases, and Medline/PubMed provide a wide range of information from very general information on diseases to peer-reviewed, evidence-based practice guidelines. The WHO, CDC, and state and local health departments maintain websites targeted to the public, medical professionals, and public health professionals.

These websites maintain web access logs that contain a variety of information about users and the information that users accessed. For each access to the site, the log contains a record of:

• IP address—The IP address of the computer making the HTTP request.

• RFC—A field that is used to identify the person requesting information from the website. (This field will rarely be available and only from websites that register users.)

• Auth—A field included to list the authenticated user, if required for website access.

• Timestamp—Date and time of request.

• Action—The action requested (e.g., request for a specific article or for a search).

• Status—Whether the user experienced a success, redirect, failure, or server error when they tried to access the web page.

• Transfer volume—How many bytes were transferred to the requester.

• Referring URL—The URL of any website that the requester was using just before coming to the present site.

Johnson et al. (2004) used correlation analysis to study the relationship between the volume of web page accesses to a health-related website for articles on the topic of influenza and the number of influenza-like illness (ILI) cases reported by the U.S. Influenza Sentinel Physicians Surveillance Network. They obtained 12 web access logs (one for each month of the 2001 calendar year) from Healthlink, a consumer health information website developed and maintained by the Office of Clinical Informatics at the Medical College of Wisconsin. The web access logs contain the identification number of documents that persons retrieved and any free-text search terms that they may have entered into the search engine that brought them to the documents. The study was limited to an analysis of the documents retrieved, not the search terms.

Although Healthlink receives queries from many countries, the study was limited to requests from the United States. Influenza activity varies by location, and the influenza season is inverted in the Southern Hemisphere. They removed requests made by users outside of the United States using GeoIP Country, an open source Perl module developed by MaxMindTM to analyze the IP address of each record and assign (when possible) the country of the user. We discuss "geolocation'' (determining a requester's locations from an IP addresses) in detail shortly.

Figure 26.4 shows the weekly accesses of 17 documents about influenza diagnosis/treatment and vaccination versus the ILI reference standard. Both time series are normalized into units of standard deviations from the mean. The correlation analysis was limited to those periods for which ILI reference data were available; namely, portions of the 2000-2001 and 2001-2002 influenza seasons (weeks 1-20 and 40-52 of the year 2001, respectively).

The correlation between the article-access time series and the ILI reference standard was 0.78 for weeks 1-20 of 2001 and 0.76 for weeks 40-52.The time lag at which the correlation was maximal was zero.

Although these correlations are interesting, we note that sick individuals have a wide choice of health-related websites, including those located in other countries. Therefore, a monitoring system based on analysis of website access would face a monumental organizational task in achieving high coverage of all sick individuals from a particular region who accessed websites.

We also note that, at present, the potential of monitoring websites designed for clinicians is not high. In addition to the number of such sites, physicians and nurse practitioners tend to consult each other, textbooks, and reference manuals rather than websites when they have patient-specific questions that they want to answer. A recent survey of 13 faculty members and 25 residents found that "the group sought immediate answers to 66% of questions [that arose in routine practice],'' but that they "most commonly use another person or a pocket reference'' when doing so (Ramos et al., 2003). In the future, it may be that clinicians (or patients) will consult new types of websites designed to provide diagnostic assistance—similar to the BOSSS system for cattle diseases described in Chapter 13— and that such websites will be few in number, or linked.

