Nonreactive Research Using the Internet

The rapid development of computer-mediated communication in global network structures has led to an increased interest in research on "online behavior." At the same time, the Internet is emerging as a powerful research tool that can be efficiently used for the collection of real-world data (see Reips, chap. 6, this volume). Besides the interesting opportunities for Web-based surveys as well as experimental studies (e.g., Reips & Bosnjak,

2001), the technological properties of the Internet environment provide benefits for Types 4 and 5 nonreactive measurement in particular.

Analyzing written material on the Internet. One kind of nonreactive Internet data is the written material that people produce on the Internet, for instance, on their personal home pages (e.g., Sch├╝tz & Machilek, 2003), virtual discussion groups, or email lists. These texts are usually written for a potentially unrestricted public, and the researcher has the opportunity to save and analyze such data. (For an overview of the methods of text analysis, see Mehl, chap. 11, this volume). Interestingly, communication on the Internet is often two-sided at least, making it possible to analyze the interaction between individuals and also higher social aggregates. Bordia (1996), for example, described the use of online discussion group archives in rumor transmission research. Here it is possible to take a process-oriented perspective on occasions of naturally occurring rumor transmission that can be found in the Internet comparatively easily. Bordia analyzed relevant episodes by quantitative content analysis, using statement categories such as "interrogatory statements" or "prudent statements" referring to an individual's tentativeness or hesitancy in discussing a rumor. The quantity of these statements could not only be compared throughout the entire discourse but also over time. From the latter perspective, Bordia (1996) found that although the analyzed discussions progressed, the frequency of prudent statements related to the rumor decreased. The author highlighted the finding that although this phenomenon has been mentioned in the literature before, this was the first time it could be shown in a natural context. Another illustrative example of research on written material located on the Internet can be found in Stone and Pennebaker (2002), who analyzed collective trauma coping in Internet chat room conversations following the death of Princess Diana. The authors were able to detect significant changes of language and content over a period of 4 weeks. Whereas during the first days after Diana's death, personal and emotional responses were common, after 1 week, expressions of compassion changed into hostile comments, and the dominance of collective language during the first period changed into more individual language, indicating the disappearance of collective shared grief. Here, it is valid to question whether communication on the Internet can legitimately be called "natural." However, even though several differences between computer-mediated communication and face-to-face interaction have been identified (e.g., Kiesler, Siegel, & McGuire, 1984), widespread accessibility and increasing competence in using the technological environment has made online communication an integral part of behavior in industrialized countries. Hence, the specific nature of Internet behavior does not make it less natural.

Log file analysis. Analyzing the texts available on the Internet represents the classical method of archival data analysis, although the interactional, dynamic, and mostly well-documented structure of Internet content increases the potency of such analyses. Yet there is also another, possibly even more important, way in which online research might enhance the capacities of nonreactive measurement. This is embedded in the fact that Internet behavior is continuously and automatically recorded without the explicit awareness of its users. Interestingly, these records can be assigned to the behavior of single individuals or at least to single machines. A highly nonreactive technique not yet frequently used in psychological research is the analysis of the log files generated on Internet server machines and optionally on client computers as well. This type of log file analysis would make the hidden protocols accessible for other broad and complex research activities. A simple form of log file analysis has been applied in advertising contexts, for example, by indicating the attractiveness of certain Web pages and the success of particular advertising links by following users' navigation through the net (e.g., Wiedmann & Buxel, 2001). In a similar fashion, log file analysis can also be used in descriptive research. Berker (2002), for example, reported a nonreactive study on Internet behavior that examined the Internet usage of people with an account at the Internet server of a large German university. Analyzing the proxy log files for a 2-week period in 1998 revealed an interesting preference order of the Web pages viewed by the users. Twenty-four percent of the hits were to pornographic sites, followed by multipurpose sites such as probably preset Internet providers (22%) and to Web sites offering technical support and search engines (both 9%). Using log file information such as time and duration of access led to additional results concerning content-specific user habits. A more controlled and theory-driven Type 4 approach was suggested by Kulikowich and Young (2001), who recommended assessing the problemsolving behavior of individuals by using log file data from an online learning tool. Here, the different problem-solving activities of the participants should be represented by examining their retrieval of particular Web pages. Analyzing the sequential order and duration of access might allow conclusions about both the individuals' learning behavior as well as the appropriateness of specific learning environments.

As in conventional accretion measures, the traces of individual behavior can be followed throughout the Internet or in specific online environments. Further development of this method might also include analyzing navigation behavior more directly and should not only be restricted to behavior that is only relevant in the virtual environment. In the context of environmental planning, for instance, it might be a useful strategy to present 3D versions of various architectural alternatives and then analyze how long the respective models are visited, which alternatives are entered, and which perspectives are selected for further viewing. This may become a new type of "social design" (Sommer, 1983). For a thorough and controversial discussion of the merits and boundaries of using virtual environments for psychological research, we recommend the debate by Blascovich, Loomis, Beall, Swinth, Hoyt, and Bailenson (2002) in Psychological Inquiry.

For conducting log file analysis, a variety of software tools have been developed. Interested readers may retrieve one of various free software offers from the Internet (e.g., Analog, 2003; Richter, Naumann, & Noller, 2003). For a compilation of log file analysis tools, see Janetzko (2003). Two major problems in conducting log file analysis are that accessing an

Internet page is not always recorded in the same way at the same place and that not every access to a particular page is actually documented by a log file. Reducing the first problem might be aided by a standardization of the content and location log file protocols for research purposes (Type 4 measurement). The latter problem comes up when pages are retrieved from cache memories without accessing a server machine or when proxy servers are involved that do not always inform the original server machine about access to one of its pages. Conversely, if records are analyzed that are located on the user machine, this method might suffer from the nonacceptance of so-called cookies that are set by many users. To deal with these problems, particularly in the commercial sector, efforts have been made to standardize the feedback of proxy servers (Werner, 2002). Furthermore, potential self-selection of the sample in a log file analysis should be kept in mind that depends on the activation of proxies, cookies, and cache use on the user's computer. Ideally, possible confounding with variables relevant to the research subject should be ruled out in a prestudy comparing, for instance, those users who have activated proxies in the preferences menu of their Web browser with those who have not. Concerning patterns of Internet use, results by Berker (2002) indicate only small differences between both user groups.

Another problem of nonreactive Internet research might be the identification of single-person behavior. Although log files usually identify single accessing computers, it is both not clear whether multiple individuals use this particular computer at one time and whether the pages are accessed automatically without the intent or even the awareness of the user. To reduce these interpretational weaknesses, it is sometimes advisable to set a duration of inactivity that, if exceeded, marks the beginning of another session by a different user (e.g., exceeding the average time of inactivity by 1.5 standard deviations as suggested by Catledge & Pitkow, 1995). Furthermore, depending on the research question, one should exclude those Web addresses that are usually contacted automatically from analysis (e.g., home pages of browser software, Berker, 2002). As individual behavior can be identified with satisfac tory reliability without the knowledge or agreement of subjects, anonymity of analysis is an important requirement. Berker (2002), for example, eliminated all information from the log files that could have facilitated identification of individual computers prior to his analysis.

This brief overview of different pitfalls of nonre-active online research points to the important roles that the technological properties of the medium and the recording of information play in the use and interpretation of behavioral traces. Thus, researchers have to be well acquainted with the technological details of data generation to prevent ethically inappropriate behavior or interpretational errors.

Was this article helpful?

0 0

Post a comment