Representing Observational Data

With many measurement approaches, the question, How should one represent one's data? does not arise. The standard rectangular data matrix suffices. Rows represent sampling units (participants, dyads, etc.), columns represent variables, and columns are filled in with the relatively few scores generated by the measurement approach. That is all the standard statistical packages need or expect, and even a preliminary step like scoring the items of a self-esteem scale, for example, is relatively straightforward. Such data matrices (e.g., the Data Editor window in SPSS) are useful for observational studies as well, but usually the columns are filled with scores that result from data reduction, not initial data collection.

More so than with many other measurement approaches (physiological recording is one important exception), observational methods produce diverse and voluminous data, so how data are represented (literally, re-presented) for the inevitable subsequent computer processing becomes an important consideration. We are convinced that if data are structured well initially, they may not analyze themselves exactly, but their analysis may well be facilitated. To this end, Bakeman and Quera have defined standard conventions for formatting sequential data (SDIS or Sequential Data Interchange Standard; Bakeman & Quera, 1995). Such data files can then be analyzed with GSEQ, a program for sequential observational data that has considerable capability and flexibility (Generalized Sequential Querier; for current information see or In particular, GSEQ effects the kinds of data reduction we have mentioned earlier and demonstrate subsequently.

Taking into account different possible coding units and different approaches, Bakeman and Quera (1995) defined five data types. The first three are used when onset and offset times are not recorded, whereas the last two assume such recording of time:

1. Event sequences consist of a single stream of coded events without time information; a code from a single ME&E set is assigned to each event.

2. Multievent sequences consist of a single stream of cross-classified events (i.e., codes from different ME&E sets are assigned to each event).

3. Interval sequences consist of a stream of timed intervals, each of which may contain one or more codes.

4. State sequences consist of single stream of coded states (onset time of each is recorded) or several such streams, each representing a ME&E set.

5. Timed-event sequences consist of a record of onsets and offsets of events that may, or may not, be organized into ME&E sets.

Conventions for expressing data as one or the other of these five types are designed to be easy to use and easy to read. To illustrate, segments from an event sequential, a state sequential, and a timed event sequential data file are given in Figure 10.1. Our intent is that these five types reflect what investigators do and how they think about their data, but there are other possibilities. For example, coding with the assistance of various computerized systems typically produces files of codes along with their associated onset times. In such cases, we have found it easy to write programs that reformat such data into SDIS format (e.g., Bakeman & Quera, 2000).

In a number of ways, the five SDIS data types are quite similar. In fact, once observational data have been represented according to SDIS conventions, producing what we call SDS files, these SDS data

FIGURE 10.1. Examples of event, state, and timed-event sequential data formatted per SDIS conventions. The data type and, in these examples, a set or sets of ME&E codes in parentheses are declared before the semicolon. Codes for the event and state sequence are un = unoccupied, lo = onlooking, tog = together, par = parallel, and grp = group. Codes for the timed sequence are MRV = mother rhythmic vocalization, MOV = mother other vocalization, IRV = infant rhythmic vocalization, IOV = infant other vocalization. For the event sequence, the observation began at 10:30 and ended at 10:33; such information is needed only if rates are computed. For the state sequence, the observation began at 0:00 and ended at 3:00; in this case units were seconds, and the onset time for each code was given. For the timed event sequence, units were integer seconds; it began at second 0 and ended at second 60. The end of the observation is indicated with a forward slash. For the timed sequence, an ampersand separates mother and infant streams.

<Jenny> ,10:30 un lo un tog lo tog par tog par grp lo 10:33/

<Alex> un,0:00 lo,0:32 un,0:48 tog,l:02 lo,l:08 tog,1:22 par,1:41 tog,1:53 par,2:05 grp,2:31 lo,2:41 ,3:00/

<Dyad AK> ,0 MRV,8-12 MRV,32-38 MOV,53-57 ... & IOV 18-21 IRV,33-35 IOV,43-46 ... ,60 /

files are then compiled by the GSEQ program, which produces an MDS or modified SDS file. Whereas SDS files are easy to read, MDS files are formatted to facilitate analysis. Moreover, no matter the initial data type, the format for MDS files is common. Logically, one can think of an MDS file as a matrix. Each row represents a different code, and each column represents a coding unit (event, interval, or time unit). Then cells are checked for presence or absence of that code within that unit. If we think of this matrix as a scroll, clearly quite lengthy scrolls can result. Especially when a unit of time such as a second serves as the coding unit, we can imagine a scroll unfurling into the far future, and we can imagine this matrix of binary numbers as being quite sparse (in practice, however, actual computer files are compressed).

This common underlying format for sequential observational data is both extremely simple and powerfully general. A wealth of new codes can be created from those initially collected, which is perhaps the greatest advantage of representing sequential data this way. For example, especially useful for interval, state, and timed sequences, where cooccurrences are often of concern, are the standard logical commands of And, Or, and Not (see Figure 10.2). A single superordinate code can be formed from several subordinate codes using the Or command; for example, a single positive behavior code could be defined, which would be coded as occurring anytime any of a number of different positive codes had been coded. Also, a single code that occurs only when other codes co-occur can be formed using the And command; for example, a new code might characterize those times when an infant was gazing at the mother while the mother was concurrently vocalizing (or gazing) to her infant. Then co-occurrences of this new joint code with other codes could be examined.

The Window command is an additional, powerful data modification available in the GSEQ program (again, see Figure 10.2). With it, new codes can be formed that are tied to onset or offsets of existing codes. For example, if mother and infant rhythmic vocalization were coded (MRV and IRV), new codes could be defined for just the second that the infant (or mother) begins rhythmic vocalization (i.e., the onset second), and another new code could be defined for the onset second of IRV and the fours seconds thereafter, thus defining a 5-second window.

FIGURE 10.2. Examples of And, Or, and Window commands; used primarily to modify multiple-stream state and timed sequential data. Double-headed arrows represent time units (here a second) during which the initial code or new code occurs. A left parenthesis before a code represents the onset second.

Codes and commands


Was this article helpful?

0 0

Post a comment