Deviation from Average Plasma Level of p-Carotene

Figure 2.1. Hypothetical correlations between plasma levels of p-carotene and scores on a test of the recognition component of memory ability for 442 elderly Swiss studied by Perrig and his colleagues (1997). A shows what the relat ionship would look like if these variables were uncorre-lated (my simulation act ually produced a very low correlation coefficient of —0.002), B illustrates a relationship with the correlation coefficient of 0.22 that was in fact obtained by Perrig's group, and C shows a hypothetical relat ionship with a correlat ion coefficient close to the maximum value of 1.0. The scales of both axes are in standard deviation units; for example, 1 on the horizontal axis represents 1 standard deviation above the average plasma level of p-carotene in the population, and —1 represent s 1 standard dev ia-tion below the average (see note 2).

that are influenced by many different factors. If a single response variable such as recognition ability is plotted against a single explanatory variable such as p-carotene, it would be surprising to obtain a relationship as tight as that in Figure 2.1C because of the many other factors that may also influence recognition ability. Much of the scatter in Figure 2.1B is probably due to variation in these other factors: age, sex, educat ion of the subject s, and so on.

In fact, we can go beyond simply visualizing the data and ask the following question: what is the chance of obtaining a correlation coefficient as large as 0.22 if we measure two variables that are truly independent of each other in a group of 4 42 indiv iduals? This probability can be est imated by randomly generating two sets of unrelated data with 442 items in each set, pairing the items in the two sets arbitrarily and calculating the correlation coefficient between them, repeat ing this process many t imes, and count ing the number of times that the correlation coefficient is greater than 0.22 in these randomization trials. When I did this numerical experiment, I got no correlation coefficients greater than 0.22 in 1,000 trials, implying that the probability of getting a correlation coefficient as large as 0.22 by chance is less than 1 in 1000. This very low probability enabled Perrig's group to claim a "statistically significant" association between plasma level of p-carotene and the recognition component of memory ability.

Recall that Perrig and his colleagues measured nine physiological variables and five aspects of memory performance. How does the correlation coefficient of 0.22 for recognition ability with plasma level of p-carotene in 1993 compare to the other 44 correlation coefficients among these variables? I purposely picked the relationship between recognition ability and p-carotene to graph in Figure 2.1B because it had the highest correlation. The next highest value was 0.16 for semantic memory ability with plasma level of vitamin C in 1993, and there were 11 correlation coefficients in all that Perrig's group judged to be statistically significant because probabilities of getting these values by chance if the variables were not related were less than 5%.3 There were no significant correlations between the plasma level of vitamin E and the scores on any of the memory tests. Five of the significant correlations involved p-carotene—for levels measured in 1993 with free-recall ability, recognition ability, and semantic memory and for levels measured in 1971 with the latter two aspects of memory. Since the memory tests were only done in 1993, it's noteworthy that 1971 levels of p-carotene were correlated with these aspects of memory measured 22 years later. There were three significant correlations for vitamin C—for levels measured in 1993 with free-recall ability and semantic memory and levels measured in 1971 with semantic memor y. The other three significant correlat ions involved cholesterol with one component of memory ability and ferrit in with two component s.

These results seem consistent with the authors' conclusion that v itamin C and p-carotene enhance memory performance, but there is one further complication that we need to consider before accept ing this interpretation. To set the stage, imagine flipping a coin 10 times. What are the chances that the coin would come up heads every time? This probability can be readily calculated as 0.510 = 0.1%. Now imagine that 100 people each flip a coin 10 times. What are the chances that at least one of these people get s 10 heads? This probability turns out to be about 10%.4 In other words, unlikely events may occur, given enough opport unities. With this in mind, let's reconsider the full set of 45 correlat ion coefficient s between physiological measurement s (including blood levels of antioxidants) and components of memory ability reported by Perrig's group. Suppose each of the nine physiological variables was independent of each of the five memor y variables. In this hypothet ical situation with no real relat ionships between ant ioxidant levels and memor y ability, what would be the chances of getting correlation coefficient s as high as those actually observed? I wrote a small computer program to simulate this thought experiment and found that there was about a 12% probability that the maximum correlation coefficient in a set of 45 would be greater than 0.14

(the three largest correlation coefficients for the actual data of Perrig's group were 0.22, 0.16, and 0.14). In 80% of 1,000 trials with my program, at least one correlation coefficient in a set of 45 was greater than 0.10 (this was the level judged to be statistically significant by Perrig and his colleagues). The results of Perrig's group are still meaningful because they got 11 correlation coefficients greater than 0.10 out of the 45 that they tested, whereas I never got more than six and typically only one or two in my simulation with random data. However, the simulation shows why we should be cautious in interpreting large sets of correlation coefficients, especially when relationships are likely to be influenced by several unmeasured variables (look again at the scatter in the relationship between the plasma level of p-carotene and recognition ability shown in Figure 2.1B). In fact, there is a statistical tool mar-velously named the "sequential Bonferroni technique" that was designed to deal with the problem raised in this paragraph (Rice 1989). Applying this technique to the 45 correlat ion coefficient s presented by Perrig's group shows that we can only be confident that t wo of the correlat ions represent real relationships between plasma levels of antioxidant s and components of memory: the relationship between p-carotene measured in 1993 and recognition memory and the relationship between vitamin C measured in 1993 and semantic memor y.5

I've expounded at length on correlat ion analysis only to reach a somewhat dissatisfying conclusion: although we shouldn't embrace the conclusions of Perrig's group wholeheartedly, neither can we categorically dismiss the hypothesis that antioxidants enhance memory performance. My ulterior motive for this extended discussion was to give you some insight about the work ings of this widely used statistical technique. But there is a more fundamental problem with the data provided by Perrig and his colleagues that is rooted in the fact that they used a prospect ive design rather than a controlled experiment. Recall that the Swiss subjects in this study varied in their blood levels of ant ioxidants because of some unk nown combination of factors not determined by the researchers: they may have had genet ic differences that influenced their ability to absorb or retain part icular v itamins, they undoubtedly had different diets, and so on. How might these other factors influence the results? The analyses discussed so far don't provide a way to answer this question, so Perrig's group used another statistical method called regression, which allowed them to consider mult iple explanatory variables simultaneously. They did three regression analyses, using free-recall ability, recognition ability, and semantic memor y because each of these component s of memory was correlated with one or more ant ioxidant measures in the initial correlat ion analysis. I'll illustrate these regression analyses with semant ic memory because these results were clearest.

Perrig's group essentially wanted to know if the correlations bet ween scores on the vocabulary test of semantic memory and blood plasma levels of antioxidant s could be attributed to other variables that might also be correlated with levels of these antioxidants. Therefore they used educational level, age, and sex as addit ional explanatory variables together with plasma levels of vitamin C and p-carotene in their regression. If, for example, more educated subjects tended to have higher plasma levels of vitamin C and better semantic memor y than less educated ones, the associat ion between v itamin C and semantic memory might simply be an artifact of the relationship between education and these t wo variables. Not surprisingly, they did discover that subjects with more educat ion did better on the vocabulary test. Younger subjects also performed better, but there was no difference between males and females. However, the regression analysis made it possible to statistically control for the effect s of these variables; that is, to ask what the relat ionship between vitamin C or p-carotene and semantic memory would be if age and level of education were fixed.

Perrig's group found that these relationships were still significant even when controlling for educat ion, age, and sex. Under these condit ions, the probability that the correlat ion between plasma level of v itamin C and semantic memory ability was due solely to chance was 3.4%, compared to less than 0.1% in the initial correlation analysis, which didn't account for education, age, and sex. For p-carotene and semantic memory, the probability that the correlation was due to chance was 3.5% when education, age, and sex were controlled. Perrig's group got similar results for p-carotene and recognition ability but no significant relat ionships between v itamin C and recognition ability or between any of the antioxidants and free-recall ability.

Regression is more powerful than simple correlation analysis for revealing patterns in data collected in prospect ive studies because we can consider several variables simultaneously in regression, but regression analysis doesn't overcome the fundamental limitation of these k inds of nonexperimental studies. This limitation is that there may be unmeasured factors that account for variation in a response variable (e.g., an aspect of memory ability), as well as variation in a factor that we do measure (e.g., blood level of vitamin C or another ant ioxidant), producing an art ifactual relat ionship between the factor we are interested in and the response variable.

In the case of the elderly Swiss population studied by Perrig and his colleagues, a causal relationship between antioxidants and memory ability seems more plausible because of the known physiological effects of antioxidants on cells, but it's certainly possible that some other factor, unmeasured by the researchers, was the real reason for differences in memory performance among the subjects. For example, individuals may have had higher levels of vitamin C and p-carotene in their blood because they consistently ate more fruits and vegetables, but some other component of this diet caused differences in performance on memor y tests. Or perhaps differences in socioeconomic status or lifestyle caused both differences in dietary intake of antioxidants and differences in memory ability, creating a spurious relat ionship between plasma levels of ant ioxidant s and performance on memor y tests. The possibilities can be mult iplied almost endlessly. In considering only educat ion, age, and sex as possible confounding variables, Perrig's group wasn't very thorough in investigating alternative explanations for differences in memory ability among their elderly subjects. Contrast this study with that by the National Cancer

Institute of mortality associated with smoking, in which at least 23 additional explanatory variables besides smoking itself were considered. Although the results of Perrig's group are interesting, they fall far short of being conclusive evidence that antioxidants are beneficial for cognitive performance.

Mohsen Meydani (2001) of Tufts University reviewed studies of the effects of antioxidants on cognitive abilities of older people in an article published in Nutrition Reviews. In addition to the study by Perrig's group of elderly people with normal cognitive function, Meydani discussed several studies of individuals with Alzheimer's disease and vascular dementia.6 Two of these studies used a different design that is even more problematic than the prospective design of the study by Perrig's group, although it is quite common in medical research. This is a case-control design in which subjects who already have a particular condition, such as a disease, are compared to a set of control subjects without the condition. In this situation we are trying to deduce the cause of the disease by identifying factors that differ bet ween cases and controls. This may involve simply measuring physiological or other characteristics of the two groups at the time of the study or asking individuals about their past habits (e.g., smoking) or potential exposure to environmental toxins. For example, A. J. Sinclair and four colleagues (1998) studied plasma levels of antioxidants in 25 patients with Alzheimer's disease, 17 patients with vascular dementia, and 41 control subjects without evidence of either disease. The control subject s were similar in age to those with A lzheimer's disease and vascular dement ia, but surprisingly the groups were not matched for sex: 36% of the diseased individuals were females, but 59% of the control individuals were females. Sinclair's group found that average plasma levels of vitamin C were about the same in patients with Alzheimer's disease as in controls but were 22% less in those with vascular dementia. Conversely, vitamin E levels were 14% less in patients with Alzheimer's disease than in controls. Sinclair's group concluded, "Subjects with dementia attributed to Alzheimer's disease or to vascular disease have a degree of disturbance in antioxidant balance which may predispose to increased oxidative stress. This may be a potential therapeutic area for antioxidant supplementation" (1998:840).

This study dramatically illustrates some of the fundamental weaknesses of the case-control design compared to prospect ive designs and experimental studies. Even if the relat ionships between plasma levels of ant ioxidant s and dementia reported by Sinclair's group are biologically meaningful, there is no way to tell from their data if low vitamin C contributed to vascular dementia, and low v itamin E to Alzheimer's disease, or if these low levels were consequences of the diseases. The cases st udied were not a random sample of individuals with A lzheimer's disease and vascular dementia but were pat ients in a particular medical facility in England, and there is no way of k nowing whether or not these indiv iduals are representative of any part icular population of people with dementia. In case-control studies, there are ample opportunities for bias in the selection of controls. For example, in the research by Sinclair and his colleagues, the control group had more females than did the group with dement ia, suggest ing that indiv iduals in the control group were chosen mainly for convenience rather than to match cases and controls as closely as possible. Even without conscious bias, it's extremely difficult to select an appropriate control group in a case-control study because there is a host of potential confounding factors that should be matched between the cases and controls. Such matching is essent ially impossible with the small sample size of the study by Sinclair and his colleagues. More important, just as with nonexperimental prospect ive st udies, there is always the possibility that unmeasured or even unimagined variables account for differences in disease between cases and controls.

In his 2001 review, Meydani concluded that antioxidants protect against deterioration of cognitive ability with age and that taking high doses of supplements such as vitamin E might be beneficial to forestall this symptom of aging. I've discussed two of the studies that led to this conclusion, primarily to illustrate the limitations of nonexperimental studies in answering questions about human health. The other research described by Meydani includes experimental studies with animals and even an experimental study of vitamin supplementation for A lzheimer's patients, but it is no more conv incing than the two studies that I considered in detail. For example, the authors of the experimental study of A lzheimer's patients had to do major statistical contortions to find a relationship between treatment w ith large doses of vitamin E and rate of deterioration of the patients (Sano et al. 1997).

Therefore, Meydan i's conclusion is questionable despite the plausibility of the proposed mechanism by which antioxidant s might help preserve brain function. Nevertheless, my point is not so much to debunk the widely believed notion that antioxidants protect against aging (I myself take 400 units of vitamin E daily, just in case) but rather to set the stage for discussion of a contrasting research strategy that is often considered the gold standard of medical research: randomized, double-blind, experimental trials. For examples of this approach, I'll use tests of the hypothesis that large doses of vitamin C minimize the severity of common colds. After describing two examples, I'll return to the general issue of experimental versus nonexperimen-tal approaches in medicine and cast a more posit ive light on the k inds of non-experimental studies that we've considered so f ar.


The hypothesis that v itamin C is beneficial in prevent ing or treating colds has been tested in dozens of experiments dating back to at least 1942. This hypothesis is a nat ural candidate for experimental testing because effects of supplemental vitamin C should be manifested fairly rapidly in reduced incidence or severity of colds if the hypothesis is true. By contrast, cognitive impairment with age may be a long-term consequence of multiple factors, including use of antioxidant vitamins, that act over decades of life. It's generally not feasible to design rigorous experimental studies on these t ime scales, so other k inds of ev idence have to be used to test hypotheses about long-term effects.

Linus Pauling (1970) discussed a handful of experimental studies that were done before 1970 in his book Vitamin C and the Common Cold (see also Pauling 1971). However, the reports of these studies have various problems that make them poor examples for our consideration. The most important problem is that the original reports don't include key information about experimental design, statistical analyses, or numerical results, so they can't be thoroughly evaluated. Therefore, I'll start with an influential study by Thomas Karlowski and five colleagues at the National Institutes of Health (NIH) and reported in the Journal of the American Medical Association (JAMA) in 1975 (Chalmers 1975; Karlowski et al. 1975). This is a good illustration of the experimental approach, not because it was a flawless study, but for exactly the opposite reason: it had a cardinal flaw, as the authors themselves recognized once the study was underway. In fact, Thomas Chalmers, who was director of the Clinical Center of NIH in the 1970s and one of the authors of the JAMA article, reported that he was "more proud of it than almost any other that I have published" (Chalmers 1996:1085), partly because the NIH researchers identified the flaw and were able to account for it in their interpretations of the results.

The general hypothesis that vitamin C helps fight colds actually comprises two specific hypotheses: that taking high doses of vitamin C on a regular basis helps prevent colds and that taking high doses of v itamin C at the first signs of a cold reduces its length and severity. In other words, vitamin C may have a prophylactic effect, by preventing colds, and/or a therapeutic effect, in treating colds. Both of these hypotheses were tested by the NIH group.

Karlowski and his colleagues recruited 311 volunteers for their study from among 2,500 NIH employees. About 600 of these employees indicated willingness to participate, but about half were excluded for various reasons such as health problems that might be exacerbated by taking large supplemental doses of v itamin C, pregnancy, or unwillingness to refrain from taking vitamin supplements outside of the study. The researchers began the study in late summer to capitalize on the fact that frequency of colds in the Washington, D.C., area increases in fall and winter. They planned to continue it for 1 full year, but part icipants gradually dropped out, so it was ended after 9 months, when the total number of part icipants fell below 200. A small but significant aspect of the research design was that this stopping rule was decided beforehand. If this had not been the case, the researchers could have been accused of stopping the st udy when the result s were most favorable to their preferred hypothesis.

Standard procedure in studies like these is to compare a group of subjects who receive a treatment with a control group of subjects who do not receive the treatment. In this case, treatment refers to a regimen of vitamin C capsules prov ided to the subjects with instructions about when to take them, but the term "treatment" has ver y general application in experimental studies to represent any experimental manipulation. A key part of designing an experimental study is to decide on appropriate controls. For example, it's conceivable that the simple act of taking a pill daily might affect the occurrence or severity of colds, regardless of the contents of the pill. In other words, there might be psychological benefits of the treatment unrelated to the physiological effects of vitamin C itself. But the researchers were really interested in whether vitamin C specifically could protect against colds. Therefore, they established a control group that received placebo capsules instead of capsules containing vitamin C. More specifically, the vitamin C capsules contained 500 milligrams (mg) of vitamin C and 180 mg of lactose (milk sugar), whereas the placebo capsules contained 645 mg of lactose. The authors of the NIH study candidly state that the choice of lactose as the placebo was a hasty decision, necessitated by the desire to start the study within a few months of the time it was conceived. This turned out to be a fateful decision, as well as a hasty one.

Since Karlowski and his colleagues wanted to test both the prophylactic and therapeutic effects of vitamin C on the common cold, they divided their subjects into four groups. Members of each group were given two sets of capsules: maintenance capsules and supplemental capsules. They were instructed to take six of the former daily and six of the latter when they caught colds. For the first group, both types of capsules were placebos. For the second group, the maintenance capsules were placebos but the supplemental capsules contained v itamin C. The third group was the opposite of the second: vitamin C in maintenance capsules but placebos as supplemental capsules. The fourth group had vitamin C in both maintenance and supplemental capsules. For subjects receiving vitamin C in either maintenance or supplemental capsules, the dose was 3 grams (g) per day (6 capsules X 500 mg). For subjects who received v itamin C in both maintenance and supplemental capsules, the dose was 3 g/day when they did not have colds and 6 g/day when they did.

The most important feat ure of an experiment such as this is that subject s were randomly assigned to each of the four groups. This design is called a randomized trial, in contrast to the prospective and case-control designs described earl ier. Randomization helps overcome the fundamental dilemma of nonexperimental designs—that other variables besides the one of interest may account for the result s of a study. For example, suppose the NIH researchers had asked subjects to volunteer specifically to be in a group of their own choosing: the double-placebo group if they were skeptical of the hypothesis, the double-vitamin C group if they believed the hypothesis, or one of the intermediate groups if they weren't sure. The results of such an experiment would be impossible to interpret because any number of other factors might differ between these self-selected groups and be associated with different tendencies to choose the v itamin C treatment, as well as different susceptibilities to colds, thus compromising any interpretation of a beneficial effect of vitamin C. For instance, younger employees of NIH m ight be more skeptical of vitamin C than older ones, so more likely to select the placebo treatment, but younger employees might also be more suscept ible to colds because frequency of colds typically decreases as people get older. So this "experiment" would show that those who took vitamin C got fewer, milder colds than those who took the placebo, a bogus conclusion.

The problem illustrated by this hypothetical example is essentially the same as that in the Swiss study of antioxidants and memory ability (Perrig et al. 1997). In that case, subjects differed in plasma levels of antioxidants because of indiv idual differences in many potential factors: genet ics, diet, age, sex, and lifestyle. We couldn't say with any assurance that differences in memory ability were due to differences in plasma levels of ant ioxidant s or differences in one or more of these other confounding variables. In a randomized trial, the process of randomly assigning subjects to treatment and control groups helps allev iate this uncertainty because average values of potentially confounding variables are likely to be similar in treatment and control groups, especially with moderate to large sample sizes. This is a simple consequence of the randomization process. Imagine pick ing two softball teams from a pool of 100 men and 100 women. If we randomly select people for the two teams, they will probably have similar numbers of men and women. If, instead, we were pick ing a treatment and control group for an experiment, the same principle would apply.

I need to mention two other aspects of the design of this study that are common in medical and nutritional experiment s before describing the result s. First, this was ostensibly a double-blind st udy. This means that the subject s were not supposed to know whether they got vitamin C or placebo capsules for either their maintenance or supplemental supply, and the researchers were not supposed to know which treatment group subjects belonged to when they treated the subject s or recorded their symptoms. The purpose of the double-blind approach is to reduce the possibility that preconceptions of either subjects or researchers could bias results. If subjects knew they were taking vitamin C and believed in its efficacy, they might tend to downplay the symptoms of any colds they got. The same goes for researchers who are recording results. Second, the study relied partly on the subject s' own assessments of their health. They reported the number of colds they had and how long they were, whereas the researchers determined the severity of 20 different cold symptoms experienced by the subjects when they visited the clinic to get supplemental capsules containing either v itamin C or a placebo.

The average number of colds during the 9 months of the study was quite similar in the subjects taking a maintenance dose of 3 g of vitamin C and those taking a maintenance dose of placebo: 1.27 versus 1.36. This difference of 0.09 more colds per person in the placebo group was not statistically significant because the probability was greater than 50% that it could have been due to chance alone. One way to appreciate this is to notice that subjects with the same maintenance treatment but different supplemental treatment s also differed by at least 0.09 in frequency of colds. This difference between supplemental treatments was 0.11 for those with the placebo maintenance treatment (the left pair of bars in Figure 2.2A) and 0.15 for those with the vitamin C maintenance treatment (the right pair of bars in Figure 2.2A). Yet supplemental capsules should have had no effect on the frequency of colds because they weren't given to the subjects until after they came down with a cold, and they were only given for 5 days so they should not have affected the likeli-

Figure 2.2. Frequency of colds (A) and average length of colds (B) for four groups of employees of the National Inst itutes of Health studied in an experiment by Karlowsk i and his colleagues (1975). Differences in the type of supplemental capsule taken (filled versus open bars) should not affect the frequency of colds but might affect their duration if vitamin C has a therapeut ic effect. Differences in the type of maintenance capsule taken (the left and right pairs of bars) might affect both frequency and duration of colds.

hood of getting another cold weeks or months later. Therefore, these differences in cold frequencies associated with the supplemental treatment must have been due to chance, suggesting that the difference of similar magnitude between the maintenance treatments was probably also due to chance.

The story about potential therapeutic benefits of vitamin C is much more interesting. Karlowski's group (1975) found that colds lasted an average of 7.14 days for subjects in the double-placebo group (i.e., placebo as both their daily maintenance treatment and their supplemental treatment when they got colds), 6.59 days for subjects taking vitamin C either daily or as a supplement but not both, and 5.92 days for subjects taking vitamin C daily when they were well and as a supplement when they were ill (Figure 2.2B). These results suggest a small but significant benefit of vitamin C in reducing the length of colds, by about 0.5 days for an intake of 3 grams per day and by a little more than 1 day for an intake of 6 grams per day. In addition, several symptoms were less severe in subject s taking vitamin C than in those taking the placebo.

Perhaps you've alread y guessed the problem with these result s. Because the NIH researchers organized the study hastily, they used a placebo that tastes sweet (lactose), whereas v itamin C tastes sour. The pills were prov ided in capsules, so this wouldn't be a problem if they were swallowed whole, but... over the nine months of the study, some participants evidently couldn't resist the temptation to bite into their capsules to try to determine which cn 1-6 l M

Placebo Vitamin C

Placebo Vitamin C Maintenance Capsules (taken daily)

Supplemental Capsules — Placebo (taken during colds) ■=> Vitamin C

Figure 2.3. Average length of colds in subjects in the NIH study who did not guess their treatment group (A) and subjects who did guess their treatment group, whether successfully or not (B). The daily dose of vitamin C during colds was 0 for the double-placebo group, 3 grams for the groups that received placebo in maintenance capsules and v itamin C in supplemental capsules or v ice versa, and 6 grams for the double -vitamin C group. Data from Kar-lowski et al. (1975).

group they belonged to. In other words, the double-blind was compromised. The researchers learned of this problem early in the study (some participants simply told them they had tasted the capsules and identified them). Therefore the participants were given a questionnaire at the end of the study that asked them to guess whether they had been taking vitamin C or the placebo. About 54% had guessed their daily maintenance treatment; 77% of these guesses were correct. About 40% had guessed their supplemental treatment; 60% of these guesses were correct.

When the NIH researchers examined the results separately for subjects who did not guess either their maintenance or supplemental treatment and subjects who guessed one or both of these treatments, a striking pattern emerged (Figure 2.3). For the nonguessers, there was no difference in the duration of colds between those who got a double-placebo treatment, a single dose of 3 g of vitamin C during their colds from either maintenance or supplemental capsules, or a double dose of 6 g of vitamin C during their colds (Figure 2.3A). For the guessers, there was a clear reduction in the length of colds for those receiving 3 g of v itamin C and a further reduction for those receiving 6 g ( Figure 2.3B). For this analysis, the guessers, called "unblinded subjects" in Figure 2.3B, include those who guessed one or both of their treatment s correctly and those who were wrong. This illustrates an apparent placebo effect: if a patient thinks he or she is getting a treatment, there is a psychological benefit comparable to any direct physiological benefit that the treatment may have. In this case, the results suggest that individuals who thought they were receiving a placebo had longer colds than those who thought

they were receiving vitamin C, regardless of which treatment they were actually getting.

This study illustrates one of the major pitfalls of experimental research: the difficulty of setting up and maintaining a suitable control group. In this case, comparison of the vitamin C treatments with the placebo controls was complicated because some subjects guessed which group they were in. Kar-lowski and his colleagues concluded that a large daily dose of vitamin C did not prevent colds (Figure 2.2A) and that taking vitamin C during a cold did not shorten the cold (Figure 2.3A) or reduce the severity of cold symptoms (data not shown). These conclusions were based on analyzing a subset of their data—those for subjects who were truly in the dark about which of the treatment or control groups they belonged to. The researchers also found intriguing evidence for a placebo effect (Figure 2.3B), a problem that still bedevils medical research (Hrobjartsson and Gotzsche 2001).

Despite its flaws, this 1975 study was well received by the medical establishment, probably because the authors were affiliated with the National Institutes of Health and the study was published in one of the premier medical journals in the United States. Nevertheless, the conclusions have been criticized by advocates of the beneficial effects of vitamin C. The most recent and most detailed critique was written by Harri Hemila (1996), a scientist with the Department of Public Health at the University of Helsinki in Finland. Hemila has been an avid booster of the vitamin C hypothesis in a series of articles published in the 1990s. All of these articles involve reviews of previous studies, in some cases w ith reanalyses of the original data. To illustrate Hemila's approach, the title of his 1996 article in the Journal of Clinical Epidemiology in which he criticized the NIH research is "Vitamin C, the Placebo Effect, and the Common Cold: A Case Study of How Preconceptions Influence the Analysis of Results." In this article, Hemila misrepresents some of the results of Karlowski's group and misinterprets other results, but he does raise one interesting issue. He suggests that the shorter and less severe colds of subjects who guessed correctly that they were getting vitamin C might be due to the fact that vitamin C really did reduce the duration and severity of colds, which in t urn enabled the subjects to guess their treatment correctly. In other words, the placebo effect isn't the cause of an artifactual relationship between v itamin C intake and milder colds but rather a consequence of a real relationship between vitamin C and milder colds. This idea illustrates the complexities of disentangling cause-effect relationships in medicine, which will be explored further in Chapter 6. In theory, Hemila's hypothesis could be tested by comparing characteristics of colds in subjects who were getting vitamin C and guessed correctly that they were getting vitamin C and in subjects who were not getting vitamin C but guessed that they were. The NIH researchers considered mak ing this comparison in their original art icle, but didn't believe the sample sizes in these subgroups were sufficient.

The second author of the NIH study was Thomas Chalmers, who was director of the Clinical Center ofNIH when the studywas conducted. In introducing this study, I quoted Chalmers's expression of pride in the work. This quotation came from his one-page rebuttal to Hemila's six-page critique. Chalmers concludes his rebuttal as follows: "In summary, I resent the time that I have had to devote to this author's biased defense of his late mentor's [Linus Pauling] infatuation with ascorbic acid. It may be that a properly done, unbiased, and updated meta-analysis7 of the RCTs [randomized controlled trials] should be carried out, but I think it would be a waste of time" (1996:1085). It's rare to see such candid expression of emotion in the technical scientific literature, although strong feelings held by proponents of different hypotheses can sometimes be glimpsed at scientific meetings.

There have been numerous experimental tests of the effect s of vitamin C on the common cold since the early 1970s. One consistent result is that taking large daily doses of vitamin C in an effort to prevent colds is futile. Even Hemila agrees with this conclusion, although he thinks it may be possible that vitamin C has a prophylactic benefit for people who are physically stressed or suffer from borderline malnutrition. Experimental studies have provided more support for the hypothesis that taking high doses of vitamin C at the beginning of a cold reduces its duration and severity, although even in this case there is a lot of variation in results of various studies. In particular, Robert Douglas and his colleagues (2001) recently reported a suspicious pattern of greater apparent beneficial effect s of therapeutic doses of v itamin C in more poorly designed studies. However, all of the experimental studies of vitamin C and the common cold have not yet been thoroughly and systematically reviewed, so we can't reach a definitive conclusion. Nevertheless, I'd like to briefly describe one of the most recent experimental studies to contrast some of its methods with those of the NIH study in the early 1970s. Then I'll make some concluding general points about the role of experiments on human volunteers in medical research.

Carmen Audera and three colleagues (2001) at the Australian National University (ANU) in Canberra studied the therapeutic effects of vitamin C on the length and severity of colds in the staff and students of ANU in 1998 -1999. They solicited volunteers much as the NIH researchers did and used similar criteria for selecting subjects. Since Audera's group was interested specifically in testing the therapeutic effects of vitamin C, subjects were instructed to take medicat ion only at the onset of a cold. Specifically, when they experienced two of several typical cold symptoms for at least 4 hours, or "four hours of certainty that a cold is coming on" (2001:360), they started taking the pills they had been given and did so for the first 3 days of the cold.

In this study, the placebo tablets contained 0.01 grams of vitamin C, and the dose was three tablets, or 0.03 g/day for 3 days. There were three additional treatment groups. One received 1 g of vitamin C per day for 3 days; one received 3 g of vitamin C per day; and one received BioC, containing 3 g of vitamin C per day plus four other substances thought to allev iate cold symptoms (e.g., rose-hip extract). The advantage of using a small dose of vitamin C in the placebo was that the taste of these tablets was apparently indistinguishable from that of the tablets for the three treatment groups that received much higher doses, yet the small dose was far below a level that ad vocates of vitamin C believe would be necessary to treat colds. Instead, this small placebo dose was comparable to the minimum daily requirement of vitamin C to prevent scurvy. In fact, only 17% of the subjects guessed their treatment group, and the majority of these guesses were incorrect. Contrast this figure with the much higher percentage of subjects who guessed their treatment group in the NIH st udy, in which lactose was used as the placebo.

As in the NIH st udy and in any true experiment, the subjects were randomly assigned to the four treatment groups. One problem of the study was that a fairly large number of part icipants dropped out before reporting any colds. The authors don't say whether dropouts were more likely to be students or staff at ANU. However, the numbers of colds for which data were collected were comparable in the four treatment groups, suggest ing that subjects in each of the four groups were equally likely to drop out of the study. A second and more serious problem was that the subjects were responsible for initiating their own treatment and for recording the severity of their cold symptoms. How might this bias the results? If subjects in different treatment groups were similarly accurate or inaccurate in recording their symptoms, reliance on the subjects themselves to record the data shouldn't produce systematic differences bet ween the treatment groups. For example, if there was a tendency to exaggerate symptoms, this should increase the average severity score to the same extent in all four groups, so the differences among groups should be the same as if the symptoms were not exaggerated. One of the purposes of random assignment of subjects to treatment groups in this study was to minimize the likelihood that some groups had more hypochondriacs than others, that is, to ensure that the average tendency to exaggerate symptoms was similar in all four groups. However, the study might have been compromised if subjects didn't initiate self-treatment soon enough, even if there were no differences in this lag time among groups. In fact, the average time between the beginning of cold symptoms and taking the first dose of medication was 13.4 hours, in contrast to the 4 hours specified in the instructions. This average t ime didn't differ among groups, but a vitamin C advocate could argue that a high dose needs to be taken at the very beginning of a cold to be effective, so to have a fair test of the vitamin C hypothesis, the first dose should be taken much sooner than 13 hours after the beginning of cold symptoms.

Based on data for 184 colds in 149 subjects, Audera's group found no therapeutic benefits of vitamin C. The average length of colds was actually shortest for subjects in the placebo group, and the cumulative index of severity at 28 days after the cold started was second lowest in the placebo group (Figure 2.4). However, there were no statistically significant differences among groups in these summary measures or in any more specific measures such as duration of nasal symptoms, throat symptoms, or systemic symptoms (e.g., fever, headache, and achiness). As suggested in the previous paragraph, this study was not foolproof, but it seems to provide fairly persuasive evidence against the hypothesis that large doses of v itamin C can be used successfully to treat the common cold. More important, comparing this study

Figure 2.4. Average duration of cold symptoms (A) and average severity score (B) for subjects in four treatment groups in a study at Australian National University (Audera et al. 2001). The treatments were given during the first 3 days of colds. The dose of 0.03 grams per day represents the placebo; the dose of 3 grams plus additives represents "Bio C," which contained bioflavenoids, rutin, hisperidin, rose hip extract, and acerola in addition to 3 grams of vitamin C. Subjects rated the severity of cough, nasal, throat, and systemic symptoms on a scale of 1 to 3 each day; B shows the cumulative total of these scores after 28 days. This time period incorporates the full lengths of the longest colds. The vertical bars show 95% confidence intervals, a standard index of variability among individuals (see Chapter 8). For each group, the probability is 95% that the true average lies within the 95% confidence interval. The large overlaps in these confidence intervals implies that the differences among treatments are not significant.

Figure 2.4. Average duration of cold symptoms (A) and average severity score (B) for subjects in four treatment groups in a study at Australian National University (Audera et al. 2001). The treatments were given during the first 3 days of colds. The dose of 0.03 grams per day represents the placebo; the dose of 3 grams plus additives represents "Bio C," which contained bioflavenoids, rutin, hisperidin, rose hip extract, and acerola in addition to 3 grams of vitamin C. Subjects rated the severity of cough, nasal, throat, and systemic symptoms on a scale of 1 to 3 each day; B shows the cumulative total of these scores after 28 days. This time period incorporates the full lengths of the longest colds. The vertical bars show 95% confidence intervals, a standard index of variability among individuals (see Chapter 8). For each group, the probability is 95% that the true average lies within the 95% confidence interval. The large overlaps in these confidence intervals implies that the differences among treatments are not significant.

with the NIH test of the same hypothesis illustrates many of the subtle problems that can arise in conducting nutritional or medical experiments with human volunteers.

While I was completing the first draft of this chapter, two articles about other treatment s for the common cold appeared in my local newspaper on successive days. The first was a front-page story that announced "New Drug Could Be Common Cold Cure." The drug is called pleconaril and was tested in a randomized, controlled trial much like the experimental tests of vitamin C. The pharmaceutical company that developed the drug applied for approval from the U.S. Food and Drug Administration (FDA) to sell it, but in March 2002 an Advisory Committee to the FDA recommended that further studies be done before approval was granted.8 Enough pleconaril to treat one cold would sell for about $40, much more than the cost of vitamin C—if vitamin C were effective. The news report made no mention of all of the previous research on vitamin C but simply stated, "Scientists have developed the first [emphasis mine] medicine proven to reduce the length and severity of the common cold" (Reno Gazette-Journal, 18 December 2001). Pleconaril apparently works by directly attacking rhinovirus, the most common cause of the common cold. The second article touted the benefits of unfiltered beer for treating cold symptoms. This sounds to me like the best approach of all.

0 0

Post a comment