Misconceptions and Misteachings in Quantitative Research Methods

Posted April 12, 2009

About a month ago, I posted an entry describing the Critical Thinking Education Group (CTEG). This group of educators and others is in the process of building and maintaining a library of resources for all levels and types of critical thinking education. I wrote a quick-reference page (well, a few pages) for this library called Common Misteachings. The following is an expanded version.

Scientific Misteachings

  • To be valid, a measure must be reliable. Reliability and validity are independent of one another. A reliable measure is one that is consistent across time, subjects, and testing conditions. A valid measure is one that measures what it is intended to measure. A useful measure must have both.

    For example, a single exam score is a valid measure of academic success, but it is not a very reliable one. An online research methods text written for the social sciences provides a good summary which distinguishes validity from reliability by discussing how these concepts are related to the usefulness of a measure.

  • Valid = good or true. Validity is not sufficient for truth or value. It refers only to the structure of an argument. Validity is necessary to draw conclusions about truth, but something that is high in validity may be 100% untrue.
  • The manner in which instructions are given, the gender of the researcher, the room, the time of day, etc. are possible confounds. Nothing that is held constant can confound. Only those things that co-vary with treatment conditions can confound. Any variable that co-varies with treatment is a possible confound.

    There are many variables which, when held constant, may interfere with our ability to generalize findings to settings, manipulations, populations, and measures which were not adequately studied. These are problems of external validity, not confounds.

  • Individual differences among subjects are potential confounds. Confounding variables are those that compromise a comparison (internal validity issues). Unless these differences are used to assign subjects to groups, they cannot confound.
  • An individual participant’s mood or past experience is a possible confound. Again, confounding variables are those that co-vary with treatment conditions, so characteristics and behaviors of individuals cannot confound. These are also not problems of external validity, either, since a scientific study from which conclusions (rather than hypotheses) are drawn involves an adequate sample. Nothing a single subject in an adequate sample does or is can derail a well-designed study.
Control Groups
  • Experiments always require “control groups”. Experiments require comparisons. Control, in the strict sense of the word, is not always possible. For example, we cannot test the effects of presentation media on memory using a control group. Some values that “presentation media” can have are: visual, audio, and touch. You cannot have a condition that presents the material without a mode of presentation.

    Two additional examples of hypotheses that cannot be adequately tested using the traditional “experimental vs. control” comparison are the effects of psychotherapy on depression and the effects of acupuncture on pain. Although it is possible to compare treatment to no treatment, it is not possible to ensure differences in depression or pain are not due to placebo effects unless the patient is blind to the treatment condition. How can one be unaware that one has received psychotherapy or acupuncture? Psychotherapy can only be compared to other forms of therapy. Traditional acupuncture treatment is usually compared to a condition in which acupuncture needles are inserted in randomly-selected locations.

Random Assignment
  • Random assignment reduces or eliminates individual differences in unmeasured variables (noise). Random assignment is simply a method for assigning subjects to groups. It cannot change those subjects. It cannot change those subjects. It does not make people taller (or shorter), more (or less) intelligent, or a different gender. Random assignment reduces possible confounds by eliminating experimenter bias in the assignment process (the experimenter uses no criteria other than chance to assign subjects to treatment conditions).
  • Random assignment controls for individual differences in unmeasured variables. To control for something, you must either measure it or hold it constant. Ignoring it, which is what random assignment does, is not controlling for it. There are many methods to control for individual differences, all of which require measuring the variable for which one wishes to control.
  • Random assignment distributes individual differences in unmeasured variables (noise) evenly among treatment groups. Comes from the human tendency to think of randomness as meaning “spread out” evenly. Random assignment distributes noise randomly.
  • Random assignment equalizes groups. Random, again, does not mean even. It is entirely possible to assign randomly and end up with a treatment group with only women and control with only men. It is highly improbable, but entirely possible.
  • Random assignment eliminates confounding variables. There are many possible sources of confounding variables. Random assignment only eliminates bias in assignment, which itself is not a guarantee that the groups will not differ on an influential variable by chance (see “equalizes groups”). It also cannot address the confounds that may exist in the method.
Statistical Analysis and Interpretation
  • The purpose of statistical analysis is to “find significance” in data. The phrase “find significance” is meaningless. It is a warped version of “statistically significant differences” and students usually have no idea what either actually means; to get through their statistics coursework, they simply memorized formulas and steps. Interpreting the result is simply a matter of choosing “significance” or “no significance”.
  • Significant differences means the differences are large or the findings are significant. “Children in the Phonics class had significantly higher scores than children in the Traditional class” does not mean that Phonics is significantly better than Traditional teaching. In a nutshell, it means that the difference between the samples’ mean scores is likely to be a result of teaching method, not chance.
  • Results were not “significant” because the groups were not the same size. Small differences do not affect the outcome of most statistical tests. However large differences in sample size are a problem because sample size affects variance. Researchers must also consider the reasons for unequal samples in experiments due to the possibility of confounding variables.

    For example, if a medical treatment results in significantly greater reduction in symptoms than placebo, but the sample sizes differ because half of the participants in the experimental condition dropped out (or died) prior to completing the trial, the findings can only be generalized to those who are able to tolerate the treatment.

  • Interactions are the combined effect of two independent variables on a dependent variable. Interactions are when the effect one independent variable has on the dependent measure differs among levels of another independent variable. This is not a combined or additive effect; interactions take on many forms, including the absence of simple effects in some conditions.
  • If you run more subjects, you will probably “find significance”. Although sample size affects power, power is not the probability one will “find significance”. It is the probability of detecting an effect that exists. If you start with an adequate sample, you cannot improve your chances by adding subjects. In addition, the “significant” results you obtain may be of no practical significance; they may have no predictive value.
  • Causal modeling allows the researcher to draw causal conclusions without experimental design. This is a method of analysis, not a design feature. It compares obtained data to a model of causal relationships and determines if the data are consistent with the model. This approach is more powerful than simple correlation analysis in eliminating alternative hypotheses, but no amount of statistical analysis can compensate for a lack of random assignment and/or control of possible confounding variables.
  • Measures of effect size tell us the magnitude of the relationship and/or the difference we can expect to find among treatments. Some measures of effect size in some situations may be described in this way, but most cannot. Experimental controls, laboratory settings, and strong manipulations make effect sizes somewhat meaningless. One must always consider method and sample size when evaluating findings; measures of effect size do not change this fact.
    If, for example, a researcher hypothesizes that profanity in a written story distracts readers, reducing the probability that they will process details. To test this hypothesis, the experimental condition involves reading a story in which every sentence contains at least two words considered profane. The mean difference in comprehension score between the experimental and control conditions could be considered a measure of the magnitude of the observed effect, but this is somewhat meanlingless. In addition to other limitations, the experimental manipulation was extreme (in order to maximize the power of the hypothesis test) and the participants anticipated a test of the material. It does not tell us how much profanity in stories affects reading comprehension in general.


thoughtcounts Z on April 13th, 2009 at 07:55:
This is great. I think you should consider submitting this to this month’s Carnival of the Elitist Bastards — more info here. (Motto: “It’s time we stop letting our culture celebrate willful ignorance and start promoting genius instead.”)

Administrator on April 13th, 2009 at 09:22:
Thank You for the kudos! I submitted it to Skeptic’s Circle and the last time I submitted to Elitist Bastards I inadvertently sent an entry that was also in a Skeptic’s Circle. But, since I posted in hopes to draw users to CTEG, maybe they won’t mind another double? Or maybe I should post another one… hmmmm…

Comments for this page are closed.

Print Friendly, PDF & Email