Can I believe my own data?

I want to discuss something I have often wondered about when collecting field data. How can I be sure that my data collection is not biased by my own expectations?

Let me explain this with an example. Imagine I’m interested in comparing the abundance of a bird species at different altitudes. Based on a particular ecological theory, I expect this species to be more abundant at lower altitudes. To test my hypothesis, I walk transects at low and high altitudes and count all the individuals that I see of that species. However, it is likely that detectability is not perfect; the further away an individual is from a transect, the more likely am I to miss it. Therefore, I also visually estimate the distance of each individual from the transect. Using the distance measurements, I can examine how the number of individuals seen, drops off away from the transect line. I can then use this information to calculate how many individuals I missed seeing and adjust my estimates of abundance accordingly (I’ve described it in very simplistic terms; the actual process is a bit more complicated (pdf )).

You will notice that the accuracy of the abundance estimates depend on two factors: my ability to accurately identify the species-of-interest (i.e. to be able to tell it apart from other similar-looking species) and to accurately estimate distances of individuals to the transects (underestimates of distance will inflate abundances and vice-versa). Given that both (species identification and distance estimation) are done visually, there will be some measurement error associated with my estimates. What we generally assume is that this error is equally likely in our different treatments (low and high altitudes in our example) and therefore will not bias our results in any direction. But is that assumption really valid? I went into the study expecting to find more birds at lower altitudes. Is it possible that my desire to find this result biases my data collection without me being aware of it? For example, am I more likely to classify an individual of uncertain identity as belonging to the species-of-interest in lower altitudes? Am I more likely to underestimate distances of individuals in lower altitudes? I think what i describe will be an issue whenever there is strong motivation to obtain results in a particular direction. Given this, it is likely to be particularly problematic in research areas such as conservation science where researchers are even more strongly wedded to their hypotheses (e.g. forests better than plantations; protected seas better than trawled seas).

In lab-based experimental research, this bias is avoided by doing ‘blind’ experiments, i.e. where the person recording observations is unaware of which the control and which the experimental group is (in experiments involving human subjects there is an additional complication. The subjects also need to be kept unaware of the treatment groups and therefore ‘double-blind’ experiments are carried out. Fortunately we don’t have that problem with animal or plant subjects….or do we?).

Unfortunately, in field research there is no way to hide the identities of ‘control’ and ‘experimental’ groups. One can’t blindfold and airdrop researchers in a site and hope that they don’t figure out where they are! More generally, in any situation where there are cues that give away the identities of the different treatments, this problem is likely to arise; so this is not a problem unique to field observational research. What is the way out then? One possibility is to get someone who is unaware of the hypothesis being tested, to collect the data. But is this feasible? Is it even ethical to hide details of a project from the people working on it? Another way out is to try to reduce subjectivity in data collection as far as possible. In other words, to collect data like a robot would! But is this easier said than done? Finally, might it help to have multiple alternative hypotheses, instead of just one? What do you think?

This problem is not restricted to the data-gathering stage of research. It can creep in to our choice of study sites; during data entry and analysis; in our interpretation of our results and in what we choose to write-up as papers (more on this in a future post maybe).



  1. spulla

    Hari, I think this is possible, but I also think ways can be devised to reduce the bias during data collection and/or analysis (or while reporting results as suggested in the paper below).

    “False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant”

    “…flexibility in data collection, analysis, and reporting dramatically increases actual false-positive
    rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence
    that it does not. We present computer simulations and a pair of actual experiments that demonstrate how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis.”

    • spulla

      Very interesting. Now the question is, how does one apply all this to the question you posed (transect example)? Perhaps others have worked on this problem (if not, you could be the first to do so ;)).

  2. Pingback: Of Yetis and Sasquatches « Ecology Students' Society
  3. Manvi

    Having alternative hypothesis could be really helpful in reducing our biases in data collection as the net affection towards each hypothesis would reduce! Given alternative hypotheses are hard to think about, one can probably look at how the response variable (here abundances) would change over a range of values of the explanatory variables(here altitudes) – from very low, low, to medium, high, and then very high altitudes. In some sense it gives us the nature of curve of how abundances would change with altitude, which I feel is harder to predict thereby reducing the biases. But one might want to argue if it is ecologically interesting to study abundances at very low or very high altitudes at all.

    • Hari

      Good point Manvi. It is difficult to choose between testing an idea we are really curious about but might lead to biases (e.g. abundances are higher at lower altitudes) and one that is less-likely to lead to bias, but we are not so interested in testing (e.g. how does abundance change with altitude).

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s