Tagged: bias

Can I believe my own data?

I want to discuss something I have often wondered about when collecting field data. How can I be sure that my data collection is not biased by my own expectations?

Let me explain this with an example. Imagine I’m interested in comparing the abundance of a bird species at different altitudes. Based on a particular ecological theory, I expect this species to be more abundant at lower altitudes. To test my hypothesis, I walk transects at low and high altitudes and count all the individuals that I see of that species. However, it is likely that detectability is not perfect; the further away an individual is from a transect, the more likely am I to miss it. Therefore, I also visually estimate the distance of each individual from the transect. Using the distance measurements, I can examine how the number of individuals seen, drops off away from the transect line. I can then use this information to calculate how many individuals I missed seeing and adjust my estimates of abundance accordingly (I’ve described it in very simplistic terms; the actual process is a bit more complicated (pdf )).

You will notice that the accuracy of the abundance estimates depend on two factors: my ability to accurately identify the species-of-interest (i.e. to be able to tell it apart from other similar-looking species) and to accurately estimate distances of individuals to the transects (underestimates of distance will inflate abundances and vice-versa). Given that both (species identification and distance estimation) are done visually, there will be some measurement error associated with my estimates. What we generally assume is that this error is equally likely in our different treatments (low and high altitudes in our example) and therefore will not bias our results in any direction. But is that assumption really valid? I went into the study expecting to find more birds at lower altitudes. Is it possible that my desire to find this result biases my data collection without me being aware of it? For example, am I more likely to classify an individual of uncertain identity as belonging to the species-of-interest in lower altitudes? Am I more likely to underestimate distances of individuals in lower altitudes? I think what i describe will be an issue whenever there is strong motivation to obtain results in a particular direction. Given this, it is likely to be particularly problematic in research areas such as conservation science where researchers are even more strongly wedded to their hypotheses (e.g. forests better than plantations; protected seas better than trawled seas).

In lab-based experimental research, this bias is avoided by doing ‘blind’ experiments, i.e. where the person recording observations is unaware of which the control and which the experimental group is (in experiments involving human subjects there is an additional complication. The subjects also need to be kept unaware of the treatment groups and therefore ‘double-blind’ experiments are carried out. Fortunately we don’t have that problem with animal or plant subjects….or do we?).

Unfortunately, in field research there is no way to hide the identities of ‘control’ and ‘experimental’ groups. One can’t blindfold and airdrop researchers in a site and hope that they don’t figure out where they are! More generally, in any situation where there are cues that give away the identities of the different treatments, this problem is likely to arise; so this is not a problem unique to field observational research. What is the way out then? One possibility is to get someone who is unaware of the hypothesis being tested, to collect the data. But is this feasible? Is it even ethical to hide details of a project from the people working on it? Another way out is to try to reduce subjectivity in data collection as far as possible. In other words, to collect data like a robot would! But is this easier said than done? Finally, might it help to have multiple alternative hypotheses, instead of just one? What do you think?

This problem is not restricted to the data-gathering stage of research. It can creep in to our choice of study sites; during data entry and analysis; in our interpretation of our results and in what we choose to write-up as papers (more on this in a future post maybe).