Giant Datasets From GPS Collars Can Lead To 'Data Dredging' For Habitat Inferences
Data from GPS collars on animals are used to determine crucial habitats and migration routes. Because of technological advances scientists can now collect animal locations in three dimensions as often as 32 times per second. But this abundance of data comes with a downside.
“It's really easy to find statistically significant patterns in in GPS data,” said Dr. John Fieberg, a statistician at University of Minnesota who recently lectured at Utah State University.
Fieberg wants scientists who study where animals live to develop a hypothesis based on the animal’s life history and use GPS data to test the hypothesis, rather than collecting GPS data and looking for patterns within it.
“P-values and confidence intervals are all based on having an a priori model with very well-specified tests that you're going to conduct, which is very different from what a lot of people do which is collect location data, collect as much environmental data as they have, and then try and look for associations between the two," Fieberg said. "So, there is a danger to just collecting a lot of data and trying to look for associations.”
Fieberg said that this phenomenon of collecting a great deal of data and looking for patterns, known as data dredging in scientific circles, can lead to poor management of species by policymakers.
“If you think there's an association with a particular type of environment, and it's spurious and you spend a lot of money trying to manipulate the environment in a certain way that's not beneficial, then you've wasted a lot of resources," Fieberg said. "So, it's important not to just chase noise, but to try and have more of a mechanistic understanding that'll be predictive in new situations.”