Welcome to DU! The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards. Join the community: Create a free account Support DU (and get rid of ads!): Become a Star Member Latest Breaking News General Discussion The DU Lounge All Forums Issue Forums Culture Forums Alliance Forums Region Forums Support Forums Help & Search

n2doc

(47,953 posts)
Fri Dec 20, 2013, 03:55 PM Dec 2013

The Vast Majority of Raw Data From Old Scientific Studies May Now Be Missing

One of the foundations of the scientific method is the reproducibility of results. In a lab anywhere around the world, a researcher should be able to study the same subject as another scientist and reproduce the same data, or analyze the same data and notice the same patterns.

This is why the findings of a study published today in Current Biology are so concerning. When a group of researchers tried to email the authors of 516 biological studies published between 1991 and 2011 and ask for the raw data, they were dismayed to find that more 90 percent of the oldest data (from papers written more than 20 years ago) were inaccessible. In total, even including papers published as recently as 2011, they were only able to track down the data for 23 percent.

“Everybody kind of knows that if you ask a researcher for data from old studies, they’ll hem and haw, because they don’t know where it is,” says Timothy Vines, a zoologist at the University of British Columbia, who led the effort. “But there really hadn’t ever been systematic estimates of how quickly the data held by authors actually disappears.”

To make their estimate, his group chose a type of data that’s been relatively consistent over time—anatomical measurements of plants and animals—and dug up between 25 and 40 papers for each odd year during the period that used this sort of data, to see if they could hunt down the raw numbers.

A surprising amount of their inquiries were halted at the very first step: for 25 percent of the studies, active email addresses couldn’t be found, with defunct addresses listed on the paper itself and web searches not turning up any current ones. For another 38 percent of studies, their queries led to no response. Another 7 percent of the data sets were lost or inaccessible.



Read more: http://blogs.smithsonianmag.com/science/2013/12/the-vast-majority-of-raw-data-from-old-scientific-studies-may-now-be-missing/

12 replies = new reply since forum marked as read
Highlight: NoneDon't highlight anything 5 newestHighlight 5 most recent replies
The Vast Majority of Raw Data From Old Scientific Studies May Now Be Missing (Original Post) n2doc Dec 2013 OP
A lot of the cooked data is missing, too. nt bananas Dec 2013 #1
No one pays you to archive data so it often doesn't get archived. Johonny Dec 2013 #2
It's part of the research. Igel Dec 2013 #7
Most grants run out before you even finish the paper Johonny Dec 2013 #9
it doesn't help that software/storage tech changes. Evoman Dec 2013 #3
Hard drives have become relatively very cheap and large. Thor_MN Dec 2013 #4
so, not on the google=lost? mopinko Dec 2013 #5
Often enough. Igel Dec 2013 #8
My company has the opposite problem. TxDemChem Dec 2013 #6
I'll bet this problem will become worse in the future. Kablooie Dec 2013 #10
...besides the issues of storage and expense, it can be a legal liability to store data... Sancho Dec 2013 #11
I'm not worried... NeoGreen Dec 2013 #12

Igel

(35,300 posts)
7. It's part of the research.
Sat Dec 21, 2013, 01:39 PM
Dec 2013

A recurrent leitmotif in "Bob the Builder" is that a job isn't finished until you've cleaned up.

Research isn't finished until you've submitted the final proofsheets and archived the data.


I've often thought that a "Data Notices" section would be appropriate in most journals.
"Igel, Yezh Erizo. 2014. Research done on blah-blah using yada-yada methodology. No useful results."

Then if it's useful to somebody or becomes an issue, somebody else can see how a failed experiment went. Probably wouldn't get a lot of buy-in because nobody wants to reported botched experiments or make data available that undercuts one's most cherished scientific beliefs.

Johonny

(20,841 posts)
9. Most grants run out before you even finish the paper
Sat Dec 21, 2013, 02:38 PM
Dec 2013

While you are writing, editing, answering comments, you are often digging for your next grant, in the middle of the next project etc... My graduate school adviser had 8,5.25, 3.5 " disks sitting on shelves. He had even older stuff on film. There is no grant money to archive all that and to continuously do so. Yet alone update software to even read old formats. While journals do have some archiving ability most data and individual data sets are lost to the history of time. If you don't trust experimental results then often the best thing to do is do it yourself. Which frankly isn't a bad thing as not everything peer reviewed and in the literature is correct. Redoing an experiment from 1920 can be very educational. If an experiment is right you should be able to redo the results yourself.

Most of the job is money. If you pay people to do it, they will do it, if you don't they likely won't or won't even have the resources to try and do it. Researchers are often massively underpaid for their educational level, extremely busy, and grants rarely think beyond the last date in the grant proposal.

Evoman

(8,040 posts)
3. it doesn't help that software/storage tech changes.
Fri Dec 20, 2013, 07:31 PM
Dec 2013

Whether its raw data that is stored on old mediums (old computers, cassettes, big floppy discs, etc) or data that is meaningless because software used to process it is unusable, a lot of old data is as often unaccessible as it is lost. Sometimes, there is just to much data to print it out or keep it in your experiment book. Not to mention the sheer amount you have to find a place to store (then forget). Add the sometimes the transitory nature of science work (postdocs and lab techs and researchers move labs every couple of years), and of course you are gonna lose old data.

Edit: sorry about my crappy writing. I just had Chemo and I'm finding it had t concentrate.

 

Thor_MN

(11,843 posts)
4. Hard drives have become relatively very cheap and large.
Fri Dec 20, 2013, 07:39 PM
Dec 2013

Data that could be kept by today's standards, would have been purged in the past to make room for new studies. This is not surprising in the least.

Igel

(35,300 posts)
8. Often enough.
Sat Dec 21, 2013, 01:56 PM
Dec 2013

Grad student does research, grad student drops out of field or gets married. If s/he took the data, buh-bye.

Once was after some old data and found out that the guy had died a few years before. Done research at several schools, retired years before, and where was his data? Heck, had enough trouble just locating the email of somebody at any of the places he'd worked who remembered him.

Even worse, at one school I was at there was a shelf of stuff--recordings, files, etc. I asked what it was and everybody just stared at me. The bee wouldn't leave my bonnet, so I continued to ask. It had been there for years, so long that the only person who knew was a faculty member who'd been there 20 years or more. Turned out to be the field records and recordings belonging to a professor who'd been studying some Native American languages in the '60s and '70s. It resulted in some papers that were important at the time. However, the language was all but gone by the time attention returned to it and here were hundreds of hours of recorded speech by native speakers living in a small community where the language was actively used, all painstakingly transcribed. Everybody realized it was a really rich data trove. Nobody there worked on Native American languages anymore and the dept. had gone strongly towards theory, devising hypothesis-based experiments. Can't experiment on recordings. Nobody could sort out what to do with the recordings and years later they still sat there.

TxDemChem

(1,918 posts)
6. My company has the opposite problem.
Sat Dec 21, 2013, 12:54 PM
Dec 2013

We've got raw data from the 1970s written on index cards and stored in boxes labelled just by the year the study/assay was performed. My boss absolutely refuses to scan any data in to have them saved on three servers strictly for maintaining electronic records in case anything ever happens to the physical records. And if anyone is looking for specific data from before they had HP1000 (which is now also obsolete), they'd be hard pressed to find it even though it is likely in the building. A few years ago, I attempted to catalog all the boxes of data that were just sitting around and got yelled at. I gave up. Another group of chemists in the building have electronic copies of ALL of their data, from the 1950s to today. They would have been great for this study.

Kablooie

(18,632 posts)
10. I'll bet this problem will become worse in the future.
Mon Dec 23, 2013, 07:20 AM
Dec 2013

As digital formats and media become obsolete or corrupted while good old sheets of paper from an earlier era remain readable

The government should finance permanent archival of machines and software that can read every kind of digital data so in the future there will be a means of retrieving ancient data from obsolete media.

Sancho

(9,069 posts)
11. ...besides the issues of storage and expense, it can be a legal liability to store data...
Mon Dec 23, 2013, 08:11 AM
Dec 2013

at least with human subjects data, keeping data secure is an obligation of IRB. Once the research is published, there's no reason to take the risk unless the data is being archived for a specific reason.

I've keep data 10-15 years for longitudinal studies, but at some point you can't keep boxes of personal records at home. Even disks (remember those) and thumb drives pile up while the rules sometimes require stuff to be secure or locked up.

Believe it or not, I've stored decks of cards in the past. Who has time to convert data every few years to a new type of storage. More to the point, who monitors or secures data that is supposed to be private or sensitive? A hundred years ago, no one cared, but new rules and laws might make the researcher liable for a leak of information.

I can easily believe that most raw data is lost once the effort becomes dated.

NeoGreen

(4,031 posts)
12. I'm not worried...
Tue Dec 24, 2013, 01:38 PM
Dec 2013

... I'm sure the NSA has a complete archive of everything back to the Manhattan Project.

Latest Discussions»Culture Forums»Science»The Vast Majority of Raw ...