Years of genome research riddled with errors due to Microsoft Excel default setting

While Microsoft Excel spreadsheets are commonly used to record research and create relevant databases, a default setting in the software that automatically changes inputs to dates and other numbers may render many years of research erroneous, according to a study in Genome Biology.

Researchers downloaded 10 years' worth of genomic research files from 18 journals to examine the database inputs. They converted Microsoft Excel files to tabular separated files to identify gene symbol errors. Researchers screened more than 35,000 Excel files that included nearly 7,500 gene lists attached to 3,600 published papers.

They found 19.6 percent of published articles with Excel files containing gene lists contained gene name errors.

For example, the gene symbol for Septin 2 may be entered into the database as SEPT2, which Microsoft Excel automatically converts to the date 2-Sep, according to the study. 

What's more, they found the number of gene name errors has grown faster than the number of published papers per year.

The researchers write there is no way to deactivate the automatic conversion of cell entries into dates in Microsoft Excel, but they noted that Google Sheets did not convert gene names to dates or numbers when typed or pasted in.

"We show that inadvertent gene name conversion errors persist in scientific literature, but these should be easy to avoid if researchers, reviewers, editorial staff and database curators remain vigilant," researchers conclude. 

More articles on genomics:

University of Michigan School of Public Health joins Precision Medicine Initiative
Allscripts subsidiary launches genomics initiative at NIH
Stanford to use Google Genomics storage in precision medicine research

Copyright © 2024 Becker's Healthcare. All Rights Reserved. Privacy Policy. Cookie Policy. Linking and Reprinting Policy.

 

Featured Whitepapers

Featured Webinars