Gene Name Auto-Correct in Microsoft Excel Leads to Errors in One in Five Genetics Research Articles

Researchers have identified a small problem that can have big consequences for gene research: Microsoft Excel, when used with default settings, inadvertently converts gene names to dates and floating-point numbers in spreadsheets. (For example, the gene symbol “SEPT2” becomes “Sep-2,” or the accession number “2310009E13” becomes “2.31E+13.”)

The authors looked at 3,597 articles published in 18 genome-focused scientific journals between 2005 and 2015 that contained a total of 7,467 supplementary gene lists produced in Excel.

They found that 19.6 percent (or 704 articles and 987 supplementary Excel lists) contained conversion errors.

Though workarounds to this problem exist, “inadvertent gene name conversion errors persist in the scientific literature,” the authors wrote. “These should be easy to avoid if researchers, reviewers, editorial staff, and database curators remain vigilant.”

Source: Ziemann M, Eren Y, El-Osta A. Gene name errors are widespread in the scientific literature. Genome Biol. 2016;17:177.