News

Cell identity and the seven deadly sins of data analysis

16 August 2019
Dr Jarny Choi and Professor Christine Wells review the different approaches cell classification and prediction analysis  as well as highlighted common pitfalls.
In a recent paper Transcriptional profiling of stem cells – moving from descriptive to predictive paradigms, Dr Jarny Choi and Professor Christine Wells from the University of Melbourne reviewed the different approaches to mine and combine data into cell classification and cell prediction tools, as well as highlighted common pitfalls.

In 2007 it was discovered that scientists could take mature skin cells and reprogram them into induced pluripotent stem cells (iPSC), which could subsequently be differentiated into any cell type present in the body.

This discovery changed how we understand the human body – now stem cells can be created from any person, including a patient, and be observed in a dish to study underlying causes of disease. However, these “dish grown”, or in vitro stem cells and their products are not identical to their counterparts taken from a living organism, known as in vivo cells.

There have been many studies that compare and contrast in vitro stem cells to their in vivo counterparts, to find key transcriptional differences between the cells and to understand the molecular drivers of certain cell types. Moreover, the amount of data about the gene expression of both types of stem cells in public repositories is growing.

In a recent paper Transcriptional profiling of stem cells – moving from descriptive to predictive paradigms in Stem Cell Reports, Dr Jarny Choi and Professor Christine Wells from the University of Melbourne reviewed the different approaches to mine and combine this data into cell classification and cell prediction tools, as well as highlighted common pitfalls.


Figure: Seven Deadly Sins of Data Analysis

Some of the highlighted pitfalls include using datasets with missing values or false sample assignments, poor experimental design where it's difficult to say that conclusions are based on biology rather than technical issues, and applying incorrect analyses - particularly while transforming the data.

Their paper also touches on what the future holds in this field, with the advent of new technologies on the horizon. As the data generated by the stem cell community becomes more sophisticated, and increasingly at a higher cell or molecular resolution, Choi and Wells expect that the approaches for benchmarking and analysing in vitro derived cell types will also improve. Lastly, through improved analysis and benchmarking tools, researchers may be able to predict cell types derived from stem cells rather than describe them. This will ensure that future integration of dish grown stem cells and their products into patients are completely characterised and predictable.

Dr Jarny Choi is an Early Career Researcher with Stem Cells Australia; Professor Christine Wells is a Chief Investigator.