Collecting and visualising stem cell data: Stemformatics

05 December 2013
Stemformatics - helping scientists analyse their data

The Stemformatics team is using QCloud, the Research Data Storage Infrastructure (RDSI) node operated by Queensland Cyber Infrastructure Foundation (QCIF), to make available a growing collection of high quality gene expression datasets for stem cell research, and to allow researchers to quickly and easily visualise genes that interest them in these datasets.
Stem cell research
Professor Martin Pera from The University of Melbourne is Program Leader of Stem Cells Australia, one of the partners in Stemformatics. He says that stem cell research "is ultimately aimed at the development of therapies for a range of conditions which are currently intractable---conditions characterised by irreparable cell loss or damage and ranging from diseases like Parkinson's Disease, to myocardial infarction or heart attack, to diabetes, and many, many other serious and debilitating illnesses."(1) 

Stem cells are found in many places in our bodies, including our skin, hair follicles, bone marrow and blood, and the brain and spinal cord, to name a few. Stem cells generate new tissues in growing bodies, and in adults, stem cells repair and regenerate damaged and ageing tissues. Biologists are interested in whether you can harness the regenerative potential of stem cells in order to grow new cells for treatments to replace diseased or damaged tissue in the body.
One way to address this question is to look at the genetic program used by stem cells, or by cells that populate the different tissues of our body. Although every human cell, including stem cells derived in a laboratory dish, has the same compliment of genetic material available, it is the specific subsets of genes that are activated and silenced that are of interest to stem cell researchers. A large set of data exists in the public domain, which has been generated across hundreds of stem cell laboratories around the world, which point to genes which must be available to actively growing stem cells, or which must be co-ordinated as cells differentiate to become precursors to tissues such as muscle, brain, pancreas and cartilage. Once these gene subsets have been identified, researchers can develop ways to harness this information using drug or cell therapies.
Stemformatics fills the gap
One of the biggest bottlenecks for stem cell researchers is the analysis of these datasets, and this generally requires the collaboration of specialist bioinformaticians or biostatisticians.  Even then, navigating the scale of the data available is a daunting task, making cell model data inaccessible for many biologists. "It would be the equivalent of having a legal document created without being able to read it, and you have to see a lawyer every time you want to find out what it means," says Rowland Mosbergen, Lead Developer of Stemformatics at The University of Queensland.

Associate Professor Christine Wells, of the Australian Institute for Bioengineering & Nanotechnology at The University of Queensland, had the idea to create Stemformatics after receiving many requests from other researchers for help in analysing their data. She saw there was a need for a bridge between biologists and bioinformaticians, so that stem cell researchers could quickly navigate, visualise, and ask simple, common sense questions about this data, leveraging off this for informed collaborations with the biostatistics and bioinformatics community. 
Stemformatics provides this collaborative bridge between the stem cell and bioinformatics communities. The Stemformatics core team of A/Prof Christine Wells, Rowland Mosbergen and Othmar Korn collects and curates public and private cell datasets and provides easy-to-use visualisation and analysis tools through the website. The service is already gaining international popularity, with collaborators in Canada, The Netherlands, South Korea, and Japan.

Stemformatics and RDSI
The collections and tools are hosted on QCloud. Justin Clark, QCIF eResearch Analyst at The University of Queensland, has helped the team with the process of getting Stemformatics into production on the new infrastructure. "We owe Justin a fair bit," Rowland says. "We have a good relationship there. It makes it easy."

Othmar Korn, Bioinformatics Software Engineer, says the proximity of the RDSI storage to the processing capacity is critical to incorporating the next generation of genomic data. "We're moving now from the Gigabyte to Terabyte scale in our processing," he says, "and having QCloud and RDSI means we can have our data processing pipeline coupled closely to the data storage. That makes our processing so much more streamlined. Moving forward that's really important because already on the horizon we see the new next-generation of high throughput sequencing data, which is going to be another order of magnitude greater. So the data's getting bigger, and we need our infrastructure to cope with that."
Stemformatics is available at 

Stemformatics is part of the Stem Cells Australia research initiative. This story first appeared on the RDSI website on 4 December 2013 and reproduced with their permission.