5 febr. 2018

Notícies: CRAG’s research personnel trains on Molecular Data Analysis in Python using the education portal DataCamp

With the current high-throughput tools, a PhD student will probably be able to obtain a huge mass of biological data during his/her thesis. The challenging task is then to integrate all this biological information in a meaningful way: how can we understand the relationship between a plant or an animal genotype with its phenotype? The answer to this challenging task is in the use of computer-based approaches to manipulate, visualize and analyze the biological data, i.e., computational biology.

To provide CRAG research personnel with the basic tools to be self-sufficient in answering molecular biology questions from large datasets, CRAG’s Molecular Data Analysis Area within the Bioinformatics Core Unit has set up an internal training program in systems biology and bioinformatics.  This training program covers different courses such as Network Analysis and Modeling in Cytoscape, Biostatistics and Introductory R Programming for Molecular Data Analyses, and Data Visualization, Storytelling and Scientific Principles of Design.  Recently, the training program has been expanded with the inclusion of a new course on Molecular Data Analysis in Python.  Python is a general-purpose programming language that is becoming more and more popular not only in analysis for molecular biology, but for overall data science.

The trainer of this new course, Martí Bernardo -head of CRAG’s Molecular Data Analysis Area- has used for the first time the online education portal DataCamp, which offers courses on data science, statistics, and machine learning. The 12 PhD students and postdocs who took the course at CRAG this November enjoyed a theory session and two very dynamic hands-on sessions using the DataCamp portal. As Martí Bernardo explains, “DataCamp allows CRAG’s Molecular Data Analysis Area to tailor the training offered to the analysis needs and biological questions of CRAG’s researchers.

Miguel Simon, a PhD student at group of Manuel Rodríguez-Concepción, was one of the trainees on this new course on Molecular Data Analysis in Python. Miguel, who holds a degree on Biotechnology and is currently studying the genetic factors that are involved in the tomato fruit ripening, explains that “for a biologist, working with this kind of computational tools is difficult, because we do not have a strong background in informatics. This Python course has helped me to get started in these techniques and, importantly, to abandon the fear of its complexity!”

The course started from the most basic levels, which was very useful for the trainees without any previous knowledge. Step by step, more levels of complexity were added and at the end of the course, the trainees were able to manage data sets of considerable size.“Finally, we learned to design complex situations, where we had to take into account a large number of different probabilistic factors. At the end of the course you realize that you have just discovered a world that can greatly help our research and that will increasingly be essential in every laboratory,” explains Miguel Simon.

In summary, in the course the assistants learned powerful ways to store, manipulate and visualize their data, as well as tools to start their own analysis. Including this last course, CRAG’s internal training program now covers fundamental skills such as scientific writing and grant writing as well as technical skills such as biostatistics, visualization, programming and mathematical modelling.