WP3 – Tools, Data and Pipelines Hub for Bioinformatics Analysis

WP3 Tools, Data and Pipelines Hub for Bioinformatics Analysis WP Leader: Cedric Notredame (CRG) CRG, INRAE, LUKE, EMBL, FMV-ULisboa
Task 3.1 Survey and Development Task Leader: Andreia Amaral (FMV-ULisboa) FMV-ULISBOA, LUKE
Task 3.2 Integration and standardization of available data analysis pipelines Task Leader: Cedric Notredame (CRG) CRG, FMV-ULisboa, INRAE
Task 3.3 Development of ontologies Task Leader: Daniel Zerbino (EMBL) EMBL
Task 3.4 Data integration Task Leader: Daniel Zerbino (EMBL) EMBL
Task 3.5 Comparative analyses Task Leader: Cedric Notredame (CRG) CRG, FMV-ULisboa, UEDIN

 

WP3 objectives

The main objective of WP3 is to support other WPs to insure compliance of the BovReg work with the FAIR principle of scientific reproducibility. This purpose will be met at two levels: bioinformatic methods and experimental data. BovReg will establish a hub in which the consortium has access to validated and standardized methods, consolidated experimental datasets and a proper ontological description of both data and methods allowing their interoperability. Populating this hub will be the main task of the WP. This will involve integrating data and normalizing computational methods within the consortium. The software to be integrated in the hub is typically at TRL 4; the integration will try to make it ready for a TRL 5 level evaluation (i.e. validation in a significant environment). WP3 will also quantify the evolutionary traces associated with genomic features, thus contributing to the prioritization carried out when selecting SNPs and genomic features for genomic purposes.

Methods and pipelines will be collected across WP1, 2, 4 and WP5 through a joint workshop  combined with the project kick-off meeting and later on through monthly conference calls with representatives of these WP. This process will allow the rapid identification of any methodological gap requiring extra analysis pipelines to be integrated from third parties or developed. An important objective will be to determine the best way to deposit and package methods in adequate repositories along with appropriate reference datasets to allow live quality control and redeployment. These decisions will be taken with FAANG and ELIXIR and follow the Global Alliance for Genomics and Health (GA4GH, https://www.ga4gh.org/) guidelines as much as possible. Data will be processed in a similar way and be integrated in all relevant public repositories by the EMBL. The deployment and development of a suitable ontology will be key to ensuring that methods and data can be properly interoperated. In the long term, WP3 will insure the sustainability of the analysis framework by identifying the best combination of public resources while allowing self-maintenance. Rather than setting up a central repository maintained by the consortium, we will develop a set of procedures that define how data and methods should be handled and moved around public repositories and cloud computational resources. The front end of this effort will be WP10.

 

Di Tommaso, P. et al. “Nextflow enables reproducible computational workflows. Nat Biotechnology volume35, pages 316–319 (2017)


 

Feng Yue et al. “A comparative encyclopedia of DNA elements in the mouse genome”, Nature volume 515, pages355–364(2014)