IBsquare Toolbox for Oligogenic Analysis

The IBsquare Toolbox for Oligogenic Analysis comprises three tools that are meant to assist researchers and doctors alike in the identification of genetic diseases. These three tools are closely intertwined as the data contained in the DIDA database was used to train the VarCopp predictor, which in turn is part of the pipeline for ORVAL.

DIDA: DIgenic Diseases DAtabase is a novel database that provides for the first time detailed information on genes and associated genetic variants involved in digenic diseases, the simplest form of oligogenic inheritance.

VarCoPP: Variant Combination Pathogenicity Predictor is a machine-learning method that predicts the potential pathogenicity of any bi-locus variant combination (i.e. a combination of two to four variant alleles between two genes). It has been trained on digenic disease data present in the Digenic Diseases Database (DIDA) and variant data derived from control individuals of the 1000 Genomes Project (1KGP). VarCoPP consists of an ensemble of 500 individual Random Forest predictors that predict whether a variant combination is disease-causing (i.e. candidate or probably pathogenic) or neutral (i.e. probably neutral).

ORVAL: Oligogenic Resource for Variant AnaLysis is a platform for the prediction and exploration of candidate disease-causing oligogenic variant combinations.

TCRex

The TCRex webtool allows functional interpretation of full human T-cell repertoire data derived from next generation sequencing.

TCRex is the first tool of its kind and is able to link T-cell receptor sequences to a rapidly expanding list of 49 different important immunogenic epitopes, consisting of 44 viral and 5 cancer epitopes. Additional epitopes can be added by users for their own use. The tool is able to calculate enrichment statistics and baseline prediction rates to evaluate full repertoires. It is unique among TCR-epitope prediction tools in that it allows processing of full human repertoires. It has also brought together the largest database on TCR-epitope data to train the underlying machine learning models through manual curation of various online resources and scientific literature.

TCRex was developed at the University of Antwerp and has been released for public use in 2018. The manuscript describing the webtool is available as a preprint in BioRXiv.  TCRex can also be followed on Twitter: @TCRexTool.

MutaFrame

MutaFrame enables you to explore the likely effect of amino acid variants (mutations) on human proteins. It provides predictions of the ‘deleteriousness’ of mutations in human proteins, with interpretation of the underlying machine learning decisions, access to other resources (EXaC, dbSNP), and the connection to protein structure information from the Protein Data Bank (PDB). MutaFrame aims to visualise these data to make them understandable for non-expert users, and serves as a knowledge base by providing information for all possible mutations in all human proteins.

MutaFrame is developed at the Interuniversity Institute of Bioinformatics in Brussels (IB)², the manuscript can be accessed here.

DIDA

DIDA (DIgenic diseases DAtabase) is a database that provides detailed information on genes and associated genetic variants involved in digenic diseases, the simplest form of oligogenic inheritance. The database was developed at the Interuniversity Institute of Bioinformatics in Brussels (IB)² and currently includes 258 digenic combinations involved in 54 different digenic diseases. These combinations are composed of 448 distinct variants, which are distributed over 169 distinct genes. The web interface provides browsing, exploration and search functionalities, as well as documentation and help pages, general database statistics and references to the original publications from which the data have been collected. The possibility to submit novel digenic data to DIDA is also provided.

DIDA is published in the Nucleic Acids Research Database issue 2016 and has been selected as a NAR 2016 Breakthrough paper. The manuscript can be accessed here and an in-depth analysis of the different types of digenic diseases, i.e. true digenic and composite, in DIDA can be found here.

DIDA is part of the IBsquare Toolbox for Oligogenic Analysis which is one of ELIXIR Belgium’s Node Services.

VariantDB

VariantDB is a web-based interactive annotation and filtering platform that automatically annotates variants with allele frequencies, functional impact, pathogenicity predictions and pathway information. VariantDB allows filtering by all annotations, under dominant, recessive or de novo inheritance models. VariantDB is therefore a user-friendly and powerful tool to help in the interpretation of NGS data.

VariantDB was developed at the University of Antwerp and published in Genome Medicine (manuscript).

MS DataConnect

The MSDataConnect consortium (UHasselt) connects partners involved in Multiple Sclerosis (MS) care, rehabilitation, and research, with partners involved in IT development, database management, data sharing procedures, statistics, machine learning and prediction modelling.

MS DataConnect focuses on developing (1) data collection procedures and tools to create data that is FAIR, (2) IT solutions to allow (temporarily) pooling and linking of FAIR data sets, (3) statistical methods to define minimal requirements for data sets, and (4) new analytical methods for optimal mining of connected and pooled FAIR data sets.

The MS DataConnect consortium is the project coordinator of the international project Multiple Sclerosis Data Alliance (MSDA), which brings together registry holders, patients, medical societies, academia, industry, the European Medicine Agency (EMA), and Health Technology Assessment (HTA) bodies. One of the first goals of the MSDA is the development of the MSDA cohort explorer. This tool will enable searching aggregated data across different MS registries and cohorts after data is mapped to a harmonized data template. The subject and variable selection tools in the MSDA cohort explorer will allow end-users to identify MS data cohorts suitable for their (research-)question, and will facilitate the initiation of (new) collaborations with these MS cohorts.

Follow MS DataConnect on Twitter: @MSDataConnect1.

WiNGS

WiNGS is being developed to succeed NGS-Logistics, the genomics data sharing platform established at KULeuven. The NGS-Logistics platform is currently deployed at 6 of the 8 genetic centres in Belgium and lets users identify clinically relevant mutations at other genetic centres. Its scope is similar but broader than the more recent Beacon project of the Global Alliance for Genomics and Health (GA4GH) and ELIXIR Europe, which provides a framework for public web services for variant discovery against distributed genomic data collections. The ELIXIR Belgium Human Data consortium, led by KULeuven, has been participating in the ELIXIR Beacon project and maintains a Belgian Beacon node.

The WiNGS (Widely integrated NGS) platform holds significant scalability improvements to tackle the complexity of analysing of Whole-Genome Sequencing (WGS) data. Because of the sensitivity of patient genomes and GDPR requirements, enhanced access control and privacy protection for this integrated platform will be developed. The first manuscript of the platform can be downloaded here.