About FusionGDB2 (Fusion gene annotation update aided by deep learning)
Fusion genes, which have the breakpoints of structural variants on their gene body, provide a highlighted structural variant resource for studying the genomic breakages with expression and potential pathogenic impacts. A knowledgebase of the systematic functional annotation of fusion genes is critical for understanding genomic breakage context and developing therapeutic strategies. FusionGDB is a unique functional annotation database of human fusion genes and widely used for diverse aims’ studies. In this study, we report FusionGDB 2.0, which has substantial updates of contents such as (i) up-to-date human fusion genes with breakpoint location from the gene structure browser, (ii) fusion gene breakage tendency score with FusionAI deep learning model based on 20kb genomic sequence around BP area, (iii) investigation of overlapping between fusion breakpoints with 44 human genomic features across five cellular role’s categories (i.e., integration sites of 6 viruses, 13 types of repeats, 5 types of structural variants, 15 different chromatin stated regions, and 5 gene expression regulatory regions), (iv) transcribed chimeric sequence and following (v) open reading frame analysis with coding potential based on deep learning approach with Ribo-seq read features, and (vi) rigorous investigation of the protein feature retention of individual fusion partner genes in the protein level with FGviewer. Among ~ 126k fusion genes, about 16k kept their ORFs as the in-frame. These in-frame fusion genes are potentially transcribed into 83K fusion transcripts and translated into 43K fusion proteins. These fusion protein sequences are used to predict the 3D structure of fusion proteins in FusionPDB and to predict the fusion gene-derived fusion breakpoint-specific neoantigens in FusionNeoAntigen. FusionGDB 2.0 provides eight categories of annotations: Fusion Gene Summary, Fusion Gene ORF analysis, Fusion Gene Genomic Features, Fusion Protein Features, Fusion Gene Sequence, Fusion Gene PPI analysis, Related Drugs, and Related Diseases.
* The background image of banner at the top is showing the distribution of the median values of the feature importance scores of every 20 nucleotide across 20K bp of all known TCGA fusion genes.