AllergenAI: a deep learning model predicting allergenicity based on protein sequence |
AllergenAI overview |
Training and validation data Training data - one-hot encode protein matrix (allergens, positive) - protein information index in the one-hot matrix (allergens, positive) - one-hot encode protein matrix (non-allergens, negative) - protein information index in the one-hot matrix (non-allergens, negative) Cupin proteins in SDAP2.0 - one-hot encode protein matrix (cupin protiens in SDAP) - protein information index in the one-hot matrix (cupin protiens in SDAP) Non-allergen proteins in the cupin pfam - one-hot encode protein matrix (non-allergenic cupin) - protein information index in the one-hot matrix (non-allergenic cupin) Modeling with SDAP allergens (with protein sequence and protein 3D structure information) - one-hot encode protein matrix (allergens, positive) - protein information index in the one-hot matrix (allergens, positive) - one-hot encode protein matrix (non-allergens, negative) - protein information index in the one-hot matrix (non-allergens, negative) Modeling with SDAP allergens (with protein sequence, but without protein 3D structure information) - one-hot encode protein matrix (allergens, positive) - protein information index in the one-hot matrix (allergens, positive) - one-hot encode protein matrix (non-allergens, negative) - protein information index in the one-hot matrix (non-allergens, negative) |
Available Models and Codes for the AllergenAI - AllergenAI model with full training data - a model with SDAP allergen protein sequence information - a model with SDAP allergen protein sequence and 3D structure information - Pre-process: make one-hot encode protein matrix Command: python AllergenAI_preprocess.py input.fata Example command: python AllergenAI_preprocess.py Cupin.fasta Example fasta file: Cupin.fasta - Predict the allergenicity of your protein by running AllergenAI model Command:pyton Run_AllergenAI.py input.txt Example command: pyton Run_AllergenAI.py Cupin.txt Example concatenated one-hot encode matrix of your proteins (made in the pre-processing step): Cupin.txt Output: P of non-allergen, P of allergen Software and algorithms to train and run AllergenAI - python3 - packages: tensorflow, keras2.11, numpy and pandas # install python conda update conda conda create -n allergenai python=3 conda activate allergenai # install requirements conda install numpy conda install pandas conda install tensorflow pip install -upgrade tensorflow conda install keras2.11 |
About us Department of Bioinformatics and Systems Medicine McWilliams School of Biomedical Informatics The University of Texas Health Science Center at Houston 7000 Fannin Street, Houston, TX 77030 Sealy Center for Structural Biology and Molecular Biophysics University of Texas Medical Branch 301 University Blvd, Galveston, TX 77555 |
Related citations - Ivanciuc O, Schein H C, Braun W. SDAP: database and computational tools for allergenic proteins. Nucleic Acids Res. 2003 Jan 1;31(1):359-62. doi: 10.1093/nar/gkg010. - Negi S S, Schein H C, Braun W. The updated Structural Database of Allergenic Proteins (SDAP 2.0) provides 3D models for allergens and incorporated bioinformatics tools. J Allergy Clin Immunol Glob. 2023 Aug 11;2(4):100162. doi: 10.1016/j.jacig.2023.100162. |