MiTPeptideDB: a proteogenomic resource for the discovery of novel peptides

Authors: Elizabeth Guruceaga Alba Garin Muga Victor Segura

Date: 01.01.2020

Bioinformatics


Abstract

Motivation: The principal lines of research in MS/MS based Proteomics have been directed toward the molecular characterization of the proteins including their biological functions and their implications in human diseases. Recent advances in this field have also allowed the first attempts to apply these techniques to the clinical practice. Nowadays, themain progress in Computational Proteomics is based on the integration of genomic, transcriptomic and proteomic experimental data, what is known as
Proteogenomics. This methodology is being especially useful for the discovery of new clinical biomarkers, small open reading frames and microproteins, although their validation is still challenging. Results: We detected novel peptides following a proteogenomic workflow based on the MiTranscriptome human assembly and shotgun experiments. The annotation approach generated three custom databases with the corresponding peptides of known and novel transcripts of both protein coding genes and non-coding genes. In addition, we used a peptide detectability filter to improve the computational performance of the proteomic searches, the statistical analysis and the robustness of the results. These innovative additional filters are specially relevant when noisy next generation sequencing experiments are used to generate the databases. This resource, MiTPeptideDB, was validated using 43 cell lines for which RNA-Seq experiments and shotgun experiments were available.

BIB_text

@Article {
title = {MiTPeptideDB: a proteogenomic resource for the discovery of novel peptides},
journal = {Bioinformatics},
pages = {205-211},
volume = {36},
keywds = {
proteogenomics, novel peptides, proteomics, genomics, mitranscriptome, resource
}
abstract = {

Motivation: The principal lines of research in MS/MS based Proteomics have been directed toward the molecular characterization of the proteins including their biological functions and their implications in human diseases. Recent advances in this field have also allowed the first attempts to apply these techniques to the clinical practice. Nowadays, themain progress in Computational Proteomics is based on the integration of genomic, transcriptomic and proteomic experimental data, what is known as
Proteogenomics. This methodology is being especially useful for the discovery of new clinical biomarkers, small open reading frames and microproteins, although their validation is still challenging. Results: We detected novel peptides following a proteogenomic workflow based on the MiTranscriptome human assembly and shotgun experiments. The annotation approach generated three custom databases with the corresponding peptides of known and novel transcripts of both protein coding genes and non-coding genes. In addition, we used a peptide detectability filter to improve the computational performance of the proteomic searches, the statistical analysis and the robustness of the results. These innovative additional filters are specially relevant when noisy next generation sequencing experiments are used to generate the databases. This resource, MiTPeptideDB, was validated using 43 cell lines for which RNA-Seq experiments and shotgun experiments were available.


}
doi = {10.1093/bioinformatics/btz530},
date = {2020-01-01},
}
Vicomtech

Parque Científico y Tecnológico de Gipuzkoa,
Paseo Mikeletegi 57,
20009 Donostia / San Sebastián (Spain)

+(34) 943 309 230

close overlay