Genome sequencing projects have revolutionized biology, producing vast catalogs of predicted genes across model and non-model organisms. However, these predictions often require validation, as computational gene models alone cannot always capture alternative splicing, novel open reading frames, or previously unannotated coding regions. This is where proteogenomics plays a transformative role. By integrating mass spectrometry (MS)-based proteomics with genomic and transcriptomic data, proteogenomics provides direct evidence of protein expression, refining and improving genome annotation.
The significance of proteogenomics extends beyond improving genome annotation
Provides peptide-level validation of predicted coding regions, reducing false positives in genome annotations.
Refinement of gene models:
Provides peptide-level validation of predicted coding regions, reducing false positives in genome annotations.
Detects uncharacterized proteoforms and alternative transcripts, expanding the known protein repertoire.
Discovery of novel proteins:
Detects uncharacterized proteoforms and alternative transcripts, expanding the known protein repertoire.
Bridges transcriptomics with functional protein evidence, ensuring predicted transcripts are biologically relevant.
Cross-omics validation:
Bridges transcriptomics with functional protein evidence, ensuring predicted transcripts are biologically relevant.
In humans, proteogenomics enhances the understanding of cancer, rare diseases, and microbial pathogenesis by uncovering novel therapeutic targets and biomarkers.