Annotating gene sets by mining large literature collections with protein networks

Sheng Wang, Jianzhu Ma, Michael Ku Yu, Fan Zheng, Edward W. Huang, Jiawei Han, Jian Peng, Trey Ideker

Research output: Contribution to journalConference article

Abstract

Analysis of patient genomes and transcriptomes routinely recognizes new gene sets associated with human disease. Here we present an integrative natural language processing system which infers common functions for a gene set through automatic mining of the scientific literature with biological networks. This system links genes with associated literature phrases and combines these links with protein interactions in a single heterogeneous network. Multiscale functional annotations are inferred based on network distances between phrases and genes and then visualized as an ontology of biological concepts. To evaluate this system, we predict functions for gene sets representing known pathways and find that our approach achieves substantial improvement over the conventional text-mining baseline method. Moreover, our system discovers novel annotations for gene sets or pathways without previously known functions. Two case studies demonstrate how the system is used in discovery of new cancer-related pathways with ontological annotations.

Original languageEnglish (US)
Pages (from-to)602-613
Number of pages12
JournalPacific Symposium on Biocomputing
Volume0
Issue number212669
DOIs
StatePublished - Jan 1 2018
Event23rd Pacific Symposium on Biocomputing, PSB 2018 - Kohala Coast, United States
Duration: Jan 3 2018Jan 7 2018

Fingerprint

Genes
Proteins
Biological Ontologies
Natural Language Processing
Literature
Molecular Sequence Annotation
Data Mining
Natural language processing systems
Transcriptome
Heterogeneous networks
Genome
Ontology
Neoplasms

Keywords

  • Functional annotations
  • Gene interactions
  • Knowledge network
  • Text mining

ASJC Scopus subject areas

  • Medicine(all)

Cite this

Annotating gene sets by mining large literature collections with protein networks. / Wang, Sheng; Ma, Jianzhu; Yu, Michael Ku; Zheng, Fan; Huang, Edward W.; Han, Jiawei; Peng, Jian; Ideker, Trey.

In: Pacific Symposium on Biocomputing, Vol. 0, No. 212669, 01.01.2018, p. 602-613.

Research output: Contribution to journalConference article

Wang, Sheng ; Ma, Jianzhu ; Yu, Michael Ku ; Zheng, Fan ; Huang, Edward W. ; Han, Jiawei ; Peng, Jian ; Ideker, Trey. / Annotating gene sets by mining large literature collections with protein networks. In: Pacific Symposium on Biocomputing. 2018 ; Vol. 0, No. 212669. pp. 602-613.
@article{59752c5731ea4b5bb15a112297580af2,
title = "Annotating gene sets by mining large literature collections with protein networks",
abstract = "Analysis of patient genomes and transcriptomes routinely recognizes new gene sets associated with human disease. Here we present an integrative natural language processing system which infers common functions for a gene set through automatic mining of the scientific literature with biological networks. This system links genes with associated literature phrases and combines these links with protein interactions in a single heterogeneous network. Multiscale functional annotations are inferred based on network distances between phrases and genes and then visualized as an ontology of biological concepts. To evaluate this system, we predict functions for gene sets representing known pathways and find that our approach achieves substantial improvement over the conventional text-mining baseline method. Moreover, our system discovers novel annotations for gene sets or pathways without previously known functions. Two case studies demonstrate how the system is used in discovery of new cancer-related pathways with ontological annotations.",
keywords = "Functional annotations, Gene interactions, Knowledge network, Text mining",
author = "Sheng Wang and Jianzhu Ma and Yu, {Michael Ku} and Fan Zheng and Huang, {Edward W.} and Jiawei Han and Jian Peng and Trey Ideker",
year = "2018",
month = "1",
day = "1",
doi = "10.1142/9789813235533_0055",
language = "English (US)",
volume = "0",
pages = "602--613",
journal = "Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing",
issn = "2335-6936",
number = "212669",

}

TY - JOUR

T1 - Annotating gene sets by mining large literature collections with protein networks

AU - Wang, Sheng

AU - Ma, Jianzhu

AU - Yu, Michael Ku

AU - Zheng, Fan

AU - Huang, Edward W.

AU - Han, Jiawei

AU - Peng, Jian

AU - Ideker, Trey

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Analysis of patient genomes and transcriptomes routinely recognizes new gene sets associated with human disease. Here we present an integrative natural language processing system which infers common functions for a gene set through automatic mining of the scientific literature with biological networks. This system links genes with associated literature phrases and combines these links with protein interactions in a single heterogeneous network. Multiscale functional annotations are inferred based on network distances between phrases and genes and then visualized as an ontology of biological concepts. To evaluate this system, we predict functions for gene sets representing known pathways and find that our approach achieves substantial improvement over the conventional text-mining baseline method. Moreover, our system discovers novel annotations for gene sets or pathways without previously known functions. Two case studies demonstrate how the system is used in discovery of new cancer-related pathways with ontological annotations.

AB - Analysis of patient genomes and transcriptomes routinely recognizes new gene sets associated with human disease. Here we present an integrative natural language processing system which infers common functions for a gene set through automatic mining of the scientific literature with biological networks. This system links genes with associated literature phrases and combines these links with protein interactions in a single heterogeneous network. Multiscale functional annotations are inferred based on network distances between phrases and genes and then visualized as an ontology of biological concepts. To evaluate this system, we predict functions for gene sets representing known pathways and find that our approach achieves substantial improvement over the conventional text-mining baseline method. Moreover, our system discovers novel annotations for gene sets or pathways without previously known functions. Two case studies demonstrate how the system is used in discovery of new cancer-related pathways with ontological annotations.

KW - Functional annotations

KW - Gene interactions

KW - Knowledge network

KW - Text mining

UR - http://www.scopus.com/inward/record.url?scp=85048492188&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85048492188&partnerID=8YFLogxK

U2 - 10.1142/9789813235533_0055

DO - 10.1142/9789813235533_0055

M3 - Conference article

C2 - 29218918

AN - SCOPUS:85048492188

VL - 0

SP - 602

EP - 613

JO - Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

JF - Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

SN - 2335-6936

IS - 212669

ER -