Assembly and annotation of a draft genome sequence for Glycine latifolia, a perennial wild relative of soybean

Qiong Liu, Sungyul Chang, Glen L. Hartman, Leslie L. Domier

Research output: Contribution to journalArticle

Abstract

Glycine latifolia (Benth.) Newell & Hymowitz (2n = 40), one of the 27 wild perennial relatives of soybean, possesses genetic diversity and agronomically favorable traits that are lacking in soybean. Here, we report the 939-Mb draft genome assembly of G. latifolia (PI 559298) using exclusively linked-reads sequenced from a single Chromium library. We organized scaffolds into 20 chromosome-scale pseudomolecules utilizing two genetic maps and the Glycine max (L.) Merr. genome sequence. High copy numbers of putative 91-bp centromere-specific tandem repeats were observed in consecutive blocks within predicted pericentromeric regions on several pseudomolecules. No 92-bp putative centromeric repeats, which are abundant in G. max, were detected in G. latifolia or Glycine tomentella. Annotation of the assembled genome and subsequent filtering yielded a high confidence gene set of 54 475 protein-coding loci. In comparative analysis with five legume species, genes related to defense responses were significantly overrepresented in Glycine-specific orthologous gene families. A total of 304 putative nucleotide-binding site (NBS)-leucine-rich-repeat (LRR) genes were identified in this genome assembly. Different from other legume species, we observed a scarcity of TIR-NBS-LRR genes in G. latifolia. The G. latifolia genome was also predicted to contain genes encoding 367 LRR-receptor-like kinases, a family of proteins involved in basal defense responses and responses to abiotic stress. The genome sequence and annotation of G. latifolia provides a valuable source of alternative alleles and novel genes to facilitate soybean improvement. This study also highlights the efficacy and cost-effectiveness of the application of Chromium linked-reads in diploid plant genome de novo assembly.

Original languageEnglish (US)
Pages (from-to)71-85
Number of pages15
JournalPlant Journal
Volume95
Issue number1
DOIs
StatePublished - Jul 2018

Fingerprint

wild relatives
Soybeans
Glycine
Genome
soybeans
genome
Genes
Leucine
genes
leucine
genome assembly
Chromium
chromium
Fabaceae
binding sites
Glycine tomentella
Nucleotides
legumes
nucleotides
Binding Sites

Keywords

  • 10X Genomics
  • Glycine latifolia
  • disease resistance
  • genome sequence
  • soybean
  • wild perennial relative

ASJC Scopus subject areas

  • Genetics
  • Plant Science
  • Cell Biology

Cite this

Assembly and annotation of a draft genome sequence for Glycine latifolia, a perennial wild relative of soybean. / Liu, Qiong; Chang, Sungyul; Hartman, Glen L.; Domier, Leslie L.

In: Plant Journal, Vol. 95, No. 1, 07.2018, p. 71-85.

Research output: Contribution to journalArticle

@article{285254331f45468a928622d5623417e6,
title = "Assembly and annotation of a draft genome sequence for Glycine latifolia, a perennial wild relative of soybean",
abstract = "Glycine latifolia (Benth.) Newell & Hymowitz (2n = 40), one of the 27 wild perennial relatives of soybean, possesses genetic diversity and agronomically favorable traits that are lacking in soybean. Here, we report the 939-Mb draft genome assembly of G. latifolia (PI 559298) using exclusively linked-reads sequenced from a single Chromium library. We organized scaffolds into 20 chromosome-scale pseudomolecules utilizing two genetic maps and the Glycine max (L.) Merr. genome sequence. High copy numbers of putative 91-bp centromere-specific tandem repeats were observed in consecutive blocks within predicted pericentromeric regions on several pseudomolecules. No 92-bp putative centromeric repeats, which are abundant in G. max, were detected in G. latifolia or Glycine tomentella. Annotation of the assembled genome and subsequent filtering yielded a high confidence gene set of 54 475 protein-coding loci. In comparative analysis with five legume species, genes related to defense responses were significantly overrepresented in Glycine-specific orthologous gene families. A total of 304 putative nucleotide-binding site (NBS)-leucine-rich-repeat (LRR) genes were identified in this genome assembly. Different from other legume species, we observed a scarcity of TIR-NBS-LRR genes in G. latifolia. The G. latifolia genome was also predicted to contain genes encoding 367 LRR-receptor-like kinases, a family of proteins involved in basal defense responses and responses to abiotic stress. The genome sequence and annotation of G. latifolia provides a valuable source of alternative alleles and novel genes to facilitate soybean improvement. This study also highlights the efficacy and cost-effectiveness of the application of Chromium linked-reads in diploid plant genome de novo assembly.",
keywords = "10X Genomics, Glycine latifolia, disease resistance, genome sequence, soybean, wild perennial relative",
author = "Qiong Liu and Sungyul Chang and Hartman, {Glen L.} and Domier, {Leslie L.}",
year = "2018",
month = "7",
doi = "10.1111/tpj.13931",
language = "English (US)",
volume = "95",
pages = "71--85",
journal = "Plant Journal",
issn = "0960-7412",
publisher = "Wiley-Blackwell",
number = "1",

}

TY - JOUR

T1 - Assembly and annotation of a draft genome sequence for Glycine latifolia, a perennial wild relative of soybean

AU - Liu, Qiong

AU - Chang, Sungyul

AU - Hartman, Glen L.

AU - Domier, Leslie L.

PY - 2018/7

Y1 - 2018/7

N2 - Glycine latifolia (Benth.) Newell & Hymowitz (2n = 40), one of the 27 wild perennial relatives of soybean, possesses genetic diversity and agronomically favorable traits that are lacking in soybean. Here, we report the 939-Mb draft genome assembly of G. latifolia (PI 559298) using exclusively linked-reads sequenced from a single Chromium library. We organized scaffolds into 20 chromosome-scale pseudomolecules utilizing two genetic maps and the Glycine max (L.) Merr. genome sequence. High copy numbers of putative 91-bp centromere-specific tandem repeats were observed in consecutive blocks within predicted pericentromeric regions on several pseudomolecules. No 92-bp putative centromeric repeats, which are abundant in G. max, were detected in G. latifolia or Glycine tomentella. Annotation of the assembled genome and subsequent filtering yielded a high confidence gene set of 54 475 protein-coding loci. In comparative analysis with five legume species, genes related to defense responses were significantly overrepresented in Glycine-specific orthologous gene families. A total of 304 putative nucleotide-binding site (NBS)-leucine-rich-repeat (LRR) genes were identified in this genome assembly. Different from other legume species, we observed a scarcity of TIR-NBS-LRR genes in G. latifolia. The G. latifolia genome was also predicted to contain genes encoding 367 LRR-receptor-like kinases, a family of proteins involved in basal defense responses and responses to abiotic stress. The genome sequence and annotation of G. latifolia provides a valuable source of alternative alleles and novel genes to facilitate soybean improvement. This study also highlights the efficacy and cost-effectiveness of the application of Chromium linked-reads in diploid plant genome de novo assembly.

AB - Glycine latifolia (Benth.) Newell & Hymowitz (2n = 40), one of the 27 wild perennial relatives of soybean, possesses genetic diversity and agronomically favorable traits that are lacking in soybean. Here, we report the 939-Mb draft genome assembly of G. latifolia (PI 559298) using exclusively linked-reads sequenced from a single Chromium library. We organized scaffolds into 20 chromosome-scale pseudomolecules utilizing two genetic maps and the Glycine max (L.) Merr. genome sequence. High copy numbers of putative 91-bp centromere-specific tandem repeats were observed in consecutive blocks within predicted pericentromeric regions on several pseudomolecules. No 92-bp putative centromeric repeats, which are abundant in G. max, were detected in G. latifolia or Glycine tomentella. Annotation of the assembled genome and subsequent filtering yielded a high confidence gene set of 54 475 protein-coding loci. In comparative analysis with five legume species, genes related to defense responses were significantly overrepresented in Glycine-specific orthologous gene families. A total of 304 putative nucleotide-binding site (NBS)-leucine-rich-repeat (LRR) genes were identified in this genome assembly. Different from other legume species, we observed a scarcity of TIR-NBS-LRR genes in G. latifolia. The G. latifolia genome was also predicted to contain genes encoding 367 LRR-receptor-like kinases, a family of proteins involved in basal defense responses and responses to abiotic stress. The genome sequence and annotation of G. latifolia provides a valuable source of alternative alleles and novel genes to facilitate soybean improvement. This study also highlights the efficacy and cost-effectiveness of the application of Chromium linked-reads in diploid plant genome de novo assembly.

KW - 10X Genomics

KW - Glycine latifolia

KW - disease resistance

KW - genome sequence

KW - soybean

KW - wild perennial relative

UR - http://www.scopus.com/inward/record.url?scp=85047537570&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85047537570&partnerID=8YFLogxK

U2 - 10.1111/tpj.13931

DO - 10.1111/tpj.13931

M3 - Article

C2 - 29671916

AN - SCOPUS:85047537570

VL - 95

SP - 71

EP - 85

JO - Plant Journal

JF - Plant Journal

SN - 0960-7412

IS - 1

ER -