A method and system for augmenting a corpus with documents on concepts not sufficiently covered within the corpus is provided. The augmentation system generates a corpus concept graph from the documents of a corpus. A corpus concept graph represents concepts of the documents as nodes and related concepts as links between nodes. To generate a corpus concept graph, the augmentation system identifies the concepts that are related within each document of the corpus and adds nodes and links to the corpus concept graph for related concepts. The augmentation system analyzes the corpus concept graph to determine whether the relatedness of concepts of the documents of the corpus is sufficient. If the relatedness of a pair of concepts is not sufficient, then the augmentation system attempts to identify documents not already in the corpus that are related to the concepts that are not sufficiently related.
|Original language||English (US)|
|U.S. patent number||7555472|
|State||Published - Jun 30 2009|