A two-dimensional Topic-Aspect Model for discovering multi-faceted topics

Michael Paul, Roxana Girju

Research output: Contribution to conferencePaper

Abstract

This paper presents the Topic-Aspect Model (TAM), a Bayesian mixture model which jointly discovers topics and aspects. We broadly define an aspect of a document as a characteristic that spans the document, such as an underlying theme or perspective. Unlike previous models which cluster words by topic or aspect, our model can generate token assignments in both of these dimensions, rather than assuming words come from only one of two orthogonal models. We present two applications of the model. First, we model a corpus of computational linguistics abstracts, and find that the scientific topics identified in the data tend to include both a computational aspect and a linguistic aspect. For example, the computational aspect of GRAMMAR emphasizes parsing, whereas the linguistic aspect focuses on formal languages. Secondly, we show that the model can capture different viewpoints on a variety of topics in a corpus of editorials about the Israeli-Palestinian conflict. We show both qualitative and quantitative improvements in TAM over two other state-of-the-art topic models.

Original languageEnglish (US)
StatePublished - Jul 2010
Event24th AAAI Conference on Artificial Intelligence and the 22nd Innovative Applications of Artificial Intelligence Conference, AAAI-10 / IAAI-10 - Atlanta, GA, United States
Duration: Jul 11 2010Jul 15 2010

Other

Other24th AAAI Conference on Artificial Intelligence and the 22nd Innovative Applications of Artificial Intelligence Conference, AAAI-10 / IAAI-10
CountryUnited States
CityAtlanta, GA
Period7/11/107/15/10

Fingerprint

Linguistics
Computational linguistics
Formal languages

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Cite this

Paul, M., & Girju, R. (2010). A two-dimensional Topic-Aspect Model for discovering multi-faceted topics. Paper presented at 24th AAAI Conference on Artificial Intelligence and the 22nd Innovative Applications of Artificial Intelligence Conference, AAAI-10 / IAAI-10, Atlanta, GA, United States.

A two-dimensional Topic-Aspect Model for discovering multi-faceted topics. / Paul, Michael; Girju, Roxana.

2010. Paper presented at 24th AAAI Conference on Artificial Intelligence and the 22nd Innovative Applications of Artificial Intelligence Conference, AAAI-10 / IAAI-10, Atlanta, GA, United States.

Research output: Contribution to conferencePaper

Paul, M & Girju, R 2010, 'A two-dimensional Topic-Aspect Model for discovering multi-faceted topics', Paper presented at 24th AAAI Conference on Artificial Intelligence and the 22nd Innovative Applications of Artificial Intelligence Conference, AAAI-10 / IAAI-10, Atlanta, GA, United States, 7/11/10 - 7/15/10.
Paul M, Girju R. A two-dimensional Topic-Aspect Model for discovering multi-faceted topics. 2010. Paper presented at 24th AAAI Conference on Artificial Intelligence and the 22nd Innovative Applications of Artificial Intelligence Conference, AAAI-10 / IAAI-10, Atlanta, GA, United States.
Paul, Michael ; Girju, Roxana. / A two-dimensional Topic-Aspect Model for discovering multi-faceted topics. Paper presented at 24th AAAI Conference on Artificial Intelligence and the 22nd Innovative Applications of Artificial Intelligence Conference, AAAI-10 / IAAI-10, Atlanta, GA, United States.
@conference{b312e48c5c27482993fed99077e6ee3f,
title = "A two-dimensional Topic-Aspect Model for discovering multi-faceted topics",
abstract = "This paper presents the Topic-Aspect Model (TAM), a Bayesian mixture model which jointly discovers topics and aspects. We broadly define an aspect of a document as a characteristic that spans the document, such as an underlying theme or perspective. Unlike previous models which cluster words by topic or aspect, our model can generate token assignments in both of these dimensions, rather than assuming words come from only one of two orthogonal models. We present two applications of the model. First, we model a corpus of computational linguistics abstracts, and find that the scientific topics identified in the data tend to include both a computational aspect and a linguistic aspect. For example, the computational aspect of GRAMMAR emphasizes parsing, whereas the linguistic aspect focuses on formal languages. Secondly, we show that the model can capture different viewpoints on a variety of topics in a corpus of editorials about the Israeli-Palestinian conflict. We show both qualitative and quantitative improvements in TAM over two other state-of-the-art topic models.",
author = "Michael Paul and Roxana Girju",
year = "2010",
month = "7",
language = "English (US)",
note = "24th AAAI Conference on Artificial Intelligence and the 22nd Innovative Applications of Artificial Intelligence Conference, AAAI-10 / IAAI-10 ; Conference date: 11-07-2010 Through 15-07-2010",

}

TY - CONF

T1 - A two-dimensional Topic-Aspect Model for discovering multi-faceted topics

AU - Paul, Michael

AU - Girju, Roxana

PY - 2010/7

Y1 - 2010/7

N2 - This paper presents the Topic-Aspect Model (TAM), a Bayesian mixture model which jointly discovers topics and aspects. We broadly define an aspect of a document as a characteristic that spans the document, such as an underlying theme or perspective. Unlike previous models which cluster words by topic or aspect, our model can generate token assignments in both of these dimensions, rather than assuming words come from only one of two orthogonal models. We present two applications of the model. First, we model a corpus of computational linguistics abstracts, and find that the scientific topics identified in the data tend to include both a computational aspect and a linguistic aspect. For example, the computational aspect of GRAMMAR emphasizes parsing, whereas the linguistic aspect focuses on formal languages. Secondly, we show that the model can capture different viewpoints on a variety of topics in a corpus of editorials about the Israeli-Palestinian conflict. We show both qualitative and quantitative improvements in TAM over two other state-of-the-art topic models.

AB - This paper presents the Topic-Aspect Model (TAM), a Bayesian mixture model which jointly discovers topics and aspects. We broadly define an aspect of a document as a characteristic that spans the document, such as an underlying theme or perspective. Unlike previous models which cluster words by topic or aspect, our model can generate token assignments in both of these dimensions, rather than assuming words come from only one of two orthogonal models. We present two applications of the model. First, we model a corpus of computational linguistics abstracts, and find that the scientific topics identified in the data tend to include both a computational aspect and a linguistic aspect. For example, the computational aspect of GRAMMAR emphasizes parsing, whereas the linguistic aspect focuses on formal languages. Secondly, we show that the model can capture different viewpoints on a variety of topics in a corpus of editorials about the Israeli-Palestinian conflict. We show both qualitative and quantitative improvements in TAM over two other state-of-the-art topic models.

UR - http://www.scopus.com/inward/record.url?scp=77958531129&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77958531129&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:77958531129

ER -