SOFSAT: Towards a Setlike Operator Based Framework for Semantic Analysis of Text

Shubhra Kanti Karmaker Santu, Chase Geigle, Duncan Ferguson, William Cope, Mary Kalantzis, Duane Searsmith, Chengxiang Zhai

Research output: Contribution to journalArticle

Abstract

As data reported by humans about our world, text data play a very important role in all data mining applications, yet how to develop a general text analysis system to sup- port all text mining applications is a difficult challenge. In this position paper, we introduce SOFSAT, a new frame- work that can support set-like operators for semantic analy- sis of natural text data with variable text representations. It includes three basic set-like operators|TextIntersect, Tex- tUnion, and TextDi erence|that are analogous to the cor- responding set operators intersection, union, and di erence, respectively, which can be applied to any representation of text data, and di erent representations can be combined via transformation functions that map text to and from any rep- resentation. Just as the set operators can be exibly com- bined iteratively to construct arbitrary subsets or supersets based on some given sets, we show that the correspond- ing text analysis operators can also be combined exibly to support a wide range of analysis tasks that may require di erent work ows, thus enabling an application developer to \program" a text mining application by using SOFSAT as an application programming language for text analysis. We discuss instantiations and implementation strategies of the framework with some speci c examples, present ideas about how the framework can be implemented by exploit- ing/extending existing techniques, and provide a roadmap for future research in this new direction.
Original languageEnglish (US)
Pages (from-to)21–30
JournalSIGKDD Explorations Newsletter
Volume20
Issue number2
DOIs
StatePublished - Dec 2018

Keywords

  • Semantic Analysis
  • Semantic Operator for Text
  • Text Mining
  • Intelligent Text Analysis

Fingerprint Dive into the research topics of 'SOFSAT: Towards a Setlike Operator Based Framework for Semantic Analysis of Text'. Together they form a unique fingerprint.

  • Cite this