As data reported by humans about our world, text data play a very important role in all data mining applications, yet how to develop a general text analysis system to sup- port all text mining applications is a difficult challenge. In this position paper, we introduce SOFSAT, a new frame- work that can support set-like operators for semantic analy- sis of natural text data with variable text representations. It includes three basic set-like operators|TextIntersect, Tex- tUnion, and TextDi erence|that are analogous to the cor- responding set operators intersection, union, and di erence, respectively, which can be applied to any representation of text data, and di erent representations can be combined via transformation functions that map text to and from any rep- resentation. Just as the set operators can be exibly com- bined iteratively to construct arbitrary subsets or supersets based on some given sets, we show that the correspond- ing text analysis operators can also be combined exibly to support a wide range of analysis tasks that may require di erent work ows, thus enabling an application developer to \program" a text mining application by using SOFSAT as an application programming language for text analysis. We discuss instantiations and implementation strategies of the framework with some speci c examples, present ideas about how the framework can be implemented by exploit- ing/extending existing techniques, and provide a roadmap for future research in this new direction.
Original languageEnglish (US)
Pages (from-to)21–30
JournalSIGKDD Explorations Newsletter
Issue number2
StatePublished - Dec 2018


  • Semantic Analysis
  • Semantic Operator for Text
  • Text Mining
  • Intelligent Text Analysis

Cite this