TY - JOUR
T1 - SOFSAT: Towards a Setlike Operator Based Framework for Semantic Analysis of Text
AU - Karmaker Santu, Shubhra Kanti
AU - Geigle, Chase
AU - Ferguson, Duncan
AU - Cope, William
AU - Kalantzis, Mary
AU - Searsmith, Duane
AU - Zhai, Chengxiang
PY - 2018/12
Y1 - 2018/12
N2 - As data reported by humans about our world, text data play a very important role in all data mining applications, yet how to develop a general text analysis system to sup- port all text mining applications is a difficult challenge. In this position paper, we introduce SOFSAT, a new frame- work that can support set-like operators for semantic analy- sis of natural text data with variable text representations. It includes three basic set-like operators|TextIntersect, Tex- tUnion, and TextDi erence|that are analogous to the cor- responding set operators intersection, union, and di erence, respectively, which can be applied to any representation of text data, and di erent representations can be combined via transformation functions that map text to and from any rep- resentation. Just as the set operators can be exibly com- bined iteratively to construct arbitrary subsets or supersets based on some given sets, we show that the correspond- ing text analysis operators can also be combined exibly to support a wide range of analysis tasks that may require di erent work ows, thus enabling an application developer to \program" a text mining application by using SOFSAT as an application programming language for text analysis. We discuss instantiations and implementation strategies of the framework with some speci c examples, present ideas about how the framework can be implemented by exploit- ing/extending existing techniques, and provide a roadmap for future research in this new direction.
AB - As data reported by humans about our world, text data play a very important role in all data mining applications, yet how to develop a general text analysis system to sup- port all text mining applications is a difficult challenge. In this position paper, we introduce SOFSAT, a new frame- work that can support set-like operators for semantic analy- sis of natural text data with variable text representations. It includes three basic set-like operators|TextIntersect, Tex- tUnion, and TextDi erence|that are analogous to the cor- responding set operators intersection, union, and di erence, respectively, which can be applied to any representation of text data, and di erent representations can be combined via transformation functions that map text to and from any rep- resentation. Just as the set operators can be exibly com- bined iteratively to construct arbitrary subsets or supersets based on some given sets, we show that the correspond- ing text analysis operators can also be combined exibly to support a wide range of analysis tasks that may require di erent work ows, thus enabling an application developer to \program" a text mining application by using SOFSAT as an application programming language for text analysis. We discuss instantiations and implementation strategies of the framework with some speci c examples, present ideas about how the framework can be implemented by exploit- ing/extending existing techniques, and provide a roadmap for future research in this new direction.
KW - Semantic Analysis
KW - Semantic Operator for Text
KW - Text Mining
KW - Intelligent Text Analysis
U2 - 10.1145/3299986.3299990
DO - 10.1145/3299986.3299990
M3 - Article
SN - 1931-0145
VL - 20
SP - 21
EP - 30
JO - SIGKDD Explorations Newsletter
JF - SIGKDD Explorations Newsletter
IS - 2
ER -