Abstract
Technologies for handling massive structured or semi-structured data have been researched extensively in database communities. However, the real-world data are largely in the form of unstructured text, posing a great challenge to their management and analysis as well as their integration with semi-structured databases. Recent developments of deep learning methods and large pre-trained language models (PLMs) have revolutionized text mining and processing and shed new light on structuring massive text data and building a framework for integrated (i.e., structured and unstructured) data management and analysis. In this tutorial, we will focus on the recently developed text mining approaches empowered by PLMs that can work without relying on heavy human annotations. We will present an organized picture of how a set of weakly supervised methods explore the power of PLMs to structure text data, with the following outline: (1) an introduction to pre-trained language models that serve as new tools for our tasks, (2) mining topic structures: unsupervised and seed-guided methods for topic discovery from massive text corpora, (3) mining document structures: weakly supervised methods for text classification, (4) mining entity structures: distantly supervised and weakly supervised methods for phrase mining, named entity recognition, taxonomy construction, and structured knowledge graph construction, and (5) towards an integrated information processing paradigm.
Original language | English (US) |
---|---|
Pages (from-to) | 851-854 |
Number of pages | 4 |
Journal | Advances in Database Technology - EDBT |
Volume | 26 |
Issue number | 3 |
DOIs | |
State | Published - Mar 20 2023 |
Event | 26th International Conference on Extending Database Technology, EDBT 2023 - Ioannina, Greece Duration: Mar 28 2023 → Mar 31 2023 |
ASJC Scopus subject areas
- Information Systems
- Software
- Computer Science Applications