Information Extraction from Social Media: A Hands-On Tutorial on Tasks, Data, and Open Source Tools

Shubhanshu Mishra, Rezvaneh Rezapour, Jana Diesner

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Information extraction (IE) is a common sub-area of natural language processing that focuses on identifying structured data from unstructured data. The community of Information Retrieval (IR) relies on accurate and high-performance IE to be able to retrieve high quality results from massive datasets. One example of IE is to identify named entities in a text, e.g., “Barack Obama served as the president of the USA”. Here, Barack Obama and USA are named entities of types of PERSON and LOCATION, respectively. Another example is to identify sentiment expressed in a text, e.g., “This movie was awesome”. Here, the sentiment expressed is positive. Finally, identifying various linguistic aspects of a text, e.g., part of speech tags, noun phrases, dependency parses, etc., which can serve as features for additional IE tasks. This tutorial introduces participants to a) the usage of Python based, open-source tools that support IE from social media data (mainly Twitter), and b) best practices for ensuring the reproducibility of research. Participants will learn and practice various semantic and syntactic IE techniques that are commonly used for analyzing tweets. Additionally, participants will be familiarized with the landscape of publicly available tweet data, and methods for collecting and preparing them for analysis. Finally, participants will be trained to use a suite of open source tools (SAIL for active learning, TwitterNER for named entity recognition3, and SocialMediaIE for multi task learning), which utilize advanced machine learning techniques (e.g., deep learning, active learning with human-in-the-loop, multi-lingual, and multi-task learning) to perform IE on their own or existing datasets. Participants will also learn how social context can be integrated in Information Extraction systems to make them better. The tools introduced in the tutorial will focus on the three main stages of IE, namely, collection of data (including annotation), data processing and analytics, and visualization of the extracted information. More details can be found at: https://socialmediaie.github.io/tutorials/.

Original languageEnglish (US)
Title of host publicationAdvances in Information Retrieval - 44th European Conference on IR Research, ECIR 2022, Proceedings
EditorsMatthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, Vinay Setty
PublisherSpringer
Pages589-596
Number of pages8
ISBN (Print)9783030997380
DOIs
StatePublished - 2022
Event44th European Conference on Information Retrieval, ECIR 2022 - Stavanger, Norway
Duration: Apr 10 2022Apr 14 2022

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13186 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference44th European Conference on Information Retrieval, ECIR 2022
Country/TerritoryNorway
CityStavanger
Period4/10/224/14/22

Keywords

  • Information extraction
  • Machine learning bias
  • Multi-task learning
  • Natural language processing
  • Social media data
  • Twitter

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Information Extraction from Social Media: A Hands-On Tutorial on Tasks, Data, and Open Source Tools'. Together they form a unique fingerprint.

Cite this