Sacha Lévy, Reihaneh Rabbany

Web Search and Data Mining

Abstract

How to accelerate the search for relevant topical keywords within a tweet corpus? Computational social scientists conducting topical studies employ large, self-collected or crowdsourced social media datasets such as tweet corpora. Comprehensive sets of relevant keywords are often necessary to sample or analyze these data sources. However, naively skimming through thousands of keywords can quickly become a daunting task. In this study, we present a web-based application to simplify the search for relevant topical hashtags in a tweet corpus. DisKeyword allows users to grasp high-level trends in their dataset, while iteratively labeling keywords recommended based on their links to prior labeled hashtags. We open-source our code under the MIT license.