SWEET : Weakly Supervised Person Name Extraction for Fighting Human Trafficking
Javin Liu, Hao Yu, Vidya Sujaya, Pratheeksha Nair, Kellin Pelrine, Reihaneh Rabbany
Conference on Empirical Methods in Natural Language Processing
Abstract
In this work, we propose a weak supervision pipeline S WEET : S upervise W eakly for E ntity E xtraction to fight T rafficking for extracting person names from noisy escort advertisements. Our method combines the simplicity of rule-matching (through antirules , i.e., negated rules) and the generalizability of large language models fine-tuned on benchmark, domain-specific and synthetic datasets, treating them as weak labels. One of the major challenges in this domain is limited labeled data. S WEET addresses this by obtaining multiple weak labels through labeling functions and effectively aggregating them. S WEET outperforms the previous supervised SOTA method for this task by 9% F1 score on domain data and better generalizes to common benchmark datasets. Furthermore, we also release HTG EN , a synthetically generated dataset of escort advertisements (built using ChatGPT) to facilitate further research within the community.