When studying for a doctoral degree (PhD), candidates submit a thesis that provides a critical review of the current state of knowledge of the thesis subject as well as the student’s own contributions to the subject. The distinguishing criterion of doctoral graduate research is a significant and original contribution to knowledge.
Once accepted, the candidate presents the thesis orally. This oral exam is open to the public.
Abstract
Toxic comments in social media impact people’s willingness to engage online, which can limit the ways in which they benefit from positive aspects of social media. When users disengage, firms lose money on advertising and market share. Creating a safe and inclusive online environment is essential from both a business and social responsibility perspective. Creating this environment requires near immediate detection of toxic behavior which must be conducted using automated techniques. Existing toxicity detection techniques predominantly focus on performance metrics such as accuracy and F-score, often overlooking critical aspects such as throughput, computational costs, and the impact of false positives and false negatives on user engagement. Furthermore, these algorithms are typically evaluated in simple experimental setups, which do not accurately reflect the complexity of large-scale social media environments, making it difficult to predict their real-time effectiveness and cost implications.
To address these limitations, this thesis presents a comprehensive, multi-method approach for optimizing real-time toxicity detection in social media. This includes the development of a Profit-driven Simulation (PDS) framework for evaluating the real-time performance of various deep learning classifiers in complex social media environments. The PDS framework incorporates measures of effectiveness, computational efficiency, and user engagement, revealing that the choice of classifier should be tailored to the toxicity level of the environment to maximize profitability. Through rigorous experimentation, we demonstrate that high-throughput classifiers are most profitable in both low- and high-toxicity contexts, whereas classifiers with moderate accuracy and acceptable throughput excel in medium-toxicity scenarios.
In parallel, the thesis tackles the challenge of imbalanced datasets in toxicity detection by introducing a novel approach to augment toxic language data. Leveraging Reinforcement Learning with Human Feedback (RLHF) and Proximal Policy Optimization (PPO), this methodology involves fine-tuning Large Language Models (LLMs) to generate balanced and diverse toxic datasets through sentence-level text augmentation. This innovative approach addresses the scarcity of toxic samples by generating high-quality, semantically consistent paraphrases. The resulting datasets significantly improve the performance of toxicity classifiers, enhancing their robustness in identifying minority class instances and ensuring a more equitable detection process.
Moreover, this work proposes a Proximal Policy Optimization-based Cascaded Inference System (PPO-CIS) for managing toxic content. This framework dynamically assigns classifiers based on their performance and computational cost, employing high-throughput classifiers for initial filtering and more accurate classifiers for final decision-making. The PPO-CIS adapts to varying data volumes and classifier performances, ensuring efficient and accurate content moderation. By integrating a variety of policy-based deep reinforcement learning techniques and sophisticated reward functions, the PPO-CIS achieves optimal balance between accuracy and processing speed, effectively reducing the workload on human moderators and improving user experience.
Extensive experiments and evaluations on multiple datasets, including Kaggle-Jigsaw and ToxiGen, highlight the significant improvements achieved by the proposed frameworks and methodologies. These evaluations encompass various classifiers, levels of toxicity, and reinforcement learning policies, demonstrating notable enhancements in processing time, detection accuracy, and overall user satisfaction.
This thesis contributes to the development of more scalable, cost-effective, and adaptive toxicity detection systems in social media, enhancing the safety and inclusiveness of online environments while reducing the burden on human moderators.