Social Policy & Administration

Hate Speech Detection Research in South Asian Languages: A Survey of Tasks, Datasets and Methods

Hate Speech Detection Research in South Asian Languages: A Survey of Tasks, Datasets and Methods

Hate speech research in South Asian languages covers definitions, tasks, and computational approaches, highlighting challenges and opportunities.

Authors

Deepawali Sharma, Department of Computer Science, Banaras Hindu University, Varanasi, India; School of Computer Science Engineering and Technology, Bennett University, Noida, India

Tanusree Nath, Computer Science, Banaras Hindu University, Varanasi, India

Vedika Gupta, Associate Professor, Jindal Global Business School, O.P. Jindal Global University, Sonipat, Haryana, India

Vivek Kumar Singh, Computer Science, Banaras Hindu University, Varanasi, India; Computer Science, University of Delhi, New Delhi, India

Summary

Social media has over the years emerged as a powerful platform for communicating and sharing views, thoughts, and opinions. However, at the same time it is being abused by certain individuals to spread hate against individuals, communities, religions, and so on. Such content can lead to serious issues of mental health, online well-being, and social order. Therefore, it is very important to have automated methods and approaches for detecting such content from the large volume of posts in social media. Recently there has been several efforts to develop computational approaches toward this end, however, most of these efforts are directed toward content in English language. Only recently studies have started focusing on low resource languages, including those from South Asia.

This article attempts to present a detailed and comprehensive survey of hate speech related research in South Asian languages. The various definitions and terms related to Hate speech in different social media platforms are discussed first. The different tasks in the hate speech research, available datasets, and the popular computational approaches used in the South-Asian languages are surveyed in detail. Major patterns identified and the practical implications are presented and discussed, along with a discussion of challenges and opportunities of further research in the area.

Published in: ACM Transactions on Asian and Low-Resource Language Information Processing

To read the full article, please click here.