Classification of Hate Speech in TikTok Social Media Comments Using Naive Bayes Algorithm and TF-IDF Weighting

Putri Febi Utami; Dwi Krisbiantoro; Irfan Santiko; Andi Dwi Riyanto

doi:10.35671/jmtt.v4i3.102

Authors

Putri Febi Utami Universitas Amikom Purwokerto
Dwi Krisbiantoro Universitas Amikom Purwokerto
Irfan Santiko Universitas Amikom Purwokerto
Andi Dwi Riyanto Universitas Amikom Purwokerto

DOI:

https://doi.org/10.35671/jmtt.v4i3.102

Keywords:

Hate speech, TikTok, Multinomial Naive Bayes, TF-IDF, Text classification

Abstract

This research focuses on the classification of hate speech in Indonesian Tik Tok comments. Tik Tok, as a social media platform with high interaction intensity, generates a large volume of comments with diverse linguistic characteristics, including the use of formal and informal language. This linguistic variation poses challenges in the content moderation process, particularly in automatically identifying hate speech. The research dataset is secondary data obtained by combining public datasets and scraped Tik Tok comments, with an initial total of 5,698 comments. The collected data represent general user comments with variations in formal and informal language. To improve data quality, pre-processing stages were carried out including text cleaning, tokenization, normalization, stop-word removal, and stemming. After pre-processing, 4,542 comments were obtained that were suitable for use in the modeling process. Experimental results show that the Multinomial Naïve Bayes model with TF-IDF weighting is able to classify hate speech with high performance. Model accuracy reached 93% before parameter optimization and increased to 95% after hyperparameter tuning with an alpha value of 0.5. The confusion matrix results show a relatively low misclassification rate, although the class distribution in the dataset still shows imbalance. The findings of this study indicate that the Multinomial Naïve Bayes approach is effective in recognizing linguistic patterns of hate speech in Indonesian TikTok comments, including text with informal language characteristics.

Downloads

Download data is not yet available.

References

We Are Social, “Digital 2024 Indonesia,” 2024.

A. W. Utami and I. D. Arianto, “Perilaku Cyberbullying pada Media Sosial TikTok (Analisis Isi Kualitatif Perilaku Cyberbullying di Kolom Komentar dalam Akun TikTok @ofp24),” Ilmu Komun., vol. VII, no. 2, 2024.

R. M. Yazid, F. R. Umbara, and P. N. Sabrina, “Deteksi Ujaran Kebencian dengan Metode Klasifikasi Naïve Bayes dan Metode N-Gram pada Dataset Multi-Label Twitter Berbahasa Indonesia,” vol. 2, pp. 46–52, 2022.

A. Ariska and M. Kamayani, “Indonesian Journal of Computer Science,” vol. 13, no. 1, pp. 4825–4836, 2024.

P. M. S. Ardinata, A. A. J. Permana, and I. N. S. W. Wijaya, “Identifikasi Dan Normalisasi Teks Slang Dengan Fasttext Pada Twitter Dalam Bahasa Indonesia,” J. Pendidik. Teknol. dan Kejuru., vol. 21, no. 1, pp. 34–44, 2024.

S. Nabila, Kharisma Agustya Zahra Salsabilla, Nathania Trixie Aryanti, Vira Adhelia Andjani, Alfina Zahrah Umardi, and Eni Nurhayati, “Analisis Ujaran Kebencian dalam Kolom Komentar pada Media Sosial X, Tik Tok, dan Instagram,” SOSMANIORA J. Ilmu Sos. dan Hum., vol. 2, no. 4, pp. 645–651, 2023, doi: 10.55123/sosmaniora.v2i4.2997.

J. Amalia, “Membangun Slang Dictionary Untuk Normalisasi Teks Menggunakan Pre-Trained Fasttext Model,” JSR Jar. Sist. Inf. Robot., vol. 6, no. 2, pp. 250–256, 2022, doi: 10.58486/jsr.v6i2.184.

N. E. Febriyanty, M. A. Hariyadi, and C. Crysdian, “Hoax Detection News Using Naïve Bayes and Support Vector Machine Algorithm,” Int. J. Adv. Data Inf. Syst., vol. 4, no. 2, pp. 191–200, 2023, doi: 10.25008/ijadis.v4i2.1306.

M. R. Ningsih, “Sentiment Analysis on SocialMedia Using TF-IDF Vectorization and H2O Gradient Boosting for Student Anxiety Detection,” vol. 11, no. 4, pp. 1137–1144, 2024, doi: 10.15294/sji.v11i4.20582.

I. M. Karo Karo, R. Romia, S. Dewi, and P. M. Fadilah, “Hoax Detection on Indonesian Tweets using Naïve Bayes Classifier with TF-IDF,” J. Inf. Syst. Res., vol. 4, no. 3, pp. 914–919, 2023, doi: 10.47065/josh.v4i3.3317.

A. Gerliandeva, Y. Chrisnanto, and H. Ashaury, “Optimasi Klasifikasi Sentimen pada Komentar Online menggunakan Multinomial Naïve Bayes dan Ekstraksi Fitur TF-IDF serta N-grams,” J. Pekommas, vol. 9, no. 2, pp. 260–272, 2024, doi: 10.56873/jpkm.v9i2.5585.

Zaenal, Y. Salim, and L. Budi Ilmawan, “Buletin Sistem Informasi dan Teknologi Islam Analisis Sentimen terhadap Komentar Negatif di Media Sosial Facebook dengan Metode Klasifikasi Naïve Bayes INFORMASI ARTIKEL ABSTRAK,” Bul. Sist. Inf. dan Teknol. Islam, vol. 1, no. 4, pp. 259–265, 2020.

W. A. Luqyana, I. Cholissodin, and R. S. Perdana, “Analisis Sentimen Cyberbullying Pada Komentar Instagram dengan Metode Klasifikasi Support Vector Machine dengan mengimplementasikan algoritma Lexicon Based Features. Berdasarkan,” J. Pengemb. Teknol. Inf. dan Ilmu Komput. Univ. Brawijaya, vol. 2, no. 11, pp. 4704–4713, 2018.

R. M. Yazid, F. R. Umbara, and P. N. Sabrina, “Deteksi Ujaran Kebencian dengan Metode Klasifikasi Naïve Bayes dan Metode N-Gram pada Dataset Multi-Label Twitter Berbahasa Indonesia,” Informatics Digit. Expert, vol. 4, no. 2, pp. 46–52, 2024, doi: 10.36423/index.v4i2.894.

F. T. Tinanda, H. Sujaini, and H. Nasution, “Comparison Analysis of Naive Bayes and K-Nearest Neighbor Algorithms in Classifying Language Styles in Indonesian Texts,” J. Syst. Comput. Eng., vol. 6, no. 4, pp. 318–328, 2025, doi: 10.61628/jsce.v6i4.2158.

K. S. Chong and N. Shah, “Comparison of Naive Bayes and SVM Classification in Grid-Search Hyperparameter Tuned and Non-Hyperparameter Tuned Healthcare Stock Market Sentiment Analysis,” Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 12, pp. 90–94, 2022, doi: 10.14569/IJACSA.2022.0131213.

B. A. Prameswari, H. S. Oktaviani, T. R. Wicaksono, B. P. Leonard, S. Achmad, and R. Sutoyo, “Indonesian TikTok Cyberbullying Comments Dataset (Dataset),” IEEE 8th International Conference on Recent Advances and Innovations in Engineering (ICRAIE), 1–7., 2023.

M. Firdaus and P. Nur Miftahur Rizki, “BIJAKAWEB: Platform Berbasis Web Untuk Deteksi Hate Speech Pada Komentar Berita Bahasa Indonesia,” J. Teknol. Inf. dan Ilmu Komput., vol. 11, no. 4, pp. 939–948, 2024, doi: 10.25126/jtiik.1148719.

N. T. Sri, K. Geethika, N. Kotha, and H. Kandoori, “Modified TF - IDF with Machine Learning Classifier for Hate Speech Detection on Twitter,” vol. 14, no. 03, pp. 978–984, 2023.

E. K. Andana, M. Othman, and R. Ibrahim, “Comparative analysis of text classification using naive bayes and support vector machine in detecting negative content in Indonesian twitter,” Int. J. Adv. Trends Comput. Sci. Eng., vol. 8, no. 1.3 S1, pp. 356–362, 2019, doi: 10.30534/ijatcse/2019/6481.32019.

F. Khairati and H. Putra, “Prediksi Kuantitas Penggunaan Obat pada Layanan Kesehatan Menggunakan Algoritma Backpropagation Neural Network,” J. Sistim Inf. dan Teknol., vol. 4, pp. 128–135, 2022, doi: 10.37034/jsisfotek.v4i3.158.

H. Hendera, R. Mulyani, and A. A. Iftikhar, “Evaluasi Implementasi Pelayanan Farmasi Klinis Di Puskesmas: Studi Kasus Di Kecamatan Banjarmasin Utara,” J. Insa. Farm. Indones., vol. 7, no. 2, pp. 77–86, 2024, doi: 10.36387/jifi.v7i2.2104.

P. Sari, Efan, and R. Syahri, “Algoritma K-Means Clustering: Sebuah Studi Literatur,” J. Inform., vol. 1, pp. 1--7, 2024.

P. Apriyani, A. R. Dikananda, and I. Ali, “Penerapan Algoritma K-Means dalam Klasterisasi Kasus Stunting Balita Desa Tegalwangi,” Hello World J. Ilmu Komput., vol. 2, no. 1, pp. 20–33, 2023, doi: 10.56211/helloworld.v2i1.230.