PASARIBU, FAUZAN FUADI and Asri, Yessy and Kuswardani, Dwina (2025) Implementasi Ner Pada Ulasan Pengguna Aplikasi Pln Mobile Menggunakan Fasttext Dan Bilstm-Crf. Diploma thesis, ITPLN.
202131067_Fauzan Fuadi Pasaribu_Revisi_Skrips_Fauzan Fuadi Pasarib.pdf
Restricted to Registered users only
Download (26MB)
Abstract
Keterbatasan korpus domain-spesifik masih menjadi tantangan utama pada Named Entity Recognition (NER) berbahasa Indonesia. Penelitian ini mengembangkan sistem NER untuk ulasan pengguna PLN Mobile dengan pendekatan FastText subword embedding dan arsitektur BILSTM-CRF. Korpus domain pengaduan dan pelayanan kelistrikan dibangun dari 202.532 ulasan bersih hasil pembersihan dari sekitar 437 ribu ulasan Google Play Store (periode 2021-2024) melalui proses pembersihan teks, normalisasi, dan koreksi ejaan. Model ini dibandingkan dengan model serupa yang dilatih pada korpus umum Indo4B. Hasil evaluasi menunjukkan bahwa model berbasis korpus domain mencapai best epoch pada epoch ke-4 dengan validation loss minimum 0,0171, micro-FI 0,9861, dan macro-FI 0,6508. Sebaliknya, model berbasis korpus umum memerlukan epoch ke-16 untuk mencapai validation loss minimum 0,0260 dengan micro- FI 0,9635 dan macro-FI 0,4369. Analisis confusion matrix memperlihatkan bahwa korpus domain mampu mengenali 33 label entitas, termasuk entitas mayor seperti B-44 Fitur Payment (2594 benar) dan B-4D Fitur Pengaduan (1688 benar), serta entitas minor seperti B-3B Promo Stimulus (117 benar) dan B-3F Website Error (9 benar). Sementara itu, model berbasis Indo4B hanya mengenali 19 label dengan baik, dan menunjukkan penurunan signifikan pada entitas minor seperti B-3B Promo Stimulus (118 benar) dan B- 3F Website Error (9 benar) yang sebagian besar gagal diprediksi. Temuan ini menegaskan bahwa embedding berbasis domain dan model sekuensial adaptif BiLSTM-CRF lebih unggul dalam ekstraksi entitas pada ulasan pelanggan sektor ketenagalistrikan, terutama karena mampu mengenali lebih banyak label dan menjaga performa pada entitas minor yang sering terabaikan oleh model berbasis korpus umum.
The limited availability of domain-specific corpora remains a major challenge in Named Entity Recognition (NER) for the Indonesian language. This study develops an NER system for PLN Mobile user reviews using the FastText subword embedding approach and a BiLSTM-CRF architecture. The domain corpus of electricity complaint and service data was built from 202,532 cleaned reviews, obtained after processing approximately 437 thousand Google Play Store reviews (2021-2024) through text cleaning, normalization, and spelling correction. The model was compared with a similar model trained on the general-domain Indo4B corpus. Evaluation results show that the domain-based model achieved its best epoch at epoch 4, with a minimum validation loss of 0.0171, a micro-Fl score of 0.9861, and a macro-Fl score of 0.6508. In contrast, the general-domain model required 16 epochs to reach its minimum validation loss of 0.0260, with a micro-FI score of 0.9635 and a macro-Fl score of 0.4369. Confusion matrix analysis revealed that the domain-specific corpus was able to recognize 33 entity labels, including major entities such as B-44 Payment Feature (2594 correct predictions) and B-4D Complaint Feature (1688 correct predictions), as well as minor entities such as B- 3B Promo Stimulus (117 correct predictions) and B-3F Website Error (9 correct predictions). Meanwhile, the Indo4B-based model was able to effectively recognize only 19 labels and showed a significant decline in performance on minor entities such as B- 3B Promo Stimulus (118 instances) and B-3F Website Error (9 instances), most of which failed to be correctly predicted. These findings confirm that domain-specific embeddings combined with the adaptive BiLSTM-CRF sequential model are more effective for entity extraction in customer service reviews within the electricity sector, particularly because they can recognize a greater number of labels and maintain performance on minor entities that are often overlooked by models trained on general-domain corpora.
| Item Type: | Thesis (Diploma) |
|---|---|
| Uncontrolled Keywords: | NER, FastText, BILSTM-CRF, PLN Mobile, korpus domain-spesifik NER, FastText, BiLSTM-CRF, PLN Mobile, domain-specific corpus |
| Subjects: | Skripsi Bidang Keilmuan > Teknik Informatika |
| Divisions: | Fakultas Telematika Energi > S1 Teknik Informatika |
| Depositing User: | Sudarman |
| Date Deposited: | 13 Oct 2025 03:50 |
| Last Modified: | 13 Oct 2025 03:50 |
| URI: | https://repository.itpln.ac.id/id/eprint/2108 |
