Marpaung, Ricky Yordan and Asri, Yessy and Kuswardani, Dwina (2025) IMPLEMENTASI NAMED ENTITY RECOGNATION PADA ULASAN PENGGUNA APLIKASI PLN MOBILE MENGGUNAKAN INDOBERT. Diploma thesis, ITPLN.
202131079_Ricky Yordan Marpaung_Revisi_Skrips_Ricky Jordan Marpaun.pdf
Restricted to Registered users only
Download (4MB)
Abstract
Perkembangan teknologi informasi mendorong kemajuan Natural Language Processing (NLP), termasuk di Indonesia. Salah satu komponen penting dalam NLP adalah korpus, yang digunakan sebagai dasar pelatihan dan evaluasi model bahasa. Namun, korpus umum seperti Indo4B belum sepenuhnya mencakup istilah khusus dalam domain kelistrikan, khususnya pada ulasan aplikasi PLN Mobile yang banyak mengandung bahasa informal, singkatan, dan kesalahan ejaan. Penelitian ini membangun korpus pengaduan dan pelayanan kelistrikan dari 522.000 ulasan PLN Mobile (2020 2024) dan membandingkan efektivitasnya dengan korpus umum Indo4B dalam tugas Named Entity Recognition (NER) menggunakan model IndoBERT. Proses penelitian mencakup pengumpulan data melalui web scraping, preprocessing teks (normalisasi, tokenisasi, koreksi ejaan), pelabelan BIO berbasis kata kunci, dan fine-tuning IndoBERT. Evaluasi dilakukan menggunakan metrik berbasis skema BIO (precision, recall, F1-score) dengan pustaka SeqEval. Hasil pengujian menunjukkan bahwa kedua model memiliki kinerja tinggi: korpus spesifik mencapai weighted average F1-score sebesar 0,9921 dengan keunggulan pada kategori inti domain kelistrikan, sedangkan model berbasis Indo4B mencapai 0,9930 dengan keunggulan pada kategori yang beragam secara linguistik. Perbedaan ini mengindikasikan bahwa korpus domain memberikan keunggulan pada konteks kelistrikan, sedangkan korpus umum lebih baik dalam generalisasi.
Advances in information technology have driven progress in Natural Language Processing (NLP), including in Indonesia. One important component of NLP is the corpus, which is used as the basis for training and evaluating language models. However, general corpora such as Indo4B do not fully cover specialised terms in the field of electricity, particularly in reviews of the PLN Mobile app, which contain a lot of informal language, abbreviations, and spelling mistakes. This study builds a corpus of electricity related complaints and services from 522,000 PLN Mobile reviews (2020–2024) and compares its effectiveness with the general Indo4B corpus in the Named Entity Recognition (NER) task using the IndoBERT model. The research process includes data collection via web scraping, text preprocessing (normalisation, tokenisation, spelling correction), keyword-based BIO labelling, and IndoBERT fine-tuning. Evaluation is conducted using BIO-based metrics (precision, recall, F1-score) with the SeqEval library. Test results show that both models perform highly: the domain-specific corpus achieves a weighted average F1-score of 0.9921 with an advantage in the core categories of the electrical domain, while the Indo4B-based model achieves 0.9930 with an advantage in linguistically diverse categories. This difference indicates that the domain specific corpus provides an advantage in electrical engineering contexts, while the general corpus performs better in terms of generalisation.
| Item Type: | Thesis (Diploma) |
|---|---|
| Uncontrolled Keywords: | NLP, korpus, PLN Mobile, IndoBERT, Named Entity Recognition NLP, corpus, PLN Mobile, IndoBERT, Named Entity Recognition |
| Subjects: | Skripsi Bidang Keilmuan > Teknik Informatika |
| Divisions: | Fakultas Telematika Energi > S1 Teknik Informatika |
| Depositing User: | Sudarman |
| Date Deposited: | 13 Oct 2025 02:39 |
| Last Modified: | 13 Oct 2025 02:39 |
| URI: | https://repository.itpln.ac.id/id/eprint/2089 |
