SOPIAN, ACHMAD MUJADDID and Putra, Rakhmadi Irfansyah and Suliyanti, Widya N. (2025) PENERAPAN ALGORITMA PROXIMAL POLICY OPTIMIZATION PADA AGEN DALAM GAME VIZDOOM UNTUK REINFORCEMENT LEARNING BERBASIS DATA VISUAL. Diploma thesis, ITPLN.
202031033_AchmadMujaddidSopian_Revisi_Skripsi_ACHMAD MUJADDID Sopi 1.pdf
Restricted to Registered users only
Download (3MB)
Abstract
Pengembangan agen kecerdasan buatan dalam lingkungan permainan 3D dinamis seperti ViZDoom menghadapi tantangan kompleksitas lingkungan dan kelangkaan reward yang signifikan. Metode reinforcement learning konvensional sering kali kurang stabil dalam menangani lingkungan dengan informasi visual yang tinggi dan tindakan yang berkelanjutan. Penelitian ini bertujuan menerapkan algoritma Proximal Policy Optimization (PPO) dengan arsitektur CNN (Convolutional Neural Network) untuk melatih agen dalam skenario "deadly corridor" ViZDoom serta menganalisis peningkatan kinerjanya melalui metrik reward. Lingkungan kustom dibangun menggunakan Gymnasium dengan integrasi reward Shaping berbasis perubahan jarak ke tujuan (vest) dan penalti kerusakan. Pelatihan dilakukan selama 610.000 langkah dengan hyperparameter teroptimasi (learning rate=0.00001, n_steps=8192). Hasil evaluasi menunjukkan peningkatan signifikan pada total reward agen dari rata-rata -6.91 (sebelum pelatihan) menjadi 979.85 (setelah pelatihan), dengan kemampuan bertahan hingga 85.24 langkah per episode. Analisis TensorBoard mengungkapkan stabilitas pelatihan melalui penurunan policy loss sebesar 57% dan konsistensi nilai explained variance (0.47–0.56). Penelitian ini membuktikan bahwa PPO dengan arsitektur CNN efektif dalam meningkatkan kinerja agen di lingkungan ViZDoom yang kompleks, meskipun terdapat tantangan seperti variansi reward tinggi. Temuan ini memberikan dasar untuk pengembangan agen otonom yang adaptif dalam simulasi 3D berbasis visual.
The development of artificial intelligence agents in dynamic 3D game environments like ViZDoom faces challenges due to environmental complexity and significant reward sparsity. Conventional reinforcement learning methods often struggle with stability in handling visually rich environments and continuous actions. This research aims to apply the Proximal Policy Optimization (PPO) algorithm with a Convolutional Neural Network (CNN) architecture to train an agent in the ViZDoom "deadly corridor" scenario and analyze its performance improvements through reward metrics. A custom environment was built using Gymnasium with reward Shaping integration based on the change in distance to the target (vest) and damage penalties. Training was conducted over 610,000 steps with optimized hyperparameters (learning rate=0.00001, n_steps=8192). Evaluation results showed a significant increase in the agent's total reward from an average of -6.91 (before training) to 979.85 (after training), with a survival ability of up to 85.24 steps per episode. TensorBoard analysis revealed training stability through a 57% reduction in policy loss and consistency in explained variance values (0.47–0.56). This research demonstrates that PPO with CNN architecture is effective in improving agent performance in the complex ViZDoom environment, despite challenges such as high reward variance. These findings provide a foundation for the development of adaptive autonomous agents in visual-based 3D simulations.
| Item Type: | Thesis (Diploma) |
|---|---|
| Uncontrolled Keywords: | Proximal Policy Optimization, ViZDoom, Reinforcement learning, Data Visual, CNN Policy. Proximal Policy Optimization, ViZDoom, Reinforcement learning, Data Visualization , CNN Policy |
| Subjects: | Skripsi Bidang Keilmuan > Teknik Informatika |
| Divisions: | Fakultas Telematika Energi > S1 Teknik Informatika |
| Depositing User: | Sudarman |
| Date Deposited: | 09 Oct 2025 06:04 |
| Last Modified: | 09 Oct 2025 06:04 |
| URI: | https://repository.itpln.ac.id/id/eprint/1989 |
