Towards the Use of 3D Object Recognition As An Input To A Procedural Audio System in Virtual Environments
---------------------------------------------------------------
1. To what extent can neural network approaches to 3D object detection be used to drive a procedural audio system in a virtual environment?
Objective: Investigate the feasibility and effectiveness of using neural network-based object detection algorithms, such as YOLOv8, to dynamically assign sonic characteristics to objects in virtual environments using Pure Data as a procedural audio system.
2. What are the subjective perceptions of players regarding the presence, realism, and plausibility of procedurally synthesized sounds compared to recorded sounds in virtual environments?
Objective: Conduct a subjective evaluation study to gather player feedback and perceptions regarding the presence, realism, and plausibility of sound effects synthesized using procedural audio systems compared to recorded sounds.
Research Questions & Objectives
Abstract:
Despite advancements in visual fidelity for virtual environments such as procedurally generated worlds, audio synthesis during runtime remains an area of ongoing research.
This thesis explores the utilization of procedural audio (PA) within virtual environments, a technique often overshadowed despite its potential to enhance immersive experiences, particularly in video games. While neural network-based audio synthesis shows promise, it is unsuitable for complex virtual environments due to computational resources, and sound designers remain skeptical of its use.
An innovative approach that integrates neural network-based 3D object detection with PA synthesis to generate object-specific sounds in real time is proposed. By employing an object detection network (YOLO) to classify 3D game objects and apply corresponding audio characteristics, this research addresses the need for accessible tools to prototype and enhance auditory realism and presence in virtual environments, regardless of sound design expertise.
Audio generated through PA techniques is often considered less "realistic" and "plausible" than recorded audio. A listening study conducted as part of this research evaluates the realism and plausibility of PA compared to pre-recorded samples. This study validates the proposed system and utilizes the measurement of realism and plausibility of PA synthesized audio within the context of virtual environments, an area previously unexplored in the field. Presence was also measured as a way to gauge immersivity. While participants perceived pre-recorded sounds as more realistic, PA demonstrated higher plausibility. Presence, however, remained consistent across both audio types.
This study contributes to advancing PA techniques, emphasizing their potential to elevate presence within virtual environments. By providing accessible tools for sound design and enabling object-based audio synthesis, this research paves the way for enhancing immersion in virtual experiences.
Keywords: Procedural Audio, Game Audio, Game Development, Signal Processing, Sound Synthesis, Convolutional Neural Networks, Object Detection, Pure Data