Wizard-of-Oz (WoZ) experiments have long been a cornerstone of human-robot interaction (HRI) research, enabling the study of advanced interaction concepts before the underlying autonomy is technically feasible. In these setups, a hidden human operator—the “wizard”—controls the robot’s speech, gestures, and navigation, while participants believe they are engaging with an autonomous system. This method has been instrumental in probing how people respond to robots capable of nuanced communication, from natural language to expressive nonverbal cues.

Despite its value, the HRI community has lacked a general-purpose, publicly available WoZ tool. Most research groups have built bespoke systems tailored to specific experiments, leading to duplicated effort and high barriers for non-technical researchers. The WoZ4U project addresses this gap with an open-source, configurable interface designed for SoftBank’s Pepper robot, though its architecture is adaptable to other platforms.
Pepper is a humanoid social robot equipped with cameras, microphones, a mobile base, expressive LED eyes, and a chest-mounted tablet. Traditionally, controlling Pepper for WoZ studies required either the NAOqi Python API—powerful but code-intensive—or Choregraphe, a drag-and-drop programming environment ill-suited for rapid, adaptive responses. WoZ4U sidesteps these limitations by offering a browser-based graphical user interface (GUI) backed by a Python Flask server. This architecture is platform-independent, leveraging robot APIs to execute commands and stream sensor data, and can run on any modern operating system.
At the heart of WoZ4U is a YAML configuration file that defines experiment-specific parameters: speech phrases, gestures, tablet content, LED colors, and keyboard shortcuts. By editing this file, researchers can tailor the interface without altering source code, enabling repeatable setups and easy sharing between teams. Complex multimodal actions—such as synchronizing a spoken line with a gesture—can be bound to a single button, reducing the wizard’s cognitive load.
The GUI is divided into eight functional areas. Connection and monitoring tools allow selection of robot IP addresses, access to camera and microphone feeds, and tracking of tablet touch events. Motion controls map keyboard combinations to head, base, and hip movements, with an emergency stop on the Escape key. Autonomy settings manage Pepper’s built-in interactive behaviors, with alerts for conflicting configurations. Tablet controls display predefined images, videos, or web pages, with options for default fallbacks. Speech and audio controls adjust voice parameters and trigger predefined utterances or local audio files. Eye controls set LED colors or trigger limited animations. Gesture controls execute Pepper’s library of expressive motions, all configurable via the YAML file.
The system’s distributed design means the server and GUI can run separately from the robot, conserving onboard resources and allowing multi-screen layouts for different data streams. Standard web technologies (HTML, JavaScript, CSS) make customization straightforward. Researchers can, for example, open camera feeds in one tab, audio controls in another, and arrange them across displays for optimal workflow.
Initial trials involved four users—two with technical backgrounds and two without—controlling Pepper in varied scenarios, including a 25-minute board game interaction and a two-hour human-robot team exercise. Questionnaire feedback indicated that while installation could be more user-friendly, the interface was quickly understood, control was intuitive, and overall usability was rated highly. Users preferred WoZ4U over alternative tools for its flexibility and ease of configuration.
The architecture’s generality suggests potential beyond Pepper. Any robot with a compatible API, including those running the Robot Operating System (ROS), could adopt the same server–browser model. The configuration-driven approach aligns with research on end-user development, empowering non-programmers to design and run complex HRI studies.
WoZ4U also supports post-experiment analysis by recording visual and auditory data from the robot’s perspective. This capability is crucial for reviewing interactions, coding behaviors, and correlating wizard actions with participant responses. By lowering the technical threshold, WoZ4U opens WoZ experimentation to a broader segment of the HRI community, fostering more reproducible and comparable studies.
Available under the MIT license, WoZ4U’s source code and documentation provide a foundation for adaptation to other platforms or integration of additional sensing and perception displays. As the authors note, “The overall usability was rated high (Q8), and WoZ4U was mostly preferred over alternative tools (Q5).” With its combination of flexibility, accessibility, and multimodal control, WoZ4U represents a significant step toward standardizing and streamlining WoZ research in human-robot interaction.
