Robots Learn to Co‑Author Stories with Humans

Storytelling has long been a cornerstone of human culture, but the rise of social robots is opening new possibilities for interactive, co‑created narratives. Recent research has introduced a collaborative story generation framework in which a human and an AI agent—potentially embodied in a robot—take turns adding lines to an unfolding tale. Unlike traditional scripted robot performances, this approach generates content on demand, blending human creativity with machine‑generated suggestions.

Image Credit to creativecommons.org

The system is built around a large‑scale neural language model, specifically the 774‑million‑parameter GPT‑2‑large, tuned on a curated dataset of writing prompts and short stories. A ranking module sits atop the generator, sampling multiple candidate continuations and selecting the most suitable one. This sample‑and‑rank strategy allows fine‑tuning of the trade‑off between narrative quality and computational latency—critical for maintaining fluid human‑robot interaction.

The game flow begins with the AI delivering a “story starter” from a curated set. The human responds with a continuation, after which the AI reads the entire story so far and contributes its own line. This alternation continues until the human decides to end the story. The design minimizes restrictions on the human’s input, though practical factors such as speech recognition accuracy or text input conventions can influence output quality.

To train and evaluate the system, researchers collected 2,200 collaborative stories via crowdsourcing. These stories alternated between freeform human input and choice‑based selections from ten AI‑generated continuations, one of which was an incoherent distractor to ensure attention. The generator was trained on a custom‑collected r/WritingPrompts dataset, filtered and preprocessed for GPT‑2, supplemented with BookCorpus to reduce overfitting.

The ranker, also GPT‑2‑based, scores each candidate continuation in context and selects the highest‑scoring option. This architecture avoids the repetitive or degenerate text often produced by beam search in open‑ended tasks. Quality control heuristics filter out low‑content or malformed outputs before ranking.

Evaluation involved both objective and subjective measures. “Story continuation acceptability” compared the ranker’s top choice to human‑selected continuations, showing the tuned‑and‑ranked system outperforming untuned and unranked baselines. Varying the number of candidates revealed a clear quality‑latency trade‑off: more candidates improved acceptability but increased response time. The optimal point balanced narrative quality with the responsiveness needed for live interaction.

Subjective assessments adapted the ACUTE‑Eval dialogue evaluation framework to storytelling, asking participants to compare pairs of stories on engagingness, interestingness, humanness, and overall preference. In both self‑chat (AI talking to itself) and human‑AI sessions, the tuned‑and‑ranked model was consistently preferred over baselines.

An elicitation survey explored adaptation to Haru, a tabletop social robot with expressive eyes, a mouth display, and multiple degrees of freedom for emotive gestures. University participants favored collaborative storytelling over other modes, preferred short stories, and leaned toward positive moods and genres such as fantasy, mystery, and science fiction. They valued an emotive robot over a disembodied voice and wanted vocal delivery to match story content. While most felt emoting based on story events would enhance the experience, few rated Haru’s current expressivity as fully effective, highlighting an area for refinement.

The research also piloted sentiment steering, enabling players to influence story mood. Inspired by GeDi’s token‑level control, the team implemented a sentence‑level method combining the ranker’s score with a GPT‑2‑based sentiment classifier trained on TweetEval data. This approach successfully biased stories toward positive or negative sentiment without severely degrading coherence, and proved less prone to topic drift than token‑level steering. Human evaluations indicated that low‑weight positive steering offered the best balance of mood control and narrative quality.

The work underscores both the promise and challenges of AI‑assisted storytelling in robotics. The system can generate contextually appropriate, creative contributions, but lacks deep world modeling, which can lead to abrupt shifts or inconsistencies. Explicit narrative structures and improved emotive delivery in robots like Haru could further enhance the experience. For engineers and roboticists, the study offers a blueprint for integrating large‑scale language models into interactive systems, balancing computational constraints with the nuanced demands of human‑facing creative tasks.

spot_img

More from this stream

Recomended

Discover more from Aerospace and Mechanical Insider

Subscribe now to keep reading and get access to the full archive.

Continue reading