NPC Conversations in the Metaverse 🚀

How in the humanverse do we build conversational characters for the metaverse?

You walk into a bar. 🍺 On your left, you see a lady on her last cigarette ranting about how her ex-husband needs to quit whining about how much the alimony is.

On your right, a man with a lewd neckerchief downs a bottle of Jack whilst idly playing with his six-shooter, unaware that the safety catch is off. 😱

And front-and-centre, the barman pours the fourth round of Moonshine for a group of lawyers put out of business by a recent court ruling that declared their profession immoral. 😧

You've walked into a virtual Western saloon, where all the characters have their own unique stories, and you, the grand hero-slash-villain-slash-whatever-you-want-to-be, will become an integral part of their journey.

🤠 Project Howdy: Charisma's creatively-named experience where you're dropped into a Western saloon, able to talk to the characters and immersed in their WestWorl- ahem, Western world.

At Charisma, we're experimenting with what it would feel like being able to move around in a 3D space and hold conversations with its inhabitants at will. Imagine walking into a world populated with intelligent characters, and being part of a fluid experience where characters, tension, and drama naturally ebb and flow.

🔥 The metaverse needs dramatic, charismatic characters and stories. 🔥

And whilst 1-to-1 interactions are great, we need to think bigger, where multiple players might talk to multiple characters within a single conversation. The metaverse will be a social hub, after all!

We think that immersive experiences will be the backbone of social entertainment in the metaverse. And they should be able to play out anywhere, that's what the metaverse is: think cross-reality experiences involving VR, AR, robots, websites, mobile apps, games, museum exhibits, theme park installations, and so much more.

We're therefore excited to share our vision of Charisma being the best-in-class way to write and power social NPCs in the metaverse and beyond. 🙌

For drama, you need a dramatic platform

Many of the tools and platforms that exist today to build interactive conversational experiences are designed for Messenger-style interactions.

Tools like DialogFlow support disembodied conversations: conversations that don't operate in the world of the living but in a dull digital ether, where there is no concept of space or drama.

These platforms aren't concerned about the thriving, breathing world around the player: any environmental context is forgotten.

If the player throws a bar stool at a character, a disembodied character wouldn't react. And a chatbot with pre-scripted "small talk" would happily repeat "I'm sorry, I don't understand. How can I help you today?".

Drama, and more importantly, fun, are not often on the agenda for chatbot platforms! They're primarily designed for call centres to get users the help they need as quick as possible, not to engage players in complex, weaving narratives.

We feel that the metaverse deserves a better platform to power characters with.

Adding a third dimension

Why are conversations in the 3D realm of the metaverse so different to chatbot conversations?

Exhibit A: a WhatsApp chat. The participants are explicit: they're the people that someone has invited to the conversation through the press of a button. The experience is asynchronous: there's no need for anyone to react immediately in order to progress the conversation. And the distinction between conversations is clear: there's UI separating your chat with Mum from your conversation with your partner (thank goodness).

But now imagine Exhibit B: a 3D environment. If you say something, that's broadcast around you and any number of people may or may not hear what you've said: the participants are implicit. If they've heard it, there's no guarantee that they'll react to it. But if they do react, they should do so within a small window for the context to still be relevant: the experience is synchronous. And multiple people could react and join the same conversation: the distinction between separate conversations is unclear.

These fundamental differences mean that a dialogue engine powering a 3D world cannot be a traditional chatbot engine.

The engine itself must have a richer and deeper understanding of the space and its drama.

In a 3D environment, a player could walk up to any character or group of characters and intend to strike up a conversation. They could just as easily turn around and walk off after saying hello! In order to create a convincing immersive experience, there must be a framework for how to deal with this coherently. That's a unique challenge, and there are many questions that arise as a result:

  • What are the background characters doing? Should drama related to these characters advance if the player is not nearby to witness it?
  • What happens to characters when they go out of view or become unloaded? Do their simulations partially continue running (such as updating their emotions over time)?
  • What happens if a character decides to autonomously leave a conversation?
  • What happens if a character that's part of the core narrative isn't nearby?
  • Is a character always available to talk with?
  • How does a player join or leave a conversation? Should a conversation only be leave-able through a button press? If there's free movement, is it enough to "walk away"/"turn around"?
  • How does the experience know who the player is directing their conversation to?

And that's even before we start thinking about multiplayer...!

At Charisma we've been exploring these challenges deeply. We're devising principles for conversational entertainment, striking the balance between a totally scripted and a totally open-ended world. Control the player too much, and the experience feels constrictive, but open it up, and it's a technical minefield that's hard to make feel cohesive at all!

Also in the Charisma cookbook are best principles for player controls (including the camera which can be surprisingly crucial for context!), user interfaces, character positioning and level design. These are all non-obvious but critical aspects which make a big difference to the quality of a 3D conversational experience.

🤠 Project Howdy: Involving multiple characters in a conversation is a challenge: but Charisma's tech can make it happen!

🤠 Project Howdy: Involving multiple characters in a conversation is a challenge: but Charisma's tech can make it happen!

Technologically, we've adapted Charisma to support totally non-linear stories (what we call Game Stories) where story beats can be triggered dynamically, perhaps in response to a gameplay action or something specific the player said. This event-driven model is perfect for authors to tell more interesting, reactive stories with their loved characters.

But it's not just the text and speech that make a conversation. There's plenty more than that...

Charisma as a toolbox for multi-modal conversations

As humans, we are inherently multi-modal creatures, and use multiple modalities (forms of media) in communication. For example, visual, auditory and social cues all work together with the semantic content of text or speech to make our communication richer.

So why not extend this to avatars in the metaverse too?

That involves building technology that is able to enrich text or speech input with other cues from the player and the world:

  • Player gaze and attention
  • Player emotion, sentiment and tone
  • Interruption detection
  • Backchannel detection
  • Turn-taking management
  • Gesture detection
  • Player gameplay actions

This data can then be combined with the context of the interaction, the knowledge base of the characters, and the scripted story, to help the characters respond naturally.

We're developing technology in these areas to further advance Charisma as not just a dialogue engine, but a comprehensive behaviour engine for characters and virtual beings alike.

Our grand vision is to pull together best-in-class technology in a wide variety of domains to power characters in the metaverse and beyond. That includes lip-sync, text-to-speech, natural language processing tech and generative systems like GPT-3. By working together, we hope to delight, excite and engage players for years to come.

Find out more!

The metaverse is all about shared experience, and we'd love to share our experiences with you.

If this sounds interesting to you and you have a project idea that would benefit from Charisma, we'd love to hear from you!

Or if you're fascinated by these subjects like we are, we are hiring! Even if there's not a role for you right now, we're always excited to talk to people on our wavelength about this fast-moving space. Check out our open roles here!