Hello!

As the title suggests I am looking for a way to connect XTTS v2 to Perchance in order to have it read the responses out loud. I have been doing a solo DnD type thing and would like the characters and narrator to have their own voices to make it more immersive.

SillyTavern has this exact feature, but it seems mostly geared towards ERP(?) which I’m not interested in. Does anyone know how to get XTTS v2 set up to read out loud in Perchance or if its even possible? I would assume Javascript could do it, but I have no clue how to use Javascript.

Off topic: The miniature in the picture is sculpted by me and is one of the characters in my adventure. Its free for anyone who has access to a 3D printer.

  • wthit56@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    22 days ago

    At a glance, XTTS is a technology that runs on your computer. Not a service you can call from any web page. So it would have to run on the perchance server… which it does not.

  • Eric9082@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    17 days ago

    This is a feature that I’d love to see as well; my understanding (not much at present, but I’m learning) would be that one could set up a server via node.js and through the user code block in the AI chat, send the AI response text (along with the speaker’s name) to that process. That process, being local on your machine, could potentially invoke a local instance of XTTS and speak the text.

    This is conjecture on my part; I’ve been making some progress integrating per-message JS in the chat. Right now, for TTS, I’d like a means to separate narrator dialog, action text, and speaker text separately so that the TTS doesn’t simply say the entire message. For example:

    "The sheriff walked slowly into the room. ‘Everyone freeze! I’m looking for Bad Bart’ "

    I’d like to have the AI somehow separate this text so the narrator voice could speak the narrator part of the message and the character (with a different voice) would speak the character message. This would involve invoking the TTS engine twice for one message, as expected.

    It will certainly take months for me to approach anything workable, but luckily technology will improve as well over time perhaps making it easier.