Keen to put the latest generative AI tools to the test, I created this animated avatar capable of delivering medication counselling in a realistic and professional-sounding manner.
️ Disclaimer: This video is purely for demo purposes as part of a personal hobby project and is not intended to provide any form of medical advice. This is not a real product, and views expressed are my own.
Click the thumbnail below (or visit this link) to watch the demo video to see Macy in action, where she briefly talks about two commonly-prescribed medications.
The outcome is pretty impressive and will only get better over time, given the speed at which generative AI is improving. To create this demo, it took me 0 dollars and only 25 minutes.
Here are the tools I used:
We need a face to represent our avatar, and we can use image generation tools like Midjourney to do just that
Midjourney is a free AI service by OpenAI that creates images from textual descriptions
Setup:
newbies-24
/imagine followed by your description prompt. For example, the prompt I used was "high-quality upper body professional photo of a female Chinese pharmacist in a white lab coat with a pharmacy background". Press Enter after typing it in, and give Midjourney some time to generate the images.
I also tried other tools like DALL-E and Stable Diffusion but their results were not realistic enough (e.g., misaligned eyes and facial features).
NOTE: Midjourney has paused its free trial program as of April 2023. To generate realistic facial pictures, you can use either of the following:
We need a counselling script that can give relevant advice on a set of medications. To do that, we can use ChatGPT.
ChatGPT is a chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large language models.
Setup:
I had to regenerate the response several times because I found some of the output to be overly theoretical and academic, and is not in the expected layman format for explaining medications to patients.
Next, we want to convert the ChatGPT script text into a natural sounding audio clip. We can do so with free tools like Prime Voice AI (by ElevenLabs)
Prime Voice AI is a realistic and versatile AI speech software that brings the most compelling, rich and lifelike voices to creators and publishers seeking the ultimate tools for storytelling.
Setup:
premade/Domi as I found it to be the most lively and natural. The settings can also be adjusted accordingly for things like stability and clarity.
I shortened the script slightly by removing the section on the drug Amlodipine because I did not want the demo to be too long.
There is a credit limit for the free account, so make sure you use them wisely for the audio you want to generate.
Download and save the .mp3 (titled 'synthesized_audio.mp3') file on your local machine.
Bonus Tip: ElevenLabs also comes with the voice cloning capabilities (under the Voice Lab feature): https://beta.elevenlabs.io/voice-lab. If you have >1 min recording of a particular voice, you can convert the script into the voice you want to clone.
Lastly, it is time to piece the pharmacist image and counselling audio together into a photorealistic video. To do so, we can use tools like D-ID.
D-ID’s creative AI technology takes images of faces and turns them into high-quality, photorealistic videos. At the click of a button, it can combine images with audio or text to give them expression and speech.
Setup:
Upload your own voice section on the right.
Generate Video button at the top right and wait for your masterpiece to be ready for download!