Check out the demo video to see AutoTalker in action!
Input Prompt
"Explain python and their applications in 30 second"
Input Image

Output Video
In the rapidly evolving landscape of the 21st century, a comprehensive education is paramount for preparing students with the skills necessary to thrive in modern society. The Apprentice Project (TAP) is dedicated to cultivating these essential 21st-century skills among underserved children enrolled in government or low-income private schools.
TAP operates under the umbrella of the MentorMe Foundation, a Section 8 registered company, and is proudly supported by esteemed institutions such as Harvard University, IIM Bangalore, and the Nudge Foundation. As official partners with the Governments of Maharashtra and Delhi, TAP has a significant impact, reaching over 31,000 children through its innovative chatbot.
A staggering number of middle and high school students—over 100 million—from low-income communities across India lack critical 21st-century skills, including Social & Emotional Learning (SEL) and Financial Literacy. The traditional exam-centric public education system exacerbates this issue, leading to the alarming statistic that 1 in 2 children graduating from the Indian education system is deemed unemployable due to the absence of these crucial skills.
TAP aligns its mission with several UN SDGs:
The Apprentice Project (TAP), operating under the MentorMe Foundation, empowers underserved students through TAP Buddy—an artificial intelligence-powered WhatsApp chatbot. TAP Buddy offers video-based electives, guiding students through independent projects using personalized (ML-learned) and AI bot-based nudges and content. Self-learning project videos foster skills such as creativity, confidence, self-awareness, communication, and problem-solving, breaking mental barriers and instilling a growth mindset.
As the usage of TAP's chatbot continues to grow, the project faces challenges and seeks innovative solutions:
Course Creation: Leveraging AI to generate content across various electives such as coding and visual arts, aiming to overcome limitations in bulk video creation due to manual time constraints.
Personalized Learning: Employing AI to create personalized coding tutorials or art project guides tailored to individual learning styles and skill levels. Advanced ML/Open AI analysis adapts content based on a learner's progress, ensuring a customized learning experience.
Content Creation: Utilizing AI to generate code snippets, templates, or design ideas for art projects, guiding students at their skill levels and suggesting exploration options.
Artistic Exploration: Recommending techniques and styles based on a child's skill level, broadening artistic horizons by comparing their work to famous artists or art movements.
Creative Coding: Using AI to brainstorm ideas and provide inspiration for innovative and artistic coding projects.
My approach to addressing the challenges faced by TAP involves leveraging cutting-edge technologies, including natural language processing (NLP), artificial intelligence (AI), and machine learning (ML), to develop AutoTalker—a component of TAP aimed at enhancing the educational experience for students.
AutoTalker utilizes advanced AI models and libraries, such as Suno Bark TTS for text-to-speech conversion, Google's generative AI Python SDK (Gemini Pro) for text generation, and SadTalker for lip-syncing audio with facial movements in videos. By integrating these technologies, AutoTalker enables the creation of engaging and informative video content from text prompts and images.
Furthermore, the project incorporates features like personalized learning, content creation assistance, and language support to cater to diverse learning needs and preferences. By harnessing the power of AI, AutoTalker empowers educators and students alike to access high-quality educational content tailored to their individual requirements, thereby fostering the development of essential 21st-century skills.
Through this innovative solution, TAP aims to revolutionize the education landscape, bridging the gap in access to quality learning resources and empowering students from underserved communities to realize their full potential in the digital age.
The project focuses on leveraging technology to create new courses, personalize existing ones, and enhance the assessment process, ultimately contributing to the development of 21st-century skills in students. AutoTalker, a component of TAP, showcases the capabilities of AI in generating lip-synced videos from text prompts and images, enhancing the overall educational experience for students.
It utilizes several libraries, including:
These features collectively contribute to the generation of lip-synced videos from input text prompts and images, with support for various languages and subtitles in English.
Python 3.10.6
API key from Google AI.
ffmpeg installed.
PyTorch installed. Ensure your system supports CUDA.
ImageMagick installed. This is required for MoviePy.
SadTalker installed.
Note: Ensure your GPU has a minimum of 4 GB VRAM with support for CUDA.
Install Python 3.10.6:
Install ffmpeg:
Install ImageMagick:
Clone the AutoTalker repository:
git clone https://github.com/Pmking27/AutoTalker
cd AutoTalkerDownload SadTalker with Models and Weights:
python download_models.pyRun the above command and wait until it shows "Downloads completed." This will download SadTalker along with the required models and weights.
Create a virtual environment:
python -m venv venvActivate the virtual environment:
source venv/bin/activate.venvScriptsactivateInstall dependencies:
pip install -r requirements.txtInstall PyTorch with CUDA:
pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118Now, you have successfully set up the environment for the project, ensuring your GPU meets the specified requirements.
The project has the following structure:
.
├── checkpoints # Model checkpoints (SadTalker)
│ ├── _MACOSX
│ ├── mapping_00109-model.pth.tar
│ ├── mapping_00229-model.pth.tar
│ ├── SadTalker_V0.0.2_256.safetensors
│ └── SadTalker_V0.0.2_512.safetensors
│── gfpgan_weights # Weights for GFPGAN enhancer
│ ├── _MACOSX
│ ├── alignment_WFLW_4HG.pth
│ ├── detection_Resnet50_Final.pth
│ ├── GFPGANv1.4.pth
│ └── parsing_parsenet.pth
├── SadTalker # Folder containing SadTalker code
│ ├── app_sadtalker.py
│ ├── cog.yaml
│ ├── inference.py
│ ├── launcher.py
│ ├── LICENSE
│ ├── predict.py
│ ├── quick_demo.ipynb
│ ├── README.md
│ ├── req.txt
│ ├── requirements.txt
│ ├── requirements3d.txt
│ ├── webui.bat
│ └── webui.sh
├── venv # Virtual environment folder
├── download_models.py # Models download script
├── main.py # Main Python script
├── requirements.txt # All required dependencies list txt file
├── subtitles.py # Audio Enhacing and subtitles creation script
└── tts.py # Text To Speech into .wav file creation script
Activate Virtual Environment:
Configure GEMINI PRO API Key:
main.py file.genai.configure(api_key="add your key here")."add your key here" with your actual GEMINI PRO API key.Run Main Script and Gradio Web UI:
iface.launch() part) from the script.Run AutoTalker and Launch Gradio:
python main.pyAccess Gradio Web UI:
Explore the Interface:
Submit and Wait:
Review Output:
Explore Subtitles (If Enabled):
Repeat and Experiment:
Close Gradio UI:
By following these combined steps, you can seamlessly run AutoTalker, interact with the Gradio web UI, and experience the generated lip-synced videos.
We appreciate your interest in contributing to our project! To ensure a smooth and collaborative experience, please follow these guidelines:
Fork the Repository:
Clone the Repository:
git clone https://github.com/YourUsername/AutoTalker.gitCreate a Branch:
git checkout -b feature/your-feature-nameMake Changes:
Commit Changes:
git commit -m "Add your commit message here"Push Changes:
git push origin feature/your-feature-nameCreate Pull Request:
Review and Collaborate:
Squash Commits (if needed):
Merge:
Areas Needing Help: Human-Like TTS Implementation
If you're interested in making a significant impact, consider contributing to the implementation of Human-Like Text-to-Speech (TTS) for a diverse set of languages, including Indian regional languages. Focus on enhancing TTS capabilities for both male and female voices.
Given the diverse linguistic landscape in India, contributions to support Indian regional languages in TTS are highly valued. These languages may include, but are not limited to:
Your efforts in implementing TTS for these languages will significantly contribute to making educational content accessible to a broader audience, particularly in regions with diverse linguistic backgrounds.
Thank you for considering these important contributions to the Human-Like TTS implementation! Your work will play a vital role in making educational content inclusive and accessible to learners from various linguistic backgrounds. ?
This project is licensed under the MIT License.
This project acknowledges the following open-source projects and their contributors:
Google AI Python SDK: The Google AI Python SDK enables developers to use Google's state-of-the-art generative AI models (like Gemini and PaLM) to build AI-powered features and applications.
SadTalker: [CVPR 2023] SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation. A project by OpenTalker.
Pedalboard: A Python library for working with audio, developed by Spotify.
Whisper: Robust Speech Recognition via Large-Scale Weak Supervision, an open-source project by OpenAI.
Transformers by Hugging Face: ? Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Accelerate by Hugging Face: A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision.
Optimum by Hugging Face: Accelerate training and inference of ? Transformers and ? Diffusers with easy-to-use hardware optimization tools.
Bark by Suno AI: ? Text-Prompted Generative Audio Model.
PyTorch: Tensors and Dynamic neural networks in Python with strong GPU acceleration.
These projects have significantly contributed to the development and functionality of AutoTalker, and we extend our gratitude to their respective developers and maintainers.