The Principal Dev โ€“ Masterclass for Tech Leads

The Principal Dev โ€“ Masterclass for Tech LeadsJuly 17-18

Join
VideoLingo Logo

Connect the World, Frame by Frame

Huanshere%2FVideoLingo | Trendshift

English๏ฝœ็ฎ€ไฝ“ไธญๆ–‡๏ฝœ็น้ซ”ไธญๆ–‡๏ฝœๆ—ฅๆœฌ่ชž๏ฝœEspaรฑol๏ฝœะ ัƒััะบะธะน๏ฝœFranรงais

๐ŸŒŸ Overview (Try VL Now!)

VideoLingo is an all-in-one video translation, localization, and dubbing tool aimed at generating Netflix-quality subtitles. It eliminates stiff machine translations and multi-line subtitles while adding high-quality dubbing, enabling global knowledge sharing across language barriers.

Key features:

Difference from similar projects: Single-line subtitles only, superior translation quality, seamless dubbing experience

๐ŸŽฅ Demo

Dual Subtitles


https://github.com/user-attachments/assets/a5c3d8d1-2b29-4ba9-b0d0-25896829d951

Cosy2 Voice Clone


https://github.com/user-attachments/assets/e065fe4c-3694-477f-b4d6-316917df7c0a

GPT-SoVITS with my voice


https://github.com/user-attachments/assets/47d965b2-b4ab-4a0b-9d08-b49a7bf3508c

Language Support

Input Language Support(more to come):

๐Ÿ‡บ๐Ÿ‡ธ English ๐Ÿคฉ | ๐Ÿ‡ท๐Ÿ‡บ Russian ๐Ÿ˜Š | ๐Ÿ‡ซ๐Ÿ‡ท French ๐Ÿคฉ | ๐Ÿ‡ฉ๐Ÿ‡ช German ๐Ÿคฉ | ๐Ÿ‡ฎ๐Ÿ‡น Italian ๐Ÿคฉ | ๐Ÿ‡ช๐Ÿ‡ธ Spanish ๐Ÿคฉ | ๐Ÿ‡ฏ๐Ÿ‡ต Japanese ๐Ÿ˜ | ๐Ÿ‡จ๐Ÿ‡ณ Chinese* ๐Ÿ˜Š

*Chinese uses a separate punctuation-enhanced whisper model, for now...

Translation supports all languages, while dubbing language depends on the chosen TTS method.

Installation

You don't have to read the whole docs, here is an online AI agent to help you.

Note: For Windows users with NVIDIA GPU, follow these steps before installation:

  1. Install CUDA Toolkit 12.6
  2. Install CUDNN 9.3.0
  3. Add C:\Program Files\NVIDIA\CUDNN\v9.3\bin\12.6 to your system PATH
  4. Restart your computer

Note: FFmpeg is required. Please install it via package managers:

  1. Clone the repository
git clone https://github.com/Huanshere/VideoLingo.git
cd VideoLingo
  1. Install dependencies(requires python=3.10)
conda create -n videolingo python=3.10.0 -y
conda activate videolingo
python install.py
  1. Start the application
streamlit run st.py

Docker

Alternatively, you can use Docker (requires CUDA 12.4 and NVIDIA Driver version >550), see Docker docs:

docker build -t videolingo .
docker run -d -p 8501:8501 --gpus all videolingo

APIs

VideoLingo supports OpenAI-Like API format and various TTS interfaces:

Note: VideoLingo works with 302.ai - one API key for all services (LLM, WhisperX, TTS). Or run locally with Ollama and Edge-TTS for free, no API needed!

For detailed installation, API configuration, and batch mode instructions, please refer to the documentation: English | ไธญๆ–‡

Current Limitations

  1. WhisperX transcription performance may be affected by video background noise, as it uses wav2vac model for alignment. For videos with loud background music, please enable Voice Separation Enhancement. Additionally, subtitles ending with numbers or special characters may be truncated early due to wav2vac's inability to map numeric characters (e.g., "1") to their spoken form ("one").

  2. Using weaker models can lead to errors during intermediate processes due to strict JSON format requirements for responses. If this error occurs, please delete the output folder and retry with a different LLM, otherwise repeated execution will read the previous erroneous response causing the same error.

  3. The dubbing feature may not be 100% perfect due to differences in speech rates and intonation between languages, as well as the impact of the translation step. However, this project has implemented extensive engineering processing for speech rates to ensure the best possible dubbing results.

  4. Multilingual video transcription recognition will only retain the main language. This is because whisperX uses a specialized model for a single language when forcibly aligning word-level subtitles, and will delete unrecognized languages.

  5. Cannot dub multiple characters separately, as whisperX's speaker distinction capability is not sufficiently reliable.

๐Ÿ“„ License

This project is licensed under the Apache 2.0 License. Special thanks to the following open source projects for their contributions:

whisperX, yt-dlp, json_repair, BELLE

๐Ÿ“ฌ Contact Me

โญ Star History

Star History Chart


If you find VideoLingo helpful, please give me a โญ๏ธ!

Join libs.tech

...and unlock some superpowers

GitHub

We won't share your data with anyone else.