The Simple Way to Generate Subtitles

A good but not perfect solution for get subtitles for movies and anime.

Introduction Whisper

Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. We show that the use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language. Moreover, it enables transcription in multiple languages, as well as translation from those languages into English. We are open-sourcing models and inference code to serve as a foundation for building useful applications and for further research on robust speech processing.

It's a open source model from OpenAI and there is a high-performance inference called Whisper.cpp. Whisper.cpp provide Apple Silicon and CoreML support and reduce memory usage, which is good for Mac users and low-end devices such like NAS.

Whisper.cpp is a powerful tool, generate subtitles is just a narrow use case. You can use it to do more things.

Here is the example for setup Whisper.cpp for Mac.

Installation

Clone the repository:

git clone https://github.com/ggerganov/whisper.cpp.git

Install Python dependencies needed for the creation of the Core ML model:

pip install ane_transformers
pip install openai-whisper
pip install coremltools

Generate a Core ML model. I recommend medium model.

./models/generate-coreml-model.sh medium

Build whisper.cpp with Core ML support:

make clean
WHISPER_COREML=1 make -j

Usage

Optional: Convert your video to wav file. Be noticed: whisper.cpp only support wav files, so you need to convert your video to wav first. I recommend ffmpeg.

ffmpeg -i '/your/filePath/file.mp4'  -ar 16k output.wav

Run whisper.cpp

./main -l 'your_target_language' -m models/ggml-medium.bin -osrt -f output.wav -t 8 -nf -mc 5 -bs 1 -bo 10

Depends on performance of your device, I generate a subtitle for 1 hour video in around 10 minutes on my M1Max macbook pro.

Translate your subtitle

For translating subtitle, I wrote a simple python script to translate subtitle with DeepL API. You may need to sign up an account and apply an API key from DeepL API for free.

And you need to install deepl python package.

pip install deepl

Here is the script:

import deepl
import threading
import sys

class Translator:
    def __init__(self, auth_key):
        self.translator = deepl.Translator(auth_key)
        self.progress = 0
        self.lock = threading.Lock()

    def trans_to_chinese(self, data, index, temp):
        result = self.translator.translate_text(data, target_lang="ZH") # translate to Chinese or other language you want
        with self.lock:
            temp[index] = result
            self.progress += 1
            self.update_progress()

    def update_progress(self):
        percent = format(self.progress / self.total * 100, '.1f')
        print('\r' + percent + '%', end='')

    def translate_srt(self, path):
        data = ''
        x = path.split('/')
        length = len(x)
        file_name = path.replace(x[length - 1], x[length - 1].replace('.srt', '.zh.srt'))
        temp = [None]
        threads = []

        with open(path, 'r', encoding="utf-8") as my_file:
            lines = my_file.readlines()
            self.total = len(lines)
            temp = [None] * len(lines)

            for index, line in enumerate(lines):
                if line[0].isdigit():
                    temp[index] = line
                    with self.lock:
                        self.progress += 1
                        self.update_progress()
                else:
                    t = threading.Thread(target=self.trans_to_chinese, args=(line, index, temp))
                    t.start()
                    threads.append(t)
                    if len(threads) >= self.max_threads_num:
                        for thread in threads:
                            thread.join()
                        threads.clear()

        for thread in threads:
            thread.join()

        for line in temp:
            data += str(line)

        print(path)
        with open(file_name, 'w', encoding='utf-8') as file:
            file.write(data)
        print('Translate finished!')

    def get_path(self, url):
        if url.endswith('.srt'):
            print('Translating...')
            self.progress = 0
            self.max_threads_num = 80
            self.translate_srt(url)
        else:
            print('File format error!')

if __name__ == '__main__':
    auth_key = "your_api_key_here"
    translator = Translator(auth_key)
    translator.get_path(sys.argv[1])

One more thing

For easy to use the whole process, I wrote a shell script to do all the things.

Create a translate.py file with the script above in root path.
Create a generate-srt.sh file with the following script in root path too.

#!/bin/bash

addr="$1"
lang="ja"

if [ $# -gt 1 ]; then
    lang="$2"
fi

ffmpeg -i "$addr"  -ar 16k output.wav
./main -l $lang -m models/ggml-medium.bin -osrt -f output.wav -t 8 -nf -mc 5 -bs 1 -bo 10
rm output.wav
python3 translate.py "output.wav.srt"
cp "output.wav.zh.srt" "$addr.srt"
rm "output.wav.zh.srt"
rm "output.wav.srt"

Give the script permission to execute.

chmod +x generate-srt.sh

Run the script.

./generate-srt.sh /your/file/path/file.mp4 ja # ja is the target language, you can change it to other language you want.

Enjoy your subtitles!

Published: August 22, 2023

AI Python Tools