Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. We show that the use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language. Moreover, it enables transcription in multiple languages, as well as translation from those languages into English. We are open-sourcing models and inference code to serve as a foundation for building useful applications and for further research on robust speech processing.
It's a open source model from OpenAI and there is a high-performance inference called Whisper.cpp. Whisper.cpp provide Apple Silicon and CoreML support and reduce memory usage, which is good for Mac users and low-end devices such like NAS.
Whisper.cpp is a powerful tool, generate subtitles is just a narrow use case. You can use it to do more things.
Here is the example for setup Whisper.cpp for Mac.
git clone https://github.com/ggerganov/whisper.cpp.git
pip install ane_transformers
pip install openai-whisper
pip install coremltools
./models/generate-coreml-model.sh medium
make clean
WHISPER_COREML=1 make -j
ffmpeg -i '/your/filePath/file.mp4' -ar 16k output.wav
./main -l 'your_target_language' -m models/ggml-medium.bin -osrt -f output.wav -t 8 -nf -mc 5 -bs 1 -bo 10
Depends on performance of your device, I generate a subtitle for 1 hour video in around 10 minutes on my M1Max macbook pro.
For translating subtitle, I wrote a simple python script to translate subtitle with DeepL API. You may need to sign up an account and apply an API key from DeepL API for free.
And you need to install deepl python package.
pip install deepl
Here is the script:
import deepl
import threading
import sys
class Translator:
def __init__(self, auth_key):
self.translator = deepl.Translator(auth_key)
self.progress = 0
self.lock = threading.Lock()
def trans_to_chinese(self, data, index, temp):
result = self.translator.translate_text(data, target_lang="ZH") # translate to Chinese or other language you want
with self.lock:
temp[index] = result
self.progress += 1
self.update_progress()
def update_progress(self):
percent = format(self.progress / self.total * 100, '.1f')
print('\r' + percent + '%', end='')
def translate_srt(self, path):
data = ''
x = path.split('/')
length = len(x)
file_name = path.replace(x[length - 1], x[length - 1].replace('.srt', '.zh.srt'))
temp = [None]
threads = []
with open(path, 'r', encoding="utf-8") as my_file:
lines = my_file.readlines()
self.total = len(lines)
temp = [None] * len(lines)
for index, line in enumerate(lines):
if line[0].isdigit():
temp[index] = line
with self.lock:
self.progress += 1
self.update_progress()
else:
t = threading.Thread(target=self.trans_to_chinese, args=(line, index, temp))
t.start()
threads.append(t)
if len(threads) >= self.max_threads_num:
for thread in threads:
thread.join()
threads.clear()
for thread in threads:
thread.join()
for line in temp:
data += str(line)
print(path)
with open(file_name, 'w', encoding='utf-8') as file:
file.write(data)
print('Translate finished!')
def get_path(self, url):
if url.endswith('.srt'):
print('Translating...')
self.progress = 0
self.max_threads_num = 80
self.translate_srt(url)
else:
print('File format error!')
if __name__ == '__main__':
auth_key = "your_api_key_here"
translator = Translator(auth_key)
translator.get_path(sys.argv[1])
For easy to use the whole process, I wrote a shell script to do all the things.
translate.py
file with the script above in root path.generate-srt.sh
file with the following script in root path too.#!/bin/bash
addr="$1"
lang="ja"
if [ $# -gt 1 ]; then
lang="$2"
fi
ffmpeg -i "$addr" -ar 16k output.wav
./main -l $lang -m models/ggml-medium.bin -osrt -f output.wav -t 8 -nf -mc 5 -bs 1 -bo 10
rm output.wav
python3 translate.py "output.wav.srt"
cp "output.wav.zh.srt" "$addr.srt"
rm "output.wav.zh.srt"
rm "output.wav.srt"
chmod +x generate-srt.sh
./generate-srt.sh /your/file/path/file.mp4 ja # ja is the target language, you can change it to other language you want.
Enjoy your subtitles!