通过百度翻译 API 机翻字幕
🔄

通过百度翻译 API 机翻字幕

Created
Mar 20, 2022 05:14 PM
Tags
python
Property
百度翻译 API 接口文档 👉🏻 百度翻译开放平台
  • srt 字幕文件的一般格式如下
1 00:00:22,105 --> 00:00:23,148 ♫ (Hayden Magnum - Flicker) Love was my religion, ♫ 2 00:00:23,148 --> 00:00:24,357 ♫ (Hayden Magnum - Flicker) but i lost my faith ♫ 3 00:00:24,357 --> 00:00:26,735 ♫ My line is going straight, but wait ♫
  • 翻译逻辑:
    • 跳过空行、数字行、箭头行,直接原文写入新文件
    • 剩余的是需要翻译的有效行,调接口翻译,翻译完把原文和译文写入新文件
    • 有效行中可能出现一句对话拆分多行的情况,采用多行合并后再翻译的处理方式
  • API 限制:
    • 标准版QPS=1,不限字符,高级版QPS=10,但是每月只有两百万字符免费额度。我使用的标准版,调用时加延时
import hashlib import re import time import requests def translate(from_, to, text): base_url = 'https://fanyi-api.baidu.com/api/trans/vip/translate' payload = { 'q': text, 'from': from_, 'to': to, 'appid': 'xxx', 'salt': 'xxx' } raw = payload['appid'] + payload['q'] + payload['salt'] + 'xxx' sign = hashlib.md5(raw.encode()).hexdigest() try: r = requests.post( base_url, params=dict(**payload, sign=sign) ) trans_result = r.json().get('trans_result', [{}]) if r.status_code != 200: print(f'api return error: {r.status_code}') time.sleep(1) return trans_result[0].get('dst', '') except Exception as e: print(f'translate error: {e}') return '' with open(f'es.srt', 'r', encoding='utf-8') as f: with open(f'en.srt', 'w', encoding='utf-8') as fc: lastlines = [] for i, line in enumerate(f.readlines()): line = re.sub(r'\(.*\)', '', line) line = line.strip().lstrip('-').strip().lstrip('\ufeff') if not line or line.isdigit() or '-->' in line: if lastlines: line_ = ' '.join(lastlines) print(f'{i} start') fc.write(f'{translate(from_="spa", to="en", text=line_)}\n') print(f'{i} done') lastlines.clear() fc.write(f'{line}\n') else: lastlines.append(line)