-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
生成字幕 #1658
base: main
Are you sure you want to change the base?
生成字幕 #1658
Conversation
生成与音频同步的字幕并返回: - TTS_infer_pack/TTS.py 生成与音频对应的字幕信息 - api_v2.py /tts 接口可用JSON同时返回生成的音频(转为base64)和字幕 - 通过参数控制是否生成字幕,默认关闭,不影响其他模块
|
||
check_res = check_params(req) | ||
if check_res is not None: | ||
return check_res | ||
|
||
if streaming_mode or return_fragment: | ||
req["return_fragment"] = True | ||
|
||
|
||
if streaming_mode: with_srt_format = "" # streaming not support srt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
流式不支持字幕时最好log输出一下,提醒用户。
|
||
texts = sum(texts, []) | ||
|
||
# 按顺序计算每段语音的起止时间,并与文字一一对应,用于生成字幕 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
后处理计算音频时间和恢复顺序这边,不需要返回字幕的话不去计算应该好一点,就是用单独的逻辑去控制是否需要计算。
合并了吗 |
能否直接返回字级别的时间戳呢? 一个片段里太多字数了,还是不能上屏。 |
我也需要字级别的时间戳,如何实现呢? |
大佬,在这个的基础上能不能添加自定义停顿呀,例如:“测试[2秒]生成语音”,在“测试”后面停顿两秒,然后在“生成语音” |
请教一下,为什么我根据你返回的时间点,生成.ass的字幕文件,然后把这个字幕通过ffmepg合成到视频中,字幕和视频的时间点对应不上啊 |
生成与音频同步的字幕:
另:ref_audio_path参数可接受形如base64:xxxxxx的字符串作为base64编码的音频,免去上传音频文件这一步。