-
-
Notifications
You must be signed in to change notification settings - Fork 508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XTTS can only generate text with a maximum of 400 tokens. #140
Comments
there is a bug on how sentences are splitted for now. next git update will solve your issue. |
English |
could yo provide which part of the text generates the error? I want to be sure my patch is fixing it. |
How would I know that? |
Just send a full log of the terminal as a txt file dude... |
tts.log |
well, do you really think the A.I. speaker is going to say the example code, and a technical table without issue? |
Bleh. Hit me, too. Seems to hit even non-technical books. It just happened to have a really crappily formatted number as a gag... Processing 57.16%: : 23464/41046 Sentence: Why? Why…well, the rules said so; During handling of the above exception, another exception occurred: Traceback (most recent call last): ===================================================================== It was so deep down it would be bad…to change it. But why? Why—did it look like something had been changed? Focus. What did you multiply Erin’s achievements by? Oh, it was simple if you looked. Though it was so long and precise; no wonder it was always skipped. You had to round it because it was like counting…multiply Erin’s deeds by… 3.1415926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679 8214808651 3282306647 0938446095 5058223172 5359408128 4811174502 8410270193 8521105559 6446229489 5493038196 4428810975 6659334461 2847564823 3786783165 2712019091 4564856692 3460348610 4543266482 1339360726 0249141273 7245870066 0631558817 4881520920 9628292540 9171536436 7892590360 0113305305 4882046652 1384146951 9415116094 3305727036 5759591953 0921861173 8193261179 3105118548 0744623799 6274956735 1885752724 8912279381 8301194912 9833673362 4406566430 8602139494 6395224737 1907021798 6094370277 0539217176 2931767523 8467481846 7669405132 0005681271 4526356082 7785771342 7577896091 7363717872 1468440901 2249534301 4654958537 1050792279 6892589235 4201995611 2129021960 8640344181 5981362977 4771309960 5187072113 4999999837 2978049951 0597317328 1609631859 5024459455 3469083026 4252230825 3344685035 2619311881 7101000313 7838752886 5875332083 8142061717 7669147303 5982534904 2875546873 1159562863 8823537875 9375195778 1857780532 1712268066 1300192787 6611195909… It was counting infinity each time. But just a symbol. What was it? Oh yes. Multiply by π. What the— <Red|Who wrote that?|Red> |
For what it is worth, I am using an RTX 4090 with 24GB of VRAM. Would it be possible to modify the value so I can do bigger than 400 tokens, or is it a function of how the voice stuff is generated? |
And I did just end up editing the ebook to modify that line to truncate the number, since it didn't add anything to the story... but wanted to provide the context of my issue. I wish there was a resume... |
it's a TTS limitation. all A.I. today cannot or don't want more than a certain number of token for accuracy. maybe in 5 years we will swallow to it the entire book once... but it's not the case today. |
An interesting consideration is to add the ability to optionally scan the file before committing to the conversion. Check tokens per the settings selected and validate that everything is compliant. |
I think text splitting might be the only factor I can think of for my above suggestion. |
it's not as easy you think to split text when you have 1124 languages to manage.... |
I also cannot convert my book because of this splitting error.
|
I think I understand. However, you have a current process for doing that, correct? My thought is that you do a dry-run of the current text split without the actual audio conversion just to see where the text-splits land, then check the tokens for them to make sure they are compliant. I'm not sure that it would require a rewrite of the text-splitting system. Just a thought, as it might save days of time and energy for users... especially those without GPUs or slower GPUs. I guess this also depends on if you are using GenAI to actually come up with the text splits vs. an algorithm done purely on CPU. I suppose I could look at the code myself and see, but don't know that I have the time to come up with a solution and submit a pull request. |
text splitting should be fixed in the next update. |
@majormer could you please provide your original text to make my last test for the fix of this issue? |
It was posted above, but this is the original text that caused the crash: ================================= It was so deep down it would be bad…to change it. But why? Why—did it look like something had been changed? Focus. What did you multiply Erin’s achievements by? Oh, it was simple if you looked. Though it was so long and precise; no wonder it was always skipped. You had to round it because it was like counting…multiply Erin’s deeds by… 3.1415926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679 8214808651 3282306647 0938446095 5058223172 5359408128 4811174502 8410270193 8521105559 6446229489 5493038196 4428810975 6659334461 2847564823 3786783165 2712019091 4564856692 3460348610 4543266482 1339360726 0249141273 7245870066 0631558817 4881520920 9628292540 9171536436 7892590360 0113305305 4882046652 1384146951 9415116094 3305727036 5759591953 0921861173 8193261179 3105118548 0744623799 6274956735 1885752724 8912279381 8301194912 9833673362 4406566430 8602139494 6395224737 1907021798 6094370277 0539217176 2931767523 8467481846 7669405132 0005681271 4526356082 7785771342 7577896091 7363717872 1468440901 2249534301 4654958537 1050792279 6892589235 4201995611 2129021960 8640344181 5981362977 4771309960 5187072113 4999999837 2978049951 0597317328 1609631859 5024459455 3469083026 4252230825 3344685035 2619311881 7101000313 7838752886 5875332083 8142061717 7669147303 5982534904 2875546873 1159562863 8823537875 9375195778 1857780532 1712268066 1300192787 6611195909… It was counting infinity each time. But just a symbol. What was it? Oh yes. Multiply by π. What the— <Red|Who wrote that?|Red> =================================== |
btw is special mathematic chars like "π" were ok or not? |
Pi worked fine. It was the long strong of numbers
…On Wed, Jan 8, 2025, 2:14 AM ROBERT MCDOWELL ***@***.***> wrote:
btw is special mathematic chars like "π" were ok or not?
—
Reply to this email directly, view it on GitHub
<#140 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB2XMTOE73JLUK57TNP7XC32JTM53AVCNFSM6AAAAABUJQRJ4SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNZXGAYTGMZRGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I mean you don't know if the voice pronounced it well as it crashed...... |
Incorrect. I changed the text of the number and re-ran the entire book. It
completed successfully, including the pi character.
…On Wed, Jan 8, 2025, 8:35 AM ROBERT MCDOWELL ***@***.***> wrote:
I mean you don't know if the voice pronounced it well as it crashed......
—
Reply to this email directly, view it on GitHub
<#140 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB2XMTIJ6TJBGWYJQKUVLFD2JUZRPAVCNFSM6AAAAABUJQRJ4SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNZXHAZDGNRXGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
ok so do you confirm the voice is pronouncing well special chars like Pi ? |
Well, I just went back to verify, and there is a gap. I only spot-checked the generated file. This is the original text: And it actually SAID: Three fourteen fifteen (fade-out) Although in finding this sample, I found other gaps in the speech as well. When I processed the e-book, I unchecked the "Enable text splitting" option, so not sure if that lead to gaps or not. I'm now going through the sentences that are still in the temp directory to see if it rendered, but it might take a bit as there are 41038 un-labeled wav files to pick at, and I haven't read the book to know the general location of a given sentence. I'll follow up on that soon, to see if it processed the voice at the sentence level. |
I guess I'm saying that it didn't crash, but it also didn't bother with even the entire shortened number. |
ok despite of your issue which is already solved on the next update. |
Yeah. That is what I'm trying to figure out. In the chapter audio file, the entire sentence is skipped. I'm looking for the actual sentence files which I still have in my temp directory and trying to figure out if it was pronounced there, or if that was just skipped entirely in the processing. Unfortunately, I am trying to figure out which of thousands of sentence files it might be in, and I am working on that now. |
create a simple text with a sentence with "π" in it. and tell me if "π" is said or not... I need test from others than only my tests. |
It does not say the pi character. I used the sentence "Wideacre Hall faces due π and the π shines all day on the yellow stone until it is warm and powdery to the touch" It skipped the first π entirely, and the second one almost sounds like it was pronounced like you would say the long U sound. I would attach the .wav file here, but it won't let me. |
There. I attached the generated .wav file with the sentence above. |
I have not pulled a newer version, however, so if I need to do that to validate testing, or run it against a different branch, please let me know. I will keep it handy. |
ok so it confirms that mathematics signs are not spoken.... we must find a way for this issue, but for +1100 languages it's not possible. the main languages will have a fix. |
Processing 9.85%: : 798/8093 Traceback (most recent call last):
File "G:\ebook2audiobook-main\lib\functions.py", line 625, in convert_sentence_to_audio
output = params['tts'].inference(
^^^^^^^^^^^^^^^^^^^^^^^^
File "G:\ebook2audiobook-main\python_env\Lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "G:\ebook2audiobook-main\python_env\Lib\site-packages\TTS\tts\models\xtts.py", line 528, in inference
text_tokens.shape[-1] < self.args.gpt_max_text_tokens
AssertionError: ❗ XTTS can only generate text with a maximum of 400 tokens.
Caught DependencyError: ❗ XTTS can only generate text with a maximum of 400 tokens.
Processing 9.85%: : 798/8093
Traceback (most recent call last):
File "G:\ebook2audiobook-main\lib\functions.py", line 625, in convert_sentence_to_audio
output = params['tts'].inference(
^^^^^^^^^^^^^^^^^^^^^^^^
File "G:\ebook2audiobook-main\python_env\Lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "G:\ebook2audiobook-main\python_env\Lib\site-packages\TTS\tts\models\xtts.py", line 528, in inference
text_tokens.shape[-1] < self.args.gpt_max_text_tokens
AssertionError: ❗ XTTS can only generate text with a maximum of 400 tokens.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "G:\ebook2audiobook-main\lib\functions.py", line 584, in convert_chapters_to_audio
if convert_sentence_to_audio(params, session):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "G:\ebook2audiobook-main\lib\functions.py", line 657, in convert_sentence_to_audio
raise DependencyError(e)
lib.functions.DependencyError: ❗ XTTS can only generate text with a maximum of 400 tokens.
Caught DependencyError: ❗ XTTS can only generate text with a maximum of 400 tokens.
convert_ebook() Exception: ❗ XTTS can only generate text with a maximum of 400 tokens.
The text was updated successfully, but these errors were encountered: