XTTS can only generate text with a maximum of 400 tokens. #140

Code4SAFrankie · 2024-12-28T06:26:21Z

Processing 9.85%: : 798/8093 Traceback (most recent call last):
File "G:\ebook2audiobook-main\lib\functions.py", line 625, in convert_sentence_to_audio
output = params['tts'].inference(
^^^^^^^^^^^^^^^^^^^^^^^^
File "G:\ebook2audiobook-main\python_env\Lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "G:\ebook2audiobook-main\python_env\Lib\site-packages\TTS\tts\models\xtts.py", line 528, in inference
text_tokens.shape[-1] < self.args.gpt_max_text_tokens
AssertionError: ❗ XTTS can only generate text with a maximum of 400 tokens.
Caught DependencyError: ❗ XTTS can only generate text with a maximum of 400 tokens.
Processing 9.85%: : 798/8093
Traceback (most recent call last):
File "G:\ebook2audiobook-main\lib\functions.py", line 625, in convert_sentence_to_audio
output = params['tts'].inference(
^^^^^^^^^^^^^^^^^^^^^^^^
File "G:\ebook2audiobook-main\python_env\Lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "G:\ebook2audiobook-main\python_env\Lib\site-packages\TTS\tts\models\xtts.py", line 528, in inference
text_tokens.shape[-1] < self.args.gpt_max_text_tokens
AssertionError: ❗ XTTS can only generate text with a maximum of 400 tokens.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "G:\ebook2audiobook-main\lib\functions.py", line 584, in convert_chapters_to_audio
if convert_sentence_to_audio(params, session):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "G:\ebook2audiobook-main\lib\functions.py", line 657, in convert_sentence_to_audio
raise DependencyError(e)
lib.functions.DependencyError: ❗ XTTS can only generate text with a maximum of 400 tokens.
Caught DependencyError: ❗ XTTS can only generate text with a maximum of 400 tokens.
convert_ebook() Exception: ❗ XTTS can only generate text with a maximum of 400 tokens.

ROBERT-MCDOWELL · 2024-12-28T13:00:20Z

there is a bug on how sentences are splitted for now. next git update will solve your issue.
which language are you using?

Code4SAFrankie · 2024-12-30T08:02:42Z

English

ROBERT-MCDOWELL · 2024-12-30T10:57:29Z

could yo provide which part of the text generates the error? I want to be sure my patch is fixing it.

Code4SAFrankie · 2024-12-30T20:08:50Z

How would I know that?

DrewThomasson · 2024-12-30T20:10:27Z

Just send a full log of the terminal as a txt file dude...

Code4SAFrankie · 2024-12-31T13:57:30Z

tts.log
This is the log

Code4SAFrankie · 2024-12-31T13:59:00Z

This is an image of the page where it crashed.

ROBERT-MCDOWELL · 2024-12-31T14:07:51Z

well, do you really think the A.I. speaker is going to say the example code, and a technical table without issue?
the day an A.I. will voice an entire math book without glitches is not for today I tell you.

majormer · 2024-12-31T15:19:15Z

Bleh. Hit me, too. Seems to hit even non-technical books. It just happened to have a really crappily formatted number as a gag...

Processing 57.16%: : 23464/41046 Sentence: Why? Why…well, the rules said so;
Like Kevin;
Like Tom,
Processing 57.17%: : 23465/41046 Sentence: the darling [Clown];
Like…
Multiply them? By what? It was just…why did the rules look different here? As if they had been written differently? Just a little word;
Processing 57.17%: : 23466/41046 Sentence: What…what did you multiply them by?
Every time you tried to figure it out,
Processing 57.17%: : 23467/41046 Sentence: it slipped away;
Which was why self-analysis never caught it;
S-strange;
Was this wrong? Why was this rule here?
It was so deep down it would be bad…to change it;
But why? Why—
Processing 57.17%: : 23468/41046 Sentence: did it look like something had been changed?
Focus;
What did you multiply Erin’s achievements by? Oh,
Processing 57.17%: : 23469/41046 Sentence: it was simple if you looked;
Though it was so long and precise; no wonder it was always skipped;
You had to round it because it was like counting…multiply Erin’s deeds by…
3;
Processing 57.17%: : 23469/41046 Traceback (most recent call last):
File "W:\Personal\Repos\ebook2audiobook\lib\functions.py", line 625, in convert_sentence_to_audio
output = params['tts'].inference(
^^^^^^^^^^^^^^^^^^^^^^^^
File "W:\Personal\Repos\ebook2audiobook\python_env\Lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "W:\Personal\Repos\ebook2audiobook\python_env\Lib\site-packages\TTS\tts\models\xtts.py", line 528, in inference
text_tokens.shape[-1] < self.args.gpt_max_text_tokens
AssertionError: ❗ XTTS can only generate text with a maximum of 400 tokens.
Caught DependencyError: ❗ XTTS can only generate text with a maximum of 400 tokens.
Processing 57.17%: : 23469/41046
Traceback (most recent call last):
File "W:\Personal\Repos\ebook2audiobook\lib\functions.py", line 625, in convert_sentence_to_audio
output = params['tts'].inference(
^^^^^^^^^^^^^^^^^^^^^^^^
File "W:\Personal\Repos\ebook2audiobook\python_env\Lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "W:\Personal\Repos\ebook2audiobook\python_env\Lib\site-packages\TTS\tts\models\xtts.py", line 528, in inference
text_tokens.shape[-1] < self.args.gpt_max_text_tokens
AssertionError: ❗ XTTS can only generate text with a maximum of 400 tokens.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "W:\Personal\Repos\ebook2audiobook\lib\functions.py", line 584, in convert_chapters_to_audio
if convert_sentence_to_audio(params, session):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "W:\Personal\Repos\ebook2audiobook\lib\functions.py", line 657, in convert_sentence_to_audio
raise DependencyError(e)
lib.functions.DependencyError: ❗ XTTS can only generate text with a maximum of 400 tokens.
Caught DependencyError: ❗ XTTS can only generate text with a maximum of 400 tokens.
convert_ebook() Exception: ❗ XTTS can only generate text with a maximum of 400 tokens.

=====================================================================

It was so deep down it would be bad…to change it. But why? Why—did it look like something had been changed?

Focus. What did you multiply Erin’s achievements by? Oh, it was simple if you looked. Though it was so long and precise; no wonder it was always skipped. You had to round it because it was like counting…multiply Erin’s deeds by…

3.1415926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679 8214808651 3282306647 0938446095 5058223172 5359408128 4811174502 8410270193 8521105559 6446229489 5493038196 4428810975 6659334461 2847564823 3786783165 2712019091 4564856692 3460348610 4543266482 1339360726 0249141273 7245870066 0631558817 4881520920 9628292540 9171536436 7892590360 0113305305 4882046652 1384146951 9415116094 3305727036 5759591953 0921861173 8193261179 3105118548 0744623799 6274956735 1885752724 8912279381 8301194912 9833673362 4406566430 8602139494 6395224737 1907021798 6094370277 0539217176 2931767523 8467481846 7669405132 0005681271 4526356082 7785771342 7577896091 7363717872 1468440901 2249534301 4654958537 1050792279 6892589235 4201995611 2129021960 8640344181 5981362977 4771309960 5187072113 4999999837 2978049951 0597317328 1609631859 5024459455 3469083026 4252230825 3344685035 2619311881 7101000313 7838752886 5875332083 8142061717 7669147303 5982534904 2875546873 1159562863 8823537875 9375195778 1857780532 1712268066 1300192787 6611195909…

It was counting infinity each time. But just a symbol. What was it? Oh yes. Multiply by π. What the—

<Red|Who wrote that?|Red>

majormer · 2024-12-31T15:21:38Z

For what it is worth, I am using an RTX 4090 with 24GB of VRAM. Would it be possible to modify the value so I can do bigger than 400 tokens, or is it a function of how the voice stuff is generated?

majormer · 2024-12-31T15:39:34Z

And I did just end up editing the ebook to modify that line to truncate the number, since it didn't add anything to the story... but wanted to provide the context of my issue. I wish there was a resume...

ROBERT-MCDOWELL · 2024-12-31T15:51:06Z

it's a TTS limitation. all A.I. today cannot or don't want more than a certain number of token for accuracy. maybe in 5 years we will swallow to it the entire book once... but it's not the case today.
this is normally fixed next update, I don't know when as I need to fix other things.

majormer · 2024-12-31T17:01:18Z

An interesting consideration is to add the ability to optionally scan the file before committing to the conversion. Check tokens per the settings selected and validate that everything is compliant.

majormer · 2024-12-31T17:02:42Z

I think text splitting might be the only factor I can think of for my above suggestion.

ROBERT-MCDOWELL · 2025-01-01T01:29:21Z

it's not as easy you think to split text when you have 1124 languages to manage....

Digital-Yeti · 2025-01-02T17:14:39Z

I also cannot convert my book because of this splitting error.

53;
 Other ways in which machinery affects the production of raw material will be mentioned in Volume 03;
*
*As it turned out, Volume 03 of Capital, when published, 
                               Sentence: contained nothing on this subject, 
Processing 58.34%: : 2769/4745 Sentence: although Chapters 40–44 (on the second form of differential rent) did deal with the related topic of the impact of extra amounts of capital directly invested in land;

54; 
                               Sentence: Export of cotton from India to Great Britain: 34,540,143 lb;
 in 1846; 204,141,168 lb;/4745 
 in 1860; 445,947,600 lb;
 in 1865;
 Export of wool from India to Great Britain: 4,570, 
                               Traceback (most recent call last):
  File "/home/user/app/lib/functions.py", line 583, in convert_sentence_to_audio
    output = params['tts'].inference(
  File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 528, in inference
    text_tokens.shape[-1] < self.args.gpt_max_text_tokens
AssertionError:  ❗ XTTS can only generate text with a maximum of 400 tokens.
Caught DependencyError:  ❗ XTTS can only generate text with a maximum of 400 tokens.

majormer · 2025-01-02T17:36:40Z

it's not as easy you think to split text when you have 1124 languages to manage....

I think I understand. However, you have a current process for doing that, correct? My thought is that you do a dry-run of the current text split without the actual audio conversion just to see where the text-splits land, then check the tokens for them to make sure they are compliant. I'm not sure that it would require a rewrite of the text-splitting system. Just a thought, as it might save days of time and energy for users... especially those without GPUs or slower GPUs.

I guess this also depends on if you are using GenAI to actually come up with the text splits vs. an algorithm done purely on CPU. I suppose I could look at the code myself and see, but don't know that I have the time to come up with a solution and submit a pull request.

ROBERT-MCDOWELL · 2025-01-02T18:59:56Z

text splitting should be fixed in the next update.
"just to see where the text-splits land, then check the tokens for them to make sure they are compliant."
who will make sure? a script? the user? as I said, each language has it's own exceptions, punctuation etc... then cut all languages like English will be a total mess of how the A.I. is reacting.

ROBERT-MCDOWELL · 2025-01-07T11:58:21Z

@majormer could you please provide your original text to make my last test for the fix of this issue?

majormer · 2025-01-08T04:20:30Z

@majormer could you please provide your original text to make my last test for the fix of this issue?

It was posted above, but this is the original text that caused the crash:

=================================

It was so deep down it would be bad…to change it. But why? Why—did it look like something had been changed?

Focus. What did you multiply Erin’s achievements by? Oh, it was simple if you looked. Though it was so long and precise; no wonder it was always skipped. You had to round it because it was like counting…multiply Erin’s deeds by…

3.1415926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679 8214808651 3282306647 0938446095 5058223172 5359408128 4811174502 8410270193 8521105559 6446229489 5493038196 4428810975 6659334461 2847564823 3786783165 2712019091 4564856692 3460348610 4543266482 1339360726 0249141273 7245870066 0631558817 4881520920 9628292540 9171536436 7892590360 0113305305 4882046652 1384146951 9415116094 3305727036 5759591953 0921861173 8193261179 3105118548 0744623799 6274956735 1885752724 8912279381 8301194912 9833673362 4406566430 8602139494 6395224737 1907021798 6094370277 0539217176 2931767523 8467481846 7669405132 0005681271 4526356082 7785771342 7577896091 7363717872 1468440901 2249534301 4654958537 1050792279 6892589235 4201995611 2129021960 8640344181 5981362977 4771309960 5187072113 4999999837 2978049951 0597317328 1609631859 5024459455 3469083026 4252230825 3344685035 2619311881 7101000313 7838752886 5875332083 8142061717 7669147303 5982534904 2875546873 1159562863 8823537875 9375195778 1857780532 1712268066 1300192787 6611195909…

It was counting infinity each time. But just a symbol. What was it? Oh yes. Multiply by π. What the—

<Red|Who wrote that?|Red>

===================================

ROBERT-MCDOWELL · 2025-01-08T08:13:59Z

btw is special mathematic chars like "π" were ok or not?

majormer · 2025-01-08T14:21:02Z

Pi worked fine. It was the long strong of numbers

…

On Wed, Jan 8, 2025, 2:14 AM ROBERT MCDOWELL ***@***.***> wrote: btw is special mathematic chars like "π" were ok or not? — Reply to this email directly, view it on GitHub <#140 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB2XMTOE73JLUK57TNP7XC32JTM53AVCNFSM6AAAAABUJQRJ4SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNZXGAYTGMZRGE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

ROBERT-MCDOWELL · 2025-01-08T14:34:41Z

I mean you don't know if the voice pronounced it well as it crashed......

majormer · 2025-01-08T15:03:59Z

Incorrect. I changed the text of the number and re-ran the entire book. It completed successfully, including the pi character.

…

On Wed, Jan 8, 2025, 8:35 AM ROBERT MCDOWELL ***@***.***> wrote: I mean you don't know if the voice pronounced it well as it crashed...... — Reply to this email directly, view it on GitHub <#140 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB2XMTIJ6TJBGWYJQKUVLFD2JUZRPAVCNFSM6AAAAABUJQRJ4SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNZXHAZDGNRXGU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

ROBERT-MCDOWELL · 2025-01-08T15:17:16Z

ok so do you confirm the voice is pronouncing well special chars like Pi ?

majormer · 2025-01-08T15:37:52Z

Well, I just went back to verify, and there is a gap. I only spot-checked the generated file.

This is the original text:

And it actually SAID:

Three fourteen fifteen (fade-out)
GAP
getting mad about the unfairness...

Although in finding this sample, I found other gaps in the speech as well. When I processed the e-book, I unchecked the "Enable text splitting" option, so not sure if that lead to gaps or not. I'm now going through the sentences that are still in the temp directory to see if it rendered, but it might take a bit as there are 41038 un-labeled wav files to pick at, and I haven't read the book to know the general location of a given sentence. I'll follow up on that soon, to see if it processed the voice at the sentence level.

majormer · 2025-01-08T15:40:01Z

I guess I'm saying that it didn't crash, but it also didn't bother with even the entire shortened number.

ROBERT-MCDOWELL · 2025-01-08T15:48:47Z

ok despite of your issue which is already solved on the next update.
I'm talking about special mathematics characters like "π" . Is the voice pronounce it or not?

majormer · 2025-01-08T15:51:26Z

Yeah. That is what I'm trying to figure out. In the chapter audio file, the entire sentence is skipped. I'm looking for the actual sentence files which I still have in my temp directory and trying to figure out if it was pronounced there, or if that was just skipped entirely in the processing. Unfortunately, I am trying to figure out which of thousands of sentence files it might be in, and I am working on that now.

ROBERT-MCDOWELL · 2025-01-08T15:59:37Z

create a simple text with a sentence with "π" in it. and tell me if "π" is said or not... I need test from others than only my tests.

majormer · 2025-01-08T16:02:31Z

Well, I did find the .wav file, or rather not:

There should be a wav file with "π" in it, but it is missing. All the text between the equal signs (markdown didn't like the bolding I tried to do) was skipped between those two files:

3.1415=========926535…

It was counting infinity each time. But just a symbol. What was it? Oh yes. Multiply by π. What the—

<Red|Who wrote that?|Red>

——

If you were—========etting mad about the unfairness of the world, of iniquitous sides,

I'll work on getting a file created with pi in it and follow up. Gimme a few.

majormer · 2025-01-08T16:25:10Z

It does not say the pi character. I used the sentence "Wideacre Hall faces due π and the π shines all day on the yellow stone until it is warm and powdery to the touch"

It skipped the first π entirely, and the second one almost sounds like it was pronounced like you would say the long U sound.

I would attach the .wav file here, but it won't let me.

majormer · 2025-01-08T16:26:49Z

3.zip

majormer · 2025-01-08T16:27:14Z

There. I attached the generated .wav file with the sentence above.

majormer · 2025-01-08T16:28:27Z

I have not pulled a newer version, however, so if I need to do that to validate testing, or run it against a different branch, please let me know. I will keep it handy.

ROBERT-MCDOWELL · 2025-01-08T16:53:12Z

ok so it confirms that mathematics signs are not spoken.... we must find a way for this issue, but for +1100 languages it's not possible. the main languages will have a fix.

ROBERT-MCDOWELL added the duplicate This issue or pull request already exists label Dec 28, 2024

ROBERT-MCDOWELL mentioned this issue Dec 29, 2024

AssertionError: XTTS can only generate text with a maximum of 400 tokens. #151

Closed

DrewThomasson added the fixed in next update (pending) label Jan 2, 2025

ROBERT-MCDOWELL removed the duplicate This issue or pull request already exists label Jan 10, 2025

XTTS can only generate text with a maximum of 400 tokens. #140

XTTS can only generate text with a maximum of 400 tokens. #140

Comments

Code4SAFrankie commented Dec 28, 2024

ROBERT-MCDOWELL commented Dec 28, 2024 • edited Loading

Code4SAFrankie commented Dec 30, 2024

ROBERT-MCDOWELL commented Dec 30, 2024

Code4SAFrankie commented Dec 30, 2024

DrewThomasson commented Dec 30, 2024

Code4SAFrankie commented Dec 31, 2024

Code4SAFrankie commented Dec 31, 2024

ROBERT-MCDOWELL commented Dec 31, 2024 • edited Loading

majormer commented Dec 31, 2024

majormer commented Dec 31, 2024

majormer commented Dec 31, 2024

ROBERT-MCDOWELL commented Dec 31, 2024

majormer commented Dec 31, 2024

majormer commented Dec 31, 2024

ROBERT-MCDOWELL commented Jan 1, 2025

Digital-Yeti commented Jan 2, 2025 • edited Loading

majormer commented Jan 2, 2025

ROBERT-MCDOWELL commented Jan 2, 2025 • edited Loading

ROBERT-MCDOWELL commented Jan 7, 2025

majormer commented Jan 8, 2025

ROBERT-MCDOWELL commented Jan 8, 2025

majormer commented Jan 8, 2025 via email

ROBERT-MCDOWELL commented Jan 8, 2025

majormer commented Jan 8, 2025 via email

ROBERT-MCDOWELL commented Jan 8, 2025

majormer commented Jan 8, 2025

majormer commented Jan 8, 2025

ROBERT-MCDOWELL commented Jan 8, 2025

majormer commented Jan 8, 2025

ROBERT-MCDOWELL commented Jan 8, 2025

majormer commented Jan 8, 2025 • edited Loading

majormer commented Jan 8, 2025

majormer commented Jan 8, 2025

majormer commented Jan 8, 2025

majormer commented Jan 8, 2025

ROBERT-MCDOWELL commented Jan 8, 2025

ROBERT-MCDOWELL commented Dec 28, 2024 •

edited

Loading

ROBERT-MCDOWELL commented Dec 31, 2024 •

edited

Loading

Digital-Yeti commented Jan 2, 2025 •

edited

Loading

ROBERT-MCDOWELL commented Jan 2, 2025 •

edited

Loading

majormer commented Jan 8, 2025 •

edited

Loading