-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Umbrella Issue] Multilingual Support #170
Comments
Title: Article Outline Disordered After Modifying Description: To enable Storm to generate articles in Chinese, we modified the prompt in the Steps to Reproduce:
Expected Behavior:
Actual Behavior:
Additional Information:
Request:
|
@zhoucheng89 would you please share the full system log directory as a zip file? |
@Yucheng-Jiang please check this zip |
Thank you! I will look into this during the weekend. |
I just checked out the issue. Given Steps to Reproduce include "Modify the prompt in WriteLeadSection to request Chinese output", I think I need to do more experiment on my side to see how to better support non-English languages instead of directly debugging your provided log. I will follow up in this thread once I have progress. |
@shaoyijia ok,thanks. Waiting for your good news |
Hi @zhoucheng89 , sorry for the late response. I finally got time to look into this issue today. Here are the changes I made to make STORM support Chinese. (Note: This is not a very well-polished version but for demonstration-purpose only. I recommend run more experiments on your side) In general, to make STORM support non-English languages, you need to confirm or modify the following things:
Hope my answer provides you with pointers to modify STORM for your use case. If you are willing to polish my example and provide a STORM (Chinese) version, we are happy to link it in our README.md - I believe this contribution would help many people. |
Hi @shaoyijia,I am very pleased to see your message. I saw the code you submitted in the dev-chinese branch, and I will improve the Chinese version of the story in the case you submitted in the future. At the same time, I am currently reading the source code of Storm and have debugged it several times locally. However, the word count of the article I run is always 3400 words, which cannot meet the requirements of a long academic paper. I first tried to return more snippets in the RM stage, and before AnswerQuestion, I concatenated all the snippets into the required Info in the prompt words. Before generating the outline, I recalled more similar snippets through paraphrase-multilingual-MILM-L12-v2, but these methods did not exceed 3400 words in the paper, and the final article was relatively empty. The chapter order of the article was inconsistent with the chapter order in the outline. May I know which steps I can modify to achieve the final goal? Improve the number of final output text books and maintain consistency between the order of article chapters and the outline. |
Hi @zhoucheng89 , thanks for testing it out and looking into the codebase.
Are you running with
This seems to be problematic. The embedding model is only used when STORM extends the outline into the final article. I don't think you shall call the embedding model before generating the outline.
Is the order for first-level heading (those headings marked with "#") correct or not? As elaborated here, STORM only forces the first-level heading to follow the planned outline and don't apply requirement for second/third/...-level heading to allow flexibility for article generation. You can change its behavior by modifying However, if the order for first-level heading does not follow the planned outline, this is not as expected and we can help look into it if you provide more details on what command you run and what results you get. |
@shaoyijia Can you add me as a contributor,because I have discovered a bug but am unable to submit code. The result of running Storm last time was very hollow because my code was not updated. Now that I have updated to the latest code, I am planning to run it again. I am glad that you have introduced the co story mechanism, which will help users intervene in a timely manner before paper generation. I believe the biggest difference between dialogue mode and collaboration mode is the addition of a moderator and users. The moderator can guide users to intervene, which will promptly correct the confusing answers generated during the roundtable discussion process. This mechanism is great, and I will read the code of the collaboration mechanism in a timely manner and try to support it in Chinese. |
Thank you! For contribution, please follow the steps below:
|
@shaoyijia I want to modify the Chinese version of SerperRM. I need to modify the search API. I modified the code on the dev Chinese branch after forking. Am I submitting a pull request directly from the dev Chinese branch after forking to the dev Chinese branch of Storm? |
Hi @zhoucheng89 , you can also fork this repo and develop the Chinese STORM in your repo so that you can have full control. If everything is functioning, we are happy to link to your repo on our README so that people can find Chinese STORM if needed. Does this sound good to you? |
@shaoyijia Are you suggesting that after I fork, I can directly modify the code in my dev-chinese branch without the need to initiate a pull request? And will you continue to maintain the dev-chinese branch of the original Storm repository? |
@zhoucheng89 Yes. Once you feel it's stable, you can ping me to take a look, so I can consider mentioning it in our README. Currently, I don't have bandwidth myself to develop |
At present, we do not have immediate plans to introduce multilingual support for our deployed web research preview. However, we are strong advocates for making AI technology more inclusive and accessible to speakers of various languages.
As part of this commitment, we want to support the community in adapting STORM for use in other languages. This issue serves as a space for discussion, sharing experiences, and collaboration on efforts to expand STORM's multilingual capabilities.
We encourage anyone interested in this effort to share their attempts, insights, or challenges here. For example, we have already seen initial success in adding support for Arabic (#169). Team members will be actively involved in the discussion to provide support.
The text was updated successfully, but these errors were encountered: