Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate assistant feature integrations from text processing to task processing #114

Open
julien-nc opened this issue Aug 19, 2024 · 0 comments
Labels
documentation Improvements or additions to documentation

Comments

@julien-nc
Copy link
Member

julien-nc commented Aug 19, 2024

There's been a few changes in the assistant frontend implementation and in the server AI-related APIs. Here are some pointers if you want to adjust your app or client.

Everything described here appeared in Nextcloud 30.

Text processing + transcription + image generation + translation + anything else are now handled by one single API: the task processing one.
The concepts of task type and providers are still there. The task types now include the "shape" of their input and output. The input and output shapes define a list of typed fields.

The assistant now only submits "task processing" tasks.

Use/open the Assistant in the frontend of your app

  • The openAssistantForm frontend function is now exposed as window.OCA.Assistant.openAssistantForm but the OCA.TPAssistant namespace is still there for backward compatibility.
  • The identifier parameter of openAssistantForm is deprecated (but is still supported) and replaced by customId.
  • The input parameter of openAssistantForm is deprecated (but still works for core:text2text* task types). It can be replaced by inputs which is an object which contains the values for each field. If you only support core:text2text* task types, only setting the inputs.input field value is enough, more on that below, in the "Task types")

Use the Task processing OCP API in the backend of your app

The scheduling logic is the same as before. With the manager, tt is possible to:

https://docs.nextcloud.com/server/latest/developer_manual/digging_deeper/task_processing.html

Task processing OCS API

  • You can get the list of available task types with /ocs/v2.php/taskprocessing/tasktypes
  • You can get a user's task list by task type with /ocs/v2.php/taskprocessing/tasks?taskType=TASK_TYPE_ID&customId=CUSTOM_ID . The GET parameters are optional.
  • You can get a user's task list by scheduling app with /ocs/v2.php/taskprocessing/tasks/app/APP_ID?customId=CUSTOM_ID . The GET parameter is optional.

The list of available endpoints can be found in https://github.com/nextcloud/server/blob/master/core/Controller/TaskProcessingApiController.php or can be browsed with the ocs_api_viewer Nextcloud app (core -> task_processing_api).

Task representation

The task objects returned by the OCS API are a bit different.

The input and output attributes are now objects which contain the values for each field.
The status is now a string: https://github.com/nextcloud/server/blob/master/lib/public/TaskProcessing/Task.php#L366-L370

Task types support in clients

As different task types expect different input fields and produce different output fields, the previously existing text processing support implementations cannot directly support all task processing task types.

For an easy migration, one could support a static list of task processing task types: the ones that are equivalent to text processing ones:

  • core:text2text (previously called FreePrompt)
  • core:text2text:headline (previously called Headline)
  • core:text2text:summary (previously called Summary)
  • core:text2text:topics (previously called Topics)

And the new ones:

  • core:text2text:formalization
  • core:text2text:reformulation
  • core:text2text:simplification

All those task types have the same input and output shapes: Just one text field named "input" and "output".

More details: https://docs.nextcloud.com/server/latest/developer_manual/digging_deeper/task_processing.html#tasks-types

Also, here is the list of task types defined in the server: https://github.com/nextcloud/server/tree/master/lib/public/TaskProcessing/TaskTypes . We can discuss how to support more task types later (for example, dynamically render the input/output form like it is done in the Assistant NC app).

Summarize

curl https://nc.org/ocs/v2.php/taskprocessing/schedule -X POST \
     -H "ocs-apirequest: true" \
     -H "Content-Type: application/json" \
     -d '{"input":{"input":"the text to summarize"},"type":"core:text2text:summary","appId":"mail"}'
$task = new Task(\OCP\TaskProcessing\TaskTypes\TextToTextSummary::ID, ['input' => 'the text to summarize'], 'mail', $this->userId);
$this->taskProcessingManager->scheduleTask($task);
$taskId = $task->getId();

or

$task = new Task(\OCP\TaskProcessing\TaskTypes\TextToTextSummary::ID, ['input' => 'the text to summarize'], 'mail', $this->userId);
$resultTask = $this->taskProcessingManager->runTask($task);
$summary = $task->getOutput()['output'];

Translate

Translations can now be done via the task processing API. There is a core:text2text:translate task type.
If this task type is in the list of available ones, it means there is at least a provider for this task type installed.

You can get the list of supported origin languages with taskTypeObject.inputShapeEnumValues.origin_language.
Same for the target languages: taskTypeObject.inputShapeEnumValues.target_language.
Both are a list of objects like:

{ "name": "English (US)", "value": "en" }

Example request to submit a translation task:

curl https://nc.org/ocs/v2.php/taskprocessing/schedule -X POST \
     -H "ocs-apirequest: true" \
     -H "Content-Type: application/json" \
     -d '{"input":{"origin_language":"en","input":"hello","target_language":"de"},"type":"core:text2text:translate","appId":"text","customId":"document-123"}'

Transcribe

The input must be the file ID of the audio input file.

Example request to transcribe an audio file:

curl https://nc.org/ocs/v2.php/taskprocessing/schedule -X POST \
     -H "ocs-apirequest: true" \
     -H "Content-Type: application/json" \
     -d '{"input":{"input":1450},"type":"core:audio2text","appId":"spreed"}'

Or from the backend side:

$task = new Task(\OCP\TaskProcessing\TaskTypes\AudioToText::ID, ['input' => 1450], 'spreed', $this->userId);
$this->taskProcessingManager->scheduleTask($task);
$taskId = $task->getId();

or

$task = new Task(\OCP\TaskProcessing\TaskTypes\AudioToText::ID, ['input' => 1450], 'spreed', $this->userId);
$resultTask = $this->taskProcessingManager->runTask($task);
$transcription = $task->getOutput()['output'];
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant