LLM Index Document
A system task to index the provided document into a vector database that can be efficiently searched, retrieved, and processed later.
Definitions
{
"name": "llm_index_document_task",
"taskReferenceName": "llm_index_document_task_ref",
"inputParameters": {
"vectorDB": "pineconedb",
"namespace": "myNewModel",
"index": "test",
"embeddingModelProvider": "azure_openai",
"embeddingModel": "text-davinci-003",
"url": "${workflow.input.url}",
"mediaType": "application/xhtml+xml",
"chunkSize": 500,
"chunkOverlap": 100
},
"type": "LLM_INDEX_DOCUMENT"
}
Input Parameters
Attribute | Description |
---|---|
vectorDB | Choose the required vector database. Note:If you haven’t configured the vector database on your Orkes console, navigate to the Integrations tab and configure your required provider. Refer to this doc on how to integrate Vector Databases with Orkes console. |
namespace | Choose from the available namespace configured within the chosen vector database. Namespaces are separate isolated environments within the database to manage and organize vector data effectively. Note:Namespace field is applicable only for Pinecone integration and is not applicable to Weaviate integration. |
index | Choose the index in your vector database where indexed text or data was stored. Note:For Weaviate integration, this field refers to the class name, while in Pinecone integration, it denotes the index name itself. |
embeddingModelProvider | Choose the required LLM provider for embedding. If you haven’t configured your AI / LLM provider on your Orkes console, navigate to the Integrations tab and configure your required provider. Refer to this doc on how to integrate the LLM providers with Orkes console. |
embeddingModel | Choose from the available language model for the chosen LLM provider. |
url | Provide the URL of the file to be indexed. |
mediaType | Select the media type of the file to be indexed. Currently, supported media types include:
|
chunkSize | Specifies how long each segment of the input text should be when it’s divided for processing by the LLM. For example, if your article contains 2000 words and you specify the chunk size of 500, then the document would be divided into four chunks for processing. |
chunkOverlap | Specifies the overlap quantity between the adjacent chunks. For example, if the chunk overlap is specified as 100, then the first 100 words of each chunk would overlap with the last 100 words of the previous chunk. |
Examples
- UI
- JSON Example
- Add task type LLM Index Document.
- Choose the vector database, & LLM provider for embedding the document.
- Provide the document URL to be indexed and other input parameters.
{
"name": "llm_index_document_task",
"taskReferenceName": "llm_index_document_task_ref",
"inputParameters": {
"vectorDB": "pineconedb",
"namespace": "myNewModel",
"index": "test",
"embeddingModelProvider": "azure_openai",
"embeddingModel": "text-davinci-003",
"url": "${workflow.input.url}",
"mediaType": "application/xhtml+xml",
"chunkSize": 500,
"chunkOverlap": 100
},
"type": "LLM_INDEX_DOCUMENT"
}