LLM Index Document
A system task designed to index a provided document into a vector database for efficient search, retrieval, and processing at a later stage.
Definitions
{
"name": "llm_index_document_task",
"taskReferenceName": "llm_index_document_task_ref",
"inputParameters": {
"vectorDB": "pineconedb",
"namespace": "myNewModel",
"index": "test",
"embeddingModelProvider": "azure_openai",
"embeddingModel": "text-davinci-003",
"url": "${workflow.input.url}",
"mediaType": "application/xhtml+xml",
"chunkSize": 500,
"chunkOverlap": 100
},
"type": "LLM_INDEX_DOCUMENT"
}
Input Parameters
Parameter | Description |
---|---|
vectorDB | Choose the required vector database. Note:If you haven’t configured the vector database on your Orkes console, navigate to the Integrations tab and configure your required provider. Refer to the documentation on how to integrate Vector Databases with Orkes console. |
namespace | Choose from the available namespace configured within the chosen vector database. Namespaces are separate isolated environments within the database to manage and organize vector data effectively. Note: The namespace field has different names and applicability based on the integration:
|
index | Choose the index in your vector database where indexed text or data was stored. Note: For Weaviate integration, this field refers to the class name, while for other integrations, it denotes the index name. |
embeddingModelProvider | Choose the required LLM provider for embedding. If you haven’t configured your AI / LLM provider on your Orkes console, navigate to the Integrations tab and configure your required provider. Refer to the documentation on how to integrate the LLM providers with Orkes console. |
embeddingModel | Choose from the available language models provided by the selected LLM provider. |
url | Provide the URL of the file to be indexed. |
mediaType | Select the media type of the file to be indexed. Currently, supported media types include:
|
chunkSize | Specifies how long each input text segment should be when it’s divided for processing by the LLM. For example, if the article contains 2000 words and the chunk size is configured as 500, then the document would be divided into four chunks for processing. |
chunkOverlap | Specifies the overlap between adjacent chunks. For example, if the chunk overlap is specified as 100, then the first 100 words of each chunk would overlap with the last 100 words of the previous chunk. |
Examples
- UI
- JSON
- Add task type LLM Index Document.
- Choose the vector database, & LLM provider for embedding the document.
- Provide the document URL to be indexed and other input parameters.
{
"name": "llm_index_document_task",
"taskReferenceName": "llm_index_document_task_ref",
"inputParameters": {
"vectorDB": "pineconedb",
"namespace": "myNewModel",
"index": "test",
"embeddingModelProvider": "azure_openai",
"embeddingModel": "text-davinci-003",
"url": "${workflow.input.url}",
"mediaType": "application/xhtml+xml",
"chunkSize": 500,
"chunkOverlap": 100
},
"type": "LLM_INDEX_DOCUMENT"
}