Skip to main content

LLM Index Document

A system task to index the provided document into a vector database that can be efficiently searched, retrieved, and processed later.

Definitions

{
"name": "llm_index_document_task",
"taskReferenceName": "llm_index_document_task_ref",
"inputParameters": {
"vectorDB": "pineconedb",
"namespace": "myNewModel",
"index": "test",
"embeddingModelProvider": "azure_openai",
"embeddingModel": "text-davinci-003",
"url": "${workflow.input.url}",
"mediaType": "application/xhtml+xml",
"chunkSize": 500,
"chunkOverlap": 100
},
"type": "LLM_INDEX_DOCUMENT"
}

Input Parameters

ParameterDescription
vectorDBChoose the required vector database.

Note:If you haven’t configured the vector database on your Orkes console, navigate to the Integrations tab and configure your required provider. Refer to this doc on how to integrate Vector Databases with Orkes console.
namespaceChoose from the available namespace configured within the chosen vector database.

Namespaces are separate isolated environments within the database to manage and organize vector data effectively.

Note:Namespace field is applicable only for Pinecone integration and is not applicable to Weaviate integration.
indexChoose the index in your vector database where indexed text or data was stored.

Note:For Weaviate integration, this field refers to the class name, while in Pinecone integration, it denotes the index name itself.
embeddingModelProviderChoose the required LLM provider for embedding.

If you haven’t configured your AI / LLM provider on your Orkes console, navigate to the Integrations tab and configure your required provider. Refer to this doc on how to integrate the LLM providers with Orkes console.
embeddingModelChoose from the available language model for the chosen LLM provider.
urlProvide the URL of the file to be indexed.
mediaTypeSelect the media type of the file to be indexed. Currently, supported media types include:
  • application/pdf
  • text/html
  • text/plain
  • json
chunkSizeSpecifies how long each segment of the input text should be when it’s divided for processing by the LLM.

For example, if your article contains 2000 words and you specify the chunk size of 500, then the document would be divided into four chunks for processing.
chunkOverlapSpecifies the overlap quantity between the adjacent chunks.

For example, if the chunk overlap is specified as 100, then the first 100 words of each chunk would overlap with the last 100 words of the previous chunk.

Examples



  1. Add task type LLM Index Document.
  2. Choose the vector database, & LLM provider for embedding the document.
  3. Provide the document URL to be indexed and other input parameters.

LLM Index Document Task