Skip to main content

LLM Search Index

The LLM Search Index task is used to search a vector database or repository of vector embeddings of already processed and indexed documents to get the closest match. This task is typically used in scenarios where you need to retrieve or manipulate data stored in a database using a natural language query.

The LLM Search Index task takes a query, which can be a question, statement, or request made in natural language. This query is processed to generate a vector representation, which is then used to search the vector database. The task returns a list of documents with vectors similar to the query vector, providing the closest matches based on the degree of similarity.

Prerequisites

Task parameters

Configure these parameters for the LLM Search Index task.

ParameterDescriptionRequired/ Optional
inputParameters.vectorDBThe vector database to retrieve the data.

Note: If you haven’t configured the vector database on your Orkes Conductor cluster, navigate to the Integrations tab and configure your required provider.
Required.
inputParameters.indexThe index in your vector database where the text or data will be stored.

The terminology of the index field varies depending on the integration:
  • For Weaviate, the index field indicates the collection name.
  • For other integrations, it denotes the index name.
Required.
inputParameters.namespaceNamespaces are separate isolated environments within the database to manage and organize vector data effectively. Choose from the available namespace configured within the chosen vector database.

The usage and terminology of the namespace field vary depending on the integration:
  • For Pinecone, the namespace field is applicable.
  • For Weaviate, the namespace field is not applicable.
  • For MongoDB, the namespace field is referred to as “Collection” in MongoDB.
  • For Postgres, the namespace field is referred to as “Table” in Postgres.
Required.
inputParameters.embeddingModelProviderThe LLM provider for the embeddings.

Note: If you haven’t configured your AI/LLM provider on your Orkes console, navigate to the Integrations tab and configure your required provider.
Required.
inputParameters.embeddingModelThe embedding model provided by the selected LLM provider.Required.
inputParameters.queryThe search query. A query typically refers to a question, statement, or request made in natural language that is used to search, retrieve, or manipulate data stored in a database.Required.
inputParameters.maxResultsThe maximum number of results to return. Provide a non-zero integer between 1 and 10000.Required.
inputParameters.dimensionsThe size of the vector, which is the number of elements in the vector.Optional.

Caching parameters

You can cache the task outputs using the following parameters. Refer to Caching Task Outputs for a full guide.

ParameterDescriptionRequired/ Optional
cacheConfig.ttlInSecondThe time to live in seconds, which is the duration for the output to be cached.Required if using cacheConfig.
cacheConfig.keyThe cache key is a unique identifier for the cached output and must be constructed exclusively from the task’s input parameters.
It can be a string concatenation that contains the task’s input keys, such as ${uri}-${method} or re_${uri}_${method}.
Required if using cacheConfig.

Schema parameters

You can enforce input/output validation for the task using the following parameters. Refer to Schema Validation for a full guide.

ParameterDescriptionRequired/ Optional
taskDefinition.enforceSchemaWhether to enforce schema validation for task inputs/outputs. Set to true to enable validation.Optional.
taskDefinition.inputSchemaThe name and type of the input schema to be associated with the task.Required if enforceSchema is set to true.
taskDefinition.outputSchemaThe name and type of the output schema to be associated with the task.Required if enforceSchema is set to true.

Other generic parameters

Here are other parameters for configuring the task behavior.

ParameterDescriptionRequired/ Optional
optionalWhether the task is optional. The default is false.

If set to true, the workflow continues to the next task even if this task is in progress or fails.
Optional.

Task configuration

This is the task configuration for an LLM Search Index task.

{
"name": "llm_search_index_task",
"taskReferenceName": "llm_search_index_task_ref",
"inputParameters": {
"vectorDB": "pineconedb",
"namespace": "myNewModel",
"index": "test",
"llmProvider": "azure_openai",
"embeddingModel": "text-davinci-003",
"query": "What is an LLM model?",
"dimensions": "${workflow.input.dimensions}"
},
"type": "LLM_SEARCH_INDEX"
}

Task output

The LLM Search Index task will return the following parameters.

ParameterDescription
resultA JSON array containing the results of the query.
scoreRepresents a value quantifying the degree of likeness between a specific item and a query vector, facilitating ranking and ordering of results. Higher scores denote stronger relevance to the query vector.
metadataAn object containing additional metadata related to the retrieved document.
docIdThe unique identifier of the queried document.
parentDocIdAn identifier that denotes a parent document in hierarchical or relational data structures.
textThe actual content retrieved.

Adding an LLM Search Index task in UI

To add an LLM Search Index task:

  1. In your workflow, select the (+) icon and add an LLM Search Index task.
  2. Choose the Vector database, Index, Namespace, Embedding model provider, and Embedding model.
  3. In Query, enter the text to be queried.

LLM Search Index Task - UI