Using Vector Databases
This guide provides an overview of vector databases and how Orkes Conductor makes it easy to use them for system AI tasks in workflows.
If you are already familiar with vector databases, skip the overview and proceed to the configuration steps.
Overview: Vector databases
A vector database is a type of database specifically designed to store and query vectors or multi-dimensional data. Vectors are mathematical entities with an ordered set of numerical values, often representing points or data in a multi-dimensional space.
Embeddings
An embedding is a representation of input data converted into an array of numbers known as vectors. This combination of numbers represented as a vector acts as a multi-dimensional map that can be used to find its similarity in relation to other embeddings. A language model typically generates embeddings from an input and stores them in a database.
Namespaces
A namespace is a logical grouping or category of embeddings within a vector database. They are used to segregate different types of data or embeddings. It helps organize and manage diverse embeddings for more efficient storage and querying.
Indexes
Indexes are hierarchical structures built on embeddings in a vector database to optimize retrieval and query performance, similar to tables in a relational database. Each namespace can contain multiple indexes. Indexes help you quickly locate and retrieve embeddings based on their similarity to a given query vector. Vendors use different terminology—Pinecone refers to them as "indexes," while Weaviate calls them "classes."
Configuring vector databases
- To store data in a vector database, an embedding is to be generated by an AI model. You must also integrate an AI model provider of your choice to facilitate this process.
Here is an overview of using vector databases in Orkes Conductor:
- Choose a vector database provider.
- Integrate your chosen vector database with your Orkes Conductor cluster.
- Set access limits to the vector database to govern which applications or groups can use them.
- Use a vector database in your workflow by adding an AI task and configuring it for the chosen vector database.
Step 1: Choose a vector database provider
The following vector database providers are available for integration with Orkes Conductor:
Review the provider’s official documentation to determine which database suits your use case.
Step 2: Integrate a vector database provider
Before using a vector database in a workflow, you must integrate it with your Orkes Conductor cluster.
To integrate a vector database provider:
- Go to Integrations from the left navigation menu on your Conductor cluster.
- Select + New integration from the top-right of your window.
- In the Vector Databases section, select + Add to integrate your preferred database provider.
- Enter the required parameters for the chosen provider.
The integration configuration differs with each provider. For detailed steps on integrating with each provider, refer to Vector Database Integrations.
- (Optional) Toggle the Active button off if you don’t want to activate the integration instantly.
- Select Save.
Step 3: Set access limit for integrations
As best practice, use Orkes’ RBAC feature to govern which user groups or applications can access the database providers.
To provide access to an application or group:
- Go to Access Control > Applications or Groups from the left navigation menu on your Orkes Conductor cluster.
- Create a new group/application or select an existing one.
- In the Permissions section, select +Add Permission.
- In the Integration tab, select the required vector databases and toggle the necessary permissions.
- Select Add Permissions.
The group or application can now access the vector databases according to the configured permissions.
Step 4: Use vector database in workflows
Vector databases can be used in workflows with the following system AI tasks:
To use a vector database in workflows:
- Go to Definitions > Workflow from the left navigation menu on your Conductor cluster.
- Select + Define workflow.
- In the visual workflow editor, select Start and add the relevant AI task based on your use case.
- Select the configured vector database and indexes.
- Configure the remaining task parameters.
Refer to the AI Task Reference for more details on configuring the task parameters.
- Select Save > Confirm.