Orkes logo image
Product
Platform
Orkes Platform thumbnail
Orkes Platform
Orkes Agentic Workflows
Orkes Conductor Vs Conductor OSS thumbnail
Orkes vs. Conductor OSS
Orkes Cloud
How Orkes Powers Boat Thumbnail
How Orkes Powers BOAT
Try enterprise Orkes Cloud for free
Enjoy a free 14-day trial with all enterprise features
Start for free
Capabilities
Microservices Workflow Orchestration icon
Microservices Workflow Orchestration
Enable faster development cycles, easier maintenance, and improved user experiences.
Realtime API Orchestration icon
Realtime API Orchestration
Enable faster development cycles, easier maintenance, and improved user experiences.
Event Driven Architecture icon
Event Driven Architecture
Create durable workflows that promote modularity, flexibility, and responsiveness.
Human Workflow Orchestration icon
Human Workflow Orchestration
Seamlessly insert humans in the loop of complex workflows.
Process orchestration icon
Process Orchestration
Visualize end-to-end business processes, connect people, processes and systems, and monitor performance to resolve issues in real-time
Use Cases
By Industry
Financial Services icon
Financial Services
Secure and comprehensive workflow orchestration for financial services
Media and Entertainment icon
Media and Entertainment
Enterprise grade workflow orchestration for your media pipelines
Telecommunications icon
Telecommunications
Future proof your workflow management with workflow orchestration
Healthcare icon
Healthcare
Revolutionize and expedite patient care with workflow orchestration for healthcare
Shipping and logistics icon
Shipping and Logistics
Reinforce your inventory management with durable execution and long running workflows
Software icon
Software
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aenean leo mauris, laoreet interdum sodales a, mollis nec enim.
Docs
Developers
Learn
Blog
Explore our blog for insights into the latest trends in workflow orchestration, real-world use cases, and updates on how our solutions are transforming industries.
Read blogs
Check out our latest blog:
How to Build a Durable Conductor Workflow using Conductor Skills and Claude Code in Minutes
Customers
Discover how leading companies are using Orkes to accelerate development, streamline operations, and achieve remarkable results.
Read case studies
Our latest case study:
Twilio Case Study Thumbnail
Orkes Academy New!
Master workflow orchestration with hands-on labs, structured learning paths, and certification. Build production-ready workflows from fundamentals to Agentic AI.
Explore courses
Featured course:
Orkes Academy Thumbnail
Events icon
Events
Videos icons
Videos
In the news icon
In the News
Whitepapers icon
Whitepapers
About us icon
About Us
Pricing
Get a demo
Signup
Slack FaviconDiscourse Logo icon
Get a demo
Signup
Slack FaviconDiscourse Logo icon
Orkes logo image

Company

Platform
Careers
HIRING!
Partners
About Us
Legal Hub
Security

Product

Cloud
Platform
Support

Community

Docs
Blogs
Events

Use Cases

Microservices Workflow Orchestration
Realtime API Orchestration
Event Driven Architecture
Agentic Workflows
Human Workflow Orchestration
Process Orchestration

Compare

Orkes vs Camunda
Orkes vs BPMN
Orkes vs LangChain
Orkes vs Temporal
Twitter or X Socials linkLinkedIn Socials linkYouTube Socials linkSlack Socials linkGithub Socials linkFacebook iconInstagram iconTik Tok icon
© 2026 Orkes. All Rights Reserved.
Back to Blogs

Table of Contents

Share on:Share on LinkedInShare on FacebookShare on Twitter
Worker Code Illustration

Get Started for Free with Dev Edition

Signup
Back to Blogs
ENGINEERING

RAG Explained | Using Retrieval-Augmented Generation to Build Semantic Search

Yong Sheng Tan
Yong Sheng Tan
Solutions Architect
Last updated: June 13, 2024
June 13, 2024
7 min read

Related Blogs

Time to Finally Understand Orchestration vs. Choreography

Oct 3, 2025

Time to Finally Understand Orchestration vs. Choreography

Orchestrating Long-Running APIs | A Step-by-Step Guide to Asynchronous Execution

Apr 4, 2025

Orchestrating Long-Running APIs | A Step-by-Step Guide to Asynchronous Execution

Orchestrating Asynchronous Workflows (How Are They Different from Synchronous?)

Mar 26, 2025

Orchestrating Asynchronous Workflows (How Are They Different from Synchronous?)

Ready to Build Something Amazing?

Join thousands of developers building the future with Orkes.

Start for free

Large language models (LLMs) have captured the public sphere of imagination in the past few years since OpenAI first launched ChatGPT to the world in late 2022. After the initial fascination amongst the public, businesses followed suit to find use cases where they could potentially deploy LLMs.

With more and more LLMs released as open source and deployable as on-premise private models, it became possible for organizations to train, fine-tune, or supplement models with private data. RAG (retrieval-augmented generation) is one such technique for customizing an LLM, serving as a viable approach for businesses to use LLMs without the high costs and specialized skills involved in building a custom model from scratch.

What is retrieval-augmented generation?

RAG (retrieval-augmented generation) is a technique that improves the accuracy of an LLM (large language model) output with pre-fetched data from external sources. With RAG, the model references a separate database from its training data in real-time before generating a response.

RAG extends the general capabilities of LLMs into a specific domain without the need to train a custom model from scratch. This approach enables general-purpose LLMs to provide more useful, relevant, and accurate answers in highly-specialized or private contexts, such as an organization’s internal knowledge base. For most use cases, RAG provides a similar result as training custom models but at a fraction of the required cost and resources.

How does retrieval-augmented generation work?

RAG involves using general-purpose LLMs as-is without special training or fine-tuning to serve answers based on domain-specific knowledge. This is achieved using a two-part process.

First, the data is chunked and transformed into embeddings, which are vector representations of the data. These embeddings are then indexed into a vector database with the help of an AI algorithm known as embedding models.

Once the data is populated in the index, natural language queries can be performed on the index using the same embedding model to yield relevant chunks of information. These chunks then get passed to the LLM as context, along with guardrails and prompts on how to respond given the context.

Diagram of how retrieval-augmented generation works. Part 1 is indexing, which involves accessing the data from a source, transforming the data into embeddings, and storing the embeddings into a vector database index. Part 2 is searching, which involves searching the indexing to yield context and calling the LLM with a prompt containing the context.

RAG (retrieval-augmented generation) is a two-part AI technique that involves indexing data into a vector database and searching the database to retrieve relevant information.

Why use retrieval-augmented generation?

RAG offers several strategic advantages when implementing generative AI capabilities:

Minimize inaccuracies

Using a RAG-based LLM can help reduce hallucinations (plausible yet completely false information) or inaccuracies in the model’s answers. By providing access to additional information, RAG enables relevant context to be added to the LLM prompt, thus leveraging the power of in-context learning (ICL) to improve the reliability of the model’s answers.

Access to latest information

With access to a continuously updated external database, the LLM can provide the latest information in news, social media, research, and other sources. RAG ensures that the LLM responses are up-to-date, relevant, and credible, even if the model’s training data does not contain the latest information.

Cost-effective, scalable, and flexible

RAG requires much less time and specialized skills, tooling, or infrastructure to obtain a production-ready LLM. Furthermore, by changing the data source or updating the database, the LLM can be efficiently modified without any retraining, making RAG an ideal approach at scale.

Since RAG makes use of general-purpose LLMs, the model is decoupled from the domain, enabling developers to switch up the model at will. Compared to a custom pre-trained model, RAG provides instant, low-cost upgrades from one LLM to another.

Highly inspectable architecture

RAG offers a highly inspectable architecture, so developers can examine the user input, the retrieved context, and the LLM response to identify any discrepancies. With this ease of visibility, RAG-powered LLMs can also be instructed to provide sources in their responses, establishing more credibility and transparency with users.

How to use retrieval-augmented generation?

RAG can be used for various knowledge-intensive tasks:

  • Question-answering systems
  • Knowledge base search engine
  • Document retrieval for research
  • Recommendation systems
  • Chatbots with real-time data

Diagram of 5 RAG use cases: question-answering systems, knowledge base search engine, document retrieval for research, recommendation systems, chatbots with real-time data.

RAG (retrieval-augmented generation) is useful for many knowledge-retrieval processes.

Building a retrieval-augmented generation system

While the barriers to entry into RAG are much lower, it still requires an understanding of LLM concepts, as well as trained developers and engineers who can build data pipelines and integrate the query toolchain into the required services for consumption.

Using workflow orchestration as a means to build RAG-based applications levels the playing field to that of anyone who can string together API calls to form a business process. The two-part process described above can be built as two workflows to create a RAG-based application. Let's build a financial news analysis platform in this example.

Orchestrating RAG using Orkes Conductor

Orkes Conductor streamlines the process of building LLM-powered applications by orchestrating the interaction between distributed components so that you don’t have to write the plumbing or infrastructure code for it. In this case, a RAG system requires orchestration between four key components:

  1. Aggregating data from a data source;
  2. Indexing and retrieving the data in a vector database
  3. Using an embedding model; and
  4. Integrating and calling the LLM to respond to a search query.

Let's build out the workflows to orchestrate the interaction between these components.

Part 1: Indexing the data

The first part of creating a RAG system is to load, clean, and index the data. This process can be accomplished with a Conductor workflow. Let’s build a data-indexer workflow.

Step 1: Get the data

Choose a data source for your RAG system. The data can come from anywhere — a document repository, database, or API — and Conductor offers a variety of tasks that can pull data from any source.

For our financial news analysis platform, the FMP Articles API will serve as the data source. To call the API, get the API access keys and create an HTTP task in your Conductor workflow. Configure the endpoint method, URL, and other settings, and the task will retrieve data through the API whenever the workflow is executed.

Screenshot of the HTTP task in Orkes Conductor.

Use the HTTP task in Orkes Conductor to call an API.

Step 2: Transform the data

Before the data gets indexed to the vector database, the API payload should be transformed, cleaned, and chunked so that the embedding model can ingest it.

Developers can write Conductor workers to transform the data to create chunks. Conductor workers are powerful, language-agnostic functions that can be written in any language and use well-known libraries such as NumPy, pandas, and so on for advanced data transformation and cleaning.

In our example, we will use a JSON JQ Transform Task as a simple demonstration of how to transform the data. We only need the article title and content from the FMP Articles API for our financial news analysis platform. Each article must be stored in separate chunks in the required payload format for indexing.

Diagram of the original data payload versus the transformed data.

API payload vs the transformed data. Only the relevant data are retained.

Step 3: Index the data into a vector database

The cleaned data is now ready to be indexed into a vector database, such as Pinecone, Weaviate, MongoDB, and more. Use the LLM Index Text Task in your Conductor workflow to add one data chunk into the vector space. A dynamic fork can be used to execute multiple LLM Index Text Tasks in parallel so that multiple chunks can be added at once.

Screenshot of the LLM Index Text task in Orkes Conductor.

Use the LLM Index Text task in Orkes Conductor to store data into a vector database.

The LLM Index Test Task is one of the many LLM tasks provided in Orkes Conductor to simplify building LLM-powered applications.

Repeat

To build out the vector database, iterate through the three steps — extract, transform, load — until the desired dataset size is reached. The iterative loop can be built using a Do While operator task in Conductor.

Here is the full data-indexer workflow.

Diagram of the data-indexer workflow.

data-indexer workflow in Orkes Conductor.

Part 2: Retrieving data for semantic search

Once the vector database is ready, it can be deployed for production — in this case, for financial news analysis. This is where data is retrieved from the vector database to serve as context for the LLM, so that it can formulate a more accurate response. For this second part, let’s build a semantic-search workflow that can be used in an application.

Step 1: Retrieve relevant data from vector database

In a new workflow, add the LLM Search Index Task — one of the many LLM tasks provided in Orkes Conductor to simplify building LLM-powered applications. This task takes in a user query and returns the relevant context chunks that match the most closely.

Diagram of the input, containing the user query versus the output, containing the relevant data from the vector database.

The LLM Search Index Text takes in a user query and returns the relevant context chunks from the vector database.

Step 2: Formulate an answer

With the retrieved context, call an LLM of your choice to generate the response to the user query. Use the LLM Text Complete Task in Orkes Conductor to accomplish this step. The LLM will ingest the user query along with the context.

Screenshot of the LLM Text Complete task in Orkes Conductor.

Use the LLM Text Complete task in Orkes Conductor to prompt a selected LLM for a response.

Guardrails can be set up in Orkes Conductor to optimize and constrain the LLM response, such as by adjusting the temperature, topP, or maxTokens.

Use Orkes Conductor’s AI Prompt studio to create a prompt template for the LLM to follow in the LLM Text Complete Task.

Example prompt template

text
Answer the question based on the context provided.

Context: "${context}"

Question: "${question}"

Provide just the answer without repeating the question or mentioning the context.

Here is the full semantic-search workflow.

Diagram of the semantic-search workflow.

semantic-search workflow in Orkes Conductor.

Use the workflow in your application

Once the semantic-search workflow is ready, you can use it in your application project to build a semantic search engine or chatbot. To build your application, leverage the Conductor SDKs, available in popular languages like Python, Java, and Golang, and call our APIs to trigger the workflow in your application.

The RAG-based financial news analysis platform looks like this:

Screenshot of the example RAG-based financial news analysis platform.

The RAG-based financial news analysis platform in action, with a Conductor workflow powering it.

Whenever a user enters a query, a semantic-search workflow runs in the background to provide the answer.

If the vector database needs to be updated on the backend, the data-indexer workflow can be triggered, or even scheduled at regular intervals for automatic updates.

While the financial news analysis platform is a simple variant of a RAG system, developers can use Orkes Conductor to quickly develop and debug their own RAG systems of varying complexities.

Summary

Building semantic search using RAG can be much more achievable than most people think. By applying orchestration using a platform like Orkes Conductor, the development and operational effort need not involve complicated tooling, infrastructure, skill sets and other resources. This translates to a highly efficient go-to-market process that can be rapidly iterated over time to optimize the results and value derived from such AI capabilities in any modern business.

—

Conductor is an enterprise-grade orchestration platform for process automation, API and microservices orchestration, agentic workflows, and more. Check out the full set of features, try it yourself using our Developer Edition sandbox, or get a demo of Orkes Cloud, a fully managed and hosted Conductor service.