Chunk Text
- v5.2.38 and later
The Chunk Text task is used to divide text into smaller segments (chunks) based on the document type. This task is useful for processing large text inputs in parts, such as preparing content for semantic search, text embedding, or summarization.
During execution, the task determines the chunking logic based on the specified document type and splits the text into segments of the defined size. Each chunk is returned as an array element and can be processed by subsequent tasks in the workflow.
Task parameters
Configure these parameters for the Chunk Text task.
| Parameter | Description | Required/ Optional | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| inputParameters.text | The input text to be divided into chunks. | Required. | ||||||||
| inputParameters.chunkSize | The maximum number of characters per chunk. Enter a value between 100 and 10,000 characters. The default and recommended value is 1,024. | Required. | ||||||||
| inputParameters.mediaType | The document type or content format of the input text. Supported values include:
| Required. |
The following are generic configuration parameters that can be applied to the task and are not specific to the Chunk Text task.
Caching parameters
You can cache the task outputs using the following parameters. Refer to Caching Task Outputs for a full guide.
| Parameter | Description | Required/ Optional |
|---|---|---|
| cacheConfig.ttlInSecond | The time to live in seconds, which is the duration for the output to be cached. | Required if using cacheConfig. |
| cacheConfig.key | The cache key is a unique identifier for the cached output and must be constructed exclusively from the task’s input parameters. It can be a string concatenation that contains the task’s input keys, such as ${uri}-${method} or re_${uri}_${method}. | Required if using cacheConfig. |
Other generic parameters
Here are other parameters for configuring the task behavior.
| Parameter | Description | Required/ Optional |
|---|---|---|
| optional | Whether the task is optional. If set to true, any task failure is ignored, and the workflow continues with the task status updated to COMPLETED_WITH_ERRORS. However, the task must reach a terminal state. If the task remains incomplete, the workflow waits until it reaches a terminal state before proceeding. | Optional. |
Task configuration
This is the task configuration for a Chunk Text task.
{
"name": "chunk_text_task",
"taskReferenceName": "chunkText",
"inputParameters": {
"text": "<YOUR-TEXT-HERE>",
"chunkSize": 1024,
"mediaType": "auto"
},
"type": "CHUNK_TEXT"
}
Task output
The Chunk Text task will return the following parameters.
| Parameter | Description |
|---|---|
| text | An array of chunked text segments. Each element in the array represents one chunk of the original text. |