Skip to main content

Error Boundary

Error Boundary provides a comprehensive error handling system with multi-level error resolution, priority-based execution, and enterprise-grade error recovery capabilities.

Overview

The vNext workflow system implements a hierarchical error handling mechanism that allows you to define error policies at three levels:

  1. Task Level - Most specific, applied to individual task executions
  2. State Level - Applied when no task-level boundary handles the error
  3. Global Level - Workflow-wide fallback when no lower-level boundary matches

Important: ErrorBoundary definitions work at task execution level. Regardless of where the boundary is defined (global, state, or task), actions are taken based on task execution errors.

Schema alignment: Workflow JSON Schema for errorBoundary (including abort actions that reference a transition) matches backend validators when you use vnext-schema and Ajv2019 validation as described in Schema Management.


Error Resolution Hierarchy

When an error occurs during task execution, the system evaluates error boundaries in the following order:

Task-level errorBoundary (most specific)
↓ (if no match, automatically check next level)
State-level errorBoundary
↓ (if no match, automatically check next level)
Global-level errorBoundary
↓ (if still no match)
Default system behavior (throw exception)

Within each level, rules are evaluated by priority (lower number = higher priority).


ErrorBoundary Structure

{
"errorBoundary": {
"onError": [
{
"action": 0,
"errorTypes": ["ValidationException"],
"errorCodes": ["Task:400007"],
"transition": "error-state",
"priority": 10,
"retryPolicy": {
"maxRetries": 3,
"initialDelay": "PT5S",
"backoffType": 1,
"backoffMultiplier": 2.0,
"maxDelay": "PT1M",
"useJitter": true
},
"logOnly": false
}
]
}
}

Properties

PropertyTypeDescription
onErrorarrayError handling rules evaluated in priority order

Note: onTimeout property exists in schema but is not yet implemented.


Error Handler Rule

Each rule in the onError array defines how to handle specific errors.

Properties

PropertyTypeRequiredDefaultDescription
actionintYes-Action to take (see Error Actions)
errorTypesstring[]No["*"]Exception type names to match
errorCodesstring[]No["*"]Error codes to match
transitionstringNo-Transition key to trigger
priorityintNo100Rule priority (lower = higher priority)
retryPolicyobjectNo-Retry configuration
logOnlybooleanNofalseOnly log, don't affect flow

Error Matching

  • errorTypes: Exception class names (e.g., ValidationException, TimeoutException)
  • errorCodes: Error codes in format Category:Code or just Code (e.g., Task:400007, 500)
  • Empty array or ["*"] matches all errors

Retry and rule matching: Retry is now resolved by rule-based matching (error-aware retry). The error is matched to the appropriate rule before applying retry; infrastructure-level errors are no longer incorrectly included in the boundary, and retry behavior is consistent with the matched rule.

Priority System

  • Lower values are evaluated first
  • Default priority: 100
  • Wildcard rules should use: 999
  • Recommended ranges:
    • Critical handlers: 1-10
    • Specific handlers: 10-50
    • General handlers: 50-100
    • Fallback handlers: 100-999

Error Actions

CodeActionDescription
0AbortAbort execution, optionally trigger error transition
1RetryRetry the task with configured retry policy
2RollbackRollback to compensation state
3IgnoreIgnore error and continue to next task
4NotifySend notification and optionally transition
5LogLog only, does not affect flow

Retry Policy

Configure retry behavior for the Retry action.

{
"retryPolicy": {
"maxRetries": 3,
"initialDelay": "PT5S",
"backoffType": 1,
"backoffMultiplier": 2.0,
"maxDelay": "PT1M",
"useJitter": true
}
}

Properties

PropertyTypeDefaultDescription
maxRetriesint3Maximum retry attempts
initialDelaystring-Initial delay (ISO 8601 duration)
backoffTypeint10: Fixed, 1: Exponential
backoffMultipliernumber2.0Multiplier for exponential backoff
maxDelaystring-Maximum delay between retries (ISO 8601 duration)
useJitterbooleantrueAdd random jitter to prevent thundering herd

Backoff Types

CodeTypeDescription
0FixedSame delay between each retry
1ExponentialDelay doubles (or multiplies) with each retry

Duration Format (ISO 8601)

  • PT5S - 5 seconds
  • PT30S - 30 seconds
  • PT1M - 1 minute
  • PT5M - 5 minutes
  • PT1H - 1 hour

Examples

1. Global Level ErrorBoundary

Define at workflow attributes level for workflow-wide error handling:

{
"key": "payment-workflow",
"domain": "banking",
"version": "1.0.0",
"attributes": {
"type": "F",
"errorBoundary": {
"onError": [
{
"action": 0,
"errorCodes": ["*"],
"transition": "error-state",
"priority": 999
}
]
},
"states": [...]
}
}

2. State Level ErrorBoundary

Define at state level for state-specific error handling:

{
"key": "processing",
"stateType": 2,
"versionStrategy": "Minor",
"labels": [...],
"errorBoundary": {
"onError": [
{
"action": 1,
"errorTypes": ["TransientException"],
"priority": 10,
"retryPolicy": {
"maxRetries": 5,
"initialDelay": "PT10S",
"backoffType": 1,
"backoffMultiplier": 2.0,
"maxDelay": "PT5M",
"useJitter": true
}
},
{
"action": 0,
"errorCodes": ["*"],
"transition": "failed",
"priority": 100
}
]
},
"transitions": [...]
}

3. Task Level ErrorBoundary

Define at task execution level for task-specific error handling:

{
"onExecutionTasks": [
{
"order": 1,
"task": {
"key": "call-external-api",
"domain": "core",
"version": "1.0.0",
"flow": "sys-tasks"
},
"mapping": {
"key": "api-call-mapping",
"domain": "core",
"flow": "sys-mappings",
"version": "1.0.0"
},
"errorBoundary": {
"onError": [
{
"action": 1,
"errorCodes": ["Task:503", "Task:504"],
"priority": 1,
"retryPolicy": {
"maxRetries": 3,
"initialDelay": "PT5S",
"backoffType": 1,
"backoffMultiplier": 2.0
}
},
{
"action": 3,
"errorCodes": ["Task:404"],
"priority": 2,
"logOnly": true
}
]
}
}
]
}

4. Multiple Rules with Priority

{
"errorBoundary": {
"onError": [
{
"_comment": "Handle validation errors - abort immediately",
"action": 0,
"errorTypes": ["ValidationException"],
"transition": "validation-failed",
"priority": 1
},
{
"_comment": "Retry transient failures",
"action": 1,
"errorCodes": ["Task:503", "Task:504", "Task:429"],
"priority": 10,
"retryPolicy": {
"maxRetries": 5,
"initialDelay": "PT5S",
"backoffType": 1
}
},
{
"_comment": "Log and ignore non-critical errors",
"action": 5,
"errorCodes": ["Task:204"],
"priority": 20,
"logOnly": true
},
{
"_comment": "Fallback - abort with error transition",
"action": 0,
"errorCodes": ["*"],
"transition": "error-state",
"priority": 999
}
]
}
}

5. Exponential Backoff with Jitter

{
"errorBoundary": {
"onError": [
{
"action": 1,
"errorTypes": ["*"],
"priority": 100,
"retryPolicy": {
"maxRetries": 5,
"initialDelay": "PT1S",
"backoffType": 1,
"backoffMultiplier": 2.0,
"maxDelay": "PT30S",
"useJitter": true
}
}
]
}
}

Retry delays with jitter (approximate):

  1. ~1s (+ random 0-500ms)
  2. ~2s (+ random 0-1000ms)
  3. ~4s (+ random 0-2000ms)
  4. ~8s (+ random 0-4000ms)
  5. ~16s (+ random 0-8000ms, capped at 30s)

Best Practices

1. Use Priority Wisely

{
"onError": [
{ "action": 0, "errorTypes": ["ValidationException"], "priority": 1 },
{ "action": 1, "errorCodes": ["Task:503"], "priority": 10 },
{ "action": 0, "errorCodes": ["*"], "priority": 999 }
]
}

2. Always Have a Fallback

Include a wildcard rule with high priority number as fallback:

{
"action": 0,
"errorCodes": ["*"],
"transition": "error-state",
"priority": 999
}

3. Use Appropriate Retry Policies

  • Transient errors (503, 504, 429): Retry with exponential backoff
  • Validation errors: Abort immediately, no retry
  • Business errors: Route to error handling state

4. Leverage Hierarchy

  • Task level: Specific retry policies for external API calls
  • State level: Common error handling for state operations
  • Global level: Fallback and notification for unhandled errors

5. Use logOnly for Debugging

{
"action": 5,
"errorCodes": ["*"],
"priority": 1,
"logOnly": true
}