Documentation Index
Fetch the complete documentation index at: https://atp.hypertext.studio/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The Agent Triage Protocol defines standard error responses that enable consistent error handling across implementations. This page describes the error response format, standard error codes, and best practices for handling errors in ATP implementations.
Error responses utilize a structured format that provides both machine-readable codes and human-understandable messages. The error object contains:
{
"code": "NOTIFICATION_EXPIRED",
"message": "The notification deadline has passed and no longer accepts responses",
"details": {
"notification_id": "550e8400-e29b-41d4-a716-446655440000",
"expired_at": "2025-05-25T11:00:00Z"
},
"request_id": "req_abc123def456"
}
| Field | Type | Description |
|---|
code | string | Standardized error identifier for programmatic handling |
message | string | Human-readable error description |
details | object | Additional context specific to the error type |
request_id | string | Unique identifier for request tracing |
The request_id field is particularly important for troubleshooting, as it allows correlation of error reports across system boundaries and log files.
HTTP Status Codes
The protocol uses standard HTTP status codes to indicate the class of error:
| Status Code | Description | When Used |
|---|
400 Bad Request | Invalid request format or parameters | Malformed JSON, missing required fields |
401 Unauthorized | Missing or invalid authentication | Invalid or expired API key/token |
403 Forbidden | Valid auth but insufficient permissions | Attempting to access another service’s notifications |
404 Not Found | Resource doesn’t exist | Notification ID not found |
409 Conflict | Request conflicts with current state | Responding to already-answered notification |
422 Unprocessable Entity | Request validation failed | Response data doesn’t match expected format |
429 Too Many Requests | Rate limit exceeded | Too many requests in time period |
500 Internal Server Error | Server-side failure | Unexpected errors in ATP server |
503 Service Unavailable | Temporary service issues | Server maintenance or overload |
Error Codes
The protocol defines specific error codes that provide more detail than HTTP status alone. These codes allow client applications to implement specific handling logic for different error conditions.
Authentication Errors
| Code | Description |
|---|
AUTH_INVALID_TOKEN | The provided token is malformed or invalid |
AUTH_EXPIRED_TOKEN | The authentication token has expired |
AUTH_INSUFFICIENT_PERMISSIONS | Token lacks required permissions |
Notification Errors
| Code | Description |
|---|
NOTIFICATION_NOT_FOUND | Notification doesn’t exist or is no longer accessible |
NOTIFICATION_EXPIRED | Notification deadline has passed |
NOTIFICATION_ALREADY_RESPONDED | Notification has already been answered |
NOTIFICATION_INVALIDATED | Service marked notification as invalid |
Validation Errors
| Code | Description |
|---|
INVALID_ACTION_ID | The specified action_id doesn’t exist for this notification |
INVALID_RESPONSE_DATA | Response data doesn’t match expected format |
CONSTRAINT_VIOLATION | Response violates defined constraints |
MISSING_REQUIRED_FIELD | Required field is missing from request |
Service Errors
| Code | Description |
|---|
SERVICE_NOT_REGISTERED | Service hasn’t been registered with ATP |
SERVICE_SUSPENDED | Service has been temporarily suspended |
CALLBACK_FAILED | Failed to deliver response to service callback |
Rate Limiting
| Code | Description |
|---|
RATE_LIMIT_EXCEEDED | Too many requests from this client/service |
QUOTA_EXCEEDED | Monthly/daily quota has been exceeded |
Client Error Handling
Robust client implementations must incorporate comprehensive error handling strategies to ensure reliable operation in production environments. The protocol distinguishes between transient failures that warrant retry attempts and permanent errors that require user intervention or alternative action.
Transient vs. Permanent Errors
Transient errors are temporary issues that may resolve with time or retries:
- All
5xx series errors
- Rate limiting (
429) responses
- Network connectivity issues
- Webhook delivery failures
Permanent errors indicate fundamental problems that won’t be resolved by retrying:
- Authentication errors (except token expiration)
- Resource not found errors
- Validation errors
- Business logic errors (e.g., notification already responded)
Retry Strategies
For temporary failures, clients should implement exponential backoff retry strategies:
- Initial retry delay should begin at one second
- Double the delay with each subsequent attempt
- Add small random jitter to prevent thundering herd problems
- Cap maximum delay at 60 seconds
- Limit total retry attempts (typically 3-5 is reasonable)
function calculateRetryDelay(attempt) {
// Start with 1000ms delay and double each time
const baseDelay = Math.min(1000 * Math.pow(2, attempt), 60000);
// Add jitter (±10% of base delay)
const jitter = baseDelay * 0.1 * (Math.random() * 2 - 1);
return baseDelay + jitter;
}
User Feedback
Client applications should provide appropriate feedback to users based on error types:
- For transient errors, show a temporary “retrying” message
- For permanent errors, show clear explanation of the issue
- For validation errors, highlight the specific fields with problems
- For expired or invalidated notifications, remove them from the UI
- For authentication issues, prompt for re-authentication
Service Callback Errors
When the ATP server delivers responses to service webhook endpoints, services may encounter processing errors that prevent successful handling of user decisions. Services should communicate these errors using a consistent format that enables appropriate ATP server behavior.
{
"code": "RESOURCE_LOCKED",
"message": "Cannot apply changes because resource is currently locked by another operation",
"user_message": "The system is currently processing another change. Please try again in a few moments.",
"retriable": true
}
| Field | Type | Description |
|---|
code | string | Service-specific error identifier |
message | string | Technical error description for logging |
user_message | string | Human-readable message for potential user display |
retriable | boolean | Indicates whether retry attempts may succeed |
The retriable field is particularly important as it tells the ATP server whether it should attempt to deliver the response again later. If set to false, the ATP server will not retry and may notify the user that their response could not be processed.
Logging and Monitoring
Robust ATP implementations should include comprehensive logging and monitoring for error conditions:
- Log all errors with their request IDs
- Include contextual information in logs (user ID, service ID, notification ID)
- Monitor error rates by type and service
- Set up alerts for unusual error patterns
- Implement distributed tracing for complex deployments
When logging errors, be careful not to include sensitive information:
- Never log authentication tokens
- Redact personal information from error logs
- Sanitize potentially sensitive fields in error details
Error Handling Examples
Client-Side Error Handling (TypeScript)
async function submitResponse(
notification: Notification,
actionId: string,
responseData: any
): Promise<void> {
const MAX_RETRIES = 3;
let attempt = 0;
while (attempt <= MAX_RETRIES) {
try {
const response = await fetch('/api/v1/client/respond', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${userToken}`
},
body: JSON.stringify({
notification_id: notification.id,
action_id: actionId,
response_data: responseData
})
});
if (response.ok) {
return; // Success!
}
const errorData = await response.json();
// Handle specific error cases
switch (errorData.code) {
case 'NOTIFICATION_EXPIRED':
case 'NOTIFICATION_INVALIDATED':
case 'NOTIFICATION_ALREADY_RESPONDED':
// Terminal state, remove from UI
removeNotification(notification.id);
showUserMessage(errorData.message);
return;
case 'INVALID_RESPONSE_DATA':
case 'CONSTRAINT_VIOLATION':
// Validation error, show specific feedback
showValidationError(errorData.details);
return;
case 'AUTH_EXPIRED_TOKEN':
// Try to refresh the token
await refreshUserToken();
attempt++; // Don't count against retry limit
continue;
case 'RATE_LIMIT_EXCEEDED':
// Get retry delay from headers
const retryAfter = response.headers.get('Retry-After');
const delayMs = retryAfter ? parseInt(retryAfter) * 1000 : calculateRetryDelay(attempt);
await delay(delayMs);
attempt++;
continue;
}
// Server errors (5xx) are retryable
if (response.status >= 500) {
await delay(calculateRetryDelay(attempt));
attempt++;
continue;
}
// Other errors are considered permanent
showUserMessage(`Error: ${errorData.message}`);
return;
} catch (error) {
// Network errors are retryable
if (error instanceof NetworkError) {
await delay(calculateRetryDelay(attempt));
attempt++;
continue;
}
// Other exceptions are unexpected and should be logged
logError('Unexpected error during response submission', error);
showUserMessage('An unexpected error occurred. Please try again later.');
return;
}
}
// If we've exhausted retries
showUserMessage('Unable to submit your response due to network issues. Please try again later.');
}
Service-Side Webhook Error Handling (Python)
@app.route('/atp/webhook', methods=['POST'])
def handle_atp_webhook():
# Verify webhook signature
signature = request.headers.get('X-ATP-Signature')
if not verify_signature(request.data, signature, webhook_secret):
return jsonify({
'code': 'INVALID_SIGNATURE',
'message': 'Invalid webhook signature',
'retriable': False
}), 401
data = request.json
notification_id = data['notification_id']
action_id = data['action_id']
response_data = data.get('response_data')
try:
# Retrieve context for this notification
context = get_notification_context(notification_id)
if not context:
return jsonify({
'code': 'UNKNOWN_NOTIFICATION',
'message': 'No context found for this notification',
'retriable': False
}), 404
# Process the response based on action type
if action_id == 'approve_deployment':
try:
result = process_deployment_approval(context, response_data)
return jsonify({'status': 'success', 'result': result})
except ResourceLockedException:
return jsonify({
'code': 'RESOURCE_LOCKED',
'message': 'Deployment resource is currently locked',
'user_message': 'Another deployment is in progress. Please try again later.',
'retriable': True
}), 409
# Other action handlers...
except TemporaryFailure as e:
# Log the error with request ID for tracing
logger.error(f"Temporary failure processing webhook: {str(e)}",
extra={'request_id': request.headers.get('X-Request-ID')})
# Return 503 to trigger retry with backoff
return jsonify({
'code': 'TEMPORARY_FAILURE',
'message': str(e),
'retriable': True
}), 503
except PermanentFailure as e:
# Log the permanent error
logger.error(f"Permanent failure processing webhook: {str(e)}",
extra={'request_id': request.headers.get('X-Request-ID')})
# Return 422 to indicate the request was valid but couldn't be processed
return jsonify({
'code': e.code,
'message': str(e),
'user_message': e.user_message,
'retriable': False
}), 422
except Exception as e:
# Unexpected errors should be logged with full details
logger.exception(f"Unexpected error processing webhook",
extra={'request_id': request.headers.get('X-Request-ID')})
# Return 500 with minimal details to avoid leaking implementation details
return jsonify({
'code': 'INTERNAL_ERROR',
'message': 'An unexpected error occurred',
'retriable': True
}), 500
By following these error handling patterns, ATP implementations can provide robust, user-friendly experiences even when things go wrong.