← All Modules

workflow

Durable workflow engine + Lua client. The engine runs in assay serve; any assay Lua app becomes a worker via require("assay.workflow"). Workflow code runs deterministically and replays from a persisted event log, so worker crashes don't lose work and side effects don't duplicate.

Four pieces, one binary:

┌──────────────────────────────────────────────────────────────────────┐
│ assay serve              the engine (REST + SSE + dashboard)         │
│ assay <subcommand>       CLI (workflow / schedule / namespace /      │
│                                worker / queue / completion)          │
│ require("assay.workflow") Lua client: handlers + management surface  │
│ REST API + OpenAPI spec  any-language workers via openapi-generator  │
└──────────────────────────────────────────────────────────────────────┘

The engine and clients communicate over HTTP — any language with an HTTP client can implement a worker or management script, not just Lua.

Engine — assay serve

Start the workflow server.

assay serve                                           # SQLite, port 8080, no auth (dev)
assay serve --port 8085                               # different port
assay serve --backend sqlite:///var/lib/assay/w.db    # explicit SQLite path
assay serve --backend postgres://u:p@h:5432/assay     # Postgres (multi-instance)
DATABASE_URL=postgres://... assay serve               # backend from env (keeps creds out of argv)

Auth modes:

FlagBehaviour
--no-auth (default)Open access. Use only behind a trusted gateway.
--auth-api-keyClients send Authorization: Bearer <key>. Manage keys with --generate-api-key / --list-api-keys. Keys are SHA256-hashed at rest.
--auth-issuer <url> --auth-audience <aud>JWT/OIDC. Fetches and caches the issuer's JWKS to validate signatures. Works with Auth0, Okta, Dex, Keycloak, Cloudflare Access, any OIDC issuer.
--auth-api-key + --auth-issuer …Combined. Tokens that parse as a JWS header take the JWT path; everything else takes the API-key path. Same server accepts both token types on Authorization: Bearer.

Combined mode dispatch (when both --auth-issuer and --auth-api-key are set):

Combined mode lets the same server serve short-lived OIDC user tokens (from a browser session) alongside long-lived machine API keys (from a CI job) without the caller picking a mode up front.

Multi-instance deployment. SQLite is single-instance only (engine takes an engine_lock row at startup). For Kubernetes / Docker Swarm, use Postgres: the cron scheduler picks a leader via pg_advisory_lock so only one instance fires; workflow + activity task claiming uses FOR UPDATE SKIP LOCKED so multiple instances don't race.

Optional S3 archival (cargo feature s3-archival, default-off). When compiled in and ASSAY_ARCHIVE_S3_BUCKET is set at runtime, a background task periodically finds workflows in terminal states older than ASSAY_ARCHIVE_RETENTION_DAYS (default 30), uploads {record, events} to s3://bucket/prefix/<namespace>/<workflow_id>.json, and purges dependent rows. The workflow stub stays with archived_at + archive_uri set so GET /workflows/{id} still resolves. Credentials resolve via the AWS SDK default chain (env / shared config / IRSA).

Env varDefaultMeaning
ASSAY_ARCHIVE_S3_BUCKET(unset)Enables archival when set
ASSAY_ARCHIVE_S3_PREFIXassay/S3 key prefix
ASSAY_ARCHIVE_RETENTION_DAYS30Min age before archiving
ASSAY_ARCHIVE_POLL_SECS3600How often the archiver runs
ASSAY_ARCHIVE_BATCH_SIZE50Max workflows archived / tick
ASSAY_WF_DISPATCH_TIMEOUT_SECS30Worker silent-timeout for dispatch lease (see "crash safety")

Dashboard whitelabel (v0.11.10+)

The embedded /workflow dashboard can be rebranded per-deployment via six optional env vars. Every knob defaults to assay's built-in identity so an unset env keeps the standalone experience unchanged. Intended for platform teams who front assay inside their own admin UI and want the dashboard to read as part of that product.

Env varDefaultMeaning
ASSAY_WHITELABEL_NAMEAssaySidebar header text
ASSAY_WHITELABEL_LOGO_URL(unset, no image)Image URL rendered before the brand text
ASSAY_WHITELABEL_PAGE_TITLEAssay Workflow DashboardBrowser tab title
ASSAY_WHITELABEL_PARENT_URL(unset, link hidden)If set, adds a back-link in the sidebar footer to the parent app
ASSAY_WHITELABEL_PARENT_NAMEBackLabel for the back-link
ASSAY_WHITELABEL_API_DOCS_URL/api/v1/engine/workflow/docsOverride or hide the API Docs sidebar link
ASSAY_WHITELABEL_CSS_URL(unset, no extra sheet)Extra stylesheet loaded after assay's own CSS
ASSAY_WHITELABEL_SUBTITLE(unset, no subtitle)Small muted line rendered under the brand name
ASSAY_WHITELABEL_MARKFirst char of NAME (uppercased)Glyph in the always-visible brand badge square
ASSAY_WHITELABEL_FAVICON_URLBuilt-in SVGBrowser-tab icon URL (v0.11.12+)
ASSAY_WHITELABEL_DEFAULT_NAMESPACEmainNamespace the dashboard opens on (v0.11.12+)

ASSAY_WHITELABEL_API_DOCS_URL="" (empty string) hides the link entirely. Any other value redirects the link to that URL. Setting the variable explicitly to empty is distinct from leaving it unset — unset keeps the default /api/v1/engine/workflow/docs link, empty hides it.

Theming via CSS custom properties. Every colour, radius, and shadow on the dashboard is a CSS variable on :root. An extra stylesheet loaded after assay's own CSS can re-declare any of them without forking. The full token list:

--bg          --surface      --surface-hover  --border
--text        --text-muted
--accent      --accent-hover
--green       --red          --yellow         --orange
--shadow      --code-bg

Minimal example — re-skin the dashboard to match your host app's primary colour:

/* served at /static/my-theme.css by your host app */
:root {
  --accent:       #009999;
  --accent-hover: #007a7a;
  --bg:           hsl(0 0% 98%);
  --surface:      hsl(0 0% 100%);
  --text:         hsl(222 84% 5%);
  --border:       hsl(214 32% 91%);
}

Then point assay at it:

- name: ASSAY_WHITELABEL_CSS_URL
  value: "/static/my-theme.css"

Operators can also override any specific selector in the same file — source-order specificity ensures the extra sheet wins. The URL is loaded as a plain <link rel="stylesheet"> at the end of <head>; an asset-version query-string is appended automatically so redeploys invalidate browser caches.

Hosting the logo. If assay is mounted on the same origin as the embedding app (e.g. behind a reverse proxy at /workflow/*), a path-absolute URL like /static/my-logo.svg loads from the host app with no CORS plumbing. For cross-origin setups, use a full https://… URL — <img> loads don't require CORS headers.

Example (Kubernetes Deployment):

env:
  - name: ASSAY_WHITELABEL_NAME
    value: "Acme Workflows"
  - name: ASSAY_WHITELABEL_LOGO_URL
    value: "/static/acme-logo.svg"
  - name: ASSAY_WHITELABEL_PAGE_TITLE
    value: "Acme Workflows"
  - name: ASSAY_WHITELABEL_PARENT_URL
    value: "/"
  - name: ASSAY_WHITELABEL_PARENT_NAME
    value: "Acme Console"
  - name: ASSAY_WHITELABEL_API_DOCS_URL
    value: "" # hide; docs are provided by the parent console

The engine serves:

PathPurpose
GET /api/v1/engine/workflow/healthLiveness probe
GET /api/v1/engine/workflow/version{ version, build_profile } — CLI + UI use
GET /api/v1/engine/workflow/openapi.jsonFull OpenAPI 3 spec (all ~30 endpoints)
GET /api/v1/engine/workflow/docsInteractive API docs (Scalar)
GET /workflow/Built-in dashboard (see "Dashboard" below)
GET /api/v1/engine/workflow/events/streamSSE event stream

Full endpoint list in the OpenAPI spec — workflow lifecycle, state queries, events, children, continue-as-new, signals, schedules (CRUD + patch/pause/resume), namespaces, workers, queues, worker task polling and dispatch.

CLI

Talk to a running engine from a shell. Lua stdlib (below) is the preferred path for automation; CLI is for operators at a terminal and one-shot shell scripts.

Global options — flag / env / config file / default precedence:

FlagEnv varConfig keyDefault
--engine-url URLASSAY_ENGINE_URLengine_urlhttp://127.0.0.1:8080
--api-key KEYASSAY_API_KEYapi_key(none)
(via config only)ASSAY_API_KEY_FILEapi_key_file(none; read + trim file contents)
--namespace NSASSAY_NAMESPACEnamespacemain
--output FORMATASSAY_OUTPUToutputtable on TTY, json when piped
--config PATHASSAY_CONFIG_FILE(n/a)see discovery order

Config file — YAML, auto-discovered at (first match wins):

  1. --config PATH (explicit)
  2. $ASSAY_CONFIG_FILE
  3. $XDG_CONFIG_HOME/assay/config.yaml
  4. ~/.config/assay/config.yaml
  5. /etc/assay/config.yaml
engine_url: https://assay.example.com
api_key_file: /run/secrets/assay-api-key # preferred — keeps the secret out of env / argv
namespace: main
output: table

api_key_file wins over api_key. Missing file is not an error — callers fall through to flag / env / default precedence.

JSON input indirection. --input, --search-attrs, and signal payloads all accept:

'{"key":"value"}'       # literal
@/path/to/file.json     # file contents
-                       # read stdin

Subcommand surface:

assay workflow
  start     --type T [--id ID] [--input JSON] [--queue Q] [--search-attrs JSON]
  list      [--status S] [--type T] [--search-attrs JSON] [--limit N]
  describe  <id>
  state     <id> [<query-name>]                 # register_query reader
  events    <id> [--follow]                     # log, or poll-stream until terminal
  children  <id>
  signal    <id> <name> [payload]
  cancel    <id>
  terminate <id> [--reason R]
  continue-as-new <id> [--input JSON]           # client-side (distinct from ctx:)
  wait      <id> [--timeout SECS] [--target STATUS]   # scripting-friendly blocking

assay schedule
  list  |  describe <name>
  create <name> --type T --cron EXPR [--timezone TZ] [--input JSON] [--queue Q]
  patch  <name> [--cron EXPR] [--timezone TZ] [--input JSON] [--queue Q] [--overlap POLICY]
  pause  <name>  |  resume <name>  |  delete <name>

assay namespace  create | list | describe | delete
assay worker     list
assay queue      stats
assay completion  <bash|zsh|fish|powershell|elvish>

Exit codes: 0 success · 1 HTTP / unreachable / not-found · 2 workflow wait timeout · 64 usage error (bad JSON).

Shell completion:

assay completion bash > /etc/bash_completion.d/assay
assay completion zsh  > "${fpath[1]}/_assay"
assay completion fish > ~/.config/fish/completions/assay.fish

Lua client — require("assay.workflow")

Two roles in one module: worker (register handlers and block polling for tasks) and management (inspect / mutate the engine from anywhere, same as the CLI).

Worker role

Management role (new in v0.11.3 — parity with REST)

Workflows:

FunctionREST
workflow.start(opts)POST /workflows
workflow.list(opts?)GET /workflows?...
workflow.describe(id)GET /workflows/{id}
workflow.get_events(id)GET /workflows/{id}/events
workflow.get_state(id, name?)GET /workflows/{id}/state[/{name}]
workflow.list_children(id)GET /workflows/{id}/children
workflow.signal(id, name, payload)POST /workflows/{id}/signal/{name}
workflow.cancel(id)POST /workflows/{id}/cancel
workflow.terminate(id, reason?)POST /workflows/{id}/terminate
workflow.continue_as_new(id, input?)POST /workflows/{id}/continue-as-new

workflow.list(opts) accepts { namespace?, status?, type?, search_attrs?, limit?, offset? }. search_attrs is a table; the CLI URL-encodes it as the search_attrs= query param.

Sub-tables (one per REST resource):

Every function returns the parsed JSON response on success, nil on a 404 for describe/get_state, or raises error() with an HTTP status message otherwise — consistent with the existing workflow.start / signal / describe / cancel behaviour.

Workflow handler context (ctx)

Inside workflow.define(name, function(ctx, input) ... end):

MethodBehaviour
ctx:execute_activity(name, input, opts?)Schedule an activity, block until complete, return result. Raises on final failure. opts: retry + timeout knobs (see below).
ctx:execute_parallel(activities)v0.11.3. Schedule N activities concurrently, return results in input order. Raises if any fail. Handler resumes only when all have terminal events.
ctx:sleep(seconds)Durable timer. Survives worker bouncing; another worker resumes when due.
ctx:wait_for_signal(name, opts?) → payloadBlock until a matching signal arrives. Payload is the signal's JSON value (or nil if signaled with no payload). Multiple waits consume in order. v0.11.9: opts.timeout = seconds bounds the wait; returns nil if the timer fires before any matching signal.
ctx:start_child_workflow(workflow_type, opts)Start a child, block until it completes. opts.workflow_id is required and must be deterministic (same id every replay).
ctx:side_effect(name, fn)Run a non-deterministic op exactly once. Value is cached in history; all replays return the cached value.
ctx:register_query(name, fn)v0.11.3. Expose live workflow state to external callers via GET /workflows/{id}/state[/{name}]. Handler runs on every replay; result is persisted as a snapshot.
ctx:upsert_search_attributes(patch)v0.11.3. Merge a table into the workflow's search_attributes so callers can filter on it via workflow.list({ search_attrs = ... }).
ctx:continue_as_new(input)v0.11.3. Close this run and start a fresh one with empty history (same type / namespace / queue). Standard pattern for unbounded-loop workflows.
ctx:cancel(reason?)v0.11.11. Terminate this workflow with engine status CANCELLED. Use when the handler itself decides to stop early (human rejected, preconditions failed). Distinct from an externally-requested cancel; same terminal state.

opts on execute_activity / execute_parallel: { task_queue?, max_attempts?, initial_interval_secs?, backoff_coefficient?, start_to_close_secs?, heartbeat_timeout_secs? }.

Inside workflow.activity(name, function(ctx, input) ... end):

Crash safety

Workflow code is deterministic by replay. Each ctx: call gets a per-execution sequence number and the engine persists every completed command as an event:

ActivityScheduled / Completed / Failed
TimerScheduled / Fired
SignalReceived                            WorkflowStarted / Completed / Failed / Cancelled
SideEffectRecorded                        WorkflowAwaitingSignal / CancelRequested
ChildWorkflowStarted / Completed / Failed

When a worker picks up a workflow task it receives the full event history. ctx: calls short-circuit to cached values for everything already in history, so the workflow always reaches the same state and only the next unfulfilled step actually runs.

Failure modeRecovery
Activity worker dies mid-executionlast_heartbeat ages out (per-activity heartbeat_timeout_secs); engine re-queues per retry policy.
Workflow worker dies mid-replaydispatch_last_heartbeat ages out (ASSAY_WF_DISPATCH_TIMEOUT_SECS, default 30s); any worker on the queue picks it up and replays.
Engine diesAll state in the DB. On restart, in-flight tasks become claimable again as heartbeats age out.

ctx:side_effect is the escape hatch for any operation that would produce different values across replays (current time, random IDs, external HTTP). The result is recorded once on first execution and returned from cache thereafter, even after a worker crash.

Schedules (cron)

Declarative recurring workflow starts. Scheduler runs on the leader node under Postgres; fires once across the cluster.

assay schedule create nightly \
  --type Report \
  --cron "0 0 2 * * *" \
  --timezone Europe/Berlin \
  --input '{"lookback_hours":24}'

assay schedule patch   nightly --cron "0 0 3 * * *"   # in-place update (v0.11.3)
assay schedule pause   nightly                        # scheduler skips paused (v0.11.3)
assay schedule resume  nightly                        # recomputes next_run_at from now
assay schedule delete  nightly

Cron uses the 6- or 7-field form (with seconds). "0 * * * * *" = every minute on the zero second.

Timezone (v0.11.3). IANA name via --timezone. Default is UTC. The scheduler evaluates the cron in that zone; next_run_at is persisted as a UTC epoch.

Search attributes

Indexed application-level metadata for filtering workflows. Set at start, updated at runtime via ctx:upsert_search_attributes, filtered on list:

-- set at start
workflow.start({
  workflow_type = "Ingest",
  workflow_id   = "ing-42",
  search_attributes = { env = "prod", tenant = "acme" },
})

-- update inside a running workflow
ctx:upsert_search_attributes({ progress = 0.5, stage = "deploy" })

-- filter list results (URL-encoded JSON server-side)
workflow.list({ search_attrs = { env = "prod" } })

Postgres backs search with a JSONB column + ->> operator; SQLite uses json_extract. Filters AND-join; unchanged keys are preserved across upserts.

Dashboard

/workflow/ (or just / — redirects). Real-time monitoring + tier-1 operator controls.

ViewReadMutate
WorkflowsList + filter (status, type, search_attrs)+ Start workflow form; per-row Signal / Cancel / Terminate
DetailMetadata, event timeline, children, live stateSignal / Cancel / Terminate / Continue-as-new; live ctx:register_query snapshot
SchedulesList with timezone + paused stateCreate (with timezone) / Edit (PATCH) / Pause / Resume / Delete
WorkersIdentity, queue, last heartbeat
QueuesPending + running per queue
SettingsEngine version, build profile, namespaces, API docsNamespace create / delete

Status-bar footer always shows the engine version (fetched from /api/v1/engine/workflow/version). Live list updates via SSE. Cache-busted asset URLs per startup.

Steps tab convention (v0.12.0+)

Any workflow that exposes a pipeline_state query handler returning a steps[] array gets an automatic Steps tab in the dashboard's detail view. The tab is added at the front and default-selected when present, hidden otherwise. No new stdlib API — the convention is just a shape the dashboard recognises.

The query name stays pipeline_state (for back-compat and because "pipeline state" reads naturally as "the current state of the run's sequence of operations") but the tab label and schema field both use "steps" — the neutral term that works for CI/CD, ETL, approval flows, and long-running background jobs alike.

ctx:register_query("pipeline_state", function()
  return {
    status = "running",      -- optional: overall pipeline status
    current_step = 2,        -- optional: 1-indexed
    steps = {                -- REQUIRED: array of steps
      { name = "Approval",    status = "done",    started_at = "...", completed_at = "...",
        actions = { "approve", "reject" } },                        -- optional, see below
      { name = "Tag & Retag", status = "running", started_at = "..." },
      { name = "GitOps",      status = "waiting" },
      { name = "ArgoCD Sync", status = "waiting" },
      { name = "Health Check", status = "waiting" },
    },
    log = {                  -- optional: append-only diagnostic log
      { time = "14:32:01", msg = "Approved by alice", step = 1 },
      { time = "14:32:02", msg = "Creating tag …",    step = 2 },
    },
  }
end)

Canonical step statuses. The dashboard maps these to glyphs and colours; an unknown status falls back to waiting for the visual but the literal text is shown under the circle. Keep your workflows on this list so cross-app glanceability holds.

StatusGlyphMeaning
waitingStep has not started yet
runningStep is in progress
doneStep completed successfully
failedStep terminated with an error
cancelledStep actively got cancelled (e.g. an approval was rejected)
skipped·Step never ran because the workflow ended before reaching it

The distinction between cancelled and skipped matters: a workflow that gets rejected at an approval gate has one cancelled step (the approval itself) and N skipped steps after it. Rendering them identically would mis-suggest that the engine actively cancelled five things; in reality four of them never ran. Authors should use skipped for the never-ran case.

Naming. The schema field is steps, not stages. CI/CD operators will still call them stages colloquially — and that's fine — but the field name stays neutral so non-CI/CD consumers (ETL pipelines, approval flows, long-running background jobs) can adopt the convention without the framing feeling forced.

Live tail. While the tab is open and the workflow is RUNNING, the dashboard polls GET /workflows/{id}/state/pipeline_state every 1s and diff-applies changes to the existing DOM: circles + connectors update in place, log entries append at the bottom. A scroll-lock toggle freezes auto-scroll for operators reading mid-log. Polling stops on tab switch, panel close, or when the run reaches a terminal status.

Step actions. Each step may include an actions = { "approve", "reject", … } array. The dashboard renders one button per action under the step's circle. A click POSTs a step_action signal:

POST /api/v1/engine/workflow/workflows/{id}/signal/step_action
Content-Type: application/json

{ "payload": { "step": "Approval", "action": "approve", "user": "alice" } }

The workflow handler reads the signal and dispatches:

local s = ctx:wait_for_signal("step_action")
if s.step == "Approval" and s.action == "approve" then
  -- domain-specific behaviour lives here
end

The engine routes the signal; the workflow decides what each action means. Pure plumbing on the engine side.

Step log filter. Clicking a step circle filters the log below to entries where step matches that index. Click again to clear. Workflows opt in by setting the step field on each log entry; omitting it means the entry is always visible regardless of which step is selected.

Architectural boundary — engine vs consumer

The Steps tab convention is the first concrete test of a rule that holds across the assay codebase: the engine and its built-in dashboard stay domain-agnostic, and consumers (applications using the engine) own everything domain-specific.

In assay (generic, every consumer benefits)In the consumer (domain logic)
Steps tab + circles + connectorsWorkflow definitions that write the pipeline_state shape
Live snapshot polling + diff-applyActivities that perform the actual side effects (deploys, retags, approvals)
step_action signal routingWorkflow handlers that interpret the actions
Whitelabel / favicon / CSS-overrideUI brand details specific to the deployment
Namespace / queue / worker viewsRBAC, audit, business rules around who may signal what

A new feature lands in assay if and only if every consumer would benefit from it. Anything that encodes domain meaning (what "approve" means, which environments exist, how a tag is computed) belongs in the consumer. This split keeps assay reusable and consumers focused; it also lets a single assay release improve every consumer's dashboard without coupled rollouts.

Concepts

ConceptMeaning
ActivityA unit of concrete work with at-least-once semantics. Result is persisted before the workflow proceeds. Configurable retry + timeout.
WorkflowDeterministic orchestration of activities, sleeps, signals, child workflows. Full event history persisted; crashed worker → replay.
Task queueNamed queue workers subscribe to. Workflows are routed to a queue; only workers on that queue claim them.
NamespaceLogical tenant. Workflows / schedules / workers are namespace-scoped. Default main.
SignalAsync message to a running workflow; consumed via ctx:wait_for_signal.
ScheduleCron expression that starts a workflow recurringly. Leader-elected under Postgres so only one instance fires.
Child workflowWorkflow started by another workflow. Cancellation propagates parent → child recursively.
Side effectNon-deterministic op captured in history on first call so replays see the same value.
Query handlerctx:register_query surface exposing live workflow state via /state[/{name}]. Snapshot written on every replay.
Search attributesIndexed metadata (JSON object) for filtering workflows; updatable at runtime.
Archival stubTerminal workflow moved to S3 by the optional archiver; row stays in Postgres with archive_uri pointing at the bundle.

Example — approval-gated deploy with live state

local workflow = require("assay.workflow")
workflow.connect("http://assay.example.com", { token = env.get("ASSAY_TOKEN") })

workflow.define("ApproveAndDeploy", function(ctx, input)
  local state = { stage = "build", progress = 0 }
  ctx:register_query("pipeline_state", function() return state end)

  local artifact = ctx:execute_activity("build", { ref = input.git_sha })
  state.stage = "awaiting_approval"; state.progress = 0.33

  local approval = ctx:wait_for_signal("approve")
  state.stage = "deploying"; state.progress = 0.66
  ctx:upsert_search_attributes({ approver = approval.by })

  local result = ctx:execute_activity("deploy", {
    image = artifact.image, env = input.target_env, approver = approval.by,
  })
  state.stage = "done"; state.progress = 1.0
  return result
end)

workflow.activity("build",  function(ctx, input) --[[ ... ]] end)
workflow.activity("deploy", function(ctx, input) --[[ ... ]] end)

workflow.listen({ queue = "deploys" })  -- blocks

Drive it from the shell:

assay workflow start --type ApproveAndDeploy --id deploy-1234 \
  --input '{"git_sha":"abc123","target_env":"staging"}'

assay workflow state deploy-1234 pipeline_state   # "awaiting_approval"

assay workflow signal deploy-1234 approve '{"by":"alice"}'

assay workflow wait deploy-1234 --timeout 300   # exit 0 on COMPLETED, 1 on failure, 2 on timeout

Notes