Skip to content

Basic workflows

Defining workflows

Workbench offers basic constructor classes for chaining together processing units and testing or debugging simple workflows:

from workbench.workflows import Workflow, Stage, Job

workflow = Workflow("Workflow Example")
workflow.add_stage(Stage("Extract"))
workflow.stages["Extract"].add_job(BasicMetadataExtraction("Extract metadata")) 
workflow.add_stage(Stage("Analyze"))
workflow.stages['Analyze'].add_job(StandardAnalysis('Execute IM standard'))

Running workflows

The choice of execution method will depend on whether you need to run the entire workflow, a stage, or an individual job.

# Run all workflow stages
for doc_version in session.document_versions:
    # Run all workflow stages
    workflow.run_all_stages_sequentially(doc_version, session)

    # Run all jobs in a specific stage
    workflow.stages['Extract'].run_all_jobs_sequentially(doc_version, session)

    # Run a specific job
    workflow.stages['Analyze'].jobs['Execute IM standard'].run(doc_version, session)

Example: Running a job task for a single document

When running a job on its own it's possible to pass keyword arguments to the run() method. This is not possible when calling all jobs in a stage or all stages in a workflow as the keyword arguments will be different.

classifiers = {}

for classifier in session.classifiers.values():
    if classifier['id'] == 'ISO1':
        classifiers['ISO1'] = classifier

doc_version = session.document_versions[1]

# The job can be run using the job.run(*args, **kwargs) method or called from the workflow object.
workflow.stages['Analyze'].run_job('Execute IM standard', doc_version, session, classifiers=classifiers)