Skip to content

Bindings

Bindings are how Workbench exposes sessions (batches) of documents for analysis. This guide expands on the concepts introduced in Overview.

Hoppa uses Azure Blob Storage in its document management layer, so the default session binding is AzureBlobSession. To bind to a session you will need details of the parent organization, parent workspace, session_id and (optionally) a user_id. A user id is recommended when using Workbench with the Hoppa logging service so that all workflow log entries are traceable.

Custom session binding (advanced users)

If your documents are already hosted in a blob storage system (such as AWS S3 or Azure Blob Storage) that supports signed URL or header auth blob requests, then you can write your own custom logic for connecting to the source and mapping documents into the Workbench session binding schema. We recommend starting from the base Session class in the workbench.bindings sub-package and using the AzureBlobSession binding as guidance.

When interacting with 3rd party data source - such as Microsoft SharePoint - a user id is no longer an optional argument. This is because Hoppa can only perform read/write operations on files and directory structures that the user has been granted permissions for. Without a user_id the initialize() method will fail.

Instantiating a session object loads basic properties such as the session metadata standard and a storage client for interacting directly with the underlying blob directory (if required). Document versions are bound by calling the initialize() method. This pattern is preferred because indexing all the documents in a session can be resource-intensive and time-consuming for large sessions and may not always be required - for example if the session binding is only being used to review the session workflow.

from workbench.bindings import AzureBlobSession

# Instantiate session
session = AzureBlobSession(organization, workspace, session_id, user_id)

# Bind to session documents
session.initialize()

Under the hood, the session binding is managing all connections to 3rd party data sources. This means you can collate documents from many data sources in one session, and analyse in bulk.