Skip to content

Bindings

Bindings are how Workbench connects to remotely-stored documents for analysis.

AzureBlobSession

Bases: Session

get_processing_status

get_processing_status()

Check if all files in the session have completed processing.

Returns:

Type Description
str

completed if all files have status 'completed', error if any file has status 'failed', processing if no errors and any file has status 'processing', else draft.

results_to_df

results_to_df()

Transmute all document version results to a Pandas dataframe:

  1. Inserting results to the correct dataframe columns according to their field names
  2. Merging user edits and AI edits into their correct cell positions

Returns:

Type Description
DataFrame

Each row in the dataframe is a document version.

set_file_count

set_file_count(new_file_count)

Set the file count in the session properties JSON.

Parameters:

Name Type Description Default
new_file_count int

The count of files

required

Returns:

Type Description
bool

True if operation successful, else false.

set_processing_status

set_processing_status(new_status)

Set the processing status in the session properties JSON.

Parameters:

Name Type Description Default
new_status str

The status to set.

required

Returns:

Type Description
bool

True if operation successful, else false.

AuditableMixin

Bases: ABC

Abstract mix-in class requiring audit-tracking capabilities.

DocumentVersion

Bases: AuditableMixin, SignedEntity

bind_chunk abstractmethod

bind_chunk(content=None, sheet_index=0, chunk_index=0)

Short-hand method to upload a text chunk to a sheet. If content not provided then initializes an empty sheet chunk.

Returns:

Name Type Description
signed_url str

Signed URL to read/write the chunk.

bind_image abstractmethod

bind_image(content=None, sheet_index=0, image_index=0)

Short-hand method to upload an image to a sheet. If content not provided then initializes an empty sheet chunk.

Returns:

Name Type Description
signed_url str

Signed URL to read/write the image.

bind_row abstractmethod

bind_row(content=None, sheet_index=0, row_index=0)

Short-hand method to upload a row to a sheet. If content not provided then initializes an empty sheet row.

Returns:

Name Type Description
signed_url str

Signed URL to read/write the row.

bind_sheet abstractmethod

bind_sheet(sheet_index=0)

Short-hand method to add a sheet of the relevant concrete class (e.g. AzureDocumentVersionSheet) to the DocumentVersion at the specified index. IF concrete object already exists at the specified sheet_index then this will not be overwritten.

content abstractmethod

content()

Short-hand method to get all the text content from each sheet.

get

get(max_retries=3, backoff_factor=1.5, page_limit=10, max_download_size=400 * 1024 * 1024)

Fetch data from the signed URL with exponential backoff retry logic.

For PDFs, can extract only first N pages to reduce memory usage. The PDF page extractor is intelligent enough to recognize end-of-file (EOF) termination characters before the last bytes, allowing extraction of pages from PDFs substantially larger than the file size limit.

For ZIP files, extracts the file matching self.file_name from the archive.

Parameters:

Name Type Description Default
max_retries int

Maximum number of retry attempts. Defaults to 3.

3
backoff_factor float

Exponential backoff multiplier for retry delays. Defaults to 1.5.

1.5
page_limit int

Number of pages to extract (applies only to PDFs). Defaults to 10.

10
max_download_size int

Maximum bytes to attempt to download. Defaults to 400MB.

400 * 1024 * 1024

Returns:

Name Type Description
bytes bytes

The downloaded document content.

Raises:

Type Description
RuntimeError

If all retry attempts fail.

ValueError

If the file size exceeds max_download_size.

FileNotFoundError

If the specified file is not found in a ZIP archive.

pad_sheets

pad_sheets(sheet_number)

Pad with None until we reach the desired index

EntityMixin

Bases: SerializableMixin

Combined mixin providing data access, caching, serialization, and dictionary-like interface

__contains__

__contains__(key)

Check if a key exists using 'in' operator.

__getitem__

__getitem__(key)

Allow dictionary-style access to data attributes.

This works with cached data and automatically converts nested objects.

get abstractmethod

get()

Fetch data from the underlying data store - implement in concrete classes

keys

keys()

Return all public attribute names, properties, and data keys.

set abstractmethod

set(data=None)

Persist data to the underlying data store - implement in concrete classes

MetadataSpecification

Bases: AuditableMixin, EntityMixin

excel_from_classifiers

excel_from_classifiers()

Export classifiers to Excel workbook with:

  1. Summary sheet with all classifiers and their top-level properties
  2. Individual sheets for each classifier's picklist options

Returns:

Type Description
BytesIO

Excel workbook as bytes that can be downloaded or sent via API.

turtle_from_classifiers

turtle_from_classifiers()

Convert the information standard classifiers dictionary into Turtle, a textual syntax language for RDF triples that can be imported into other systems.

Returns:

Type Description
str

A string-formatted JSON-LD Turtle definition.

Results

Bases: EntityMixin

add abstractmethod

add(id, name, value, method='workflow', certainty=None, explanation=None)

Adds a new PropertyValue to the result object. Does not write results to storage. For this, set() method must be called.

SerializableMixin

Mixin providing data serialization, and dictionary-like interface

__contains__

__contains__(key)

Check if a key exists using 'in' operator.

__getitem__

__getitem__(key)

Allow dictionary-style access to data attributes.

This works with cached data and automatically converts nested objects.

__setitem__

__setitem__(key, value)

Allow dictionary-style setting of data attributes.

items

items()

Return public key-value pairs.

keys

keys()

Return all public attribute names, properties, and data keys.

to_dict

to_dict()

Convert object attributes to dictionary format.

Recursively converts the object and all its properties into dictionaries for serialization purposes. Only includes public attributes and properties.

values

values()

Return all public attribute values, property values, and data values.

Session

Bases: AuditableMixin, SerializableMixin

__call__

__call__()

Initialize the session and return all public Session properties.

Returns:

Name Type Description
list list[tuple[str, Any]]

List of (key, value) tuples for all public Session properties. Includes properties inherited from SerializableMixin and AuditableMixin.

Example

session_data = session() for key, value in session_data: ... print(f"{key}: {value}")

flat

flat()

Generator that yields (id, object) tuples for the document hierarchy.

Returns a flat view of the nested document structure, yielding each object with its id as the key. This does not modify the Session, it only provides an iterable view.

Yields:

Type Description
tuple[str, DocumentVersion | DocumentVersionSheet | SheetItem]

Tuple[str, object]: (id, object) pairs for: - DocumentVersion objects - DocumentVersionSheet objects - SheetItem objects from chunks, images, and rows lists

Example

for obj_id, obj in session.flat(): ... print(f"{obj_id}: {type(obj).name}")

SignedEntity

Bases: EntityMixin, ABC

Abstract base class for entities with secure URL access.

Represents a container that holds information with temporary signed URL access for reading and writing data. Provides automatic URL regeneration.

signed_url property writable

signed_url

Get a valid signed URL, regenerating if necessary.

__init__

__init__(signed_url=None, url_generator=None)

Initialize an information container.

Parameters:

Name Type Description Default
signed_url str | None

Initial signed URL for the container.

None
url_generator Callable[[], str] | None

Function to regenerate expired URLs.

None

get abstractmethod

get()

Fetch data from the signed URL or return cached body.

Subclasses should implement this method with their own signature and logic as needed.