Skip to content

Classifiers

Classifiers categorise an input. Some classifiers use generative AI models to do this, while others use quantitative techniques like vector similarity search.

azure_openai_json_classification

azure_openai_json_classification(classifier, description, filename, content=None, metadata=None, image_url=None, max_characters=8000, system_prompt=None, api_key=None, azure_endpoint=None, api_version=None, model=None, max_retries=5)

Classify an artefact (e.g. a file or page) into a category from a provided list of classifiers. Uses an LLM provided by the Azure OpenAI service.

Parameters:

Name Type Description Default
classifier List[Dict[str, Any]]

A list of classifiers.

required
filename str

Title of the artefact.

required
description str

Description of the artefact.

required
content str

Main body content of the artefact.

None
metadata Dict

Metadata properties about the artefact.

None
image_url str

A URL to an image of the artefact. Must be a signed or public URL that the OpenAI service can access.

None
max_characters int

Character limit before content will be truncated.

8000
system_prompt str

Override the default prompt with custom instructions.

None
api_key str

API key for Azure resource. If not provided will default to environment variable AZURE_OPENAI_API_KEY.

None
azure_endpoint str

Your Azure endpoint, including the resource, e.g. https://example-resource.azure.openai.com/. if not provided will default to environment variable AZURE_OPENAI_ENDPOINT.

None
api_version str

API version for Azure resource.

None
model str

Model deployment name within the Azure resource. If not provided will default to environment variable AZURE_OPENAI_DEPLOYMENT.

None
max_retries int

Number of retry attempts to the Azure OpenAI API service.

5

Returns:

Type Description
Dict[str, Any]

JSON-formatted dictionary containing:

  1. code: selected classification code.
  2. title: title / description of the classification code
  3. certainty: confidence score, either "low", "medium" or "high".
  4. explanation: concise explanation of why this code was chosen.
vector_similarity_search(query, storage_location='./storage', parquet_name='vectors')

Perform vector cosine similarity search between an input query and a set of embeddings stored in a local parquet file.

Parameters:

Name Type Description Default
query str

The input string for which the similarity search needs to be performed.

required
storage_location str

Storage location of the Parquet embeddings files.

'./storage'
parquet_name str

The name of the Parquet embeddings files.

'vectors'

Returns:

Type Description
Dict[Any, Any]

A dictionary object of the most similar match.