Skip to content

Searchers

Searchers answer questions about the input data they are provided with. This could be as simple as named entity extraction (places, people, things) or more complex queries like summarising key recommendations.

azure_ai_language_search(df, column_name, confidence_threshold=0.98, batch_size=10000)

Extracts entities from text data using Azure AI Language Client and filters entities based on the specified confidence threshold. The function processes the DataFrame in batches and returns the original DataFrame with additional columns for the recognized entities.

Parameters:

Name Type Description Default
df DataFrame

A DataFrame containing a column with text data to be analyzed.

required
column_name str

The name of the DataFrame column containing text fields to search for entities.

required
confidence_threshold float

A threshold value between 0 and 1 to filter entities based on their confidence score. Only entities with a confidence score greater than or equal to this value are retained.

0.98
batch_size int

The size of each batch to process in parallel.

10000

Returns:

Type Description
DataFrame

pandas.DataFrame: The original DataFrame with additional columns for recognized entities, where each entity category (e.g., 'Organization', 'Location') is added as a new column. Where entity columns match an existing dataframe column name, columns are suffixed with '_entities'.

azure_openai_search(search_term, content, metadata=None, image_url=None, max_characters=8000, system_prompt=None, api_key=None, azure_endpoint=None, api_version=None, model=None, max_retries=5)

Searches a document to find any free-text content that matches a search query.

Parameters:

Name Type Description Default
search_term str

The search query

required
content str

The main-body of the prompt

required
metadata dict

Any additional information describing the content

None
image_url str

A signed or public URL where an image can be found

None
max_characters int

Maximum number of text characters (excluding images) from content to be included in the prompt. Any additional characters are trimmed from content.

8000
system_prompt str

Overrides the default prompt

None
api_key str

Access key for the Azure OpenAI resource

None
azure_endpoint str

Your Azure endpoint, including the resource, e.g. https://example-resource.azure.openai.com/. if not provided will default to environment variable AZURE_OPENAI_ENDPOINT.

None
api_version str

API version for Azure resource.

None
model str

Model deployment name within the Azure resource. If not provided will default to environment variable AZURE_OPENAI_DEPLOYMENT.

None
max_retries int

Maximum number of unsuccessful call attempts to the OpenAI service before returning an error.

5

Returns:

Type Description
str

Plain-text answer to the query, or an empty string if no answer found in the document.