Skip to content

Searchers

Searchers answer questions about the input data they are provided with. This could be as simple as named entity extraction (places, people, things) or more complex queries like summarising key recommendations.

azure_ai_language_search(df, column_name, confidence_threshold=0.98, batch_size=10000)

Extracts entities from text data using Azure AI Language Client and filters entities based on the specified confidence threshold. The function processes the DataFrame in batches and returns the original DataFrame with additional columns for the recognized entities.

Parameters:

Name Type Description Default
df DataFrame

A DataFrame containing a column with text data to be analyzed.

required
column_name str

The name of the DataFrame column containing text fields to search for entities.

required
confidence_threshold float

A threshold value between 0 and 1 to filter entities based on their confidence score. Only entities with a confidence score greater than or equal to this value are retained.

0.98
batch_size int

The size of each batch to process in parallel.

10000

Returns:

Type Description
DataFrame

pandas.DataFrame: The original DataFrame with additional columns for recognized entities, where each entity category (e.g., 'Organization', 'Location') is added as a new column. Where entity columns match an existing dataframe column name, columns are suffixed with '_entities'.

azure_openai_search(search_term, content, metadata=None, image_url=None, max_characters=200000, system_prompt=None, api_key=None, azure_endpoint=None, api_version=None, model=None, max_retries=5, response_format=None)

Searches a document to find any free-text content that matches a search query.

Parameters:

Name Type Description Default
search_term str

The search query

required
content str

The main-body of the prompt

required
metadata dict | None

Any additional information describing the content

None
image_url str | None

A signed or public URL where an image can be found

None
max_characters int

Maximum number of text characters (excluding images) from the text prompt to be included in the prompt. Any additional characters are trimmed.

200000
system_prompt str | None

Overrides the default prompt

None
api_key str | None

Access key for the Azure OpenAI resource

None
azure_endpoint str | None

Your Azure endpoint, including the resource, e.g. https://example-resource.azure.openai.com/. if not provided will default to environment variable AZURE_OPENAI_ENDPOINT.

None
api_version str | None

API version for Azure resource.

None
model str | None

Model deployment name within the Azure resource. If not provided will default to environment variable AZURE_OPENAI_DEPLOYMENT.

None
max_retries int

Maximum number of unsuccessful call attempts to the OpenAI service before returning an error.

5
response_format dict | None

Optional response format specification (e.g., {"type": "json_object"}).

None

Returns:

Type Description
str

Plain-text answer to the query, or an empty string if no answer found in the document.