ExperimentsResource

ExperimentsResource ¶

ExperimentsResource(client: Union[Client, AsyncClient])

Bases: BaseResource

arun `async` ¶

arun(experiment: Experiment, rating_mode: RatingMode = DETAILED, n_epochs: int = 1, block: bool = True, timeout: float | None = None, generation_params: GenerationParams | None = None) -> Experiment

Run an existing unrun experiment to generate responses and ratings (async).

create ¶

create(name: str, prompt_template: PromptTemplate | None, collection: TemplateVariablesCollection, llm_config: LLMConfig | None = None, criterion_set: CriterionSet | None = None, description: str = '', generate: bool = False, rating_mode: RatingMode = DETAILED, n_epochs: int = 1, block: bool = False, timeout: float | None = None, generation_params: GenerationParams | None = None, rating_version: str | None = None, response_column_id: int | None = None, evaluation_mode: Literal['STANDARD', 'AGENTIC'] = 'STANDARD', agent_config: dict | None = None, evaluation_agent_config: dict | None = None) -> Experiment

Creates a new experiment.

Note: When block=True with generate=True, this method uses async streaming internally and falls back to the async implementation.

Parameters:

Name	Type	Description	Default
`name`	`str`	The name of the experiment.	required
`prompt_template`	`PromptTemplate \| None`	Optional prompt template to use for the experiment. If omitted, the collection must contain a Conversation or Raw Input column.	required
`collection`	`TemplateVariablesCollection`	The collection of template variables to use for the experiment.	required
`llm_config`	`LLMConfig \| None`	Optional LLMConfig to use for the experiment. Uses platform default if not specified.	`None`
`criterion_set`	`CriterionSet \| None`	Optional criterion set to evaluate against. If omitted, falls back to the prompt template's linked criterion set (if template is provided).	`None`
`description`	`str`	Optional description for the experiment.	`''`
`generate`	`bool`	Whether to generate responses and ratings immediately. Defaults to False.	`False`
`rating_mode`	`RatingMode`	The rating mode to use if generating responses (Only used if generate=True). Defaults to RatingMode.DETAILED.	`DETAILED`
`n_epochs`	`int`	Number of times to run the experiment for each input. Defaults to 1.	`1`
`block`	`bool`	Whether to block until the experiment is executed, only relevant if generate=True. Defaults to False.	`False`
`timeout`	`float \| None`	The timeout for the experiment execution, only relevant if generate=True and block=True. Defaults to None.	`None`
`generation_params`	`GenerationParams \| None`	Optional sampling parameters to override LLMConfig defaults for this experiment. Defaults to None (uses LLMConfig defaults).	`None`
`rating_version`	`str \| None`	Version of core rating to use. If not provided, uses project's default_rating_version. Use "mock" in test environments to avoid actual LLM calls for ratings.	`None`
`response_column_id`	`int \| None`	Optional ID of a collection column to use as pre-existing responses. When set, no LLM generation occurs; responses are taken directly from this column. llm_config and generation_params are ignored, and n_epochs is forced to 1.	`None`
`evaluation_mode`	`Literal['STANDARD', 'AGENTIC']`	The evaluation mode ("STANDARD" or "AGENTIC"). Defaults to "STANDARD".	`'STANDARD'`
`agent_config`	`dict \| None`	Optional agent configuration for agentic experiments.	`None`
`evaluation_agent_config`	`dict \| None`	Optional evaluation agent configuration for agentic experiments.	`None`

Returns:

Name	Type	Description
`Experiment`	`Experiment`	The newly created experiment object. If generate=True,
	`Experiment`	responses and ratings will be generated. The returned experiment object will
	`Experiment`	then include a generation task ID that can be used to check the status of the
	`Experiment`	generation.

Raises:

Type	Description
`HTTPStatusError`	If the experiment with the same name already exists

delete ¶

delete(experiment: Experiment) -> None

Deletes an experiment.

Parameters:

Name	Type	Description	Default
`experiment`	`Experiment`	The experiment to delete.	required

Raises:

Type	Description
`HTTPStatusError`	If the experiment doesn't exist or belongs to a different project.

get ¶

get(*, name: str | None = None, id: int | None = None, fetch_responses: bool = True, categorical_filters: dict[str, list[str]] | None = None) -> Experiment

Get an experiment by name or id.

Parameters:

Name	Type	Description	Default
`name`	`str \| None`	The name of the experiment to get.	`None`
`id`	`int \| None`	The id of the experiment to get.	`None`
`fetch_responses`	`bool`	Whether to fetch responses for the experiment. Defaults to True for backward compatibility. Set to False to save API calls when responses aren't needed.	`True`
`categorical_filters`	`dict[str, list[str]] \| None`	Filter experiment results by categorical column values. Maps column names to lists of values. Multiple values for the same column use OR logic; multiple columns use AND logic. Example: {"category": ["A", "B"], "region": ["US"]} filters for (category=A OR category=B) AND (region=US).	`None`

Returns:

Name	Type	Description
`Experiment`	`Experiment`	The experiment object.

Raises:

Type	Description
`ValueError`	If neither or both name and id are provided, or if not found.

get_or_create ¶

get_or_create(name: str, prompt_template: PromptTemplate | None, collection: TemplateVariablesCollection, llm_config: LLMConfig | None = None, criterion_set: CriterionSet | None = None, description: str = '', generate: bool = False, rating_mode: RatingMode = DETAILED, n_epochs: int = 1, block: bool = False, timeout: float | None = None, generation_params: GenerationParams | None = None, rating_version: str | None = None, response_column_id: int | None = None, evaluation_mode: Literal['STANDARD', 'AGENTIC'] = 'STANDARD', agent_config: dict | None = None, evaluation_agent_config: dict | None = None) -> tuple[Experiment, bool]

Gets an existing experiment by name or creates a new one if it doesn't exist.

The existence of an experiment is determined solely by its name. If an experiment with the given name exists, it will be returned regardless of its other properties. If no experiment exists with that name, a new one will be created with the provided parameters.

Parameters:

Name	Type	Description	Default
`name`	`str`	The name of the experiment to get or create.	required
`prompt_template`	`PromptTemplate \| None`	Optional prompt template to use if creating a new experiment. If omitted, the collection must contain a Conversation or Raw Input column.	required
`collection`	`TemplateVariablesCollection`	The collection of template variables to use if creating a new experiment.	required
`llm_config`	`LLMConfig \| None`	Optional LLMConfig to use if creating a new experiment.	`None`
`criterion_set`	`CriterionSet \| None`	Optional criterion set to use if creating a new experiment. If omitted, falls back to the prompt template's linked criterion set (if template is provided).	`None`
`description`	`str`	Optional description if creating a new experiment.	`''`
`generate`	`bool`	Whether to generate responses and ratings immediately. Defaults to False.	`False`
`rating_mode`	`RatingMode`	The rating mode to use if generating responses. Defaults to RatingMode.DETAILED.	`DETAILED`
`n_epochs`	`int`	Number of times to run the experiment for each input. Defaults to 1.	`1`
`block`	`bool`	Whether to block until the experiment is executed when creating a new experiment, only relevant if generate=True. Defaults to False.	`False`
`timeout`	`float \| None`	The timeout for the experiment execution when creating a new experiment, only relevant if generate=True and block=True. Defaults to None.	`None`
`generation_params`	`GenerationParams \| None`	Optional sampling parameters to override LLMConfig defaults for this experiment. Defaults to None (uses LLMConfig defaults).	`None`
`rating_version`	`str \| None`	Version of core rating to use. If not provided, uses project's default_rating_version. Use "mock" in test environments to avoid actual LLM calls for ratings.	`None`
`response_column_id`	`int \| None`	Optional ID of a collection column to use as pre-existing responses. When set, no LLM generation occurs; responses are taken directly from this column.	`None`
`evaluation_mode`	`Literal['STANDARD', 'AGENTIC']`	The evaluation mode ("STANDARD" or "AGENTIC"). Defaults to "STANDARD".	`'STANDARD'`
`agent_config`	`dict \| None`	Optional agent configuration for agentic experiments.	`None`
`evaluation_agent_config`	`dict \| None`	Optional evaluation agent configuration for agentic experiments.	`None`

Returns:

Type	Description
`tuple[Experiment, bool]`	tuple[Experiment \| ExperimentGenerationStatus, bool]: A tuple containing: - The experiment object (either existing or newly created) - Boolean indicating if a new experiment was created (True) or existing one returned (False)

list ¶

list(prompt_template: PromptTemplate | None = None, collection: TemplateVariablesCollection | None = None, llm_config: LLMConfig | None = None) -> list[Experiment]

Get a list of experiments sorted by creation date.

Parameters:

Name	Type	Description	Default
`prompt_template`	`PromptTemplate \| None`	The prompt template to filter by.	`None`
`collection`	`TemplateVariablesCollection \| None`	The collection to filter by.	`None`
`llm_config`	`LLMConfig \| None`	The LLM config to filter by.	`None`

Returns:

Type	Description
`list[Experiment]`	list[Experiment]: A list of experiments.

run ¶

run(experiment: Experiment, rating_mode: RatingMode = DETAILED, n_epochs: int = 1, block: bool = True, timeout: float | None = None, generation_params: GenerationParams | None = None) -> Experiment

Run an existing unrun experiment to generate responses and ratings.

This method triggers generation for an experiment that was created without running it (i.e., using client.create_experiment() without generate=True).

Parameters:

Name	Type	Description	Default
`experiment`	`Experiment`	The experiment to run.	required
`rating_mode`	`RatingMode`	The rating mode to use (FAST or DETAILED). Defaults to DETAILED.	`DETAILED`
`n_epochs`	`int`	Number of times to run for each input. Defaults to 1.	`1`
`block`	`bool`	Whether to block until generation completes. Defaults to True.	`True`
`timeout`	`float \| None`	Optional timeout in seconds. Only relevant if block=True.	`None`
`generation_params`	`GenerationParams \| None`	Optional sampling parameters to override LLMConfig defaults.	`None`

Returns:

Type	Description
`Experiment`	The experiment with generation_task_id set. If block=True, the experiment
`Experiment`	will include the generated responses and ratings.

Raises:

Type	Description
`HTTPStatusError`	If the experiment has already been run or doesn't exist.

ExperimentsResource