API Documentation

The primary method of using RSMTool is via the command-line scripts rsmtool, rsmeval, rsmpredict, rsmcompare, and rsmsummarize. However, there are certain functions in the rsmtool API that may also be useful to advanced users for use directly in their Python code. We document these functions below.

rsmtool Package

rsmtool.run_experiment(config_file_or_obj_or_dict: str | Configuration | Dict[str, Any] | Path, output_dir: str, overwrite_output: bool = False, logger: Logger | None = None, wandb_run: Run | RunDisabled | None = None) None[source]

Run an rsmtool experiment using the given configuration.

Run rsmtool experiment using the given configuration file, object, or dictionary. All outputs are generated under output_dir. If overwrite_output is True, any existing output in output_dir is overwritten.

Parameters:
  • config_file_or_obj_or_dict (Union[str, Configuration, Dict[str, Any], Path]) – Path to the experiment configuration file as either a string or as a pathlib.Path object. Users can also pass a Configuration object that is in memory or a Python dictionary with keys corresponding to fields in the configuration file. Given a configuration file, any relative paths in the configuration file will be interpreted relative to the location of the file. Given a Configuration object, relative paths will be interpreted relative to the configdir attribute, that _must_ be set. Given a dictionary, the reference path is set to the current directory.

  • output_dir (str) – Path to the experiment output directory.

  • overwrite_output (bool) – If True, overwrite any existing output under output_dir. Defaults to False.

  • logger (Optional[logging.Logger]) – A Logger object. If None is passed, get logger from __name__. Defaults to None.

  • wandb_run (Union[wandb.wandb_run.Run, wandb.sdk.lib.RunDisabled, None]) – A wandb run object that will be used to log artifacts and tables. If None is passed, a new wandb run will be initialized if wandb is enabled in the configuration. Defaults to None.

Raises:
  • FileNotFoundError – If any of the files contained in config_file_or_obj_or_dict cannot be located.

  • IOError – If output_dir already contains the output of a previous experiment and overwrite_output is False.

  • ValueError – If the current configuration specifies a non-linear model but output_dir already contains the output of a previous experiment that used a linear model with the same experiment ID.

Return type:

None

rsmtool.run_evaluation(config_file_or_obj_or_dict: str | Configuration | Dict[str, Any] | Path, output_dir: str, overwrite_output: bool = False, logger: Logger | None = None, wandb_run: Run | RunDisabled | None = None) None[source]

Run an rsmeval experiment using the given configuration.

All outputs are generated under output_dir. If overwrite_output is True, any existing output in output_dir is overwritten.

Parameters:
  • config_file_or_obj_or_dict (Union[str, Configuration, Dict[str, Any], Path]) – Path to the experiment configuration file either a string or as a pathlib.Path object. Users can also pass a Configuration object that is in memory or a Python dictionary with keys corresponding to fields in the configuration file. Given a configuration file, any relative paths in the configuration file will be interpreted relative to the location of the file. Given a Configuration object, relative paths will be interpreted relative to the configdir attribute, that _must_ be set. Given a dictionary, the reference path is set to the current directory.

  • output_dir (str) – Path to the experiment output directory.

  • overwrite_output (bool) – If True, overwrite any existing output under output_dir. Defaults to False.

  • logger (Optional[logging.Logger]) – A Logger object. If None is passed, get logger from __name__. Defaults to None.

  • wandb_run (Union[wandb.wandb_run.Run, wandb.sdk.lib.RunDisabled, None]) – A wandb run object that will be used to log artifacts and tables. If None is passed, a new wandb run will be initialized if wandb is enabled in the configuration. Defaults to None.

Raises:
  • FileNotFoundError – If any of the files contained in config_file_or_obj_or_dict cannot be located.

  • IOError – If output_dir already contains the output of a previous experiment and overwrite_output is False.

Return type:

None

rsmtool.run_comparison(config_file_or_obj_or_dict: str | Configuration | Dict[str, Any] | Path, output_dir: str) None[source]

Run an rsmcompare experiment using the given configuration.

Use the given configuration file, object, or dictionary and generate the report in the given directory.

Parameters:
  • config_file_or_obj_or_dict (Union[str, Configuration, Dict[str, Any], Path]) – Path to the experiment configuration file either a a string or as a pathlib.Path object. Users can also pass a Configuration object that is in memory or a Python dictionary with keys corresponding to fields in the configuration file. Given a configuration file, any relative paths in the configuration file will be interpreted relative to the location of the file. Given a Configuration object, relative paths will be interpreted relative to the configdir attribute, that _must_ be set. Given a dictionary, the reference path is set to the current directory.

  • output_dir (str) – Path to the experiment output directory.

Raises:
  • FileNotFoundError – If either of the two input directories in config_file_or_obj_or_dict do not exist.

  • FileNotFoundError – If the directories do not contain rsmtool outputs at all.

Return type:

None

rsmtool.run_summary(config_file_or_obj_or_dict: str | Configuration | Dict[str, Any] | Path, output_dir: str, overwrite_output: bool = False, logger: Logger | None = None, wandb_run: Run | RunDisabled | None = None) None[source]

Run rsmsummarize experiment using the given configuration.

Summarize several rsmtool experiments using the given configuration file, object, or dictionary. All outputs are generated under output_dir. If overwrite_output is True, any existing output in output_dir is overwritten.

Parameters:
  • config_file_or_obj_or_dict (Union[str, Configuration, Dict[str, Any], Path]) – Path to the experiment configuration file either a a string or as a pathlib.Path object. Users can also pass a Configuration object that is in memory or a Python dictionary with keys corresponding to fields in the configuration file. Given a configuration file, any relative paths in the configuration file will be interpreted relative to the location of the file. Given a Configuration object, relative paths will be interpreted relative to the configdir attribute, that _must_ be set. Given a dictionary, the reference path is set to the current directory.

  • output_dir (str) – Path to the experiment output directory.

  • overwrite_output (bool) – If True, overwrite any existing output under output_dir. Defaults to False.

  • logger (Optional[logging.Logger]) – A Logger object. If None is passed, get logger from __name__. Defaults to None.

  • wandb_run (Union[wandb.wandb_run.Run, wandb.sdk.lib.RunDisabled, None]) – A wandb run object that will be used to log artifacts and tables. If None is passed, a new wandb run will be initialized if wandb is enabled in the configuration. Defaults to None.

Raises:

IOError – If output_dir already contains the output of a previous experiment and overwrite_output is False.

Return type:

None

rsmtool.compute_and_save_predictions(config_file_or_obj_or_dict: str | Configuration | Dict[str, Any] | Path, output_file: str, feats_file: str | None = None, logger: Logger | None = None, wandb_run: Run | RunDisabled | None = None) None[source]

Run rsmpredict using the given configuration.

Generate predictions using given configuration file, object, or dictionary. Predictions are saved in output_file. Optionally, pre-processed feature values are saved in feats_file, if specified.

Parameters:
  • config_file_or_obj_or_dict (Union[str, Configuration, Dict[str, Any], Path]) – Path to the experiment configuration file either a a string or as a pathlib.Path object. Users can also pass a Configuration object that is in memory or a Python dictionary with keys corresponding to fields in the configuration file. Given a configuration file, any relative paths in the configuration file will be interpreted relative to the location of the file. Given a Configuration object, relative paths will be interpreted relative to the configdir attribute, that _must_ be set. Given a dictionary, the reference path is set to the current directory.

  • output_file (str) – The path to the output file.

  • feats_file (Optional[str]) – Path to the output file for saving preprocessed feature values.

  • logger (Optional[logging.Logger]) – A Logger object. If None is passed, get logger from __name__. Defaults to None.

  • wandb_run (Union[wandb.wandb_run.Run, wandb.sdk.lib.RunDisabled, None]) – A wandb run object that will be used to log artifacts and tables. If None is passed, a new wandb run will be initialized if wandb is enabled in the configuration. Defaults to None.

Raises:
  • FileNotFoundError – If any of the files contained in config_file_or_obj_or_dict cannot be located.

  • FileNotFoundError – If experiment_dir does not exist.

  • FileNotFoundError – If experiment_dir does not contain the required output needed from an rsmtool experiment.

  • RuntimeError – If the name of the output file does not end in “.csv”, “.tsv”, or “.xlsx”.

Return type:

None

rsmtool.fast_predict(input_features: Dict[str, float], modeler: Modeler, df_feature_info: DataFrame | None = None, trim: bool = False, trim_min: float | None = None, trim_max: float | None = None, trim_tolerance: float | None = None, scale: bool = False, train_predictions_mean: float | None = None, train_predictions_sd: float | None = None, h1_mean: float | None = None, h1_sd: float | None = None, logger: Logger | None = None) Dict[str, float][source]

Compute predictions for a single instance against given model.

The main difference between this function and the compute_and_save_predictions() function is that the former is meant for batch prediction and reads all its inputs from disk and writes its outputs to disk. This function, however, is meant for real-time inference rather than batch. To this end, it operates entirely in memory. Note that there is still a bit of overlap between the two computation paths since we want to use the RSMTool API as much as possible.

This function should only be used when the goal is to generate predictions using RSMTool models in production. The user should read everything from disk in a separate thread/function and pass the inputs to this function.

Note that this function only computes regular predictions, not expected scores.

Parameters:
  • input_features (Dict[str, float]) – A dictionary containing the features for the instance for which to generate the model predictions. The keys should be names of the features on which the model was trained and the values should be the raw feature values.

  • modeler (Modeler) – The RSMTool Modeler object from which the predictions are to be generated. This object should be created from the already existing .model file in the “output” directory of the previously run RSMTool experiment.

  • df_feature_info (Optional[pandas.DataFrame]) –

    If None, this function will try to extract this information from modeler.

    A DataFrame containing the information regarding the model features. The index of the dataframe should be the names of the features and the columns should be:

    • ”sign” : 1 or -1. Indicates whether the feature value needs to be multiplied by -1.

    • ”transform” : transformation that needs to be applied to this feature.

    • ”train_mean”, “train_sd” : mean and standard deviation for outlier truncation.

    • ”train_transformed_mean”, “train_transformed_sd” : mean and standard deviation for computing z-scores.

    This dataframe should be read from the “feature.csv” file under the “output” directory of the previously run RSMTool experiment.

    Defaults to None.

  • trim (bool) – Whether to trim the predictions. If True, trim_min and trim_max must be specified or be available as attributes of the modeler. Defaults to False.

  • trim_min (Optional[float]) – The lowest possible score that the machine should predict. If None, this function will try to extract this value from modeler. If None, no such attribute exists, and trim=True, a ValueError will be raised. Defaults to None.

  • trim_max (Optional[float]) – The highest possible score that the machine should predict. If None, this function will try to extract this value from modeler. If None, no such attribute exists, and trim=True, a ValueError will be raised. Defaults to None.

  • trim_tolerance (Optional[float]) – The single numeric value that will be used to pad the trimming range specified in trim_min and trim_max. If None, this function will try to extract this value from modeler. If no such attribute can be found, the value will default to 0.4998. Defaults to None.

  • scale (bool) – Whether to scale predictions. If True, all of train_predictions_mean, train_predictions_sd, h1_mean, and h1_sd must be specified or be available as attributes of modeler. Defaults to False.

  • train_predictions_mean (Optional[float]) – The mean of the predictions on the training set used to re-scale the predictions. May be read from the “postprocessing_params.csv” file under the “output” directory of the RSMTool experiment used to train the model. If None, this function will try to extract this value from modeler. If None, no such attribute exists, and scale=True, a ValueError will be raised. Defaults to None.

  • train_predictions_sd (Optional[float]) – The standard deviation of the predictions on the training set used to re-scale the predictions. May be read from the “postprocessing_params.csv” file under the “output” directory of the RSMTool experiment used to train the model. If None, this function will try to extract this value from modeler. If None and no such attribute exists, predictions will not be scaled. Defaults to None.

  • h1_mean (Optional[float]) – The mean of the human scores in the training set also used to re-scale the predictions. May be read from the “postprocessing_params.csv” file under the “output” directory of the RSMTool experiment used to train the model. If None, this function will try to extract this value from modeler. If None, no such attribute exists, and scale=True, a ValueError will be raised. Defaults to None.

  • h1_sd (Optional[float]) – The standard deviation of the human scores in the training set used to re-scale the predictions. May be read from the “postprocessing_params.csv” file under the “output” directory of the RSMTool experiment used to train the model. If None, this function will try to extract this value from modeler. If None, no such attribute exists, and scale=True, a ValueError will be raised. Defaults to None.

  • logger (Optional[logging.Logger]) – A Logger object. If None is passed, get logger from __name__. Defaults to None.

Returns:

A dictionary containing the raw, scaled, trimmed, and rounded predictions for the input features. It always contains the “raw” key and may contain the following additional keys depending on the availability of the various optional arguments: “raw_trim”, “raw_trim_round”, “scale”, “scale_trim”, and “scale_trim_round”.

Return type:

Dict[str, float]

Raises:
  • ValueError – If input_features contains any non-numeric features

  • ValueError – If trimming/scaling is turned on but related parameters are either not specified or cannot be found as attributes in modeler/have a value of None

  • ValueError – If trimming/scaling-related parameters are specified but trimming/scaling is turned off

  • ValueError – If feature information is either not specified or cannot be found as an attribute in modeler/has a value of None

rsmtool.generate_explanation(config_file_or_obj_or_dict: str | Configuration | Dict[str, Any] | Path, output_dir: str, overwrite_output: bool = False, logger: Logger | None = None, wandb_run: Run | RunDisabled | None = None)[source]

Generate a shap.Explanation object.

This function does all the heavy lifting. It loads the model, creates an explainer, and generates an explanation object. It then calls generate_report() in order to generate a SHAP report.

Parameters:
  • config_file_or_obj_or_dict (Union[str, Configuration, Dict[str, Any], Path]) – Path to the experiment configuration file either as a string or as a pathlib.Path object. Users can also pass a Configuration object that is in memory or a Python dictionary with keys corresponding to fields in the configuration file. Given a configuration file, any relative paths in the configuration file will be interpreted relative to the location of the file. Given a Configuration object, relative paths will be interpreted relative to the configdir attribute, that _must_ be set. Given a dictionary, the reference path is set to the current directory.

  • output_dir (str) – Path to the experiment output directory.

  • overwrite_output (bool) – If True, overwrite any existing output under output_dir. Defaults to False.

  • logger (Optional[logging.Logger]) – A Logger object. If None is passed, get logger from __name__. Defaults to None.

  • wandb_run (Union[wandb.wandb_run.Run, wandb.sdk.lib.RunDisabled, None]) – A wandb run object that will be used to log artifacts and tables. If None is passed, a new wandb run will be initialized if wandb is enabled in the configuration. Defaults to None.

Raises:
  • FileNotFoundError – If any file contained in config_file_or_obj_or_dict cannot be located.

  • ValueError – If both sample_range and sample_size are defined in the configuration file.

From analyzer Module

Classes for analyzing RSMTool predictions, metrics, etc.

author:

Jeremy Biggs (jbiggs@ets.org)

author:

Anastassia Loukina (aloukina@ets.org)

author:

Nitin Madnani (nmadnani@ets.org)

organization:

ETS

class rsmtool.analyzer.Analyzer(logger: Logger | None = None)[source]

Bases: object

Class to perform analysis on all metrics, predictions, etc.

Initialize the Analyzer object.

static analyze_excluded_responses(df: DataFrame, features: List[str], header: str, exclude_zero_scores: bool = True, exclude_listwise: bool = False) DataFrame[source]

Compute statistics for responses excluded from analyses.

This method computes various statistics for the responses that were excluded from analyses, either in the training set or in the test set.

Parameters:
  • df (pandas.DataFrame) – Data frame containing the excluded responses

  • features (List[str]) – List of column names containing the features to which we want to restrict the analyses.

  • header (str) – String to be used as the table header for the output data frame.

  • exclude_zero_scores (bool) – Whether or not the zero-score responses should be counted in the exclusion statistics. Defaults to True.

  • exclude_listwise (bool) – Whether or not the candidates were excluded based on minimal number of responses. Defaults to False.

Returns:

df_full_crosstab – Two-dimensional data frame containing the exclusion statistics.

Return type:

pandas.DataFrame

static analyze_used_predictions(df_test: DataFrame, subgroups: List[str], candidate_column: str) DataFrame[source]

Compute various statistics for predictions used in analyses.

Parameters:
  • df_test (pandas.DataFrame) – Data frame containing the test set predictions.

  • subgroups (List[str]) – List of column names that contain grouping information.

  • candidate_column (str) – Column name that contains candidate identification information.

Returns:

df_analysis – Data frame containing information about the used predictions.

Return type:

pandas.DataFrame

static analyze_used_responses(df_train: DataFrame, df_test: DataFrame, subgroups: List[str], candidate_column: str) DataFrame[source]

Compute statistics for responses used in analyses.

This method computes various statistics on the responses that were used in analyses, either in the training set or in the test set.

Parameters:
  • df_train (pandas.DataFrame) – Data frame containing the response information for the training set.

  • df_test (pandas.DataFrame) – Data frame containing the response information for the test set.

  • subgroups (List[str]) – List of column names that contain grouping information.

  • candidate_column (str) – Column name that contains candidate identification information.

Returns:

df_analysis – Data frame containing information about the used responses.

Return type:

pandas.DataFrame

static check_frame_names(data_container: DataContainer, dataframe_names: List[str]) None[source]

Check that all specified dataframes are available.

This method checks to make sure all specified DataFrames are in the given data container object.

Parameters:
  • data_container (DataContainer) – A DataContainer object

  • dataframe_names (List[str]) – The names of the DataFrames expected in the DataContainer object.

Raises:

KeyError – If a given dataframe_name is not in the DataContainer object.

Return type:

None

static check_param_names(configuration_obj: Configuration, parameter_names: List[str]) None[source]

Check that all specified parameters are available.

This method checks to make sure all specified parameters are in the given configuration object.

Parameters:
  • configuration_obj (Configuration) – A configuration object

  • parameter_names (List[str]) – The names of the parameters (keys) expected in the Configuration object.

Raises:

KeyError – If a given parameter_name is not in the Configuration object.

Return type:

None

static compute_basic_descriptives(df: DataFrame, selected_features: List[str]) DataFrame[source]

Compute basic descriptive statistics for columns in the given data frame.

Parameters:
  • df (pandas.DataFrame) – Input data frame containing the feature values.

  • selected_features (List[str]) – List of feature names for which to compute the descriptives.

Returns:

df_desc – DataFrame containing the descriptives for each of the features.

Return type:

pandas.DataFrame

compute_correlations_by_group(df: DataFrame, selected_features: List[str], target_variable: str, grouping_variable: str, include_length: bool = False) Tuple[DataFrame, DataFrame, DataFrame][source]

Compute marginal and partial correlations against target variable.

This method computes various marginal and partial correlations of the given columns in the given data frame against the target variable for all data and for each level of the grouping variable.

Parameters:
  • df (pandas.DataFrame) – Input data frame.

  • selected_features (List[str]) – List of feature names for which to compute the correlations.

  • target_variable (str) – Feature name indicating the target variable i.e., the dependent variable

  • grouping_variable (str) – Feature name that contain the grouping information

  • include_length (bool) – Whether or not to include the length when computing the partial correlations. Defaults to False.

Return type:

Tuple[DataFrame, DataFrame, DataFrame]

Returns:

  • df_target_cors (pandas.DataFrame) – Data frame containing Pearson’s correlation coefficients for marginal correlations between features and target_variable.

  • df_target_partcors (pandas.DataFrame) – Data frame containing Pearson’s correlation coefficients for partial correlations between each feature and target_variable after controlling for all other features. If include_length is set to True, the “length” column will not be included in the partial correlation computation.

  • df_target_partcors_no_length (pandas.DataFrame) – If include_length is set to True: Data frame containing Pearson’s correlation coefficients for partial correlations between each feature and target_variable after controlling for “length”. Otherwise, it will be an empty data frame.

compute_degradation_and_disattenuated_correlations(df: DataFrame, use_all_responses: bool = True) Tuple[DataFrame, DataFrame][source]

Compute the degradation in performance when using system score.

This method computes the degradation in performance when using the system to predict the score instead of a second human and also the disattenuated correlations between human and system scores. These are computed as the Pearson’s correlation between the human score and the system score divided by the square root of correlation between two human raters.

For this, we can compute the system performance either only on the double scored data or on the full dataset. Both options have their pros and cons. The default is to use the full dataset. This function also assumes that the sc2 column exists in the given data frame, in addition to sc1 and the various types of predictions.

Parameters:
  • df (pandas.DataFrame) – Input data frame.

  • use_all_responses (bool) – Use the full data set instead of only using the double-scored subset. Defaults to True.

Return type:

Tuple[DataFrame, DataFrame]

Returns:

  • df_degradation (pandas.DataFrame) – Data frame containing the degradation statistics.

  • df_correlations (pandas.DataFrame) – Data frame containing the human-system correlations, human-human correlations and disattenuated correlation.

static compute_disattenuated_correlations(human_system_corr: Series, human_human_corr: Series) DataFrame[source]

Compute disattenuated correlations between human and system scores.

These are computed as the Pearson’s correlation between the human score and the system score divided by the square root of correlation between two human raters.

Parameters:
  • human_system_corr (pandas.Series) – Series containing of pearson’s correlation coefficients human-system correlations.

  • human_human_corr (pandas.Series) – Series containing of pearson’s correlation coefficients for human-human correlations. This can contain a single value or have the index matching that of human-system correlations.

Returns:

df_correlations – Data frame containing the human-system correlations, human-human correlations, and disattenuated correlations.

Return type:

pandas.DataFrame

compute_metrics(df: DataFrame, compute_shortened: bool = False, use_scaled_predictions: bool = False, include_second_score: bool = False, population_sd_dict: Dict[str, float | None] | None = None, population_mn_dict: Dict[str, float | None] | None = None, smd_method: str = 'unpooled', use_diff_std_means: bool = False) Tuple[DataFrame, DataFrame, DataFrame][source]

Compute association metrics for scores in the given data frame.

This function compute association metrics for all score types. If include_second_score is True, then it is assumed that a column called sc2 containing a second human score is available and it should be used to compute the human-human evaluation stats and the performance degradation statistics.

If compute_shortened is True, then this function also computes a shortened version of the full human-system metrics data frame. See filter_metrics() for the description of the default columns included in the shortened data frame.

Parameters:
  • df (pandas.DataFrame) – Input data frame

  • compute_shortened (bool) – Also compute a shortened version of the full metrics data frame. Defaults to False.

  • use_scaled_predictions (bool) – Use evaluations based on scaled predictions in the shortened version of the metrics data frame. Defaults to False.

  • include_second_score (bool) – Second human score available. Defaults to False.

  • population_sd_dict (Optional[Dict[str, Optional[float, None]]]) – Dictionary containing population standard deviation for each column containing human or system scores. This is used to compute SMD for subgroups. If None, a dummy dictionary is created that sets the standard deviation for all columns to None. Defaults to None.

  • population_mn_dict (Optional[Dict[str, Optional[float]]]) – Dictionary containing population mean for each column containing human or system scores. This is used to compute SMD for subgroups. If None, a dummy dictionary is created that sets the standard deviation for all columns to None. Defaults to None.

  • smd_method (str) –

    The SMD method to use, only used if use_diff_std_means is False. All methods have the same numerator mean(y_pred) - mean(y_true_observed) and the following denominators:

    • ”williamson”: pooled population standard deviation of human and system scores computed based on values in population_sd_dict.

    • ”johnson”: population standard deviation of human scores computed based on values in population_sd_dict.

    • ”pooled”: pooled standard deviation of y_true_observed and y_pred for this group.

    • ”unpooled”: standard deviation of y_true_observed for this group.

    Defaults to “unpooled”.

  • use_diff_std_means (bool) – Whether to use the difference of standardized means, rather than the standardized mean difference. This is most useful with subgroup analysis. Defaults to False.

Return type:

Tuple[DataFrame, DataFrame, DataFrame]

Returns:

  • df_human_system_eval (pandas.DataFrame) – Data frame containing the full set of evaluation metrics.

  • df_human_system_eval_filtered (pandas.DataFrame) – Data frame containing the human-human statistics but is empty if include_second_score is False.

  • df_human_human_eval (pandas.DataFrame) – A shortened version of the first data frame but is empty if compute_shortened is False.

compute_metrics_by_group(df_test: DataFrame, grouping_variable: str, use_scaled_predictions: bool = False, include_second_score: bool = False) Tuple[DataFrame, DataFrame][source]

Compute a subset of evaluation metrics by subgroups.

This method computes a subset of evalution metrics for the scores in the given data frame by group specified in grouping_variable. See filter_metrics() above for a description of the subset that is selected.

Parameters:
  • df_test (pandas.DataFrame) – Input data frame.

  • grouping_variable (str) – Feature name indicating the column that contains grouping information.

  • use_scaled_predictions (bool) – Include scaled predictions when computing the evaluation metrics. Defaults to False.

  • include_second_score (bool) – Include human-human association statistics. Defaults to False.

Return type:

Tuple[DataFrame, DataFrame]

Returns:

  • df_human_system_by_group (pandas.DataFrame) – Data frame containing the correlation human-system association statistics.

  • df_human_human_by_group (pandas.DataFrame) – Data frame that either contains the human-human statistics or is an empty data frame, depending on whether include_second_score is True`.

static compute_outliers(df: DataFrame, selected_features: List[str]) DataFrame[source]

Compute number and percentage of outliers for given columns.

This method computes the number and percentage of outliers that lie outside the range mean +/- 4 SD for each of the given columns in the given data frame.

Parameters:
  • df (pandas.DataFrame) – Input data frame containing the feature values.

  • selected_features (List[str]) – List of feature names for which to compute outlier information.

Returns:

df_output – Data frame containing outlier information for each of the features.

Return type:

pandas.DataFrame

static compute_pca(df: DataFrame, selected_features: List[str]) Tuple[DataFrame, DataFrame][source]

Compute PCA decomposition of the given features.

This method computes the PCA decomposition of features in the data frame, restricted to the given columns. The number of components is set to be min(n_features, n_samples).

Parameters:
  • df (pandas.DataFrame) – Input data frame containing feature values.

  • selected_features (List[str]) – List of feature names to be used in the PCA decomposition.

Return type:

Tuple[DataFrame, DataFrame]

Returns:

  • df_components (pandas.DataFrame) – Data frame containing the PCA components.

  • df_variance (pandas.DataFrame) – Data frame containing the variance information.

static compute_percentiles(df: DataFrame, selected_features: List[str], percentiles: List[int] | None = None) DataFrame[source]

Compute percentiles and outliers for columns in the given data frame.

Parameters:
  • df (pandas.DataFrame) – Input data frame containing the feature values.

  • selected_features (List[str]) – List of feature names for which to compute the percentile descriptives.

  • percentiles (Optional[List[int]) – The percentiles to calculate. If None, use the percentiles {1, 5, 25, 50, 75, 95, 99}. Defaults to None.

Returns:

df_output – Data frame containing the percentile information for each of the features.

Return type:

pandas.DataFrame

static correlation_helper(df: DataFrame, target_variable: str, grouping_variable: str, include_length: bool = False) Tuple[DataFrame, DataFrame, DataFrame][source]

Compute marginal and partial correlations for all columns.

This helper method computes marginal and partial correlations of all the columns in the given data frame against the target variable separately for each level in the the grouping variable. If include_length is True, it additionally computes partial correlations of each column in the data frame against the target variable after controlling for the “length” column.

Parameters:
  • df (pandas.DataFrame) – Input data frame containing numeric feature values, the numeric target variable and the grouping variable.

  • target_variable (str) – The name of the column used as a reference for computing correlations.

  • grouping_variable (str) – The name of the column defining groups in the data

  • include_length (bool) – If True compute additional partial correlations of each column in the data frame against target variable only partialling out “length” column.

Return type:

Tuple[DataFrame, DataFrame, DataFrame]

Returns:

  • df_target_cors (pandas.DataFrame) – Data frame containing Pearson’s correlation coefficients for marginal correlations between features and target_variable.

  • df_target_partcors (pandas.DataFrame) – Data frame containing Pearson’s correlation coefficients for partial correlations between each feature and target_variable after controlling for all other features. If include_length is set to True, the “length” column will not be included in the partial correlation computation.

  • df_target_partcors_no_length (pandas.DataFrame) – If include_length is set to True: Data frame containing Pearson’s correlation coefficients for partial correlations between each feature and target_variable after controlling for “length”. Otherwise, it will be an empty data frame.

filter_metrics(df_metrics: DataFrame, use_scaled_predictions: bool = False, chosen_metric_dict: Dict[str, List[str]] | None = None) DataFrame[source]

Filter data frame to retain only the given metrics.

This method filters the data frame df_metrics – containing all of the metric values by all score types (raw, raw_trim etc.) – to retain only the metrics as defined in the given dictionary chosen_metric_dict. This dictionary maps score types (“raw”, “scale”, “raw_trim” etc.) to metric names. The available metric names are:

  • “corr”

  • “kappa”

  • “wtkappa”

  • “exact_agr”

  • “adj_agr”

  • “SMD” or “DSM”, depending on what is in df_metrics.

  • “RMSE”

  • “R2”

  • “sys_min”

  • “sys_max”

  • “sys_mean”

  • “sys_sd”

  • “h_min”

  • “h_max”

  • “h_mean”

  • “h_sd”

  • “N”

Parameters:
  • df_metrics (pandas.DataFrame) – The DataFrame to filter.

  • use_scaled_predictions (bool) – Whether to use scaled predictions. Defaults to False.

  • chosen_metric_dict (Optional[Dict[str, List[str]]]) – The dictionary mapping each score type to the metrics that should be computed for it. Defaults to None.

Returns:

df_filtered_metrics – The filtered DataFrame.

Return type:

pandas.DataFrame

Note

The last five metrics will be the same for all score types. If chosen_metric_dict is not specified, the following default dictionary with the recommended metrics is used:

{"X_trim": ["N", "h_mean", "h_sd", "sys_mean", "sys_sd", "wtkappa",
              "corr", "RMSE", "R2", "SMD"],
 "X_trim_round": ["sys_mean", "sys_sd", "kappa",
                    "exact_agr", "adj_agr", "SMD"]}

where X = “raw” or “scale” depending on whether use_scaled_predictions is False or True, respectively.

static metrics_helper(human_scores: Series, system_scores: Series, population_human_score_sd: float | None = None, population_system_score_sd: float | None = None, population_human_score_mn: float | None = None, population_system_score_mn: float | None = None, smd_method: str = 'unpooled', use_diff_std_means: bool = False) Series[source]

Compute basic association metrics between system and human scores.

Parameters:
  • human_scores (pandas.Series) – Series containing numeric human (reference) scores.

  • system_scores (pandas.Series) – Series containing numeric scores predicted by the model.

  • population_human_score_sd (Optional[float]) – Reference standard deviation for human scores. This must be specified when the function is used to compute association metrics for a subset of responses, for example, responses from a particular demographic subgroup. If smd_method is set to “williamson” or “johnson”, this should be the standard deviation for the whole population (in most cases, the standard deviation for the whole test set). If use_diff_std_means is True, this must be the standard deviation for the whole population and population_human_score_mn must also be specified. Otherwise, it is ignored. Defaults to None.

  • population_system_score_sd (Optional[float]) – Reference standard deviation for system scores. This must be specified when the function is used to compute association metrics for a subset of responses, for example, responses from a particular demographic subgroup. If smd_method is set to “williamson”, this should be the standard deviation for the whole population (in most cases, the standard deviation for the whole test set). If use_diff_std_means is True, this must be the standard deviation for the whole population and population_system_score_mn must also be specified. Otherwise, it is ignored. Defaults to None.

  • population_human_score_mn (Optional[float]) – Reference mean for human scores. This must be specified when the function is used to compute association metrics for a subset of responses, for example, responses from a particular demographic subgroup. If use_diff_std_means is True, this must be the mean for the whole population (in most cases, the full test set) and population_human_score_sd must also be specified. Otherwise, it is ignored. Defaults to None.

  • population_system_score_mn (Optional[float]) – Reference mean for system scores. This must be specified when the function is used to compute association metrics for a subset of responses, for example, responses from a particular demographic subgroup. If use_diff_std_means is True, this must be the mean for the whole population (in most cases, the full test set) and population_system_score_sd must also be specified. Otherwise, it is ignored. Defaults to None.

  • smd_method (str) –

    The SMD method to use, only used if use_diff_std_means is False. All methods have the same numerator mean(y_pred) - mean(y_true_observed) and the following denominators :

    • ”williamson”: pooled population standard deviation of y_true_observed and y_pred computed using population_human_score_sd and population_system_score_sd.

    • ”johnson”: population_human_score_sd.

    • ”pooled”: pooled standard deviation of y_true_observed and y_pred for this group.

    • ”unpooled”: standard deviation of y_true_observed for this group.

    Defaults to “unpooled”.

  • use_diff_std_means (bool) – Whether to use the difference of standardized means, rather than the standardized mean difference. This is most useful with subgroup analysis. Defaults to False.

Returns:

metrics – Series containing different evaluation metrics comparing human and system scores. The following metrics are included:

  • kappa: unweighted Cohen’s kappa

  • wtkappa: quadratic weighted kappa

  • exact_agr: exact agreement

  • adj_agr: adjacent agreement with tolerance set to 1

  • One of the following :

    • SMD: standardized mean difference, if use_diff_std_means is False.

    • DSM: difference of standardized means, if use_diff_std_means is True.

  • corr: Pearson’s r

  • R2: r squared

  • RMSE: root mean square error

  • sys_min: min system score

  • sys_max: max system score

  • sys_mean: mean system score (ddof=1)

  • sys_sd: standard deviation of system scores (ddof=1)

  • h_min: min human score

  • h_max: max human score

  • h_mean: mean human score (ddof=1)

  • h_sd: standard deviation of human scores (ddof=1)

  • N: total number of responses

Return type:

pandas.Series

run_data_composition_analyses_for_rsmeval(data_container: DataContainer, configuration: Configuration) Tuple[Configuration, DataContainer][source]

Run all data composition analyses for RSMEval.

Parameters:
  • data_container (DataContainer) – The DataContainer object which must include the following DataFrames: {“test_metadata”, “test_excluded”}.

  • configuration (Configuration) – The Configuration object which must include the following parameters (keys): {“subgroups”, “candidate_column”, “exclude_zero_scores”, “exclude_listwise”}.

Return type:

Tuple[Configuration, DataContainer]

Returns:

  • configuration (Configuration) – The input Configuration object that is passed through unmodified.

  • data_container (DataContainer) – A new DataContainer object with the following DataFrames:

    • test_excluded_composition

    • data_composition

    • data_composition_by_*

run_data_composition_analyses_for_rsmtool(data_container: DataContainer, configuration: Configuration) Tuple[Configuration, DataContainer][source]

Run all data composition analyses for RSMTool.

Parameters:
  • data_container (DataContainer) – The DataContainer object which must include the following DataFrames: {“test_metadata”, “train_metadata”,”train_excluded”, “test_excluded”, “train_features”}.

  • configuration (Configuration) – The Configuration object which must include the following parameters (keys): {“subgroups”, “candidate_column”, “exclude_zero_scores”, “exclude_listwise”}.

Return type:

Tuple[Configuration, DataContainer]

Returns:

  • configuration (Configuration) – The input Configuration object that is passed through unmodified.

  • data_container (DataContainer) – A new DataContainer object with the following DataFrames:

    • test_excluded_composition

    • train_excluded_composition

    • data_composition

    • data_composition_by_*

run_prediction_analyses(data_container: DataContainer, configuration: Configuration, wandb_run: Run | RunDisabled | None = None) Tuple[Configuration, DataContainer][source]

Run all analyses on the system scores (predictions).

Parameters:
  • data_container (DataContainer) – The DataContainer object which must include the following DataFrames: {“train_features”, “train_metadata”, “train_preprocessed_features”, “train_length”, “train_features”}.

  • configuration (Configuration) – The Configuration object which must include the following parameters (keys): {“subgroups”, “second_human_score_column”, “use_scaled_predictions”}.

  • wandb_run (Union[wandb.wandb_run.Run, wandb.sdk.lib.RunDisabled, None]) – The wandb run object if wandb is enabled, None otherwise. If enabled, all the output data frames will be logged to this run as tables. Defaults to None.

Return type:

Tuple[Configuration, DataContainer]

Returns:

  • configuration (Configuration) – The input Configuration object that is passed through unmodified.

  • data_container (DataContainer) – A new DataContainer object with the following DataFrames:

    • eval

    • eval_short

    • consistency

    • degradation

    • disattenudated_correlations

    • confMatrix

    • confMatrix_h1h2

    • score_dist

    • eval_by_*

    • consistency_by_*

    • disattenduated_correlations_by_*

    • true_score_eval

run_training_analyses(data_container: DataContainer, configuration: Configuration) Tuple[Configuration, DataContainer][source]

Run all analyses on the training data.

Parameters:
  • data_container (DataContainer) – The DataContainer object which must include the following DataFrames: {“train_features”, “train_metadata”, “train_preprocessed_features”, “train_length”, “train_features”}.

  • configuration (Configuration) – The Configuration object which must include the following parameters (keys): {“length_column”, “subgroups”, “selected_features”}.

Return type:

Tuple[Configuration, DataContainer]

Returns:

  • configuration (Configuration) – The input Configuration object that is passed through unmodified.

  • data_container (DataContainer) – A new DataContainer object with the following DataFrames:

    • feature_descriptives

    • feature_descriptivesExtra

    • feature_outliers

    • cors_orig

    • cors_processed

    • margcor_score_all_data

    • pcor_score_all_data

    • pcor_score_no_length_all_data

    • margcor_length_all_data

    • pcor_length_all_data

    • pca

    • pcavar

    • margcor_length_by_*

    • pcor_length_by_*

    • margcor_score_by_*

    • pcor_score_by_*

    • pcor_score_no_length_by_*

From comparer Module

Classes for comparing outputs of two RSMTool experiments.

author:

Jeremy Biggs (jbiggs@ets.org)

author:

Anastassia Loukina (aloukina@ets.org)

author:

Nitin Madnani (nmadnani@ets.org)

organization:

ETS

class rsmtool.comparer.Comparer[source]

Bases: object

Class to perform comparisons between two RSMTool experiments.

static compute_correlations_between_versions(df_old: DataFrame, df_new: DataFrame, human_score: str = 'sc1', id_column: str = 'spkitemid') DataFrame[source]

Compute correlations between old and new feature values.

This method computes correlations between old and new feature values in the two given frames as well as the correlations between each feature value and the human score.

Parameters:
  • df_old (pandas.DataFrame) – Data frame with feature values for the ‘old’ model.

  • df_new (pandas.DataFrame) – Data frame with feature values for the ‘new’ model.

  • human_score (str) – Name of the column containing human score. Must be the same for both data sets. Defaults to "sc1".

  • id_column (str) – Name of the column containing id for each response. Must be the same for both data sets. Defaults to "spkitemid".

Returns:

df_correlations

Data frame with a row for each feature and the following columns:
  • ”N”: total number of responses

  • ”human_old”: correlation with human score in the old frame

  • ”human_new”: correlation with human score in the new frame

  • ”old_new”: correlation between old and new frames

Return type:

pandas.DataFrame

Raises:
  • ValueError – If there are no shared features between the two sets.

  • ValueError – If there are no shared responses between the two sets.

load_rsmtool_output(filedir: str, figdir: str, experiment_id: str, prefix: str, groups_eval: List[str]) Tuple[Dict[str, DataFrame], Dict[str, str], str][source]

Load all of the outputs of an rsmtool experiment.

For each type of output, we first check whether the file exists to allow comparing experiments with different sets of outputs.

Parameters:
  • filedir (str) – Path to the directory containing output files.

  • figdir (str) – Path to the directory containing output figures.

  • experiment_id (str) – Original experiment_id used to generate the output files.

  • prefix (str) – Must be set to "scale" or "raw". Indicates whether the score is scaled or not.

  • groups_eval (List[str]) – List of subgroup names used for subgroup evaluation.

Return type:

Tuple[Dict[str, DataFrame], Dict[str, str], str]

Returns:

  • files (Dict[str, pd.DataFrame]) – A dictionary mapping data frame names converted to the actual pandas data frames. If a particular type of output did not exist for the experiment, its value will be an empty data frame.

  • figs (Dict[str, str]) – A dictionary mapping figure names to the paths of the files containing the figures.

  • file_format (str) – The file format used for the output files, e.g., csv, tsv, or xlsx.

static make_summary_stat_df(df: DataFrame) DataFrame[source]

Compute summary statistics for the data in the given frame.

Parameters:

df (pandas.DataFrame) – Data frame containing numeric data.

Returns:

res – Data frame containing summary statistics for data in the input frame.

Return type:

pandas.DataFrame

static process_confusion_matrix(conf_matrix: DataFrame) DataFrame[source]

Add “human” and “machine” to column names in the confusion matrix.

Parameters:

conf_matrix (pandas.DataFrame) – data frame containing the confusion matrix.

Returns:

conf_matrix_renamed – pandas Data Frame containing the confusion matrix with the columns renamed.

Return type:

pandas.DataFrame

From configuration_parser Module

Configuration parser.

Classes related to parsing configuration files and creating configuration objects.

author:

Jeremy Biggs (jbiggs@ets.org)

author:

Anastassia Loukina (aloukina@ets.org)

author:

Nitin Madnani (nmadnani@ets.org)

organization:

ETS

class rsmtool.configuration_parser.Configuration(configdict: Dict[str, Any], *, configdir: str | Path | None = None, context: str = 'rsmtool', logger: Logger | None = None)[source]

Bases: object

Configuration class.

Encapsulates all of the configuration parameters and methods to access these parameters.

Create an object of the Configuration class.

This method can be used to directly instantiate a Configuration object.

Parameters:
  • configdict (Dict[str, Any]) – A dictionary of configuration parameters. The dictionary must be a valid configuration dictionary with default values filled as necessary.

  • configdir (Optional[Union[str, Path]]) – The reference path used to resolve any relative paths in the configuration object. When None, will be set during initialization to the current working directory. Defaults to None,

  • context (str) – The context of the tool. One of {“rsmtool”, “rsmeval”, “rsmcompare”, “rsmpredict”, “rsmsummarize”}. Defaults to “rsmtool”.

  • logger (Optional[logging.Logger]) – A Logger object. If None is passed, get logger from __name__. Defaults to None.

check_exclude_listwise() bool[source]

Check for candidate exclusion.

Check if we are excluding candidates based on number of responses, and add this to the configuration file.

Returns:

exclude_listwise – Whether to exclude list-wise.

Return type:

bool

check_flag_column(flag_column: str = 'flag_column', partition: str = 'unknown') Dict[str, List[str]][source]

Make sure the column name in flag_column is correctly specified.

Get flag columns and values for filtering, if any, and convert single values to lists. Raises an exception if the column name in flag_column is not correctly specified in the configuration file.

Parameters:
  • flag_column (str) – The flag column name to check. Currently used names are “flag_column” or “flag_column_test”. Defaults to “flag_column”.

  • partition (str) – The data partition which is filtered based on the flag column name. One of {“train”, “test”, “both”, “unknown”}. Defaults to “unknown”.

Returns:

new_filtering_dict – Properly formatted dictionary for the column name in flag_column.

Return type:

Dict[str, List[str]]

Raises:
  • ValueError – If the specified value of the column name in flag_column is not a dictionary.

  • ValueError – If the value of partition is not in the expected list.

  • ValueError – If the value of partition does not match the flag_column.

property configdir: str

Get the path to configuration directory.

Get the path to the configuration reference directory that will be used to resolve any relative paths in the configuration.

Returns:

configdir – The path to the configuration reference directory.

Return type:

str

property context: str

Get the context.

copy(deep: bool = True) Configuration[source]

Return a copy of the object.

Parameters:

deep (bool) – Whether to perform a deep copy. Defaults to True.

Returns:

copy – A new configuration object.

Return type:

Configuration

get(key: str, default: Any = None) Any[source]

Get value or default for the given key.

Parameters:
  • key (str) – Key to check in the Configuration object.

  • default (Any) – The default value to return, if no key exists. Defaults to None.

Returns:

The value in the configuration object dictionary.

Return type:

value

get_default_converter() Dict[str, Any][source]

Get default converter dictionary for data reader.

Returns:

default_converter – The default converter for a train or test file.

Return type:

Dict[str, Any]

get_names_and_paths(keys: List[str], names: List[str]) Tuple[List[str], List[str]][source]

Get values (paths) for the given keys and names.

This method is mainly used to retrieve values that are paths and it skips any values that are None.

Parameters:
  • keys (List[str]) – A list of keys whose values to retrieve.

  • names (List[str]) – The names corresponding to the keys.

Return type:

Tuple[List[str], List[str]]

Returns:

  • existing_names (List[str]) – The names for values that were not None.

  • existing_paths (List[str]) – The paths (values) for the given keys (that were not None.

Raises:

ValueError – If there are any duplicate keys or names.

get_rater_error_variance() float[source]

Get specified rater error variance, if any, and make sure it’s numeric.

Returns:

rater_error_variance – Specified rater error variance.

Return type:

float

get_trim_min_max_tolerance() Tuple[float | None, float | None, float | None][source]

Get trim min, trim max, and tolerance values.

Get the specified trim min and max, and trim_tolerance if any, and make sure they are numeric.

Return type:

Tuple[Optional[float], Optional[float], Optional[float]]

Returns:

  • spec_trim_min (Optional[float]) – Specified trim min value.

  • spec_trim_max (Optional[float]) – Specified trim max value.

  • spec_trim_tolerance (Optional[float]) – Specified trim tolerance value.

items() List[Tuple[str, Any]][source]

Return configuration items as a list of tuples.

Returns:

items – A list of (key, value) tuples in the configuration object.

Return type:

List[Tuple[str, Any]]

keys() List[str][source]

Return keys as a list.

Returns:

keys – A list of keys in the configuration object.

Return type:

List[str]

pop(key: str, default: Any = None) Any[source]

Remove and return an element from the object having the given key.

Parameters:
  • key (str) – Key to pop in the configuration object.

  • default (Any) – The default value to return, if no key exists. Defaults to None.

Returns:

value – The value removed from the object.

Return type:

Any

save(output_dir: str | None = None) None[source]

Save the configuration file to the output directory specified.

Parameters:

output_dir (Optional[str]) – The path to the output directory. If None, the current directory is used. Defaults to None.

Return type:

None

to_dict() Dict[str, Any][source]

Get a dictionary representation of the configuration object.

Returns:

config – The configuration dictionary.

Return type:

Dict[str, Any]

values() List[Any][source]

Return configuration values as a list.

Returns:

values – A list of values in the configuration object.

Return type:

List[Any]

class rsmtool.configuration_parser.ConfigurationParser(pathlike: str | Path, logger: Logger | None = None)[source]

Bases: object

ConfigurationParser class to create Configuration objects.

Instantiate a ConfigurationParser for a given config file path.

Parameters:
  • pathlike (Union[str, Path]) – A string containing the path to the configuration file that is to be parsed. A pathlib.Path instance is also acceptable.

  • logger (Optional[logging.Logger]) – Custom logger object to use, if not None. Otherwise a new logger is created. Defaults to None.

Raises:
  • FileNotFoundError – If the given path does not exist.

  • OSError – If the given path is a directory, not a file.

  • ValueError – If the file at the given path does not have a valid extension (“.json”).

logger = None
parse(context: str = 'rsmtool') Configuration[source]

Parse configuration file.

Parse the configuration file for which this parser was instantiated.

Parameters:

context (str) – Context of the tool in which we are validating. One of: {“rsmtool”, “rsmeval”, “rsmpredict”, “rsmcompare”, “rsmsummarize”, “rsmxval”, “rsmexplain”}. Defaults to “rsmtool”.

Returns:

configuration – A configuration object containing the parameters in the file that we instantiated the parser for.

Return type:

Configuration

classmethod process_config(config: Dict[str, Any]) Dict[str, Any][source]

Process the given configuration dictionary.

Converts fields which are read in as string to the appropriate format. Fields which can take multiple string values are converted to lists if they have not been already formatted as such.

Parameters:

config (Dict[str, Any]) – Given Configuration dictionary to be processed.

Returns:

new_config – A copy of the given configuration dictionary with all fields converted to the appropriate format.

Return type:

Dict[str, Any]

Raises:
  • NameError – If config does not exist, or config could not be read.

  • ValueError – If boolean configuration fields contain a value other then “true” or “false” (in JSON).

classmethod validate_config(config: Dict[str, Any], context: str = 'rsmtool') Dict[str, Any][source]

Validate the given configuration dictionary.

Ensure that all required fields are specified, add default values values for all unspecified fields, and ensure that all specified fields are valid.

Parameters:
  • config (Dict[str, Any]) – Given configuration dictionary to be validated.

  • context (str) – Context of the tool in which we are validating. One of {“rsmtool”, “rsmeval”, “rsmpredict”, “rsmcompare”, “rsmsummarize”}. Defaults to “rsmtool”.

Returns:

new_config – A copy of the given configuration dictionary with all required fields specified and with the default values for unspecified fields.

Return type:

Dict[str, Any]

Raises:

ValueError – If the Configuration object does not contain all required fields, has any unrecognized fields, or has any fields with invalid values.

rsmtool.configuration_parser.configure(context: str, config_file_or_obj_or_dict: str | Configuration | Dict[str, Any] | Path) Configuration[source]

Create a Configuration object.

Get the configuration for context from the input config_file_or_obj_or_dict.

Parameters:
  • context (str) – The context that is being configured. Must be one of “rsmtool”, “rsmeval”, “rsmcompare”, “rsmsummarize”, “rsmpredict”, or “rsmxval”.

  • config_file_or_obj_or_dict (Union[str, Configuration, Dict[str, Any], Path]) – Path to the experiment configuration file either a a string or as a pathlib.Path object. Users can also pass a Configuration object that is in memory or a Python dictionary with keys corresponding to fields in the configuration file. Given a configuration file, any relative paths in the configuration file will be interpreted relative to the location of the file. Given a Configuration object, relative paths will be interpreted relative to the configdir attribute, that _must_ be set. Given a dictionary, the reference path is set to the current directory.

Returns:

configuration – The Configuration object for the tool.

Return type:

Configuration

Raises:
  • AttributeError – If the configdir attribute for the Configuration input is not set.

  • ValueError – If config_file_or_obj_or_dict contains anything except a string, a path, a dictionary, or a Configuration object.

From container Module

Class to encapsulate data contained in multiple pandas DataFrames.

It represents each of the multiple data sources as a “dataset”. Each dataset is represented by three properties: - “name” : the name of the data set - “frame” : the pandas DataFrame that contains the actual data - “path” : the path to the file on disk from which the data was read

author:

Jeremy Biggs (jbiggs@ets.org)

author:

Anastassia Loukina (aloukina@ets.org)

author:

Nitin Madnani (nmadnani@ets.org)

organization:

ETS

class rsmtool.container.DataContainer(datasets: List[DatasetDict] | None = None)[source]

Bases: object

Class to encapsulate datasets.

Initialize a DataContainer object.

Parameters:

datasets (Optional[List[DatasetDict]]) – A list of dataset dictionaries. Each dict should have the following keys: “name” containing the name of the dataset, “frame” containing the dataframe object representing the dataset, and “path” containing the path to the file from which the frame was read.

add_dataset(dataset_dict: DatasetDict, update: bool = False) None[source]

Add a new dataset (or update an existing one).

Parameters:
  • dataset_dict (DatasetDict) – The dataset dictionary to add or update with the “name”, “frame”, and “path” keys.

  • update (bool) – Update an existing DataFrame, if True. Defaults to False.

Return type:

None

copy(deep: bool = True) DataContainer[source]

Return a copy of the container object.

Parameters:

deep (bool) – If True, create a deep copy of the underlying data frames. Defaults to True.

Returns:

data_container – A copy of the input container object.

Return type:

DataContainer

drop(name: str) DataContainer[source]

Drop a given dataset from the container and return instance.

Parameters:

name (str) – The name of the dataset to drop.

Returns:

data_container – The input container object with the dataset dropped.

Return type:

DataContainer

get_frame(name: str, default: DataFrame | None = None) DataFrame | None[source]

Get the data frame given the dataset name.

Parameters:
  • name (str) – The name for the dataset.

  • default (Optional[pandas.DataFrame]) – The default value to return if the named dataset does not exist. Defaults to None.

Returns:

frame – The data frame for the named dataset.

Return type:

Optional[pandas.DataFrame]

get_frames(prefix: str | None = None, suffix: str | None = None) Dict[str, DataFrame][source]

Get all data frames with a given prefix or suffix in their name.

Note that the selection by prefix or suffix is case-insensitive.

Parameters:
  • prefix (Optional[str]) – Only return frames with the given prefix. If None, then do not exclude any frames based on their prefix. Defaults to None.

  • suffix (Optional[str]) – Only return frames with the given suffix. If None, then do not exclude any frames based on their suffix. Defaults to None.

Returns:

frames – A dictionary with the data frames that contain the specified prefix and/or suffix in their corresponding names. The names are the keys and the frames are the values.

Return type:

Dict[str, pandas.DataFrame]

get_path(name: str, default: str | None = None) str | None[source]

Get the path for the dataset given the name.

Parameters:
  • name (str) – The name for the dataset.

  • default (Optional[str]) – The default path to return if the named dataset does not exist. Defaults to None.

Returns:

path – The path for the named dataset.

Return type:

Optional[str]

items() List[Tuple[str, DataFrame]][source]

Return the container items as a list of (name, frame) tuples.

Returns:

items – A list of (name, frame) tuples in the container object.

Return type:

List[Tuple[str, pandas.DataFrame]]

keys() List[str][source]

Return the container keys (dataset names) as a list.

Returns:

keys – A list of keys (names) in the container object.

Return type:

List[str]

rename(name: str, new_name: str) DataContainer[source]

Rename a given dataset in the container and return instance.

Parameters:
  • name (str) – The name of the current dataset in the container object.

  • new_name (str) – The new name for the dataset in the container object.

Returns:

data_container – The input container object with the dataset renamed.

Return type:

DataContainer

static to_datasets(data_container: DataContainer) List[DatasetDict][source]

Convert container object to a list of dataset dictionaries.

Each dictionary will contain the “name”, “frame”, and “path” keys.

Parameters:

data_container (DataContainer) – The container object to convert.

Returns:

dataset_dicts – A list of dataset dictionaries.

Return type:

List[DatasetDict]

values() List[DataFrame][source]

Return all data frames as a list.

Returns:

values – A list of all data frames in the container object.

Return type:

List[pandas.DataFrame]

class rsmtool.container.DatasetDict[source]

Bases: TypedDict

Type definition for a dataset dictionary.

frame: DataFrame
name: str
path: Optional[str]

From convert_feature_json Module

rsmtool.convert_feature_json.convert_feature_json_file(json_file: str, output_file: str, delete=False) None[source]

Convert given feature JSON file into tabular format.

The specific format is inferred by the extension of the output file.

Parameters:
  • json_file (str) – Path to feature JSON file to be converted.

  • output_file (str) – Path to CSV/TSV/XLSX output file.

  • delete (bool) – Whether to delete the original file after conversion. Defaults to False.

Raises:
  • RuntimeError – If the given input file is not a valid feature JSON file.

  • RuntimeError – If the output file has an unsupported extension.

Return type:

None

From fairness_utils Module

rsmtool.fairness_utils.get_fairness_analyses(df: DataFrame, group: str, system_score_column: str, human_score_column: str = 'sc1', base_group: str | None = None) Tuple[Dict[str, RegressionResults], DataContainer][source]

Compute analyses from Loukina et al. 2019.

The function computes how much variance group membership explains in overall score accuracy (osa), overall score difference (osd), and conditional score difference (csd). See the paper for more details.

Parameters:
  • df (pandas.DataFrame) – A dataframe containing columns with numeric human scores, columns with numeric system scores and a column with group membership.

  • group (str) – Name of the column containing group membership.

  • system_score_column (str) – Name of the column containing system scores.

  • human_score_column (str) – Name of the column containing human scores. Dedaults to "sc1".

  • base_group (Optional[str]) – Name of the group to use as the reference category. If None, the group with the largest number of cases will be used as the reference category. Ties are broken alphabetically. Defaults to None.

Return type:

Tuple[Dict[str, RegressionResults], DataContainer]

Returns:

  • model_dict (Dict[str, RegressionResults]) – A dictionary with different proposed metrics as keys and fitted models as values.

  • fairness_container (DataContainer) –

    A datacontainer with the following datasets:

    • "estimates_<METRIC>_by_<GROUP>" where <GROUP> corresponds to the given group and <METRIC> can be osa, osd and csd estimates for each group computed by the respective models.

    • "fairness_metrics_by_<GROUP>" - a summary of model fits (R^2 and p values).

From modeler Module

Class for training and predicting with built-in or SKLL models.

author:

Jeremy Biggs (jbiggs@ets.org)

author:

Anastassia Loukina (aloukina@ets.org)

author:

Nitin Madnani (nmadnani@ets.org)

organization:

ETS

class rsmtool.modeler.Modeler(logger: Logger | None = None) None[source]

Bases: object

Class to train model and generate predictions with built-in or SKLL models.

Note

The learner and scaling-/trimming-related attributes are set to None.

Initialize empty Modeler object.

Parameters:

logger (Optional[logging.Logger]) – Logger object to use for logging messages. If None, a new logger instance will be created. Defaults to None.

create_fake_skll_learner(df_coefficients: DataFrame) Learner[source]

Create a fake SKLL linear regression learner from given coefficients.

Parameters:

df_coefficients (pandas.DataFrame) – The data frame containing the linear coefficients we want to create the fake SKLL model with.

Returns:

learner – SKLL Learner object representing a LinearRegression model with the specified coefficients.

Return type:

skll.learner.Learner

get_coefficients() ndarray | None[source]

Get the coefficients of the model, if available.

Returns:

coefficients – The coefficients of the model, if available.

Return type:

Optional[np.ndarray]

get_feature_names() List[str] | None[source]

Get the feature names, if available.

Returns:

feature_names – A list of feature names, or None if no learner was trained.

Return type:

Optional[List[str]]

get_intercept() float | None[source]

Get the intercept of the model, if available.

Returns:

intercept – The intercept of the model.

Return type:

Optional[float]

classmethod load_from_file(path: str) Modeler[source]

Load a Modeler object from a file on disk.

The file must contain either a Modeler or a SKLL Learner, in which case a Modeler object will be created from the Learner.

Parameters:

path (str) – File path from which to load the modeler object.

Returns:

model – A Modeler instance.

Return type:

Modeler

Raises:

ValueError – If path does not end with “.model”.

classmethod load_from_learner(learner: Learner) Modeler[source]

Create a new Modeler object with a pre-populated learner.

Parameters:

learner (skll.learner.Learner) – A SKLL Learner object.

Returns:

modeler – The newly created Modeler object.

Return type:

Modeler

Raises:

TypeError – If learner is not a SKLL Learner instance.

static model_fit_to_dataframe(fit: RegressionResults) DataFrame[source]

Extract fit metrics from a statsmodels fit object into a data frame.

Parameters:

fit (statsmodels.regression.linear_model.RegressionResults) – Model fit object obtained from a linear model trained using statsmodels.OLS.

Returns:

df_fit – The output data frame with the main model fit metrics.

Return type:

pandas.DataFrame

static ols_coefficients_to_dataframe(coefs: Series) DataFrame[source]

Convert series containing OLS coefficients to a data frame.

Parameters:

coefs (pandas.Series) – Series with feature names in the index and the coefficient values as the data, obtained from a linear model trained using statsmodels.OLS.

Returns:

df_coef – Data frame with two columns: the feature name and the coefficient value.

Return type:

pandas.DataFrame

Note

The first row in the output data frame is always for the intercept and the rest are sorted by feature name.

predict(df: DataFrame, min_score: float | None = None, max_score: float | None = None, predict_expected: bool = False) DataFrame[source]

Get raw predictions from given SKLL model on data in given data frame.

Parameters:
  • df (pandas.DataFrame) – Data frame containing features on which to make the predictions. The data must contain pre-processed feature values, an ID column named “spkitemid”, and a label column named “sc1”.

  • min_score (Optional[float]) – Minimum score level to be used if computing expected scores. If None, trying to compute expected scores will raise an exception. Defaults to None.

  • max_score (Optional[float]) – Maximum score level to be used if computing expected scores. If None, trying to compute expected scores will raise an exception. Defaults to None.

  • predict_expected (bool) – Predict expected scores for classifiers that return probability distributions over score. This will be ignored with a warning if the specified model does not support probability distributions. Note also that this assumes that the score range consists of contiguous integers - starting at min_score and ending at max_score. Defaults to False.

Returns:

df_predictions – Data frame containing the raw predictions, the IDs, and the human scores.

Return type:

pandas.DataFrame

Raises:
  • ValueError – If no model has been trained yet.

  • ValueError – If the model cannot predict probability distributions and predict_expected is set to True.

  • ValueError – If the score range specified by min_score and max_score does not match what the model predicts in its probability distribution.

  • ValueError – If predict_expected is True but min_score and max_score are None.

predict_train_and_test(df_train: DataFrame, df_test: DataFrame, configuration: Configuration) Tuple[Configuration, DataContainer][source]

Generate raw, scaled, and trimmed predictions on given data.

Parameters:
  • df_train (pandas.DataFrame) – Data frame containing the pre-processed training set features.

  • df_test (pandas.DataFrame) – Data frame containing the pre-processed test set features.

  • configuration (Configuration) – A configuration object containing “trim_max” and “trim_min” keys.

Return type:

Tuple[Configuration, DataContainer]

Returns:

  • configuration (Configuration) – A copy of the given configuration object also containing the “train_predictions_mean”, “train_predictions_sd”, “human_labels_mean”, “human_labels_sd”, “trim_min”, and “trim_max” parameters.

  • data_container (DataContainer) – A data container object containing the “pred_train”, “pred_test”, and “postprocessing_params” data sets.

save(model_path: str) None[source]

Save an instance of this class to disk.

Parameters:

model_path (str) – Destination path for model file

Return type:

None

scale_coefficients(configuration: Configuration) DataContainer[source]

Scale coefficients using human scores & training set predictions.

This procedure approximates what is done in operational setting but does not apply trimming to predictions.

Parameters:

configuration (Configuration) – A configuration object containing the “train_predictions_mean”, “train_predictions_sd”, and “human_labels_sd” parameters.

Returns:

data_container – A container object containing the “coefficients_scaled” dataset. The frame for this dataset contains the scaled coefficients and the feature names, along with the intercept.

Return type:

DataContainer

Raises:

RuntimeError – If the model is non-linear and no coefficients are available.

static skll_learner_params_to_dataframe(learner: Learner) DataFrame[source]

Extract parameters from the given SKLL learner into a data frame.

Parameters:

learner (skll.learner.Learner) – A SKLL learner object.

Returns:

df_coef – The data frame containing the model parameters from the given SKLL Learner object.

Return type:

pandas.DataFrame

Note

  1. We use the coef_ attribute of the scikit-learn model underlying the SKLL learner instead of the latter’s model_params attribute. This is because model_params ignores zero coefficients, which we do not want.

  2. The first row in the output data frame is always for the intercept and the rest are sorted by feature name.

train(configuration: Configuration, data_container: DataContainer, filedir: str, file_format: str = 'csv')[source]

Train the given model on the given data and save the results.

The main driver function to train the given model on the given data and save the results in the given directories using the given experiment ID as the prefix.

Parameters:
  • configuration (Configuration) – A configuration object containing “experiment_id” and “model_name” parameters.

  • data_container (DataContainer) – A data container object containing “train_preprocessed_features” data set.

  • filedir (str) – Path to the “output” experiment output directory.

  • file_format (str) – The format in which to save files. One of {"csv", "tsv", "xlsx"}. Defaults to "csv".

Returns:

model – The trained SKLL Learner object.

Return type:

skll.learner.Learner

train_builtin_model(model_name: str, df_train: DataFrame, experiment_id: str, filedir: str, file_format: str = 'csv') Learner[source]

Train one of the built-in linear regression models.

Parameters:
  • model_name (str) – Name of the built-in model to train.

  • df_train (pandas.DataFrame) – Data frame containing the features on which to train the model. The data frame must contain the ID column named “spkitemid” and the numeric label column named “sc1”.

  • experiment_id (str) – The experiment ID.

  • filedir (str) – Path to the output experiment output directory.

  • file_format (str) – The format in which to save files. One of {"csv", "tsv", "xlsx"}. Defaults to "csv".

Returns:

learner – SKLL LinearRegression Learner object containing the coefficients learned by training the built-in model.

Return type:

skll.learner.Learner

train_equal_weights_lr(df_train: DataFrame, feature_columns: List[str]) Tuple[Learner, RegressionResults, DataFrame, List[str]][source]

Train an “EqualWeightsLR” model.

This model assigns the same weight to all features.

Parameters:
  • df_train (pandas.DataFrame) – Data frame containing the features on which to train the model.

  • feature_columns (List[str]) – A list of feature columns to use in training the model.

Return type:

Tuple[Learner, RegressionResults, DataFrame, List[str]]

Returns:

  • learner (skll.learner.Learner) – The SKLL learner object.

  • fit (statsmodels.regression.linear_model.RegressionResults) – A statsmodels regression results object.

  • df_coef (pandas.DataFrame) – Data frame containing the model coefficients.

  • used_features (List[str]) – A list of features used in the final model.

train_lasso_fixed_lambda(df_train: DataFrame, feature_columns: List[str]) Tuple[Learner, None, DataFrame, List[str]][source]

Train a “LassoFixedLambda” model.

This is a Lasso model with a fixed lambda.

Parameters:
  • df_train (pandas.DataFrame) – Data frame containing the features on which to train the model.

  • feature_columns (List[str]) – A list of feature columns to use in training the model.

Return type:

Tuple[Learner, None, DataFrame, List[str]]

Returns:

  • learner (skll.learner.Learner) – The SKLL learner object

  • fit (None) – This is always None since there is no OLS model fitted in this case.

  • df_coef (pandas.DataFrame) – Data frame containing the model coefficients.

  • used_features (List[str]) – A list of features used in the final model.

train_lasso_fixed_lambda_then_lr(df_train: DataFrame, feature_columns: List[str]) Tuple[Learner, RegressionResults, DataFrame, List[str]][source]

Train a “LassoFixedLambdaThenLR” model.

First do feature selection using lasso regression with a fixed lambda and then use only those features to train a second linear regression

Parameters:
  • df_train (pandas.DataFrame) – Data frame containing the features on which to train the model.

  • feature_columns (List[str]) – A list of feature columns to use in training the model.

Return type:

Tuple[Learner, RegressionResults, DataFrame, List[str]]

Returns:

  • learner (skll.learner.Learner) – The SKLL learner object

  • fit (statsmodels.regression.linear_model.RegressionResults) – A statsmodels regression results object.

  • df_coef (pandas.DataFrame) – The model coefficients in a data_frame

  • used_features (List[str]) – A list of features used in the final model.

train_lasso_fixed_lambda_then_non_negative_lr(df_train: DataFrame, feature_columns: List[str]) Tuple[Learner, RegressionResults, DataFrame, List[str]][source]

Train an “LassoFixedLambdaThenNNLR” model.

First do feature selection using lasso regression and positive only weights. Then fit an NNLR (see above) on those features.

Parameters:
  • df_train (pandas.DataFrame) – Data frame containing the features on which to train the model.

  • feature_columns (List[str]) – A list of feature columns to use in training the model.

Return type:

Tuple[Learner, RegressionResults, DataFrame, List[str]]

Returns:

  • learner (skll.learner.Learner) – The SKLL learner object.

  • fit (statsmodels.regression.linear_model.RegressionResults) – A statsmodels regression results object.

  • df_coef (pandas.DataFrame) – Data frame containing the model coefficients.

  • used_features (List[str]) – A list of features used in the final model.

train_linear_regression(df_train: DataFrame, feature_columns: List[str]) Tuple[Learner, RegressionResults, DataFrame, List[str]][source]

Train a “LinearRegression” model.

This model is a simple linear regression model.

Parameters:
  • df_train (pandas.DataFrame) – Data frame containing the features on which to train the model.

  • feature_columns (List[str]) – A list of feature columns to use in training the model.

Return type:

Tuple[Learner, RegressionResults, DataFrame, List[str]]

Returns:

  • learner (skll.learner.Learner) – The SKLL learner object.

  • fit (statmodels.regression.linear_model.RegressionResults) – A statsmodels regression results object.

  • df_coef (pandas.DataFrame) – Data frame containing the model coefficients.

  • used_features (List[str]) – A list of features used in the final model.

train_non_negative_lr(df_train: DataFrame, feature_columns: List[str]) Tuple[Learner, RegressionResults, DataFrame, List[str]][source]

Train an “NNLR” model.

To do this, we first do feature selection using non-negative least squares (NNLS) and then use only its non-zero features to train another linear regression (LR) model. We do the regular LR at the end since we want an LR object so that we have access to R^2 and other useful statistics. There should be no difference between the non-zero coefficients from NNLS and the coefficients that end up coming out of the subsequent LR.

Parameters:
  • df_train (pandas.DataFrame) – Data frame containing the features on which to train the model.

  • feature_columns (List[str]) – A list of feature columns to use in training the model.

Return type:

Tuple[Learner, RegressionResults, DataFrame, List[str]]

Returns:

  • learner (skll.learner.Learner) – The SKLL learner object.

  • fit (statsmodels.regression.linear_model.RegressionResults) – A statsmodels regression results object.

  • df_coef (pandas.DataFrame) – Data frame containing the model coefficients.

  • used_features (List[str]) – A list of features used in the final model.

train_non_negative_lr_iterative(df_train: DataFrame, feature_columns: List[str]) Tuple[Learner, RegressionResults, DataFrame, List[str]][source]

Train an “NNLR_iterative” model.

For applications where there is a concern that standard NNLS may not converge, an alternate method of training NNLR by iteratively fitting OLS models, checking the coefficients, and dropping negative coefficients. First, fit an OLS model. Then, identify any variables whose coefficients are negative. Drop these variables from the model. Finally, refit the model. If any coefficients are still negative, set these to zero.

Parameters:
  • df_train (pandas.DataFrame) – Data frame containing the features on which to train the model.

  • feature_columns (List[str]) – A list of feature columns to use in training the model.

Return type:

Tuple[Learner, RegressionResults, DataFrame, List[str]]

Returns:

  • learner (skll.learner.Learner) – The SKLL learner object.

  • fit (statsmodels.regression.linear_model.RegressionResults) – A statsmodels regression results object.

  • df_coef (pandas.DataFrame) – Data frame containing the model coefficients.

  • used_features (List[str]) – A list of features used in the final model.

train_positive_lasso_cv(df_train: DataFrame, feature_columns: List[str]) Tuple[Learner, None, DataFrame, List[str]][source]

Train a “PositiveLassoCV” model.

Do feature selection using lasso regression optimized for log likelihood using cross validation. All coefficients are constrained to have positive values.

Parameters:
  • df_train (pandas.DataFrame) – Data frame containing the features on which to train the model.

  • feature_columns (List[str]) – A list of feature columns to use in training the model.

Return type:

Tuple[Learner, None, DataFrame, List[str]]

Returns:

  • learner (skll.learner.Learner) – The SKLL learner object

  • fit (None) – This is always None since there is no OLS model fitted in this case.

  • df_coef (pandas.DataFrame) – Data frame containing the model coefficients.

  • used_features (List[str]) – A list of features used in the final model.

train_positive_lasso_cv_then_lr(df_train: DataFrame, feature_columns: List[str]) Tuple[Learner, RegressionResults, DataFrame, List[str]][source]

Train a “PositiveLassoCVThenLR” model.

First do feature selection using lasso regression optimized for log likelihood using cross validation and then use only those features to train a second linear regression.

Parameters:
  • df_train (pandas.DataFrame) – Data frame containing the features on which to train the model.

  • feature_columns (List[str]) – A list of feature columns to use in training the model.

Return type:

Tuple[Learner, RegressionResults, DataFrame, List[str]]

Returns:

  • learner (skll.learner.Learner) – The SKLL learner object.

  • fit (statsmodels.regression.linear_model.RegressionResults) – A statsmodels regression results object.

  • df_coef (pandas.DataFrame) – Data frame containing the model coefficients.

  • used_features (List[str]) – A list of features used in the final model.

train_rebalanced_lr(df_train: DataFrame, feature_columns: List[str]) Tuple[Learner, RegressionResults, DataFrame, List[str]][source]

Train a “RebalancedLR” model.

This model balances empirical weights by changing betas (adapted from here).

Parameters:
  • df_train (pandas.DataFrame) – Data frame containing the features on which to train the model.

  • feature_columns (List[str]) – A list of feature columns to use in training the model.

Return type:

Tuple[Learner, RegressionResults, DataFrame, List[str]]

Returns:

  • learner (skll.learner.Learner) – The SKLL learner object.

  • fit (statsmodels.regression.linear_model.RegressionResults) – A statsmodels regression results object.

  • df_coef (pandas.DataFrame) – Data frame containing the model coefficients.

  • used_features (List[str]) – A list of features used in the final model.

train_score_weighted_lr(df_train: DataFrame, feature_columns: List[str]) Tuple[Learner, RegressionResults, DataFrame, List[str]][source]

Train a “ScoreWeightedLR” model.

This is a linear regression model weighted by score.

Parameters:
  • df_train (pandas.DataFrame) – Data frame containing the features on which to train the model.

  • feature_columns (List[str]) – A list of feature columns to use in training the model.

Return type:

Tuple[Learner, RegressionResults, DataFrame, List[str]]

Returns:

  • learner (skll.learner.Learner) – The SKLL learner object

  • fit (statsmodels.regression.linear_model.RegressionResults) – A statsmodels regression results object.

  • df_coef (pandas.DataFrame) – Data frame containing the model coefficients.

  • used_features (List[str]) – A list of features used in the final model.

train_skll_model(model_name: str, df_train: DataFrame, custom_fixed_parameters: Dict[str, Any] | None = None, custom_objective: str | None = None, predict_expected_scores: bool = False, skll_grid_search_jobs: int = 1) Tuple[Learner, str][source]

Train a SKLL classification or regression model.

Parameters:
  • model_name (str) – Name of the SKLL model to train.

  • df_train (pandas.DataFrame) – Data frame containing the features on which to train the model.

  • custom_fixed_parameters (Optional[Dict[str, Any]]) – A dictionary containing any fixed parameters for the SKLL model. Defaults to None.

  • custom_objective (Optional[str]) – Name of custom user-specified objective. If not specified or None, “neg_mean_squared_error” is used as the objective. Defaults to None.

  • predict_expected_scores (bool) – Whether we want the trained classifiers to predict expected scores. Defaults to False.

  • skll_grid_search_jobs (int) – Number of folds to run in parallel when using SKLL grid search. Defaults to 1.

Return type:

Tuple[Learner, str]

Returns:

  • learner (skll.learner.Learner) – A SKLL Learner object of the appropriate type.

  • objective (str) – The chosen tuning objective.

From preprocessor Module

Classes for preprocessing input data in various contexts.

author:

Jeremy Biggs (jbiggs@ets.org)

author:

Anastassia Loukina (aloukina@ets.org)

author:

Nitin Madnani (nmadnani@ets.org)

organization:

ETS

class rsmtool.preprocessor.FeaturePreprocessor(logger: Logger | None = None)[source]

Bases: object

Class to preprocess features in training and testing sets.

Initialize the FeaturePreprocessor object.

check_model_name(model_name: str) str[source]

Check that the given model name is valid and determine its type.

Parameters:

model_name (str) – Name of the model.

Returns:

model_type – One of “BUILTIN” or “SKLL”.

Return type:

str

Raises:

ValueError – If the model is not supported.

check_subgroups(df: DataFrame, subgroups: List[str]) DataFrame[source]

Validate subgroup names in the given data.

Check that all subgroups, if specified, correspond to columns in the provided data frame, and replace all NaNs in subgroups values with ‘No info’ for later convenience.

Raises an exception if any specified subgroup columns are missing.

Parameters:
  • df (pandas.DataFrame) – Input data frame with subgroups to check.

  • subgroups (List[str]) – List of column names that contain grouping information.

Returns:

df – Modified input data frame with NaNs replaced.

Return type:

pandas.DataFrame

Raises:

KeyError – If the data does not contain columns for all specified subgroups.

filter_data(df: DataFrame, label_column: str, id_column: str, length_column: str | None, second_human_score_column: str | None, candidate_column: str, requested_feature_names: List[str], reserved_column_names: List[str], given_trim_min: float | None, given_trim_max: float | None, flag_column_dict: Dict[str, Any], subgroups: List[str], exclude_zero_scores: bool = True, exclude_zero_sd: bool = False, feature_subset_specs: DataFrame | None = None, feature_subset: str | None = None, min_candidate_items: int | None = None, use_fake_labels: bool = False) Tuple[DataFrame, DataFrame, DataFrame, DataFrame, DataFrame, DataFrame, DataFrame, float, float, List[str]][source]

Filter rows with zero/non-numeric values for label_column.

Check whether any features that are specifically requested in requested_feature_names are missing from the data. If no feature names are requested, the feature list is generated based on column names and subset information, if available. The function then excludes non-numeric values for any feature. It will also exclude zero scores if exclude_zero_scores is True. If the user requested to exclude candidates with less than min_candidate_items, such candidates are also excluded.

It also generates fake labels between 1 and 10 if use_fake_parameters is True. Finally, it renames the ID and label columns and splits the data into: (a) data frame with feature values and scores (b) data frame with information about subgroup and candidate (metadata) and (c) the data frame with all other columns.

Parameters:
  • df (pandas.DataFrame) – The data frame to filter.

  • label_column (str) – The label column in the data.

  • id_column (str) – The ID column in the data.

  • length_column (Optional[str]) – The length column in the data.

  • second_human_score_column (Optional[str]) – The second human score column in the data.

  • candidate_column (str) – The candidate column in the data.

  • requested_feature_names (List[str]) – A list of requested feature names.

  • reserved_column_names (List[str]) – A list of reserved column names.

  • given_trim_min (Optional[float]) – The minimum trim value.

  • given_trim_max (Optional[float]) – The maximum trim value.

  • flag_column_dict (Dict[str, Any]) – A dictionary of flag columns.

  • subgroups (List[str]) – List containing subgroup names.

  • exclude_zero_scores (bool) – Whether to exclude zero scores. Defaults to True.

  • exclude_zero_sd (bool) – Whether to exclude zero standard deviation. Defaults to False.

  • feature_subset_specs (Optional[pandas.DataFrame]) – The data frame containing the feature subset specifications. Defaults to None.

  • feature_subset (Optional[str]) – The feature subset group (e.g. ‘A’). Defaults to None.

  • min_candidate_items (Optional[int]) – The minimum number of items needed to include candidate. Defaults to None.

  • use_fake_labels (bool) – Whether to use fake labels. Defaults to False.

Return type:

Tuple[DataFrame, DataFrame, DataFrame, DataFrame, DataFrame, DataFrame, DataFrame, float, float, List[str]]

Returns:

  • df_filtered_features (pandas.DataFrame) – Data frame with filtered features.

  • df_filtered_metadata (pandas.DataFrame) – Data frame with filtered metadata.

  • df_filtered_other_columns (pandas.DataFrame) – Data frame with other columns filtered.

  • df_excluded (pandas.DataFrame) – Data frame with excluded records.

  • df_filtered_length (pandas.DataFrame) – Data frame with length column(s) filtered.

  • df_filtered_human_scores (pandas.DataFrame) – Data frame with human scores filtered.

  • df_responses_with_excluded_flags (pandas.DataFrame) – Data frame containing responses with excluded flags.

  • trim_min (float) – The maximum trim value.

  • trim_max (float) – The minimum trim value.

  • feature_names (List[str]) – A list of feature names.

filter_on_column(df: DataFrame, column: str, exclude_zeros: bool = False, exclude_zero_sd: bool = False) Tuple[DataFrame, DataFrame][source]

Filter out rows containing non-numeric values.

Filter out the rows in the given data frame that contain non-numeric (or zero, if specified) values in the specified column. Additionally, it may exclude any columns if they have a standard deviation (\(\\sigma\)) of 0.

Parameters:
  • df (pandas.DataFrame) – The data frame containing the data to be filtered.

  • column (str) – Name of the column from which to filter out values.

  • exclude_zeros (bool) – Whether to exclude responses containing zeros in the specified column. Defaults to False.

  • exclude_zero_sd (bool) – Whether to perform the additional filtering step of removing columns that have \(\\sigma = 0\). Defaults to False.

Return type:

Tuple[DataFrame, DataFrame]

Returns:

  • df_filtered (pandas.DataFrame) – Data frame containing the responses that were not filtered out.

  • df_excluded (pandas.DataFrame) – Data frame containing the non-numeric or zero responses that were filtered out.

Note

The columns with \(\\sigma=0\) are removed from both output data frames, assuming exclude_zero_scores is True.

filter_on_flag_columns(df: DataFrame, flag_column_dict: Dict[str, Any]) Tuple[DataFrame, DataFrame][source]

Filter based on specific flag columns.

Check that all flag_columns are present in the given data frame, convert these columns to strings and filter out the values which do not match the condition in flag_column_dict.

Parameters:
  • df (pandas.DataFrame) – The DataFrame to filter on.

  • flag_column_dict (Dict[str, Any]) – Dictionary containing the flag column information.

Return type:

Tuple[DataFrame, DataFrame]

Returns:

  • df_responses_with_requested_flags (pandas.DataFrame) – Data frame containing the responses remaining after filtering using the specified flag columns.

  • df_responses_with_excluded_flags (pandas.DataFrame) – Data frame containing the responses filtered out using the specified flag columns.

Raises:
  • KeyError – If the columns listed in the dictionary are not actually present in the data frame.

  • ValueError – If no responses remain after filtering based on the flag column information.

generate_feature_names(df: DataFrame, reserved_column_names: List[str], feature_subset_specs: DataFrame, feature_subset: str | None) List[str][source]

Generate feature names from column names in data frame.

This method also selects the specified subset of features.

Parameters:
  • df (pandas.DataFrame) – The data frame from which to generate feature names.

  • reserved_column_names (List[str]) – Names of reserved columns.

  • feature_subset_specs (pandas.DataFrame) – Feature subset specifications.

  • feature_subset (Optional[str]) – Feature subset column.

Returns:

feature_names – List of generated features names.

Return type:

List[str]

preprocess_feature(values: ndarray, feature_name: str, feature_transform: str, feature_mean: float, feature_sd: float, exclude_zero_sd: bool = False, raise_error: bool = True, truncations: DataFrame | None = None, truncate_outliers: bool = True) ndarray[source]

Remove outliers and transform the values in given numpy array.

Use the given outlier and transformation parameters.

Parameters:
  • values (numpy.ndarray) – The feature values to preprocess.

  • feature_name (str) – Name of the feature being pre-processed.

  • feature_transform (str) – Name of the transformation function to apply.

  • feature_mean (float) – Mean value to use for outlier detection instead of the mean of the given feature values.

  • feature_sd (float) – Std. dev. value to use for outlier detection instead of the std. dev. of the given feature values.

  • exclude_zero_sd (bool) – Exclude the feature if it has zero standard deviation. Defaults to False.

  • raise_error (bool) – Raise an error if any of the transformations lead to “inf” values or may change the ranking of feature values. Defaults to True.

  • truncations (Optional[pandas.DataFrame]) – Set of pre-defined truncation values. Defaults to None.

  • truncate_outliers (bool) – Whether to truncate outlier values. Defaults to True.

Returns:

transformed_feature – Numpy array containing the transformed and clamped feature values.

Return type:

numpy.ndarray

Raises:

ValueError – If the preprocessed feature values have zero standard deviation and exclude_zero_sd is set to True.

preprocess_features(df_train: DataFrame, df_test: DataFrame, df_feature_specs: DataFrame, standardize_features: bool = True, use_truncations: bool = False, truncate_outliers: bool = True) Tuple[DataFrame, DataFrame, DataFrame][source]

Preprocess features in given data using corresponding specifications.

Preprocess the feature values in the training and testing data frames whose specifications are contained in df_feature_specs. Also returns a third data frame containing the feature specifications and other information.

Parameters:
  • df_train (pandas.DataFrame) – Data frame containing the raw feature values for the training set.

  • df_test (pandas.DataFrame) – Data frame containing the raw feature values for the test set.

  • df_feature_specs (pandas.DataFrame) – Data frame containing the various specifications from the feature file.

  • standardize_features (bool) – Whether to standardize the features. Defaults to True.

  • truncate_outliers (bool) – Truncate outlier values if set in the configuration file. Defaults to True.

  • use_truncations (bool) – Whether we should use the truncation set for removing outliers. Defaults to False.

Return type:

Tuple[DataFrame, DataFrame, DataFrame]

Returns:

  • df_train_preprocessed (pandas.DataFrame) – Data frame with preprocessed training data.

  • df_test_preprocessed (pandas.DataFrame) – Data frame with preprocessed test data.

  • df_feature_info (pandas.DataFrame) – Data frame with feature information.

preprocess_new_data(df_input: DataFrame, df_feature_info: DataFrame, standardize_features: bool = True, truncate_outliers: bool = True) Tuple[DataFrame, DataFrame][source]

Preprocess feature values using the parameters in df_feature_info.

For more details on what these preprocessing parameters are, see documentation.

Parameters:
  • df_input (pandas.DataFrame) – Data frame with raw feature values that will be used to generate the scores. Each feature is stored in a separate column. Each row corresponds to one response. There should also be a column named “spkitemid” containing a unique ID for each response.

  • df_feature_info (pandas.DataFrame) –

    Data frame with preprocessing parameters in the following columns:

    • ”feature” : the name of the feature; should match the feature names in df_input.

    • ”sign” : 1 or -1. Indicates whether the feature value needs to be multiplied by -1.

    • ”transform” : transformation that needs to be applied to this feature.

    • ”train_mean”, “train_sd” : mean and standard deviation for outlier truncation.

    • ”train_transformed_mean”, “train_transformed_sd” : mean and standard deviation for computing z-scores.

  • standardize_features (bool) – Whether the features should be standardized prior to prediction. Defaults to True.

  • truncate_outliers (bool) – Whether outlier should be truncated prior to prediction. Defaults to True.

Return type:

Tuple[DataFrame, DataFrame]

Returns:

  • df_features_preprocessed (pandas.DataFrame) – Data frame with processed feature values.

  • df_excluded (pandas.DataFrame) – Data frame with responses excluded from further analysis due to non-numeric feature values in the original file or after applying transformations. This data frame always contains the original feature values.

Raises:
  • KeyError – if some of the features specified in df_feature_info are not present in df_input.

  • ValueError – If all responses have at least one non-numeric feature value and, therefore, no score can be generated for any of the responses.

process_data(config_obj: Configuration, data_container_obj: DataContainer, context: str = 'rsmtool') Tuple[Configuration, DataContainer][source]

Process and setup the data for an experiment in the given context.

Parameters:
  • config_obj (Configuration) – The configuration object.

  • data_container_obj (DataContainer) – The data container object.

  • context (str) – The tool context: one of {“rsmtool”, “rsmeval”, “rsmpredict”}. Defaults to “rsmtool”.

Return type:

Tuple[Configuration, DataContainer]

Returns:

  • config_obj (Configuration) – New configuration object containing the updated configuration.

  • data_container (DataContainer) – new data container object containing the preprocessed data.

Raises:

ValueError – If the context is not one of {“rsmtool”, “rsmeval”, “rsmpredict”}.

process_data_rsmeval(config_obj: Configuration, data_container_obj: DataContainer) Tuple[Configuration, DataContainer][source]

Set up rsmeval experiment by loading & preprocessing evaluation data.

This function takes a configuration object and a container object as input and returns the same types of objects as output after the loading, normalizing, and preprocessing.

Parameters:
Return type:

Tuple[Configuration, DataContainer]

Returns:

  • config_obj (Configuration) – New configuration object containing the updated rsmeval configuration.

  • data_container (DataContainer) – New data container object containing the preprocessed data.

Raises:
  • KeyError – If columns specified in the configuration do not exist in the predictions file.

  • ValueError – If the columns containing the human scores and the system scores in the predictions file have the same name.

  • ValueError – If the columns containing the first set of human scores and the second set of human scores in the predictions file have the same name.

  • ValueError – If the predictions file contains the same response ID more than once.

  • ValueError – No responses were left after filtering out zero or non-numeric values for the various columns.

process_data_rsmexplain(config_obj: Configuration, data_container_obj: DataContainer) Tuple[Configuration, DataContainer][source]

Process data for rsmexplain experiments.

This function takes a configuration object and a container object as input and returns the same types of objects as output after the loading, normalizing, and preprocessing.

Parameters:
Return type:

Tuple[Configuration, DataContainer]

Returns:

  • config_obj (Configuration) – New configuration object containing the updated rsmexplain configuration.

  • data_congtainer (DataContainer) – New data container object containing the preprocessed data.

Raises:

ValueError – If data contains duplicate response IDs.

process_data_rsmpredict(config_obj: Configuration, data_container_obj: DataContainer) Tuple[Configuration, DataContainer][source]

Process data for rsmpredict experiments.

This function takes a configuration object and a container object as input and returns the same types of objects as output after the loading, normalizing, and preprocessing.

Parameters:
Return type:

Tuple[Configuration, DataContainer]

Returns:

  • config_obj (Configuration) – New configuration object containing the updated rsmpredict configuration.

  • data_congtainer (DataContainer) – New data container object containing the preprocessed data.

Raises:
  • KeyError – If columns specified in the configuration do not exist in the data.

  • ValueError – If data contains duplicate response IDs.

process_data_rsmtool(config_obj: Configuration, data_container_obj: DataContainer) Tuple[Configuration, DataContainer][source]

Set up rsmtool experiment by loading & preprocessing train/test data.

This function takes a configuration object and a container object as input and returns the same types of objects as output after the loading, normalizing, and preprocessing.

Parameters:
Return type:

Tuple[Configuration, DataContainer]

Returns:

  • config_obj (Configuration) – New configuration object containing the updated rsmtool configuration.

  • data_container (DataContainer) – New data container object containing the preprocessed data.

Raises:
  • ValueError – If columns specified in the configuration do not exist in the data.

  • ValueError – If the test label column and second human score columns have the same name.

  • ValueError – If the length column is requested as a feature.

  • ValueError – If the second human score column is requested as a feature.

  • ValueError – If “use_truncations” was specified in the configuration, but no feature CSV file was found.

process_predictions(df_test_predictions: DataFrame, train_predictions_mean: float, train_predictions_sd: float, human_labels_mean: float, human_labels_sd: float, trim_min: float, trim_max: float, trim_tolerance: float = 0.4998) DataFrame[source]

Process predictions to create scaled, trimmed and rounded predictions.

Parameters:
  • df_test_predictions (pandas.DataFrame) – Data frame containing the test set predictions.

  • train_predictions_mean (float) – The mean of the predictions on the training set.

  • train_predictions_sd (float) – The std. dev. of the predictions on the training set.

  • human_labels_mean (float) – The mean of the human scores used to train the model.

  • human_labels_sd (float) – The std. dev. of the human scores used to train the model.

  • trim_min (float) – The lowest score on the score point, used for trimming the raw regression predictions.

  • trim_max (float) – The highest score on the score point, used for trimming the raw regression predictions.

  • trim_tolerance (float) – Tolerance to be added to trim_max and substracted from trim_min. Defaults to 0.4998.

Returns:

df_pred_processed – Data frame containing the various trimmed and rounded predictions.

Return type:

pandas.DataFrame

static remove_outliers(values: ndarray, mean: float | None = None, sd: float | None = None, sd_multiplier: int = 4) ndarray[source]

Remove outliers from given array of values by clamping them.

Clamp any given values that are ± sd_multiplier (\(m\)) standard deviations (\(\\sigma\)) away from the mean (\(\\mu\)). Use given mean and sd instead of computing \(\\sigma\) and \(\\mu\), if specified. The values are clamped to the interval:

\[\begin{split}[\\mu - m * \\sigma, \\mu + m * \\sigma]\end{split}\]
Parameters:
  • values (numpy.ndarray) – The values from which to remove outliers, usually corresponding to a given feature.

  • mean (Optional[float]) – Use the given mean value when computing outliers instead of the mean from the data. Defaults to None.

  • sd (Optional[float]) – Use the given std. dev. value when computing outliers instead of the std. dev. from the data. Defaults to None.

  • sd_multiplier (int) – Use the given multipler for the std. dev. when computing the outliers. Defaults to 4.

Returns:

new_values – Numpy array with the outliers clamped.

Return type:

numpy.ndarray

remove_outliers_using_truncations(values: ndarray, feature_name: str, truncations: DataFrame) ndarray[source]

Remove outliers using pre-specified truncation groups.

This is different from remove_outliers() which calculates the outliers based on the training set rather than looking up the truncation values from a pre-specified data frame.

Parameters:
  • values (numpy.ndarray) – The values from which to remove outliers, usually corresponding to a given feature.

  • feature_name (str) – Name of the feature whose outliers are being clamped.

  • truncations (pandas.DataFrame) – A data frame with truncation values. The features should be set as the index.

Returns:

new_values – Numpy array with the outliers clamped.

Return type:

numpy.ndarray

rename_default_columns(df: DataFrame, requested_feature_names: List[str], id_column: str, first_human_score_column: str | None, second_human_score_column: str | None, length_column: str | None, system_score_column: str | None, candidate_column: str | None) DataFrame[source]

Standardize column names and rename columns with reserved column names.

RSMTool reserves some column names for internal use, e.g., “sc1”, “spkitemid” etc. If the given data already contains columns with these names, then they must be renamed to prevent conflict. This method renames such columns to “##NAME##”, e.g., an existing column named “sc1” will be renamed to “##sc1##”.

Parameters:
  • df (pandas.DataFrame) – The data frame containing the columns to rename.

  • requested_feature_names (List[str]) – List of feature column names that we want to include in the scoring model.

  • id_column (str) – Column name containing the response IDs.

  • first_human_score_column (Union[str, None]) – Column name containing the H1 scores. Should be None if no H1 scores are available.

  • second_human_score_column (Union[str, None]) – Column name containing the H2 scores. Should be None if no H2 scores are available.

  • length_column (Union[str, None]) – Column name containing response lengths. Should be None if lengths are not available.

  • system_score_column (Union[str, None]) – Column name containing the score predicted by the system. This is only used for rsmeval.

  • candidate_column (Union[str, None]) – Column name containing identifying information at the candidate level. Should be None if such information is not available.

Returns:

df – Modified input data frame with all the approximate re-namings.

Return type:

pandas.DataFrame

select_candidates(df: DataFrame, N: int, candidate_col: str = 'candidate') Tuple[DataFrame, DataFrame][source]

Select candidates which have responses to N or more items.

Parameters:
  • df (pandas.DataFrame) – The data frame from which to select candidates with N or more items.

  • N (int) – Minimal number of items per candidate

  • candidate_col (str) – Name of the column which contains candidate ids. Defaults to “candidate”.

Return type:

Tuple[DataFrame, DataFrame]

Returns:

  • df_included (pandas.DataFrame) – Data frame with responses from candidates with responses to N or more items.

  • df_excluded (pandas.DataFrame) – Data frame with responses from candidates with responses to less than N items.

trim(values: List[float] | ndarray, trim_min: float, trim_max: float, tolerance: float = 0.4998) ndarray[source]

Trim values in given numpy array.

The trimming uses trim_min - tolerance as the floor and trim_max + tolerance as the ceiling.

Parameters:
  • values (Union[List[float], numpy.ndarray]) – The values to trim.

  • trim_min (float) – The lowest score on the score point, used for trimming the raw regression predictions.

  • trim_max (float) – The highest score on the score point, used for trimming the raw regression predictions.

  • tolerance (float) – The tolerance that will be used to compute the trim interval. Defaults to 0.4998.

Returns:

trimmed_values – Numpy array containing the trimmed values.”

Return type:

numpy.ndarray

Raises:

ValueError – If trim_min, trim_max, or tolerance are None.

class rsmtool.preprocessor.FeatureSpecsProcessor(logger: Logger | None = None)[source]

Bases: object

Encapsulate feature file processing methods.

Initialize the FeatureSpecsProcessor object.

find_feature_sign(feature: str, sign_dict: Dict[str, str]) float[source]

Get the feature sign from the feature CSV file.

Parameters:
  • feature (str) – The name of the feature.

  • sign_dict (dict) – A dictionary of feature signs.

Returns:

feature_sign_numeric – The signed feature.

Return type:

float

generate_default_specs(feature_names: List[str]) DataFrame[source]

Generate default feature “specifications” for given feature names.

The specifications are stored as a data frame with three columns “feature”, “transform”, and “sign”.

Parameters:

feature_names (List[str]) – List of feature names for which to generate specifications.

Returns:

feature_specs – A dataframe with feature specifications that can be saved as a feature list file.

Return type:

pandas.DataFrame

Note

Since these are default specifications, the values for the “transform” column for each feature will be “raw” and the value for the “sign” column will be 1.

generate_specs(df: DataFrame, feature_names: List[str], train_label: str, feature_subset: DataFrame | None = None, feature_sign: int | None = None) DataFrame[source]

Generate feature specifications using the feature CSV file.

Compute the specifications for “sign” and the correlation with score to identify the best transformation.

Parameters:
  • df (pandas.DataFrame) – The input data frame from which to generate the specifications.

  • feature_names (List[str]) – A list of feature names.

  • train_label (str) – The label column for the training data

  • feature_subset (Optional[pandas.DataFrame]) – A data frame containing the feature subset specifications, if any. Defaults to None.

  • feature_sign (Optional[int]) – The sign of the feature. Defaults to None.

Returns:

df_feature_specs – The output data frame containing the feature specifications.

Return type:

pandas.DataFrame

validate_feature_specs(df: DataFrame, use_truncations: bool = False) DataFrame[source]

Validate given feature specifications.

Check given feature specifications to make sure that there are no duplicate feature names and that all columns are in the right format. Add the default values for “transform” and “sign” if none are given.

Parameters:
  • df (pandas.DataFrame) – The feature specification DataFrame to validate.

  • use_truncations (bool) – Whether to use truncation values. If this is True and truncation values are not specified, an exception is raised. Defaults to False.

Returns:

df_specs_new – The output data frame with normalized values.

Return type:

pandas.DataFrame

Raises:
  • KeyError – If the input data frame does not have a “feature” column.

  • ValueError – If there are duplicate values in the “feature” column.

  • ValueError – if the “sign” column contains invalid values.

  • ValueError – If use_truncations is set to True, and no “min” and “max” columns exist in the data set.

class rsmtool.preprocessor.FeatureSubsetProcessor(logger: Logger | None = None)[source]

Bases: object

Class to encapsulate feature sub-setting methods.

Initialize the FeatureSubsetProcessor object.

check_feature_subset_file(df: DataFrame, subset: str | None = None, sign: str | None = None) None[source]

Check that feature subset file is complete and in the correct format.

Raises an exception if it finds any errors but otherwise returns nothing.

Parameters:
  • df (pandas.DataFrame) – The data frame containing the feature subset file.

  • subset (Optional[str]) – Name of a pre-defined feature subset. Defaults to None.

  • sign (Optional[str]) – Value of the sign. Defaults to None.

Raises:
  • ValueError – If any columns are missing from the subset file.

  • ValueError – If any of the columns contain invalid values.

Return type:

None

select_by_subset(feature_columns: List[str], feature_subset_specs: DataFrame, subset: str) List[str][source]

Select feature columns using feature subset specifications.

Parameters:
  • feature_columns (List[str]) – A list of feature columns

  • feature_subset_specs (pandas.DataFrame) – The feature subset specification data frame.

  • subset (str) – The column to subset.

Returns:

feature_names – A list of feature names to include.

Return type:

List[str]

From prmse Module

rsmtool.utils.prmse.prmse_true(system: ndarray, human_scores: ndarray, variance_errors_human: float | None = None) float | None[source]

Compute PRMSE when predicting true score from system scores.

PRMSE = Proportional Reduction in Mean Squared Error. The formula to compute PRMSE implemented in RSMTool was derived at ETS by Matthew S. Johnson. See Loukina et al. (2020) for further information about PRMSE.

Parameters:
  • system (numpy.ndarray) – System scores for each response of shape (n_samples,).

  • human_scores (numpy.ndarray) – Human ratings for each response of shape (n_samples, n_ratings).

  • variance_errors_human (Optional[float]) – Estimated variance of errors in human scores. If None, the variance will be estimated from the data. In this case at least some responses must have more than one human score. Defaults to None.

Returns:

prmse – Proportional reduction in mean squared error. If the variance of errors in human scores is not available and cannot be estimated from the data, returns None.

Return type:

Optional[float]

Raises:

ValueError – If variance of true scores or MSE could not be computed.

rsmtool.utils.prmse.variance_of_errors(human_scores: ndarray) float | None[source]

Estimate the variance of errors in human scores.

Parameters:

human_scores (numpy.ndarray) – Human ratings for each response of shape (n_samples, n_ratings).

Returns:

variance_of_errors – Estimated variance of errors in human scores. If the variance of errors cannot be estimated from the data, returns None.

Return type:

Optional[float]

From reader Module

Classes for reading data files (or dictionaries) into DataContainer objects.

author:

Jeremy Biggs (jbiggs@ets.org)

author:

Anastassia Loukina (aloukina@ets.org)

author:

Nitin Madnani (nmadnani@ets.org)

organization:

ETS

class rsmtool.reader.DataReader(filepaths: List[str], framenames: List[str], file_converters: Dict[str, Dict[str, Any]] | None = None)[source]

Bases: object

Class to generate DataContainer objects.

Initialize a DataReader object.

Parameters:
  • filepaths (List[str]) – A list of paths to files that are to be read in. Some of the paths can be empty strings.

  • framenames (List[str]) – A list of names for the data sets to be included in the container.

  • file_converters (Optional[Dict[str, Dict[str, Any]]]) – A dictionary of file converter dicts. The keys are the data set names and the values are the converter dictionaries to be applied to the corresponding data set. Defaults to None.

Raises:
  • AssertionError – If len(filepaths) does not equal len(framenames).

  • ValueError – If file_converters is not a dictionary or if any of its values is not a dictionary.

  • NameError – If a key in file_converters does not exist in framenames.

  • ValueError – If any of the specified file paths is None.

static locate_files(filepaths: str | List[str], configdir: str) List[str][source]

Locate an experiment file, or a list of experiment files.

If the given path doesn’t exist, then maybe the path is relative to the path of the config file. If neither exists, then return an empty string.

Parameters:
  • filepaths (Union[str, List[str]]) – Name(s) of the experiment file we want to locate.

  • configdir (str) – Path to the reference configuration directory (usually the directory of the config file)

Returns:

retval – List of absolute paths to the located files. If a file does not exist, the corresponding element in the list is an empty string.

Return type:

List[str]

Raises:

ValueError – If filepaths is not a string or a list.

read(kwargs_dict: Dict[str, Dict[str, Any]] | None = None) DataContainer[source]

Read all files contained in self.dataset_paths.

Parameters:

kwargs_dict (Optional[Dict[str, Dict[str, Any]]]) – Dictionary with the names of the datasets as keys and dictionaries of keyword arguments to pass to the pandas reader for each dataset as values. The keys in those dictionaries are the names of the keyword arguments and the values are the values of the keyword arguments. Defaults to None.

Returns:

datacontainer – A data container object.

Return type:

DataContainer

Raises:

FileNotFoundError – If any of the files in self.dataset_paths does not exist.

static read_from_file(filename: str, converters: Dict[str, Any] | None = None, **kwargs)[source]

Read a CSV/TSV/XLSX/JSONLINES/SAS7BDAT file and return a data frame.

Parameters:
  • filename (str) – Name of file to read.

  • converters (Optional[Dict[str, Any]]) – A dictionary specifying how the types of the columns in the file should be converted. Specified in the same format as for pandas.read_csv()`. Defaults to None.

Returns:

df – Data frame containing the data in the given file.

Return type:

pandas.DataFrame

Raises:

Note

Any additional keyword arguments are passed to the underlying pandas IO reader function.

rsmtool.reader.read_jsonlines(filename: str, converters: Dict[str, Any] | None = None) DataFrame[source]

Read a data file in .jsonlines format into a data frame.

Normalize nested jsons with up to one level of nesting.

Parameters:
  • filename (str) – Name of file to read.

  • converters (Optional[Dict[str, Any]]) – A dictionary specifying how the types of the columns in the file should be converted. Specified in the same format as for pandas.read_csv(). Defaults to None.

Returns:

df – Data frame containing the data in the given file.

Return type:

pandas.DataFrame

rsmtool.reader.try_to_load_file(filename: str, converters: Dict[str, Any] | None = None, raise_error: bool = False, raise_warning: bool = False, **kwargs) DataFrame | None[source]

Read a single file, if it exists.

Optionally raises an error or warning if the file cannot be found. Otherwise, returns None.

Parameters:
  • filename (str) – Name of file to read.

  • converters (Optional[Dict[str, Any]]) – A dictionary specifying how the types of the columns in the file should be converted. Specified in the same format as for pandas.read_csv(). Defaults to None.

  • raise_error (bool) – Raise an error if the file cannot be located. Defaults to False.

  • raise_warning (bool) – Raise a warning if the file cannot be located. Defaults to False.

Returns:

df – DataFrame containing the data in the given file, or None if the file does not exist.

Return type:

Optional[pandas.DataFrame]

Raises:

FileNotFoundError – If raise_error is True and the file cannot be located.

From reporter Module

Classes for dealing with report generation.

author:

Jeremy Biggs (jbiggs@ets.org)

author:

Anastassia Loukina (aloukina@ets.org)

author:

Nitin Madnani (nmadnani@ets.org)

organization:

ETS

class rsmtool.reporter.Reporter(logger: Logger | None = None, wandb_run: Run | RunDisabled | None = None)[source]

Bases: object

Class to generate Jupyter notebook reports and convert them to HTML.

Initialize the Reporter object.

Parameters:
  • logger (Optional[logging.Logger]) – A Logger object. If None is passed, get logger from __name__. Defaults to None.

  • wandb_run (Union[wandb.wandb_run.Run, wandb.sdk.lib.RunDisabled, None]) – A wandb run object that will be used to log artifacts and tables. If None is passed, a new wandb run will be initialized if wandb is enabled in the configuration. Defaults to None.

static check_section_names(specified_sections: List[str], section_type: str, context: str = 'rsmtool') None[source]

Validate the specified section names.

This function checks whether the specified section names are valid and raises an exception if they are not.

Parameters:
  • specified_sections (List[str]) – List of report section names.

  • section_type (str) – One of “general” or “special”.

  • context (str) – Context in which we are validating the section names. One of {"rsmtool", "rsmeval", "rsmcompare"}. Defaults to "rsmtool".

Raises:

ValueError – If any of the section names of the given type are not valid in the context of the given tool.

Return type:

None

static check_section_order(chosen_sections: List[str], section_order: List[str]) None[source]

Check the order of the specified sections.

Parameters:
  • chosen_sections (List[str]) – List of chosen section names.

  • section_order (List[str]) – An ordered list of the chosen section names.

Raises:

ValueError – If any sections specified in the order are missing from the list of chosen sections or vice versa.

Return type:

None

static convert_ipynb_to_html(notebook_file: str, html_file: str)[source]

Convert given Jupyter notebook (.ipynb) to HTML file.

Parameters:
  • notebook_file (str) – Path to input Jupyter notebook file.

  • html_file (str) – Path to output HTML file.

Note

This function is also exposed as the render_notebook command-line utility.

create_comparison_report(config, csvdir_old, figdir_old, csvdir_new, figdir_new, output_dir)[source]

Generate an HTML report for comparing two rsmtool experiments.

Parameters:
  • config (configuration_parser.Configuration) – A configuration object

  • csvdir_old (str) – The old experiment CSV output directory.

  • figdir_old (str) – The old figure output directory

  • csvdir_new (str) – The new experiment CSV output directory.

  • figdir_new (str) – The old figure output directory

  • output_dir (str) – The output dir for the new report.

create_explanation_report(config, csv_dir, output_dir)[source]

Generate a html report for rsmexplain.

Parameters:
  • config (configuration_parser.Configuration) – A configuration object

  • csv_dir (str) – The experiment output directory containing CSV files with SHAP values

  • output_dir (str) – The directory for the html report

create_report(config: Configuration, csvdir: str, figdir: str, context: str = 'rsmtool') None[source]

Generate HTML report for an rsmtool/rsmeval experiment.

Parameters:
  • config (Configuration) – A configuration object

  • csvdir (str) – The CSV output directory.

  • figdir (str) – The figure output directory

  • context (str) – Context of the tool in which we are validating. One of {"rsmtool", "rsmeval"}. Defaults to "rsmtool".

Raises:

KeyError – If the test_file_location or pred_file_location fields are not specified in the given configuration.

Return type:

None

create_summary_report(config, all_experiments, csvdir)[source]

Generate an HTML report for summarizing the given rsmtool experiments.

Parameters:
determine_chosen_sections(general_sections: List[str], custom_sections: List[str], subgroups: List[str], context: str = 'rsmtool') List[str][source]

Compile a combined list of section names to be included in the report.

Parameters:
  • general_sections (List[str]) – List of specified general section names.

  • custom_sections (List[str]) – List of specified custom sections, if any.

  • subgroups (List[str]) – List of column names that contain grouping information.

  • context (str) – Context of the tool in which we are validating. One of {"rsmtool", "rsmeval", "rsmcompare"} Defaults to "rsmtool".

Returns:

chosen_sections – Final list of chosen sections that are to be included in the HTML report.

Return type:

List[str]

Raises:

ValueError – If a subgroup report section is requested but no subgroups were specified in the configuration file.

get_ordered_notebook_files(general_sections: List[str], custom_sections: List[str] = [], section_order: List[str] | None = None, subgroups: List[str] = [], model_type: str | None = None, context: str = 'rsmtool') List[str][source]

Check all section names and the order of the sections.

Combine all section names with the appropriate file mapping, and generate an ordered list of notebook files that are needed to generate the final report.

Parameters:
  • general_sections (List[str]) – List of specified general sections.

  • custom_sections (List[str]) – List of specified custom sections, if any. Defaults to [].

  • section_order (Optional[List[str]]) – Ordered list in which the user wants the specified sections. Defaults to None.

  • subgroups (List[str]) – List of column names that contain grouping information. Defaults to [].

  • model_type (Optional[List[str]]) – Type of the model. Possible values are {"BUILTIN", "SKLL", None.}. We allow None here so that rsmeval can use the same function. Defaults to None.

  • context (str) – Context of the tool in which we are validating. One of {"rsmtool", "rsmeval", "rsmcompare"}. Defaults to "rsmtool".

Returns:

chosen_notebook_files – List of the IPython notebook files that have to be rendered into the HTML report.

Return type:

List[str]

get_section_file_map(custom_sections: List[str], model_type: str | None = None, context: str = 'rsmtool') Dict[str, str][source]

Map section names to IPython notebook filenames.

Parameters:
  • custom_sections (List[str]) – List of custom sections.

  • model_type (Optional[str]) – Type of the model. One of {"BUILTIN", "SKLL", None}. We allow None here so that rsmeval can use the same function. Defaults to None.

  • context (str) – Context of the tool in which we are validating. One of {"rsmtool", "rsmeval", "rsmcompare"}. Defaults to "rsmtool".

Returns:

section_file_map – Dictionary mapping each section name to the corresponding IPython notebook filename.

Return type:

dict

static locate_custom_sections(custom_report_section_paths: List[str], configdir: str) List[str][source]

Locate custom report section files.

Get the absolute paths for custom report sections and check that the files exist. If a file does not exist, raise an exception.

Parameters:
  • custom_report_section_paths (List[str]) – List of paths to IPython notebook files representing the custom sections.

  • configdir (str) – Path to the experiment configuration directory.

Returns:

custom_report_sections – List of absolute paths to the custom section notebooks.

Return type:

List[str]

Raises:

FileNotFoundError – If any of the files cannot be found.

static merge_notebooks(notebook_files: List[str], output_file: str) None[source]

Merge the given Jupyter notebooks into a single Jupyter notebook.

Parameters:
  • notebook_files (List[str]) – List of paths to the input Jupyter notebook files.

  • output_file (str) – Path to output Jupyter notebook file

Return type:

None

From transformer Module

Class for transforming features.

author:

Jeremy Biggs (jbiggs@ets.org)

author:

Anastassia Loukina (aloukina@ets.org)

author:

Nitin Madnani (nmadnani@ets.org)

organization:

ETS

class rsmtool.transformer.FeatureTransformer(logger: Logger | None = None)[source]

Bases: object

Encapsulate feature transformation methods.

Initialize the FeatureTransformer object.

Parameters:

logger (Optional[logging.Logger]) – Logger object to use in the transformer. If not provided, a logger will be created with the name of this class.

apply_add_one_inverse_transform(name: str, values: ndarray, raise_error: bool = True) ndarray[source]

Apply the “addOneInv” (add one and invert) transform to values.

Parameters:
  • name (str) – Name of the feature to transform.

  • values (numpy.ndarray) – Numpy array containing the feature values.

  • raise_error (bool) – If True, raises an error if the transform is applied to a feature that has zero or negative values. Defaults to True.

Returns:

new_data – Numpy array containing the transformed feature values.

Return type:

numpy.ndarray

Raises:

ValueError – If the transform is applied to a feature that has negative values and raise_error is True.

apply_add_one_log_transform(name: str, values: ndarray, raise_error: bool = True) ndarray[source]

Apply the “addOneLn” (add one and log) transform to values.

Parameters:
  • name (str) – Name of the feature to transform.

  • values (numpy.ndarray) – Numpy array containing the feature values.

  • raise_error (bool) – If True, raises an error if the transform is applied to a feature that has zero or negative values. Defaults to True.

Returns:

new_data – Numpy array that contains the transformed feature values.

Return type:

numpy.ndarray

Raises:

ValueError – If the transform is applied to a feature that has negative values and raise_error is True.

apply_inverse_transform(name: str, values: ndarray, raise_error: bool = True, sd_multiplier: int = 4) ndarray[source]

Apply the “inv” (inverse) transform to values.

Parameters:
  • name (str) – Name of the feature to transform.

  • values (numpy.ndarray) – Numpy array containing the feature values.

  • raise_error (bool) – If True, raises an error if the transform is applied to a feature that has zero values or to a feature that has both positive and negative values. Defaults to True.

  • sd_multiplier (int) – Use this std. dev. multiplier to compute the ceiling and floor for outlier removal and check that these are not equal to zero. Defaults to 4.

Returns:

new_data – Numpy array containing the transformed feature values.

Return type:

numpy.ndarray

Raises:

ValueError – If the transform is applied to a feature that is zero or to a feature that can have different signs, and raise_error is True.

apply_log_transform(name: str, values: ndarray, raise_error: bool = True) ndarray[source]

Apply the “log” transform to values.

Parameters:
  • name (str) – Name of the feature to transform.

  • values (numpy.ndarray) – Numpy array containing the feature values.

  • raise_error (bool) – If True, raises an error if the transform is applied to a feature that has zero or negative values. Defaults to True.

Returns:

new_data – Numpy array containing the transformed feature values.

Return type:

numpy.ndarray

Raises:

ValueError – If the transform is applied to a feature that has zero or negative values and raise_error is True.

apply_sqrt_transform(name: str, values: ndarray, raise_error: bool = True) ndarray[source]

Apply the “sqrt” transform to values.

Parameters:
  • name (str) – Name of the feature to transform.

  • values (numpy.ndarray) – Numpy array containing the feature values.

  • raise_error (bool) – If True, raises an error if the transform is applied to a feature that has negative values. Defaults to True.

Returns:

new_data – Numpy array containing the transformed feature values.

Return type:

numpy.ndarray

Raises:

ValueError – If the transform is applied to a feature that has negative values and raise_error is True.

find_feature_transform(feature_name: str, feature_value: Series, scores: Series) str[source]

Identify best transformation for feature given correlation with score.

The best transformation is chosen based on the absolute Pearson correlation with human score.

Parameters:
  • feature_name (str) – Name of feature for which to find the transformation.

  • feature_value (pandas.Series) – Series containing feature values.

  • scores (pandas.Series) – Numeric human scores.

Returns:

best_transformation – The name of the transformation which gives the highest correlation between the feature values and the human scores. See documentation for the full list of transformations.

Return type:

str

transform_feature(values: ndarray, column_name: str, transform: str, raise_error: bool = True) ndarray[source]

Apply given transform to all values in the given numpy array.

The values are assumed to be for the feature with the given name.

Parameters:
  • values (numpy.ndarray) – Numpy array containing the feature values.

  • column_name (str) – Name of the feature to transform.

  • transform (str) – Name of the transform to apply. One of {"inv", "sqrt", "log", "addOneInv", "addOneLn", "raw", "org"}.

  • raise_error (bool) – If True, raise a ValueError if a transformation leads to invalid values or may change the ranking of the responses. Defaults to True.

Returns:

new_data – Numpy array containing the transformed feature values.

Return type:

numpy.ndarray

Raises:

ValueError – If the given transform is not recognized.

Note

Many of these transformations may be meaningless for features which span both negative and positive values. Some transformations may throw errors for negative feature values.

From utils Module

class rsmtool.utils.commandline.ConfigurationGenerator(context: str, as_string: bool = False, suppress_warnings: bool = False, use_subgroups: bool = False)[source]

Class to encapsulate automated batch-mode and interactive generation.

context

Name of the command-line tool for which we are generating the configuration file.

Type:

str

as_string

If True, return a formatted and indented string representation of the configuration, rather than a dictionary. Note that this only affects the batch-mode generation. Interactive generation always returns a string. Defaults to False.

Type:

bool

suppress_warnings

If True, do not generate any warnings for batch-mode generation. Defaults to False.

Type:

bool

use_subgroups

If True, include subgroup-related sections in the list of general sections in the configuration file. Defaults to False.

Type:

bool

Create a new ConfigurationGenerator instance.

See attributes above.

ConfigurationGenerator.generate() str | Dict[str, Any][source]

Automatically generate an example configuration in batch mode.

Returns:

configuration – The generated configuration either as a dictionary or a formatted string, depending on the value of the as_string attribute.

Return type:

Union[str, Dict[str, Any]]

rsmtool.utils.metrics.agreement(score1: List[int], score2: List[int], tolerance: int = 0) float[source]

Compute the agreement between two raters, under given tolerance.

Parameters:
  • score1 (List[int]) – List of rater 1 scores

  • score2 (List[int]) – List of rater 2 scores

  • tolerance (int) – Difference in scores that is acceptable. Defaults to 0.

Returns:

agreement_value – The percentage agreement between the two scores.

Return type:

float

rsmtool.utils.metrics.difference_of_standardized_means(y_true_observed: ndarray, y_pred: ndarray, population_y_true_observed_mn: float | None = None, population_y_pred_mn: float | None = None, population_y_true_observed_sd: float | None = None, population_y_pred_sd: float | None = None, ddof: int = 1) float | None[source]

Calculate the difference between standardized means.

First, standardize both observed and predicted scores to z-scores using mean and standard deviation for the whole population. Then calculate differences between standardized means for each subgroup.

Parameters:
  • y_true_observed (numpy.ndarray) – The observed scores for the group or subgroup.

  • y_pred (numpy.ndarray) – The predicted scores for the group or subgroup.

  • population_y_true_observed_mn (Optional[float]) – The population true score mean. When the DSM is being calculated for a subgroup, this should be the mean for the whole population. Defaults to None.

  • population_y_pred_mn (Optional[float]) – The predicted score mean. When the DSM is being calculated for a subgroup, this should be the mean for the whole population. Defaults to None.

  • population_y_true_observed_sd (Optional[float]) – The population true score standard deviation. When the DSM is being calculated for a subgroup, this should be the standard deviation for the whole population. Defaults to None.

  • population_y_pred_sd (Optional[float]) – The predicted score standard deviation. When the DSM is being calculated for a subgroup, this should be the standard deviation for the whole population. Defaults to None.

  • ddof (int) – The delta degrees of freedom. The divisor used in calculations is N - ddof where N represents the number of elements. Defaults to 1.

Returns:

difference_of_std_means – The difference of standardized means.

Return type:

Optional[float]

Raises:
  • ValueError – If only one of population_y_true_observed_mn and population_y_true_observed_sd is not None.

  • ValueError – If only one of population_y_pred_mn and population_y_pred_sd is not None.

rsmtool.utils.metrics.partial_correlations(df: DataFrame) DataFrame[source]

Implement the R pcor function from ppcor package in Python.

This computes partial correlations of each pair of variables in the given data frame df, excluding all other variables.

Parameters:

df (pandas.DataFrame) – Data frame containing the feature values.

Returns:

df_pcor – Data frame containing the partial correlations of of each pair of variables in the given data frame df, excluding all other variables.

Return type:

pandas.DataFrame

rsmtool.utils.metrics.quadratic_weighted_kappa(y_true_observed: ndarray, y_pred: ndarray, ddof: int = 0) float[source]

Calculate quadratic-weighted kappa for both discrete and continuous values.

The formula to compute quadratic-weighted kappa for continuous values was developed at ETS by Shelby Haberman. See Haberman (2019) for the full derivation. The discrete case is simply treated as a special case of the continuous one.

The formula is as follows:

\(QWK=\\displaystyle\\frac{2*Cov(M,H)}{Var(H)+Var(M)+(\\bar{M}-\\bar{H})^2}\), where

  • \(Cov\) - covariance with normalization by \(N\) (total number of observations)

  • \(H\) - the human score

  • \(M\) - the system score

  • \(\\bar{H}\) - mean of \(H\)

  • \(\\bar{M}\) - mean of \(M\)

  • \(Var(X)\) - variance of X

Parameters:
  • y_true_observed (numpy.ndarray) – The observed scores.

  • y_pred (numpy.ndarray) – The predicted scores.

  • ddof (int) – Means Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. When ddof is set to zero, the results for discrete case match those from the standard implementations. Defaults to 0.

Returns:

kappa – The quadratic weighted kappa

Return type:

float

Raises:

AssertionError – If the number of elements in y_true_observed is not equal to the number of elements in y_pred.

rsmtool.utils.metrics.standardized_mean_difference(y_true_observed: ndarray, y_pred: ndarray, population_y_true_observed_sd: float | None = None, population_y_pred_sd: float | None = None, method: str = 'unpooled', ddof: int = 1) float[source]

Compute the standardized mean difference between system and human scores.

The numerator is calculated as mean(y_pred) - mean(y_true_observed) for all of the available methods.

Parameters:
  • y_true_observed (numpy.ndarray) – The observed scores for the group or subgroup.

  • y_pred (numpy.ndarray) – The predicted scores for the group or subgroup.

  • population_y_true_observed_sd (Optional[float]) – The population true score standard deviation. When the SMD is being calculated for a subgroup, this should be the standard deviation for the whole population. Defaults to None.

  • population_y_pred_sd (Optional[float]) – The predicted score standard deviation. When the SMD is being calculated for a subgroup, this should be the standard deviation for the whole population. Defaults to None.

  • method (str) –

    The SMD method to use. Possible options are:

    • "williamson": Denominator is the pooled population standard deviation of y_true_observed and y_pred computed using population_y_true_observed_sd and population_y_pred_sd.

    • "johnson": Denominator is population_y_true_observed_sd.

    • ”pooled”: Denominator is the pooled standard deviation of y_true_observed and y_pred for this group.

    • "unpooled": Denominator is the standard deviation of y_true_observed for this group.

    Defaults to "unpooled".

  • ddof (int) – The delta degrees of freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. Defaults to 1.

Returns:

smd – The SMD for the given group or subgroup.

Return type:

float

Raises:
  • ValueError – If method is “williamson” and either population_y_true_observed_sd or population_y_pred_sd is None.

  • ValueError – If method is “johnson” and population_y_true_observed_sd is None.

  • ValueError – If method is not one of {“unpooled”, “pooled”, “williamson”, “johnson”}.

Note

  • The “williamson” implementation was recommended by Williamson, et al. (2012).

  • The metric is only applicable when both sets of scores are on the same scale.

rsmtool.utils.metrics.compute_expected_scores_from_model(model: Learner, featureset: FeatureSet, min_score: float, max_score: float) ndarray[source]

Compute expected scores using probability distributions over labels.

This function only works with SKLL models.

Parameters:
  • model (skll.learner.Learner) – The SKLL learner object to use for computing the expected scores.

  • featureset (skll.data.FeatureSet) – The SKLL featureset object for which predictions are to be made.

  • min_score (float) – Minimum score level to be used for computing expected scores.

  • max_score (float) – Maximum score level to be used for computing expected scores.

Returns:

expected_scores – A numpy array containing the expected scores.

Return type:

numpy.ndarray

Raises:
  • ValueError – If the given model cannot predict probability distributions.

  • ValueError – If the score range specified by min_score and max_score does not match what the model predicts in its probability distribution.

rsmtool.utils.notebook.get_thumbnail_as_html(path_to_image: str, image_id: int, path_to_thumbnail: str | None = None) str[source]

Generate HTML for a clickable thumbnail of given image.

Given the path to an image file, generate the HTML for a clickable thumbnail version of the image. When clicked, this HTML will open the full-sized version of the image in a new window.

Parameters:
  • path_to_image (str) – The absolute or relative path to the image. If an absolute path is provided, it will be converted to a relative path.

  • image_id (int) – The id of the <img> tag in the HTML. This must be unique for each <img> tag.

  • path_to_thumbnail (Optional[str]) – If you would like to use a different thumbnail image, specify the path to this thumbnail. Defaults to None.

Returns:

image – The HTML string generated for the image.

Return type:

str

Raises:

FileNotFoundError – If the image file cannot be located.

rsmtool.utils.notebook.show_thumbnail(path_to_image: str, image_id: int, path_to_thumbnail: str | None = None) None[source]

Display the HTML for an image thumbnail in a Jupyter notebook.

Given the path to an image file, generate the HTML for its thumbnail and display it in the notebook.

Parameters:
  • path_to_image (str) – The absolute or relative path to the image. If an absolute path is provided, it will be converted to a relative path.

  • image_id (int) – The id of the <img> tag in the HTML. This must be unique for each <img> tag.

  • path_to_thumbnail (Optional[str]) – If you would like to use a different thumbnail image, specify the path to the thumbnail. Defaults to None.

Return type:

None

rsmtool.utils.files.parse_json_with_comments(pathlike: str | Path) Dict[str, Any][source]

Parse a JSON file after removing any comments.

Comments can use either // for single-line comments or or /* ... */ for multi-line comments. The input filepath can be a string or pathlib.Path.

Parameters:

filename (Union[str, Path]) – Path to the input JSON file either as a string or as a pathlib.Path object.

Returns:

obj – JSON object representing the input file.

Return type:

Dict[str, Any]

From writer Module

Class for writing DataContainer frames to disk.

author:

Jeremy Biggs (jbiggs@ets.org)

author:

Anastassia Loukina (aloukina@ets.org)

author:

Nitin Madnani (nmadnani@ets.org)

organization:

ETS

class rsmtool.writer.DataWriter(experiment_id: str | None = None, context: str | None = None, wandb_run: Run | RunDisabled | None = None)[source]

Bases: object

Class to write out DataContainer objects.

Initialize the DataWriter object.

Parameters:
  • experiment_id (Optional[str]) – The experiment name to be used in the output file names. Defaults to None.

  • context (Optional[str]) – The context in which this writer is used. Defaults to None.

  • wandb_run (Union[wandb.wandb_run.Run, wandb.sdk.lib.RunDisabled, None]) – The wandb run object if wandb is enabled, None otherwise. If enabled, all the output data frames will be logged to this run as tables. Defaults to None.

write_experiment_output(csvdir: str, container_or_dict: DataContainer | Dict[str, DataFrame], dataframe_names: List[str] | None = None, new_names_dict: Dict[str, str] | None = None, include_experiment_id: bool = True, reset_index: bool = False, file_format: str = 'csv', index: bool = False, **kwargs) None[source]

Write out each of the named frames to disk.

This function writes out each of the given list of data frames as a “.csv”, “.tsv”, or .xlsx file in the given directory. Each data frame was generated as part of running an RSMTool experiment. All files are prefixed with the given experiment ID and suffixed with either the name of the data frame in the DataContainer (or dict) object, or a new name if new_names_dict is specified. Additionally, the indexes in the data frames are reset if so specified.

Parameters:
  • csvdir (str) – Path to the output experiment sub-directory that will contain the CSV files corresponding to each of the data frames.

  • container_or_dict (Union[container.DataContainer, Dict[str, pd.DataFrame]]) – A DataContainer object or dict, where keys are data frame names and values are pandas.DataFrame objects.

  • dataframe_names (Optional[List[str]]) – List of data frame names, one for each of the data frames. Defaults to None.

  • new_names_dict (Optional[Dict[str, str]]) – New dictionary with new names for the data frames, if desired. Defaults to None.

  • include_experiment_id (bool) – Whether to include the experiment ID in the file name. Defaults to True.

  • reset_index (bool) – Whether to reset the index of each data frame before writing to disk. Defaults to False.

  • file_format (str) – The file format in which to output the data. One of {"csv", "xlsx", "tsv"}. Defaults to "csv".

  • index (bool) – Whether to include the index in the output file. Defaults to False.

Raises:

KeyError – If file_format is not valid, or a named data frame is not present in container_or_dict.

Return type:

None

write_feature_csv(featuredir: str, data_container: DataContainer, selected_features: List[str], include_experiment_id: bool = True, file_format: str = 'csv') None[source]

Write out the selected features to disk.

Parameters:
  • featuredir (str) – Path to the experiment output directory where the feature JSON file will be saved.

  • data_container (DataContainer) – A data container object.

  • selected_features (List[str]) – List of features that were selected for model building.

  • include_experiment_id (bool) – Whether to include the experiment ID in the file name. Defaults to True.

  • file_format (str) – The file format in which to output the data. One of {"csv", "tsv", "xlsx"}. Defaults to "csv".

Return type:

None

static write_frame_to_file(df: DataFrame, name_prefix: str, file_format: str = 'csv', index: bool = False, **kwargs) None[source]

Write given data frame to disk with given name and file format.

Parameters:
  • df (pandas.DataFrame) – Data frame to write to disk

  • name_prefix (str) – The complete prefix for the file to be written to disk. This includes everything except the extension.

  • file_format (str) – The file format (extension) for the file to be written to disk. One of {"csv", "xlsx", "tsv"}. Defaults to "csv".

  • index (bool) – Whether to include the index in the output file. Defaults to False.

Raises:

KeyError – If file_format is not valid.

Return type:

None