decision_rules.survival

decision_rules.survival.kaplan_meier

class decision_rules.survival.kaplan_meier.KaplanMeierEstimator(surv_info: SurvInfo | None = None)

Bases: object

static average(estimators: list[KaplanMeierEstimator]) KaplanMeierEstimator
calcualte_indicators() tuple[float, float]
calculate_bounds(times: array, probabilities: array, cumulative_sq: array, alpha=0.05) DataFrame
calculate_interval() DataFrame
calculate_median_survival_time(survival_function: DataFrame) float | DataFrame
calculate_probabilities(surv_info: SurvInfo) SurvInfo
static compare_estimators(kme1: KaplanMeierEstimator, kme2: KaplanMeierEstimator)
fit(survival_time: ndarray, survival_status: ndarray, skip_sorting: bool = False) KaplanMeierEstimator

Fit Kaplan Meier estimator on given data

Parameters:
  • survival_time (np.ndarray) – survival time data

  • survival_status (np.ndarray) – survival status data

  • skip_sorting (bool, optional) – Flag allowing to optionally skip sorting based on survival time. It could be used to speed up the computation if the provided data is already sorted ascending by survival time. Defaults to False (this method will sort the data under the hood).

Returns:

fitted estimator

Return type:

KaplanMeierEstimator

get_at_risk_count_at(time: int) int
get_dict() KaplanMeierEstimatorDict
get_events_count_at(time: int) int
get_probability_at(time: int) float
static log_rank(survival_time: ndarray, survival_status: ndarray, covered_examples: ndarray, uncovered_examples: ndarray) float
process_surv_info(surv_info: SurvInfo)
qth_survival_time(q: float, survival_function: DataFrame | Series) float

Returns the time when a single survival function reaches the qth percentile, that is, solves \(q = S(t)\) for \(t\).

qth_survival_times(q: float, survival_functions: DataFrame) float | DataFrame

Find the times when one or more survival functions reach the qth percentile.

reverse() KaplanMeierEstimator
property surv_info: SurvInfo
update(kaplan_meier_estimator_dict: KaplanMeierEstimatorDict, update_additional_indicators: bool = False) KaplanMeierEstimator
class decision_rules.survival.kaplan_meier.KaplanMeierEstimatorDict

Bases: TypedDict

at_risk_count: ndarray
censored_count: ndarray
events_count: ndarray
probabilities: ndarray
times: ndarray
class decision_rules.survival.kaplan_meier.SurvInfo(time: ndarray, events_count: ndarray, censored_count: ndarray, at_risk_count: ndarray, probability: ndarray)

Bases: object

decision_rules.survival.metrics

Contains class for calculating rule metrics for survival rules

class decision_rules.survival.metrics.SurvivalRulesMetrics(rules: list[AbstractRule])

Bases: AbstractRulesMetrics

Class for calculating rule metrics for survival rules

calculate_p_value(coverage: Coverage | None = None, rule: SurvivalRule | None = None, y: ndarray | None = None) float

Abstract method to calculate p-value

Parameters:
  • coverage (Optional[Coverage], optional) – Coverage object for classification rules. Defaults to None.

  • rule (Optional[RegressionRule], optional) – The rule from regression ruleset for which p-value is to be calculated.. Defaults to None.

  • y (Optional[np.ndarray], optional) – Target labels for regression rules. Defaults to None.

get_metrics_calculator(rule: SurvivalRule, X: DataFrame, y: Series) dict[str, Callable[[], Any]]

Returns metrics calculator object in a form of dictionary where values are the non-parmetrized callables calculating specified metrics and keys are the names of those metrics.

Examples

>>> {
>>>  'p': lambda: rule.coverage.p,
>>>  'n': lambda: rule.coverage.n,
>>>  'P': lambda: rule.coverage.P,
>>>  'N': lambda: rule.coverage.N,
>>>  'coverage': lambda: measures.coverage(rule.coverage),
>>>  ...
>>> }
Parameters:
  • rule (AbstractRule) – rule

  • X (pd.DataFrame) – data

  • y (pd.Series) – labels

Returns:

metrics calculator object

Return type:

dict[str, Callable[[], Any]]

property supported_metrics: list[str]

Returns: list[str]: list of names of all supported metrics

decision_rules.survival.prediction

class decision_rules.survival.prediction.BestRulePredictionStrategy(rules: list[AbstractRule], default_conclusion: AbstractConclusion)

Bases: BestRulePredictionStrategy

Best rule prediction strategy for survival prediction.

class decision_rules.survival.prediction.SurvivalPrediction

Bases: TypedDict

Object describing survival prediction. It contains times and probabilities of the predicted Kaplan-Meier curve. It also contains median survival time.

static from_kaplan_meier(km: KaplanMeierEstimator | None) SurvivalPrediction | None
median_survival_time: float
probabilities: ndarray
times: ndarray
to_kaplan_meier() KaplanMeierEstimator | None
class decision_rules.survival.prediction.VotingPredictionStrategy(rules: list[AbstractRule], default_conclusion: AbstractConclusion)

Bases: PredictionStrategy

Voting prediction strategy for survival prediction.

Based on article: Wróbel et al. Learning rule sets from survival data BMC Bioinformatics (2017) 18:285 Page 5 of 13 The learned rule set can be applied for an estimation of the survival function of new observations based on the values taken by their covariates. The estimation is performed by rules covering given observation. If observation is not covered by any of the rules then it has assigned the default survival estimate computed on the entire train ing set. Otherwise, final survival estimate is calculated as an average of survival estimates of all rules covering the observation

decision_rules.survival.prediction_indicators

class decision_rules.survival.prediction_indicators.SurvivalGeneralPredictionIndicators

Bases: TypedDict

Covered_by_prediction: int
Not_covered_by_prediction: int
ibs: float
class decision_rules.survival.prediction_indicators.SurvivalPredictionIndicators

Bases: TypedDict

general: SurvivalGeneralPredictionIndicators
type_of_problem: str
decision_rules.survival.prediction_indicators.calculate_for_survival(ruleset: SurvivalRuleSet, X: DataFrame, y_true: ndarray, y_pred: ndarray, calculate_only_for_covered_examples: bool = False) SurvivalRuleSet

Calculate prediction indicators for survival problem.

Parameters:
  • ruleset (SurvivalRuleSet) – ruleset

  • X (pd.DataFrame) – Dataset

  • y_true (np.ndarray) – Survival status column

  • y_pred (np.ndarray) – Array containing the predicted class labels.

  • calculate_only_for_covered_examples (bool, optional) – If true, it will calculate indicators only for the examples where prediction was not empty. Otherwise, it will calculate indicators for all the examples. Defaults to False.

Returns:

A dictionary containing prediction indicators

Return type:

SurvivalPredictionIndicators

decision_rules.survival.rule

Contains survival rule and conclusion classes.

class decision_rules.survival.rule.SurvivalConclusion(value: float, column_name: str, fixed: bool = False)

Bases: AbstractConclusion

Conclusion part of the survival rule

Parameters:

AbstractConclusion (_type_)

property estimator: KaplanMeierEstimator

Returns: KaplanMeierEstimator: KaplanMeierEstimator

is_empty() bool

Returns whether conclusion is empty or not.

static make_empty(column_name: str) SurvivalConclusion

Creates empty conclusion. Use it when you don’t want to use default conclusion during prediction.

Parameters:

column_name (str) – decision column name

Returns:

empty conclusion

Return type:

AbstractConclusion

positives_mask(y: ndarray) ndarray

Calculates positive examples mask

Parameters:

y (np.ndarray)

Returns:

1 dimensional numpy array of booleans specifying

whether given examples are consistent with the conclusion.

Return type:

np.ndarray

class decision_rules.survival.rule.SurvivalRule(premise: AbstractCondition, conclusion: SurvivalConclusion, column_names: list[str], survival_time_attr: str = None)

Bases: AbstractRule

Survival decision rule.

calculate_coverage(X: ndarray, y: ndarray = None, P: int = None, N: int = None, **kwargs) Coverage
Parameters:
  • X (np.ndarray)

  • y (np.ndarray, optional) – if None then P and N params should be passed. Defaults to None.

  • P (int, optional) – optional number of all examples from rule decison class. Defaults to None.

  • N (int, optional) – optional number of all examples not from rule decison class. Defaults to None.

Raises:

ValueError – if y is None and either P or N is None too

Returns:

rule coverage

Return type:

Coverage

get_coverage_dict() dict
set_survival_time_attr(survival_time_attr: str)

decision_rules.survival.ruleset

Contains survival ruleset class.

class decision_rules.survival.ruleset.SurvivalRuleSet(rules: list[SurvivalRule], survival_time_attr: str)

Bases: AbstractRuleSet

Survival ruleset allowing to perform prediction on data

calculate_attribute_importances(condition_importances: dict[str, float]) dict[str, float]

Calculate importances of attriubtes in RuleSet based on conditions importances

Parameters:
  • Union[dict[str (condition_importances) – condition importances

  • float] – condition importances

  • dict[str – condition importances

  • dict[str – condition importances

  • float]]] – condition importances

Returns:

attribute importances, in the case of classification additionally

returns information about class dict[str, dict[str, float]]:

Return type:

dict[str, float]

calculate_condition_importances(X: DataFrame, y: Series, *args) dict[str, float]

Calculate importances of conditions in RuleSet

Parameters:
  • X (pd.DataFrame)

  • y (pd.Series)

  • measure (Callable[[Coverage], float]) – measure used to count importance

Returns:

condition importances, in the case of classification additionally returns information about class dict[str, dict[str, float]]:

Return type:

dict[str, float]

calculate_rules_metrics(X: DataFrame, y: Series, metrics_to_calculate: list[str] | None = None) dict[dict[str, str, float]]

Calculate rules metrics for each rule such as precision, coverage, TP, FP etc. This method should be called after updating or calculating rules coverages.

Parameters:
  • X (pd.DataFrame)

  • y (pd.Series)

  • metrics_to_calculate (Optional[list[str]], optional) – list of metrics names to calculate. Defaults to None.

Raises:

InvalidStateError – if rule’s coverage have not been calculated

Returns:

metrics for each rule

Return type:

dict[dict[str, str, float]]

calculate_rules_weights(measure=None)
Parameters:

measure – quality measure function, in case of Survival it is always log_rank or if not specified, then voting_weight is 1

Raises:

ValueError – if any of the rules in ruleset has uncalculated coverage

get_default_prediction_strategy_class() Type[PredictionStrategy]

Returns default prediction strategy class used when user doesn’t specify any.

Returns:

class implementing PredictionStrategy

interface

Return type:

Type[PredictionStrategy]

get_metrics_object_instance() AbstractRulesMetrics

Returns metrics object instance.

integrated_bier_score(X: DataFrame, y: Series, y_pred: ndarray | None = None) float

Calculate Integrated Brier Score (IBS)

Parameters:
  • X (pd.DataFrame) – dataset

  • y (pd.Series) – survival status column

  • y_pred (Optional[np.ndarray], optional) – Model predictions. If not provided, this method will perform prediction on the provided dataset. Defaults to None.

Returns:

Integrated Brier Score value

Return type:

float

local_explainability(x: Series) tuple[list[str], SurvivalPrediction]

Calculate local explainability of ruleset for given instance.

Parameters:

x (pd.Series) – Instance to explain

Returns:

list of rules uuid’s covering instance SurvivalPrediction: Kaplan-Meier estimate of examples covered by rules

Return type:

list

property prediction_strategies_choice: dict[str, Type[PredictionStrategy]]

Specifies prediction strategies available for this model.

Returns:

Dictionary containing available prediction

strategies. Keys are prediction strategies names and values are classes implementing PredictionStrategy interface for this model.

Return type:

dict[str, Type[PredictionStrategy]]

update(X_train: DataFrame, y_train: Series, _measure=None) ndarray

Updates ruleset using training dataset. This method should be called both after creation of new ruleset or after manipulating any of its rules or internal conditions. This method recalculates rules coverages and voting weights making it ready for prediction

Parameters:
  • X_train (pd.DataFrame)

  • y_train (pd.Series)

  • measure (Callable[[Coverage], float]) – voting measure function

Raises:

ValueError – if called on empty ruleset with no rules

update_using_coverages(coverages_info: dict[str, SurvivalCoverageInfodict], columns_names: list[str] = None, *args, **kwargs)