decision_rules.survival
decision_rules.survival.kaplan_meier
- class decision_rules.survival.kaplan_meier.KaplanMeierEstimator(surv_info: SurvInfo | None = None)
Bases:
object- static average(estimators: list[KaplanMeierEstimator]) KaplanMeierEstimator
- binary_search(arr, target)
- calcualte_indicators() tuple[float, float]
- calculate_bounds(times: array, probabilities: array, cumulative_sq: array, alpha=0.05) DataFrame
- calculate_interval() DataFrame
- calculate_median_survival_time(survival_function: DataFrame) float | DataFrame
- static compare_estimators(kme1: KaplanMeierEstimator, kme2: KaplanMeierEstimator)
- fit(survival_time: ndarray, survival_status: ndarray, skip_sorting: bool = False) KaplanMeierEstimator
Fit Kaplan Meier estimator on given data
- Parameters:
survival_time (np.ndarray) – survival time data
survival_status (np.ndarray) – survival status data
skip_sorting (bool, optional) – Flag allowing to optionally skip sorting based on survival time. It could be used to speed up the computation if the provided data is already sorted ascending by survival time. Defaults to False (this method will sort the data under the hood).
- Returns:
fitted estimator
- Return type:
- get_at_risk_count_at(time: int) int
- get_dict() KaplanMeierEstimatorDict
- get_events_count_at(time: int) int
- get_probability_at(time: int) float
- static log_rank(survival_time: ndarray, survival_status: ndarray, covered_examples: ndarray, uncovered_examples: ndarray) float
- qth_survival_time(q: float, survival_function: DataFrame | Series) float
Returns the time when a single survival function reaches the qth percentile, that is, solves \(q = S(t)\) for \(t\).
- qth_survival_times(q: float, survival_functions: DataFrame) float | DataFrame
Find the times when one or more survival functions reach the qth percentile.
- reverse() KaplanMeierEstimator
- update(kaplan_meier_estimator_dict: KaplanMeierEstimatorDict, update_additional_indicators: bool = False) KaplanMeierEstimator
- class decision_rules.survival.kaplan_meier.KaplanMeierEstimatorDict
Bases:
TypedDict- at_risk_count: ndarray
- censored_count: ndarray
- events_count: ndarray
- probabilities: ndarray
- times: ndarray
- class decision_rules.survival.kaplan_meier.SurvInfo(time: ndarray, events_count: ndarray, censored_count: ndarray, at_risk_count: ndarray, probability: ndarray)
Bases:
object
decision_rules.survival.metrics
Contains class for calculating rule metrics for survival rules
- class decision_rules.survival.metrics.SurvivalRulesMetrics(rules: list[AbstractRule])
Bases:
AbstractRulesMetricsClass for calculating rule metrics for survival rules
- calculate_p_value(coverage: Coverage | None = None, rule: SurvivalRule | None = None, y: ndarray | None = None) float
Abstract method to calculate p-value
- Parameters:
coverage (Optional[Coverage], optional) – Coverage object for classification rules. Defaults to None.
rule (Optional[RegressionRule], optional) – The rule from regression ruleset for which p-value is to be calculated.. Defaults to None.
y (Optional[np.ndarray], optional) – Target labels for regression rules. Defaults to None.
- get_metrics_calculator(rule: SurvivalRule, X: DataFrame, y: Series) dict[str, Callable[[], Any]]
Returns metrics calculator object in a form of dictionary where values are the non-parmetrized callables calculating specified metrics and keys are the names of those metrics.
Examples
>>> { >>> 'p': lambda: rule.coverage.p, >>> 'n': lambda: rule.coverage.n, >>> 'P': lambda: rule.coverage.P, >>> 'N': lambda: rule.coverage.N, >>> 'coverage': lambda: measures.coverage(rule.coverage), >>> ... >>> }
- Parameters:
rule (AbstractRule) – rule
X (pd.DataFrame) – data
y (pd.Series) – labels
- Returns:
metrics calculator object
- Return type:
dict[str, Callable[[], Any]]
- property supported_metrics: list[str]
Returns: list[str]: list of names of all supported metrics
decision_rules.survival.prediction
- class decision_rules.survival.prediction.BestRulePredictionStrategy(rules: list[AbstractRule], default_conclusion: AbstractConclusion)
Bases:
BestRulePredictionStrategyBest rule prediction strategy for survival prediction.
- class decision_rules.survival.prediction.SurvivalPrediction
Bases:
TypedDictObject describing survival prediction. It contains times and probabilities of the predicted Kaplan-Meier curve. It also contains median survival time.
- static from_kaplan_meier(km: KaplanMeierEstimator | None) SurvivalPrediction | None
- median_survival_time: float
- probabilities: ndarray
- times: ndarray
- to_kaplan_meier() KaplanMeierEstimator | None
- class decision_rules.survival.prediction.VotingPredictionStrategy(rules: list[AbstractRule], default_conclusion: AbstractConclusion)
Bases:
PredictionStrategyVoting prediction strategy for survival prediction.
Based on article: Wróbel et al. Learning rule sets from survival data BMC Bioinformatics (2017) 18:285 Page 5 of 13 The learned rule set can be applied for an estimation of the survival function of new observations based on the values taken by their covariates. The estimation is performed by rules covering given observation. If observation is not covered by any of the rules then it has assigned the default survival estimate computed on the entire train ing set. Otherwise, final survival estimate is calculated as an average of survival estimates of all rules covering the observation
decision_rules.survival.prediction_indicators
- class decision_rules.survival.prediction_indicators.SurvivalGeneralPredictionIndicators
Bases:
TypedDict- Covered_by_prediction: int
- Not_covered_by_prediction: int
- ibs: float
- class decision_rules.survival.prediction_indicators.SurvivalPredictionIndicators
Bases:
TypedDict- general: SurvivalGeneralPredictionIndicators
- type_of_problem: str
- decision_rules.survival.prediction_indicators.calculate_for_survival(ruleset: SurvivalRuleSet, X: DataFrame, y_true: ndarray, y_pred: ndarray, calculate_only_for_covered_examples: bool = False) SurvivalRuleSet
Calculate prediction indicators for survival problem.
- Parameters:
ruleset (SurvivalRuleSet) – ruleset
X (pd.DataFrame) – Dataset
y_true (np.ndarray) – Survival status column
y_pred (np.ndarray) – Array containing the predicted class labels.
calculate_only_for_covered_examples (bool, optional) – If true, it will calculate indicators only for the examples where prediction was not empty. Otherwise, it will calculate indicators for all the examples. Defaults to False.
- Returns:
A dictionary containing prediction indicators
- Return type:
decision_rules.survival.rule
Contains survival rule and conclusion classes.
- class decision_rules.survival.rule.SurvivalConclusion(value: float, column_name: str, fixed: bool = False)
Bases:
AbstractConclusionConclusion part of the survival rule
- Parameters:
AbstractConclusion (_type_)
- property estimator: KaplanMeierEstimator
Returns: KaplanMeierEstimator: KaplanMeierEstimator
- is_empty() bool
Returns whether conclusion is empty or not.
- static make_empty(column_name: str) SurvivalConclusion
Creates empty conclusion. Use it when you don’t want to use default conclusion during prediction.
- Parameters:
column_name (str) – decision column name
- Returns:
empty conclusion
- Return type:
- positives_mask(y: ndarray) ndarray
Calculates positive examples mask
- Parameters:
y (np.ndarray)
- Returns:
- 1 dimensional numpy array of booleans specifying
whether given examples are consistent with the conclusion.
- Return type:
np.ndarray
- class decision_rules.survival.rule.SurvivalRule(premise: AbstractCondition, conclusion: SurvivalConclusion, column_names: list[str], survival_time_attr: str = None)
Bases:
AbstractRuleSurvival decision rule.
- calculate_coverage(X: ndarray, y: ndarray = None, P: int = None, N: int = None, **kwargs) Coverage
- Parameters:
X (np.ndarray)
y (np.ndarray, optional) – if None then P and N params should be passed. Defaults to None.
P (int, optional) – optional number of all examples from rule decison class. Defaults to None.
N (int, optional) – optional number of all examples not from rule decison class. Defaults to None.
- Raises:
ValueError – if y is None and either P or N is None too
- Returns:
rule coverage
- Return type:
- get_coverage_dict() dict
- set_survival_time_attr(survival_time_attr: str)
decision_rules.survival.ruleset
Contains survival ruleset class.
- class decision_rules.survival.ruleset.SurvivalRuleSet(rules: list[SurvivalRule], survival_time_attr: str)
Bases:
AbstractRuleSetSurvival ruleset allowing to perform prediction on data
- calculate_attribute_importances(condition_importances: dict[str, float]) dict[str, float]
Calculate importances of attriubtes in RuleSet based on conditions importances
- Parameters:
Union[dict[str (condition_importances) – condition importances
float] – condition importances
dict[str – condition importances
dict[str – condition importances
float]]] – condition importances
- Returns:
- attribute importances, in the case of classification additionally
returns information about class dict[str, dict[str, float]]:
- Return type:
dict[str, float]
- calculate_condition_importances(X: DataFrame, y: Series, *args) dict[str, float]
Calculate importances of conditions in RuleSet
- Parameters:
X (pd.DataFrame)
y (pd.Series)
measure (Callable[[Coverage], float]) – measure used to count importance
- Returns:
condition importances, in the case of classification additionally returns information about class dict[str, dict[str, float]]:
- Return type:
dict[str, float]
- calculate_rules_metrics(X: DataFrame, y: Series, metrics_to_calculate: list[str] | None = None) dict[dict[str, str, float]]
Calculate rules metrics for each rule such as precision, coverage, TP, FP etc. This method should be called after updating or calculating rules coverages.
- Parameters:
X (pd.DataFrame)
y (pd.Series)
metrics_to_calculate (Optional[list[str]], optional) – list of metrics names to calculate. Defaults to None.
- Raises:
InvalidStateError – if rule’s coverage have not been calculated
- Returns:
metrics for each rule
- Return type:
dict[dict[str, str, float]]
- calculate_rules_weights(measure=None)
- Parameters:
measure – quality measure function, in case of Survival it is always log_rank or if not specified, then voting_weight is 1
- Raises:
ValueError – if any of the rules in ruleset has uncalculated coverage
- get_default_prediction_strategy_class() Type[PredictionStrategy]
Returns default prediction strategy class used when user doesn’t specify any.
- Returns:
- class implementing PredictionStrategy
interface
- Return type:
Type[PredictionStrategy]
- get_metrics_object_instance() AbstractRulesMetrics
Returns metrics object instance.
- integrated_bier_score(X: DataFrame, y: Series, y_pred: ndarray | None = None) float
Calculate Integrated Brier Score (IBS)
- Parameters:
X (pd.DataFrame) – dataset
y (pd.Series) – survival status column
y_pred (Optional[np.ndarray], optional) – Model predictions. If not provided, this method will perform prediction on the provided dataset. Defaults to None.
- Returns:
Integrated Brier Score value
- Return type:
float
- local_explainability(x: Series) tuple[list[str], SurvivalPrediction]
Calculate local explainability of ruleset for given instance.
- Parameters:
x (pd.Series) – Instance to explain
- Returns:
list of rules uuid’s covering instance SurvivalPrediction: Kaplan-Meier estimate of examples covered by rules
- Return type:
list
- property prediction_strategies_choice: dict[str, Type[PredictionStrategy]]
Specifies prediction strategies available for this model.
- Returns:
- Dictionary containing available prediction
strategies. Keys are prediction strategies names and values are classes implementing PredictionStrategy interface for this model.
- Return type:
dict[str, Type[PredictionStrategy]]
- update(X_train: DataFrame, y_train: Series, _measure=None) ndarray
Updates ruleset using training dataset. This method should be called both after creation of new ruleset or after manipulating any of its rules or internal conditions. This method recalculates rules coverages and voting weights making it ready for prediction
- Parameters:
X_train (pd.DataFrame)
y_train (pd.Series)
measure (Callable[[Coverage], float]) – voting measure function
- Raises:
ValueError – if called on empty ruleset with no rules
- update_using_coverages(coverages_info: dict[str, SurvivalCoverageInfodict], columns_names: list[str] = None, *args, **kwargs)