Use rules in textual form

In this tutorial, we will load a set of regression rules in textual form and evaluate them

Load and prepare dataset

We begin by loading the boston-housing dataset into a DataFrame.

[24]:
import pandas as pd
BOSTON_HOUSING_PATH = (
    'https://raw.githubusercontent.com/ruleminer/decision-rules/'
    'refs/heads/docs/docs-src/source/tutorials/resources/boston-housing.csv'
)
boston_housing_df = pd.read_csv(BOSTON_HOUSING_PATH)
display(boston_housing_df)
print('Columns: ', boston_housing_df.columns.values)
X = boston_housing_df.drop("MEDV", axis=1)
y = boston_housing_df["MEDV"]
index CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT MEDV
0 0 0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1 296 15 396.90 4.98 24.0
1 1 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17 396.90 9.14 21.6
2 2 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17 392.83 4.03 34.7
3 3 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18 394.63 2.94 33.4
4 4 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18 396.90 5.33 36.2
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
501 501 0.06263 0 11.93 0 0.573 6.593 69.1 2.4786 1 273 21 391.99 9.67 22.4
502 502 0.04527 0 11.93 0 0.573 6.120 76.7 2.2875 1 273 21 396.90 9.08 20.6
503 503 0.06076 0 11.93 0 0.573 6.976 91.0 2.1675 1 273 21 396.90 5.64 23.9
504 504 0.10959 0 11.93 0 0.573 6.794 89.3 2.3889 1 273 21 393.45 6.48 22.0
505 505 0.04741 0 11.93 0 0.573 6.030 80.8 2.5050 1 273 21 396.90 7.88 11.9

506 rows × 15 columns

Columns:  ['index' 'CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX'
 'PTRATIO' 'B' 'LSTAT' 'MEDV']

Load the ruleset in textual form

Now we need to load the ruleset provided in a text file

[ ]:
import urllib

FILE_PATH: str = (
    'https://raw.githubusercontent.com/ruleminer/decision-rules/'
    'refs/heads/docs/docs-src/source/tutorials/resources/regression/text_ruleset.txt'
)

with urllib.request.urlopen(FILE_PATH) as response:
    text_rules_model = response.read().decode('utf-8').splitlines()

text_rules_model
['IF AGE >= 80.05 AND RM < 7.20 AND LSTAT >= 14.74 AND CRIM >= 1.06',
 'IF LSTAT >= 14.43 AND AGE >= 77.95 AND CRIM >= 0.24',
 'IF TAX >= 300.00 AND CRIM < 15.72 AND RM >= 5.06 AND LSTAT < 32.00 AND LSTAT >= 14.73',
 'IF RM < 6.45 AND RM >= 5.75 AND AGE < 91.05 AND LSTAT < 14.16',
 'IF RM < 6.59 AND B >= 198.44 AND LSTAT < 16.12 AND RM >= 5.64 AND DIS >= 1.15']

Convert the textual ruleset to a decision-rules model

Now that the rules are loaded, we convert them into a decision-rules model using the TextRulesetFactory from decision-rules library. This conversion enables us to evaluate and modify the ruleset programmatically.

[26]:
from decision_rules.ruleset_factories._factories.regression import TextRuleSetFactory

factory = TextRuleSetFactory()
ruleset = factory.make(text_rules_model, X, y)

After conversion in the decision-rules library, we can easilythe display the model

[27]:
for rule in ruleset.rules:
    print(rule)
IF AGE >= 80.05 AND RM < 7.20 AND LSTAT >= 14.74 AND CRIM >= 1.06 THEN MEDV = {13.13} [9.41, 16.84] (p=72, n=31, P=105, N=401)
IF LSTAT >= 14.43 AND AGE >= 77.95 AND CRIM >= 0.24 THEN MEDV = {14.08} [10.07, 18.10] (p=100, n=36, P=125, N=381)
IF TAX >= 300.00 AND CRIM < 15.72 AND RM >= 5.06 AND LSTAT < 32.00 AND LSTAT >= 14.73 THEN MEDV = {15.23} [11.46, 19.00] (p=89, n=32, P=139, N=367)
IF RM < 6.45 AND RM >= 5.75 AND AGE < 91.05 AND LSTAT < 14.16 THEN MEDV = {22.05} [18.49, 25.62] (p=129, n=11, P=227, N=279)
IF RM < 6.59 AND B >= 198.44 AND LSTAT < 16.12 AND RM >= 5.64 AND DIS >= 1.15 THEN MEDV = {21.75} [18.05, 25.45] (p=184, n=33, P=236, N=270)

Analyze the ruleset statistics

We can compute various metrics for the ruleset. This step involves retrieving statistical information about the rules.

We start by calculating and displaying the general characteristics of the ruleset

[28]:
ruleset_stats = ruleset.calculate_ruleset_stats(X, y)
print(ruleset_stats)
{'rules_count': 5, 'avg_conditions_count': 4.2, 'avg_precision': 0.79, 'avg_coverage': 0.69, 'total_conditions_count': 21}

Now let’s calculate metrics for each rule. To make the output more readable and easier to interpret, we will organize the metrics into a DataFrame

[29]:
rule_metrics = ruleset.calculate_rules_metrics(X, y)
rule_metrics_df = pd.DataFrame([
    {
        'Rule': f"r{i+1}",
        'p': metrics['p'],
        'n': metrics['n'],
        'P': metrics['P'],
        'N': metrics['N'],
        'unique_in_pos': metrics['unique_in_pos'],
        'unique_in_neg': metrics['unique_in_neg'],
        'p_unique': metrics['p_unique'],
        'n_unique': metrics['n_unique'],
        'all_unique': metrics['all_unique'],
        'support':round(metrics['support'],3),
        'conditions_count': metrics['conditions_count'],
        'y_covered_avg': round(metrics['y_covered_avg'],3),
        'y_covered_median': round(metrics['y_covered_median'],3),
        'y_covered_min': metrics['y_covered_min'],
        'y_covered_max': metrics['y_covered_max'],
        'mae': round(metrics['mae'],3),
        'rmse': round(metrics['rmse'],3),
        'mape': round(metrics['mape'],3),
        'p-value': round(metrics['p-value'],3)

    }
    for i, (_, metrics) in enumerate(rule_metrics.items())
])
display(rule_metrics_df)
Rule p n P N unique_in_pos unique_in_neg p_unique n_unique all_unique support conditions_count y_covered_avg y_covered_median y_covered_min y_covered_max mae rmse mape p-value
0 r1 72 31 105 401 3 6 0 0 0 0.204 4 13.128 13.40 5.0 27.5 10.096 13.148 0.398 0.0
1 r2 100 36 125 381 10 7 3 3 6 0.269 3 14.081 13.95 5.0 30.7 9.397 12.484 0.374 0.0
2 r3 89 32 139 367 13 18 10 11 21 0.239 5 15.233 14.90 6.3 27.5 8.650 11.735 0.351 0.0
3 r4 129 11 227 279 1 4 1 2 3 0.277 4 22.053 21.70 11.9 50.0 6.577 9.201 0.352 0.0
4 r5 184 33 236 270 53 25 43 15 58 0.429 5 21.747 21.40 11.9 50.0 6.549 9.222 0.346 0.0

We can also calculate statistics like condition importances

[30]:
from decision_rules.measures import c2
condition_importances = ruleset.calculate_condition_importances(X, y, measure=c2)
condition_importances
[30]:
[{'condition': 'LSTAT >= 14.43',
  'attributes': ['LSTAT'],
  'importance': 0.32767459373187646},
 {'condition': 'LSTAT >= 14.74',
  'attributes': ['LSTAT'],
  'importance': 0.16689142510435745},
 {'condition': 'LSTAT >= 14.73',
  'attributes': ['LSTAT'],
  'importance': 0.15660227492592682},
 {'condition': 'RM < 6.45',
  'attributes': ['RM'],
  'importance': 0.1464841812280583},
 {'condition': 'LSTAT < 14.16',
  'attributes': ['LSTAT'],
  'importance': 0.14473028867871895},
 {'condition': 'RM < 6.59',
  'attributes': ['RM'],
  'importance': 0.1174108026185524},
 {'condition': 'LSTAT < 16.12',
  'attributes': ['LSTAT'],
  'importance': 0.1090749271904711},
 {'condition': 'AGE < 91.05',
  'attributes': ['AGE'],
  'importance': 0.08332534928637518},
 {'condition': 'CRIM >= 1.06',
  'attributes': ['CRIM'],
  'importance': 0.07594939786770724},
 {'condition': 'AGE >= 77.95',
  'attributes': ['AGE'],
  'importance': 0.051180099750744834},
 {'condition': 'RM < 7.20',
  'attributes': ['RM'],
  'importance': 0.04416566101660634},
 {'condition': 'AGE >= 80.05',
  'attributes': ['AGE'],
  'importance': 0.03676886132555876},
 {'condition': 'CRIM < 15.72',
  'attributes': ['CRIM'],
  'importance': 0.030665695828604723},
 {'condition': 'RM >= 5.75',
  'attributes': ['RM'],
  'importance': 0.030479048373156292},
 {'condition': 'B >= 198.44',
  'attributes': ['B'],
  'importance': 0.028619349699875198},
 {'condition': 'RM >= 5.64',
  'attributes': ['RM'],
  'importance': 0.02736509389238543},
 {'condition': 'TAX >= 300.00',
  'attributes': ['TAX'],
  'importance': 0.02714594443889693},
 {'condition': 'CRIM >= 0.24',
  'attributes': ['CRIM'],
  'importance': 0.020564473495815715},
 {'condition': 'RM >= 5.06',
  'attributes': ['RM'],
  'importance': 0.01044814267795384},
 {'condition': 'LSTAT < 32.00',
  'attributes': ['LSTAT'],
  'importance': 0.003366620183006248},
 {'condition': 'DIS >= 1.15',
  'attributes': ['DIS'],
  'importance': 0.003149404172213438}]

Modify the ruleset

The decision-rule model can be easily edited. For example, we will create a new rule stating “IF RM < 6.95 AND TAX >= 219.00 AND LSTAT < 14.17 THEN MEDV = 23.35” and then add it to the ruleset.

[31]:
from decision_rules.regression.rule import RegressionConclusion
from decision_rules.regression.rule import RegressionRule
from decision_rules.conditions import ElementaryCondition, CompoundCondition

rule = RegressionRule(
    premise=CompoundCondition(
        subconditions=[
            # Condition:  RM < 6.95
            ElementaryCondition(
                column_index=X.columns.get_loc('RM'),
                left=float('-inf'),
                right=6.95,
                left_closed=False,
                right_closed=False
            ),
            # Condition: TAX >= 219.00
            ElementaryCondition(
                column_index=X.columns.get_loc('TAX'),
                left=219.00,
                right=float('inf'),
                left_closed=True,
                right_closed=False
            ),
            # Condition: LSTAT < 14.17
            ElementaryCondition(
                column_index=X.columns.get_loc('LSTAT'),
                left=float('-inf'),
                right=14.17,
                left_closed=False,
                right_closed=False
            ),
        ]
    ),
    conclusion=RegressionConclusion(
        value=23.35,
        column_name='MEDV',
        low = 20.0,
        high = 25.0
    ),
    column_names=X.columns,
)
print(rule)

IF RM < 6.95 AND TAX >= 219.00 AND LSTAT < 14.17 THEN MEDV = {23.35} [20.00, 25.00]
[32]:
rule.coverage = rule.calculate_coverage(X.to_numpy(), y.to_numpy())
print(rule.coverage)
(p=203, n=42, P=255, N=251)
[33]:
ruleset.rules.append(rule)

print("Updated Ruleset:")
for rule in ruleset.rules:
    print(rule)
Updated Ruleset:
IF AGE >= 80.05 AND RM < 7.20 AND LSTAT >= 14.74 AND CRIM >= 1.06 THEN MEDV = {13.13} [9.41, 16.84] (p=72, n=31, P=105, N=401)
IF LSTAT >= 14.43 AND AGE >= 77.95 AND CRIM >= 0.24 THEN MEDV = {14.08} [10.07, 18.10] (p=100, n=36, P=125, N=381)
IF TAX >= 300.00 AND CRIM < 15.72 AND RM >= 5.06 AND LSTAT < 32.00 AND LSTAT >= 14.73 THEN MEDV = {15.23} [11.46, 19.00] (p=89, n=32, P=139, N=367)
IF RM < 6.45 AND RM >= 5.75 AND AGE < 91.05 AND LSTAT < 14.16 THEN MEDV = {22.05} [18.49, 25.62] (p=129, n=11, P=227, N=279)
IF RM < 6.59 AND B >= 198.44 AND LSTAT < 16.12 AND RM >= 5.64 AND DIS >= 1.15 THEN MEDV = {21.75} [18.05, 25.45] (p=184, n=33, P=236, N=270)
IF RM < 6.95 AND TAX >= 219.00 AND LSTAT < 14.17 THEN MEDV = {23.28} [18.17, 28.39] (p=203, n=42, P=255, N=251)

Now let’s remove from the rule “IF LSTAT >= 14.43 AND AGE >= 77.95 AND CRIM >= 0.24 THEN MEDV = {14.08} [10.07, 18.10]” the condition “AGE >= 77.95”

[34]:
condition_to_remove = ruleset.rules[1].premise.subconditions[1]
ruleset.rules[1].premise.subconditions.remove(condition_to_remove)
ruleset.rules[1].coverage = ruleset.rules[1].calculate_coverage(X.to_numpy(), y.to_numpy())

print("Updated Ruleset:")
for rule in ruleset.rules:
    print(rule)
Updated Ruleset:
IF AGE >= 80.05 AND RM < 7.20 AND LSTAT >= 14.74 AND CRIM >= 1.06 THEN MEDV = {13.13} [9.41, 16.84] (p=72, n=31, P=105, N=401)
IF LSTAT >= 14.43 AND CRIM >= 0.24 THEN MEDV = {14.22} [10.01, 18.44] (p=105, n=41, P=134, N=372)
IF TAX >= 300.00 AND CRIM < 15.72 AND RM >= 5.06 AND LSTAT < 32.00 AND LSTAT >= 14.73 THEN MEDV = {15.23} [11.46, 19.00] (p=89, n=32, P=139, N=367)
IF RM < 6.45 AND RM >= 5.75 AND AGE < 91.05 AND LSTAT < 14.16 THEN MEDV = {22.05} [18.49, 25.62] (p=129, n=11, P=227, N=279)
IF RM < 6.59 AND B >= 198.44 AND LSTAT < 16.12 AND RM >= 5.64 AND DIS >= 1.15 THEN MEDV = {21.75} [18.05, 25.45] (p=184, n=33, P=236, N=270)
IF RM < 6.95 AND TAX >= 219.00 AND LSTAT < 14.17 THEN MEDV = {23.28} [18.17, 28.39] (p=203, n=42, P=255, N=251)

We can also modify the value of a condition. In the rule “IF RM < 6.45 AND RM >= 5.75 AND AGE < 91.05 AND LSTAT < 14.16 THEN MEDV = {22.05} [18.49, 25.62]” we will update the condition “AGE < 91.05” to “AGE <= 71.5”

[36]:
ruleset.rules[3].premise.subconditions[2].right = 71.5
ruleset.rules[3].premise.subconditions[2].right_closed = True
ruleset.rules[3].coverage = ruleset.rules[3].calculate_coverage(X.to_numpy(), y.to_numpy())

print("Updated Ruleset:")
for rule in ruleset.rules:
    print(rule)
Updated Ruleset:
IF AGE >= 80.05 AND RM < 7.20 AND LSTAT >= 14.74 AND CRIM >= 1.06 THEN MEDV = {13.13} [9.41, 16.84] (p=72, n=31, P=105, N=401)
IF LSTAT >= 14.43 AND CRIM >= 0.24 THEN MEDV = {14.22} [10.01, 18.44] (p=105, n=41, P=134, N=372)
IF TAX >= 300.00 AND CRIM < 15.72 AND RM >= 5.06 AND LSTAT < 32.00 AND LSTAT >= 14.73 THEN MEDV = {15.23} [11.46, 19.00] (p=89, n=32, P=139, N=367)
IF RM < 6.45 AND RM >= 5.75 AND AGE <= 71.50 AND LSTAT < 14.16 THEN MEDV = {22.28} [19.69, 24.87] (p=85, n=22, P=173, N=333)
IF RM < 6.59 AND B >= 198.44 AND LSTAT < 16.12 AND RM >= 5.64 AND DIS >= 1.15 THEN MEDV = {21.75} [18.05, 25.45] (p=184, n=33, P=236, N=270)
IF RM < 6.95 AND TAX >= 219.00 AND LSTAT < 14.17 THEN MEDV = {23.28} [18.17, 28.39] (p=203, n=42, P=255, N=251)