{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Evaluate classification rules in decision-rules\n", "\n", "In this tutorial we will evaluate decision rules for regression." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we load the boston housing dataset. The column MEDV (median house price in $1000s) is our target variable `y`, the other ones are the predictors `X`." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CRIMZNINDUSCHASNOXRMAGEDISRADTAXPTRATIOBLSTATMEDV
00.00632182.3100.5386.57565.24.0900129615396.904.9824.0
10.0273107.0700.4696.42178.94.9671224217396.909.1421.6
20.0272907.0700.4697.18561.14.9671224217392.834.0334.7
30.0323702.1800.4586.99845.86.0622322218394.632.9433.4
40.0690502.1800.4587.14754.26.0622322218396.905.3336.2
.............................................
5010.06263011.9300.5736.59369.12.4786127321391.999.6722.4
5020.04527011.9300.5736.12076.72.2875127321396.909.0820.6
5030.06076011.9300.5736.97691.02.1675127321396.905.6423.9
5040.10959011.9300.5736.79489.32.3889127321393.456.4822.0
5050.04741011.9300.5736.03080.82.5050127321396.907.8811.9
\n", "

506 rows × 14 columns

\n", "
" ], "text/plain": [ " CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO \\\n", "0 0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1 296 15 \n", "1 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17 \n", "2 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17 \n", "3 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18 \n", "4 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18 \n", ".. ... .. ... ... ... ... ... ... ... ... ... \n", "501 0.06263 0 11.93 0 0.573 6.593 69.1 2.4786 1 273 21 \n", "502 0.04527 0 11.93 0 0.573 6.120 76.7 2.2875 1 273 21 \n", "503 0.06076 0 11.93 0 0.573 6.976 91.0 2.1675 1 273 21 \n", "504 0.10959 0 11.93 0 0.573 6.794 89.3 2.3889 1 273 21 \n", "505 0.04741 0 11.93 0 0.573 6.030 80.8 2.5050 1 273 21 \n", "\n", " B LSTAT MEDV \n", "0 396.90 4.98 24.0 \n", "1 396.90 9.14 21.6 \n", "2 392.83 4.03 34.7 \n", "3 394.63 2.94 33.4 \n", "4 396.90 5.33 36.2 \n", ".. ... ... ... \n", "501 391.99 9.67 22.4 \n", "502 396.90 9.08 20.6 \n", "503 396.90 5.64 23.9 \n", "504 393.45 6.48 22.0 \n", "505 396.90 7.88 11.9 \n", "\n", "[506 rows x 14 columns]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import pandas as pd\n", "df = pd.read_csv('resources/boston.csv')\n", "display(df)\n", "X = df.drop(columns=['MEDV'])\n", "y = df['MEDV']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want to predict the values of `y` from `X` using a set of decision rules. The rules are already created, we will load them from a JSON file." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "IF AGE >= 16.35 AND PTRATIO < 17.50 AND RM >= 7.48 AND LSTAT < 6.25 THEN MEDV = {47.78} [44.45, 51.11] (p=17, n=2, P=23, N=483)\n", "IF RM < 8.35 AND ZN < 92.50 AND CRIM < 0.59 AND CRIM >= 0.02 AND RM >= 7.42 THEN MEDV = {43.01} [38.58, 47.43] (p=13, n=6, P=14, N=492)\n", "IF LSTAT >= 3.15 AND RM < 8.32 AND CRIM >= 0.04 AND INDUS < 18.84 AND AGE >= 24.45 AND RM >= 7.26 THEN MEDV = {36.57} [27.64, 45.49] (p=16, n=5, P=85, N=421)\n", "IF RM < 7.28 AND PTRATIO < 18.50 AND ZN < 92.50 AND CRIM >= 0.01 AND CHAS < 0.50 AND B >= 363.19 AND RM >= 7.08 THEN MEDV = {35.09} [33.44, 36.74] (p=9, n=3, P=16, N=490)\n", "IF AGE < 82.95 AND INDUS >= 1.34 AND LSTAT < 9.10 AND NOX >= 0.40 AND DIS < 9.06 THEN MEDV = {28.86} [21.29, 36.43] (p=110, n=31, P=211, N=295)\n", "IF DIS < 1.89 AND DIS >= 1.50 AND CRIM >= 13.08 AND CRIM < 43.64 AND NOX >= 0.66 THEN MEDV = {9.09} [7.17, 11.02] (p=11, n=2, P=29, N=477)\n", "IF LSTAT >= 20.70 AND LSTAT < 33.01 AND CRIM < 43.64 AND DIS >= 1.37 AND CRIM >= 11.34 AND NOX >= 0.66 THEN MEDV = {9.02} [7.08, 10.96] (p=16, n=3, P=28, N=478)\n", "IF CRIM < 32.15 AND CRIM >= 6.99 AND NOX >= 0.66 THEN MEDV = {12.23} [6.30, 18.16] (p=53, n=2, P=147, N=359)\n", "IF DIS >= 1.18 AND NOX < 0.72 AND DIS < 2.76 AND RM < 7.14 AND NOX >= 0.66 AND CRIM >= 6.60 THEN MEDV = {11.06} [6.95, 15.18] (p=39, n=7, P=94, N=412)\n", "IF RM >= 4.33 AND CRIM >= 6.84 AND CHAS < 0.50 THEN MEDV = {13.06} [7.00, 19.12] (p=70, n=11, P=175, N=331)\n", "IF RM < 7.17 AND TAX >= 222.50 AND AGE >= 28.25 AND LSTAT < 7.96 THEN MEDV = {26.67} [20.21, 33.13] (p=61, n=5, P=225, N=281)\n", "IF RM < 6.44 AND TAX >= 223.50 AND CHAS < 0.50 AND LSTAT < 10.14 AND CRIM < 30.18 THEN MEDV = {23.00} [18.05, 27.95] (p=75, n=4, P=253, N=253)\n", "IF CRIM < 0.57 AND RM >= 6.43 AND LSTAT < 9.92 THEN MEDV = {31.93} [24.31, 39.55] (p=84, n=38, P=116, N=390)\n", "IF CRIM < 7.53 AND LSTAT >= 9.43 AND AGE >= 28.00 AND B >= 363.17 AND RM < 6.95 THEN MEDV = {19.97} [16.57, 23.36] (p=128, n=41, P=211, N=295)\n", "IF B < 396.55 AND CRIM < 6.99 AND CHAS < 0.50 AND B >= 86.04 AND AGE >= 84.50 AND LSTAT >= 14.43 THEN MEDV = {16.22} [12.89, 19.55] (p=49, n=6, P=144, N=362)\n", "IF LSTAT >= 10.54 AND CRIM < 7.25 AND INDUS >= 7.17 AND DIS < 3.97 AND RM < 6.43 AND CRIM >= 0.13 AND INDUS < 20.73 AND RM >= 5.38 AND LSTAT < 14.14 THEN MEDV = {20.65} [18.59, 22.70] (p=23, n=8, P=142, N=364)\n", "IF B >= 6.50 AND B < 368.37 AND CRIM < 7.90 AND RM >= 5.41 AND LSTAT >= 14.40 THEN MEDV = {15.36} [12.41, 18.31] (p=27, n=7, P=109, N=397)\n" ] } ], "source": [ "import json\n", "\n", "from decision_rules import measures\n", "from decision_rules.serialization import JSONSerializer\n", "from decision_rules.regression.ruleset import RegressionRuleSet\n", "\n", "# Read the JSON file.\n", "ruleset_path = 'resources/boston_ruleset.json'\n", "with open(ruleset_path) as fp:\n", " json_ruleset = json.load(fp)\n", "\n", "# Create a RegressionRuleSet object from the dict that was stored in JSON.\n", "ruleset: RegressionRuleSet = JSONSerializer.deserialize(json_ruleset, RegressionRuleSet)\n", "ruleset.update(X, y, measure=measures.c2)\n", "# Print the rules in the rule set.\n", "for rule in ruleset.rules:\n", " print(rule)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The rule set is loaded and ready for prediction. Let's generate predictions for the examples in the dataset." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CRIMZNINDUSCHASNOXRMAGEDISRADTAXPTRATIOBLSTATMEDVMEDV_pred
00.00632182.3100.5386.57565.24.0900129615396.904.9824.029.104371
10.0273107.0700.4696.42178.94.9671224217396.909.1421.622.998734
20.0272907.0700.4697.18561.14.9671224217392.834.0334.732.171973
30.0323702.1800.4586.99845.86.0622322218394.632.9433.430.458412
40.0690502.1800.4587.14754.26.0622322218396.905.3336.232.171973
................................................
5010.06263011.9300.5736.59369.12.4786127321391.999.6722.426.221980
5020.04527011.9300.5736.12076.72.2875127321396.909.0820.625.626787
5030.06076011.9300.5736.97691.02.1675127321396.905.6423.929.214162
5040.10959011.9300.5736.79489.32.3889127321393.456.4822.029.214162
5050.04741011.9300.5736.03080.82.5050127321396.907.8811.925.982855
\n", "

506 rows × 15 columns

\n", "
" ], "text/plain": [ " CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO \\\n", "0 0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1 296 15 \n", "1 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17 \n", "2 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17 \n", "3 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18 \n", "4 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18 \n", ".. ... .. ... ... ... ... ... ... ... ... ... \n", "501 0.06263 0 11.93 0 0.573 6.593 69.1 2.4786 1 273 21 \n", "502 0.04527 0 11.93 0 0.573 6.120 76.7 2.2875 1 273 21 \n", "503 0.06076 0 11.93 0 0.573 6.976 91.0 2.1675 1 273 21 \n", "504 0.10959 0 11.93 0 0.573 6.794 89.3 2.3889 1 273 21 \n", "505 0.04741 0 11.93 0 0.573 6.030 80.8 2.5050 1 273 21 \n", "\n", " B LSTAT MEDV MEDV_pred \n", "0 396.90 4.98 24.0 29.104371 \n", "1 396.90 9.14 21.6 22.998734 \n", "2 392.83 4.03 34.7 32.171973 \n", "3 394.63 2.94 33.4 30.458412 \n", "4 396.90 5.33 36.2 32.171973 \n", ".. ... ... ... ... \n", "501 391.99 9.67 22.4 26.221980 \n", "502 396.90 9.08 20.6 25.626787 \n", "503 396.90 5.64 23.9 29.214162 \n", "504 393.45 6.48 22.0 29.214162 \n", "505 396.90 7.88 11.9 25.982855 \n", "\n", "[506 rows x 15 columns]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "y_pred = ruleset.predict(X)\n", "df['MEDV_pred'] = y_pred\n", "display(df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can calculate the basic regression metrics using `calculate_for_regression`. The returned value is a dict with keys:\n", "\n", "- `general`: general metrics, such as RMSE.\n", "- `histogram`: a histogram of errors." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "General metrics\n" ] }, { "data": { "text/plain": [ "{'RMSE': 4.90202509087278,\n", " 'MAE': 3.2308461981595062,\n", " 'MAPE': 0.15617752551676722,\n", " 'rRMSE': 0.21755058026782434,\n", " 'rMAE': 0.14338410190400555,\n", " 'maxError': 37.77272727272727,\n", " 'R^2': 0.7153520927414723}" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Histogram of errors\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAh8AAAGdCAYAAACyzRGfAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAgTUlEQVR4nO3dfXBU5f2/8fcGyCY87IYEyJKaQHwiUAU1SFjUVjA1ZRgLQ6TVoQrKQHUiLcSqpFUQR02KU6EwPKhDg7ZSlGmBUhWr0eJ0TFAjVMUxRQsmEHZpq9lFvs0mkvv3h8P+XAmaTTb3ZsP1mjkz7jknJx9uGbjmcHbjMMYYAQAAWJIU7wEAAMDZhfgAAABWER8AAMAq4gMAAFhFfAAAAKuIDwAAYBXxAQAArCI+AACAVX3jPcBXtbW1qbGxUYMGDZLD4Yj3OAAAoAOMMTp+/LiysrKUlPT19zZ6XHw0NjYqOzs73mMAAIBOaGho0DnnnPO15/S4+Bg0aJCkL4Z3uVxxngYAAHREMBhUdnZ2+O/xr9Pj4uPUP7W4XC7iAwCABNORRyZ44BQAAFhFfAAAAKuIDwAAYBXxAQAArCI+AACAVcQHAACwivgAAABWER8AAMAq4gMAAFhFfAAAAKuIDwAAYBXxAQAArCI+AACAVcQHAACwqm+8BwAA9DwjlzzXofMOVUzr5knQG3HnAwAAWEV8AAAAq4gPAABgFfEBAACsIj4AAIBVxAcAALCK+AAAAFYRHwAAwCriAwAAWEV8AAAAq6KKj5EjR8rhcJy2lZSUSJKam5tVUlKijIwMDRw4UMXFxfL7/d0yOAAASExRxcebb76po0ePhreXXnpJkjRr1ixJ0uLFi7Vz505t3bpVu3fvVmNjo2bOnBn7qQEAQMKK6gfLDR06NOJ1RUWFzjvvPH33u99VIBDQxo0btXnzZk2ZMkWSVFlZqdGjR6umpkYTJ06M3dQAACBhdfqZj5aWFv3+97/XrbfeKofDodraWrW2tqqwsDB8Tl5ennJyclRdXX3G64RCIQWDwYgNAAD0Xp2Oj+3bt6upqUlz586VJPl8PiUnJystLS3ivMzMTPl8vjNep7y8XG63O7xlZ2d3diQAAJAAOh0fGzdu1NSpU5WVldWlAcrKyhQIBMJbQ0NDl64HAAB6tqie+Tjl448/1ssvv6w//elP4X0ej0ctLS1qamqKuPvh9/vl8XjOeC2n0ymn09mZMQAAQALq1J2PyspKDRs2TNOmTQvvy8/PV79+/VRVVRXeV1dXp/r6enm93q5PCgAAeoWo73y0tbWpsrJSc+bMUd++///L3W635s2bp9LSUqWnp8vlcmnhwoXyer280wUAAIRFHR8vv/yy6uvrdeutt552bOXKlUpKSlJxcbFCoZCKioq0bt26mAwKAAB6B4cxxsR7iC8LBoNyu90KBAJyuVzxHgcAzkojlzzXofMOVUz75pNwVojm729+tgsAALCK+AAAAFYRHwAAwCriAwAAWEV8AAAAq4gPAABgFfEBAACsIj4AAIBVxAcAALCK+AAAAFYRHwAAwCriAwAAWEV8AAAAq4gPAABgFfEBAACsIj4AAIBVxAcAALCK+AAAAFYRHwAAwCriAwAAWEV8AAAAq4gPAABgFfEBAACsIj4AAIBVxAcAALCK+AAAAFb1jfcAAAB7Ri55Lt4jANz5AAAAdhEfAADAKuIDAABYRXwAAACriA8AAGAV8QEAAKwiPgAAgFXEBwAAsIr4AAAAVhEfAADAqqjj48iRI/rxj3+sjIwMpaam6uKLL9Zbb70VPm6M0dKlSzV8+HClpqaqsLBQBw4ciOnQAAAgcUUVH59++qmuuOIK9evXTy+88ILef/99/frXv9bgwYPD56xYsUKrV6/Whg0btGfPHg0YMEBFRUVqbm6O+fAAACDxRPWD5X71q18pOztblZWV4X25ubnh/zbGaNWqVbr33ns1ffp0SdJTTz2lzMxMbd++XTfccEOMxgYAAIkqqjsff/7znzV+/HjNmjVLw4YN06WXXqonnngifPzgwYPy+XwqLCwM73O73SooKFB1dXW71wyFQgoGgxEbAADovaKKj3/9619av369LrjgAr344ou6/fbb9dOf/lRPPvmkJMnn80mSMjMzI74uMzMzfOyrysvL5Xa7w1t2dnZnfh0AACBBRBUfbW1tuuyyy/Twww/r0ksv1YIFCzR//nxt2LCh0wOUlZUpEAiEt4aGhk5fCwAA9HxRxcfw4cM1ZsyYiH2jR49WfX29JMnj8UiS/H5/xDl+vz987KucTqdcLlfEBgAAeq+o4uOKK65QXV1dxL5//vOfGjFihKQvHj71eDyqqqoKHw8Gg9qzZ4+8Xm8MxgUAAIkuqne7LF68WJMmTdLDDz+sH/7wh3rjjTf0+OOP6/HHH5ckORwOLVq0SA8++KAuuOAC5ebm6r777lNWVpZmzJjRHfMDAIAEE1V8XH755dq2bZvKysr0wAMPKDc3V6tWrdLs2bPD59x99906ceKEFixYoKamJl155ZXatWuXUlJSYj48AABIPA5jjIn3EF8WDAbldrsVCAR4/gMAYmzkkudier1DFdNiej0krmj+/uZnuwAAAKuIDwAAYBXxAQAArCI+AACAVcQHAACwivgAAABWER8AAMCqqD5kDADQM8X68zti/X35PBB8GXc+AACAVcQHAACwivgAAABWER8AAMAq4gMAAFhFfAAAAKuIDwAAYBXxAQAArCI+AACAVcQHAACwivgAAABWER8AAMAq4gMAAFhFfAAAAKuIDwAAYBXxAQAArCI+AACAVcQHAACwivgAAABWER8AAMAq4gMAAFhFfAAAAKuIDwAAYBXxAQAArCI+AACAVcQHAACwivgAAABWER8AAMAq4gMAAFgVVXzcf//9cjgcEVteXl74eHNzs0pKSpSRkaGBAwequLhYfr8/5kMDAIDEFfWdj29/+9s6evRoePv73/8ePrZ48WLt3LlTW7du1e7du9XY2KiZM2fGdGAAAJDY+kb9BX37yuPxnLY/EAho48aN2rx5s6ZMmSJJqqys1OjRo1VTU6OJEyd2fVoAAJDwor7zceDAAWVlZencc8/V7NmzVV9fL0mqra1Va2urCgsLw+fm5eUpJydH1dXVZ7xeKBRSMBiM2AAAQO8V1Z2PgoICbdq0SaNGjdLRo0e1fPlyXXXVVXrvvffk8/mUnJystLS0iK/JzMyUz+c74zXLy8u1fPnyTg0PAEgMI5c816HzDlVM6+ZJ0BNEFR9Tp04N//fYsWNVUFCgESNG6Nlnn1VqamqnBigrK1NpaWn4dTAYVHZ2dqeuBQAAer4uvdU2LS1NF154oT788EN5PB61tLSoqakp4hy/39/uMyKnOJ1OuVyuiA0AAPReXYqPzz77TB999JGGDx+u/Px89evXT1VVVeHjdXV1qq+vl9fr7fKgAACgd4jqn11+/vOf67rrrtOIESPU2NioZcuWqU+fPrrxxhvldrs1b948lZaWKj09XS6XSwsXLpTX6+WdLgAAICyq+Dh8+LBuvPFG/fe//9XQoUN15ZVXqqamRkOHDpUkrVy5UklJSSouLlYoFFJRUZHWrVvXLYMDQG/X0Yc0gUTjMMaYeA/xZcFgUG63W4FAgOc/AJzVzsb44N0uiSuav7/52S4AAMAq4gMAAFhFfAAAAKuIDwAAYBXxAQAArCI+AACAVcQHAACwivgAAABWER8AAMAq4gMAAFhFfAAAAKuIDwAAYBXxAQAArCI+AACAVcQHAACwivgAAABWER8AAMAq4gMAAFhFfAAAAKuIDwAAYBXxAQAArCI+AACAVcQHAACwivgAAABWER8AAMAq4gMAAFhFfAAAAKuIDwAAYBXxAQAArCI+AACAVcQHAACwivgAAABWER8AAMAq4gMAAFhFfAAAAKuIDwAAYBXxAQAArOpSfFRUVMjhcGjRokXhfc3NzSopKVFGRoYGDhyo4uJi+f3+rs4JAAB6iU7Hx5tvvqnHHntMY8eOjdi/ePFi7dy5U1u3btXu3bvV2NiomTNndnlQAADQO3QqPj777DPNnj1bTzzxhAYPHhzeHwgEtHHjRj366KOaMmWK8vPzVVlZqddff101NTUxGxoAACSuTsVHSUmJpk2bpsLCwoj9tbW1am1tjdifl5ennJwcVVdXt3utUCikYDAYsQEAgN6rb7RfsGXLFr399tt68803Tzvm8/mUnJystLS0iP2ZmZny+XztXq+8vFzLly+PdgwAAJCgorrz0dDQoJ/97Gd6+umnlZKSEpMBysrKFAgEwltDQ0NMrgsAAHqmqOKjtrZWx44d02WXXaa+ffuqb9++2r17t1avXq2+ffsqMzNTLS0tampqivg6v98vj8fT7jWdTqdcLlfEBgAAeq+o/tnlmmuu0bvvvhux75ZbblFeXp7uueceZWdnq1+/fqqqqlJxcbEkqa6uTvX19fJ6vbGbGgAAJKyo4mPQoEG66KKLIvYNGDBAGRkZ4f3z5s1TaWmp0tPT5XK5tHDhQnm9Xk2cODF2UwMAgIQV9QOn32TlypVKSkpScXGxQqGQioqKtG7dulh/GwAAkKAcxhgT7yG+LBgMyu12KxAI8PwHgLPayCXPxXsE6w5VTIv3COikaP7+5me7AAAAq4gPAABgFfEBAACsIj4AAIBVxAcAALCK+AAAAFYRHwAAwCriAwAAWEV8AAAAq4gPAABgFfEBAACsIj4AAIBVxAcAALCK+AAAAFYRHwAAwCriAwAAWEV8AAAAq4gPAABgFfEBAACsIj4AAIBVxAcAALCK+AAAAFYRHwAAwCriAwAAWEV8AAAAq4gPAABgFfEBAACs6hvvAQAAOGXkkuc6dN6himndPAm6E3c+AACAVcQHAACwivgAAABWER8AAMAq4gMAAFhFfAAAAKuIDwAAYBXxAQAArCI+AACAVVHFx/r16zV27Fi5XC65XC55vV698MIL4ePNzc0qKSlRRkaGBg4cqOLiYvn9/pgPDQAAEldU8XHOOeeooqJCtbW1euuttzRlyhRNnz5d+/fvlyQtXrxYO3fu1NatW7V79241NjZq5syZ3TI4AABITA5jjOnKBdLT0/XII4/o+uuv19ChQ7V582Zdf/31kqQPPvhAo0ePVnV1tSZOnNih6wWDQbndbgUCAblcrq6MBgAJraM/5+RsxM926Xmi+fu70898nDx5Ulu2bNGJEyfk9XpVW1ur1tZWFRYWhs/Jy8tTTk6OqqurO/ttAABALxP1T7V999135fV61dzcrIEDB2rbtm0aM2aM9u3bp+TkZKWlpUWcn5mZKZ/Pd8brhUIhhUKh8OtgMBjtSAAAIIFEfedj1KhR2rdvn/bs2aPbb79dc+bM0fvvv9/pAcrLy+V2u8NbdnZ2p68FAAB6vqjjIzk5Weeff77y8/NVXl6ucePG6Te/+Y08Ho9aWlrU1NQUcb7f75fH4znj9crKyhQIBMJbQ0ND1L8IAACQOLr8OR9tbW0KhULKz89Xv379VFVVFT5WV1en+vp6eb3eM3690+kMv3X31AYAAHqvqJ75KCsr09SpU5WTk6Pjx49r8+bN+tvf/qYXX3xRbrdb8+bNU2lpqdLT0+VyubRw4UJ5vd4Ov9MFAAD0flHFx7Fjx3TzzTfr6NGjcrvdGjt2rF588UV973vfkyStXLlSSUlJKi4uVigUUlFRkdatW9ctgwMAgMTU5c/5iDU+5wMAvsDnfJwZn/PR81j5nA8AAIDOID4AAIBVxAcAALAq6k84BQAg3jr6PAzPhvRM3PkAAABWER8AAMAq4gMAAFhFfAAAAKuIDwAAYBXxAQAArCI+AACAVcQHAACwivgAAABWER8AAMAq4gMAAFhFfAAAAKuIDwAAYBXxAQAArCI+AACAVcQHAACwivgAAABWER8AAMAq4gMAAFhFfAAAAKuIDwAAYBXxAQAArCI+AACAVcQHAACwivgAAABWER8AAMAq4gMAAFhFfAAAAKuIDwAAYBXxAQAArCI+AACAVcQHAACwivgAAABWER8AAMCqqOKjvLxcl19+uQYNGqRhw4ZpxowZqqurizinublZJSUlysjI0MCBA1VcXCy/3x/ToQEAQOKKKj52796tkpIS1dTU6KWXXlJra6uuvfZanThxInzO4sWLtXPnTm3dulW7d+9WY2OjZs6cGfPBAQBAYuobzcm7du2KeL1p0yYNGzZMtbW1+s53vqNAIKCNGzdq8+bNmjJliiSpsrJSo0ePVk1NjSZOnBi7yQEAQELq0jMfgUBAkpSeni5Jqq2tVWtrqwoLC8Pn5OXlKScnR9XV1e1eIxQKKRgMRmwAAKD3iurOx5e1tbVp0aJFuuKKK3TRRRdJknw+n5KTk5WWlhZxbmZmpnw+X7vXKS8v1/Llyzs7BgAknJFLnov3CEBcdfrOR0lJid577z1t2bKlSwOUlZUpEAiEt4aGhi5dDwAA9GyduvNxxx136C9/+Ytee+01nXPOOeH9Ho9HLS0tampqirj74ff75fF42r2W0+mU0+nszBgAACABRXXnwxijO+64Q9u2bdMrr7yi3NzciOP5+fnq16+fqqqqwvvq6upUX18vr9cbm4kBAEBCi+rOR0lJiTZv3qwdO3Zo0KBB4ec43G63UlNT5Xa7NW/ePJWWlio9PV0ul0sLFy6U1+vlnS4AAEBSlPGxfv16SdLVV18dsb+yslJz586VJK1cuVJJSUkqLi5WKBRSUVGR1q1bF5NhAQBA4osqPowx33hOSkqK1q5dq7Vr13Z6KAAA0Hvxs10AAIBVxAcAALCK+AAAAFYRHwAAwCriAwAAWEV8AAAAq4gPAABgFfEBAACsIj4AAIBVxAcAALCK+AAAAFYRHwAAwCriAwAAWEV8AAAAq4gPAABgFfEBAACsIj4AAIBVxAcAALCK+AAAAFYRHwAAwCriAwAAWEV8AAAAq4gPAABgFfEBAACsIj4AAIBVxAcAALCK+AAAAFYRHwAAwCriAwAAWEV8AAAAq4gPAABgVd94DwAAvcXIJc/FewQgIXDnAwAAWEV8AAAAq4gPAABgFfEBAACsIj4AAIBVUcfHa6+9puuuu05ZWVlyOBzavn17xHFjjJYuXarhw4crNTVVhYWFOnDgQKzmBQAACS7q+Dhx4oTGjRuntWvXtnt8xYoVWr16tTZs2KA9e/ZowIABKioqUnNzc5eHBQAAiS/qz/mYOnWqpk6d2u4xY4xWrVqle++9V9OnT5ckPfXUU8rMzNT27dt1ww03dG1aAACQ8GL6zMfBgwfl8/lUWFgY3ud2u1VQUKDq6up2vyYUCikYDEZsAACg94rpJ5z6fD5JUmZmZsT+zMzM8LGvKi8v1/Lly2M5BgB0SEc/kfRQxbRungTdhf/HPVPc3+1SVlamQCAQ3hoaGuI9EgAA6EYxjQ+PxyNJ8vv9Efv9fn/42Fc5nU65XK6IDQAA9F4xjY/c3Fx5PB5VVVWF9wWDQe3Zs0derzeW3woAACSoqJ/5+Oyzz/Thhx+GXx88eFD79u1Tenq6cnJytGjRIj344IO64IILlJubq/vuu09ZWVmaMWNGLOcGAAAJKur4eOuttzR58uTw69LSUknSnDlztGnTJt199906ceKEFixYoKamJl155ZXatWuXUlJSYjc1AHwNfrQ90LNFHR9XX321jDFnPO5wOPTAAw/ogQce6NJgAACgd4r7u10AAMDZhfgAAABWER8AAMCqmH7CKQD0RjzACsQWdz4AAIBVxAcAALCK+AAAAFYRHwAAwCoeOAUAnPU6+lDxoYpp3TzJ2YE7HwAAwCriAwAAWEV8AAAAq4gPAABgFfEBAACsIj4AAIBVxAcAALCK+AAAAFYRHwAAwCriAwAAWEV8AAAAq4gPAABgFfEBAACsIj4AAIBVxAcAALCK+AAAAFYRHwAAwCriAwAAWNU33gMA6DlGLnmuQ+cdqpjWzZMA6M248wEAAKwiPgAAgFXEBwAAsIr4AAAAVvHAKXoVHphMTB39/wbg6yXKn4Hc+QAAAFYRHwAAwCriAwAAWHXWPfORKP8e9k34dUCK37MSPKOBs1Wsf++frX+2ddudj7Vr12rkyJFKSUlRQUGB3njjje76VgAAIIF0S3w888wzKi0t1bJly/T2229r3LhxKioq0rFjx7rj2wEAgATSLfHx6KOPav78+brllls0ZswYbdiwQf3799dvf/vb7vh2AAAggcT8mY+WlhbV1taqrKwsvC8pKUmFhYWqrq4+7fxQKKRQKBR+HQgEJEnBYDDWo0mS2kL/16Hzuuv7xwq/DjvX6+k6+usF0DPF+s+ieP4ZeOqaxphvPtnE2JEjR4wk8/rrr0fsv+uuu8yECRNOO3/ZsmVGEhsbGxsbG1sv2BoaGr6xFeL+bpeysjKVlpaGX7e1temTTz5RRkaGHA6HgsGgsrOz1dDQIJfLFcdJexfWtfuwtt2Hte0+rG33OVvW1hij48ePKysr6xvPjXl8DBkyRH369JHf74/Y7/f75fF4Tjvf6XTK6XRG7EtLSzvtPJfL1av/p8UL69p9WNvuw9p2H9a2+5wNa+t2uzt0XswfOE1OTlZ+fr6qqqrC+9ra2lRVVSWv1xvrbwcAABJMt/yzS2lpqebMmaPx48drwoQJWrVqlU6cOKFbbrmlO74dAABIIN0SHz/60Y/073//W0uXLpXP59Mll1yiXbt2KTMzM+prOZ1OLVu27LR/mkHXsK7dh7XtPqxt92Ftuw9rezqHMR15TwwAAEBs8IPlAACAVcQHAACwivgAAABWER8AAMCqHh8foVBIl1xyiRwOh/bt2xdx7J133tFVV12llJQUZWdna8WKFfEZMsH84Ac/UE5OjlJSUjR8+HDddNNNamxsjDiHtY3eoUOHNG/ePOXm5io1NVXnnXeeli1bppaWlojzWNvOeeihhzRp0iT179+/3Q8ilKT6+npNmzZN/fv317Bhw3TXXXfp888/tztoglq7dq1GjhyplJQUFRQU6I033oj3SAnntdde03XXXaesrCw5HA5t37494rgxRkuXLtXw4cOVmpqqwsJCHThwID7DxlmPj4+777673Y9qDQaDuvbaazVixAjV1tbqkUce0f3336/HH388DlMmlsmTJ+vZZ59VXV2d/vjHP+qjjz7S9ddfHz7O2nbOBx98oLa2Nj322GPav3+/Vq5cqQ0bNugXv/hF+BzWtvNaWlo0a9Ys3X777e0eP3nypKZNm6aWlha9/vrrevLJJ7Vp0yYtXbrU8qSJ55lnnlFpaamWLVumt99+W+PGjVNRUZGOHTsW79ESyokTJzRu3DitXbu23eMrVqzQ6tWrtWHDBu3Zs0cDBgxQUVGRmpubLU/aA8Tkp8l1k+eff97k5eWZ/fv3G0lm79694WPr1q0zgwcPNqFQKLzvnnvuMaNGjYrDpIltx44dxuFwmJaWFmMMaxtLK1asMLm5ueHXrG3XVVZWGrfbfdr+559/3iQlJRmfzxfet379euNyuSLWG6ebMGGCKSkpCb8+efKkycrKMuXl5XGcKrFJMtu2bQu/bmtrMx6PxzzyyCPhfU1NTcbpdJo//OEPcZgwvnrsnQ+/36/58+frd7/7nfr373/a8erqan3nO99RcnJyeF9RUZHq6ur06aef2hw1oX3yySd6+umnNWnSJPXr108SaxtLgUBA6enp4desbfeprq7WxRdfHPFhhkVFRQoGg9q/f38cJ+vZWlpaVFtbq8LCwvC+pKQkFRYWqrq6Oo6T9S4HDx6Uz+eLWGe3262CgoKzcp17ZHwYYzR37lzddtttGj9+fLvn+Hy+0z4x9dRrn8/X7TMmunvuuUcDBgxQRkaG6uvrtWPHjvAx1jY2PvzwQ61Zs0Y/+clPwvtY2+7D2nbOf/7zH508ebLdtWPdYufUWrLOX7AaH0uWLJHD4fja7YMPPtCaNWt0/PhxlZWV2RwvoXV0bU+56667tHfvXv31r39Vnz59dPPNN8vwYbftinZtJenIkSP6/ve/r1mzZmn+/Plxmrzn68zaAkh83fKzXc7kzjvv1Ny5c7/2nHPPPVevvPKKqqurT/sc/PHjx2v27Nl68skn5fF45Pf7I46feu3xeGI6dyLo6NqeMmTIEA0ZMkQXXnihRo8erezsbNXU1Mjr9bK2XxHt2jY2Nmry5MmaNGnSaQ+SsraRol3br+PxeE57h8bZvLYdNWTIEPXp06fd35esW+ycWku/36/hw4eH9/v9fl1yySVxmip+rMbH0KFDNXTo0G88b/Xq1XrwwQfDrxsbG1VUVKRnnnlGBQUFkiSv16tf/vKXam1tDT+r8NJLL2nUqFEaPHhw9/wCerCOrm172traJH3xtmaJtf2qaNb2yJEjmjx5svLz81VZWamkpMibi6xtpK78vv0qr9erhx56SMeOHdOwYcMkfbG2LpdLY8aMicn36I2Sk5OVn5+vqqoqzZgxQ9IXfyZUVVXpjjvuiO9wvUhubq48Ho+qqqrCsREMBrVnz54zvoOrV4v3E68dcfDgwdPe7dLU1GQyMzPNTTfdZN577z2zZcsW079/f/PYY4/Fb9AEUFNTY9asWWP27t1rDh06ZKqqqsykSZPMeeedZ5qbm40xrG1nHT582Jx//vnmmmuuMYcPHzZHjx4Nb6ewtp338ccfm71795rly5ebgQMHmr1795q9e/ea48ePG2OM+fzzz81FF11krr32WrNv3z6za9cuM3ToUFNWVhbnyXu+LVu2GKfTaTZt2mTef/99s2DBApOWlhbxziF8s+PHj4d/X0oyjz76qNm7d6/5+OOPjTHGVFRUmLS0NLNjxw7zzjvvmOnTp5vc3Fzzv//9L86T25ew8WGMMf/4xz/MlVdeaZxOp/nWt75lKioq4jNgAnnnnXfM5MmTTXp6unE6nWbkyJHmtttuM4cPH444j7WNXmVlpZHU7vZlrG3nzJkzp921ffXVV8PnHDp0yEydOtWkpqaaIUOGmDvvvNO0trbGb+gEsmbNGpOTk2OSk5PNhAkTTE1NTbxHSjivvvpqu79H58yZY4z54u229913n8nMzDROp9Ncc801pq6uLr5Dx4nDGJ4yBAAA9vTIt9oCAIDei/gAAABWER8AAMAq4gMAAFhFfAAAAKuIDwAAYBXxAQAArCI+AACAVcQHAACwivgAAABWER8AAMAq4gMAAFj1/wDkfdkegl/j7QAAAABJRU5ErkJggg==", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from decision_rules.regression.prediction_indicators import calculate_for_regression\n", "import matplotlib.pyplot as plt\n", "\n", "print('General metrics')\n", "metrics = calculate_for_regression(y, y_pred)\n", "display(metrics['general'])\n", "\n", "print('Histogram of errors')\n", "bins = metrics['histogram']['bin_edges']\n", "counts = metrics['histogram']['histogram']\n", "_ = plt.stairs(counts, bins, fill=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The coverage matrix shows which rules cover each of the examples. We can calculate it calling the `calculate_coverage_matrix` function of the ruleset object. It accepts one argument `X` which is a `DataFrame` of examples to check.\n", "\n", "The function returns a 2D boolean numpy array. Number of rows is equal to the number of rows (examples) in `X` and number of columns is the same as the number of rules in the rule set. A `True` value means that the example is covered by the rule." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[False, False, False, False, True, False, False, False, False,\n", " False, True, False, True, False, False, False, False],\n", " [False, False, False, False, False, False, False, False, False,\n", " False, False, True, False, False, False, False, False],\n", " [False, False, False, True, True, False, False, False, False,\n", " False, False, False, True, False, False, False, False],\n", " [False, False, False, False, True, False, False, False, False,\n", " False, False, False, True, False, False, False, False],\n", " [False, False, False, True, True, False, False, False, False,\n", " False, False, False, True, False, False, False, False]])" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Number of examples: 5\n", "Number of rules: 17\n" ] } ], "source": [ "coverage_matrix = ruleset.calculate_coverage_matrix(X.iloc[:5])\n", "display(coverage_matrix)\n", "print(f'Number of examples: {coverage_matrix.shape[0]}')\n", "print(f'Number of rules: {coverage_matrix.shape[1]}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `calculate_rules_metrics` function of a ruleset object computes metrics describing each of the rules in the rule set. The explanations of the metrics can be found in the [documentation of ruleminer](https://github.com/ruleminer/ruleminer/wiki/10-Description-of-the-results-obtained)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Rule fc70c54b-02f9-4cd5-81d0-7f73d85dc367\n", "{'p': 17, 'n': 2, 'P': 23, 'N': 483, 'p_unique': 17, 'n_unique': 17, 'support': 19, 'conditions_count': 4, 'y_covered_avg': 47.77894736842106, 'y_covered_median': 50.0, 'y_covered_min': 37.6, 'y_covered_max': 50.0, 'mae': 25.395548158934893, 'rmse': 26.8660974796374, 'mape': 1.5125990035583592, 'p-value': 4.231965993998835e-06}\n", "Rule 1634ea74-f2aa-400b-8ba4-1b658578b00a\n", "{'p': 13, 'n': 6, 'P': 14, 'N': 492, 'p_unique': 13, 'n_unique': 13, 'support': 19, 'conditions_count': 5, 'y_covered_avg': 43.00526315789473, 'y_covered_median': 43.5, 'y_covered_min': 33.4, 'y_covered_max': 50.0, 'mae': 21.032764718119402, 'rmse': 22.439720251538724, 'mape': 1.2702153772414488, 'p-value': 0.00032332389088428164}\n", "Rule cdc83a4b-d169-4372-a508-af418d02b556\n", "{'p': 16, 'n': 5, 'P': 85, 'N': 421, 'p_unique': 16, 'n_unique': 16, 'support': 21, 'conditions_count': 6, 'y_covered_avg': 36.56666666666666, 'y_covered_median': 35.2, 'y_covered_min': 15.0, 'y_covered_max': 50.0, 'mae': 15.40527009222661, 'rmse': 16.774051158576192, 'mape': 0.9493332806979319, 'p-value': 0.46645377271528804}\n", "Rule 6fb80d0a-b7ae-4325-8c5e-76226b8d2a03\n", "{'p': 9, 'n': 3, 'P': 16, 'N': 490, 'p_unique': 9, 'n_unique': 9, 'support': 12, 'conditions_count': 7, 'y_covered_avg': 35.091666666666676, 'y_covered_median': 35.15, 'y_covered_min': 31.6, 'y_covered_max': 37.9, 'mae': 14.181785243741773, 'rmse': 15.560993839083396, 'mape': 0.8775897305716386, 'p-value': 2.1589703599932363e-07}\n", "Rule a6d5f154-e5d8-42a7-a7fb-7999ee8edfdc\n", "{'p': 110, 'n': 31, 'P': 211, 'N': 295, 'p_unique': 110, 'n_unique': 110, 'support': 141, 'conditions_count': 5, 'y_covered_avg': 28.858156028368793, 'y_covered_median': 26.6, 'y_covered_min': 11.9, 'y_covered_max': 50.0, 'mae': 9.648011100832562, 'rmse': 11.154801882477512, 'mape': 0.5935712142448168, 'p-value': 0.00125917486075591}\n", "Rule 7666c01f-c21d-409b-9162-c1450927bfee\n", "{'p': 11, 'n': 2, 'P': 29, 'N': 477, 'p_unique': 11, 'n_unique': 11, 'support': 13, 'conditions_count': 5, 'y_covered_avg': 9.092307692307696, 'y_covered_median': 8.5, 'y_covered_min': 5.6, 'y_covered_max': 12.7, 'mae': 13.573852234721796, 'rmse': 16.280864830458583, 'mape': 0.5432146614616068, 'p-value': 3.648142869045188e-07}\n", "Rule 13f4ff45-93e7-4bd3-b6bd-8658d2f00063\n", "{'p': 16, 'n': 3, 'P': 28, 'N': 478, 'p_unique': 16, 'n_unique': 16, 'support': 19, 'conditions_count': 6, 'y_covered_avg': 9.021052631578947, 'y_covered_median': 8.8, 'y_covered_min': 5.0, 'y_covered_max': 13.1, 'mae': 13.639192843769504, 'rmse': 16.339738186513586, 'mape': 0.546143930494049, 'p-value': 4.972293644367493e-10}\n", "Rule 66b5f17f-130f-4758-bba2-d5773a8ae8d2\n", "{'p': 53, 'n': 2, 'P': 147, 'N': 359, 'p_unique': 53, 'n_unique': 53, 'support': 55, 'conditions_count': 3, 'y_covered_avg': 12.227272727272727, 'y_covered_median': 11.7, 'y_covered_min': 5.6, 'y_covered_max': 50.0, 'mae': 10.821451670858785, 'rmse': 13.806649806224579, 'mape': 0.4261063230868248, 'p-value': 4.539164119741508e-05}\n", "Rule 2ef97491-a4fb-44ec-9f4b-5637da1fd9d5\n", "{'p': 39, 'n': 7, 'P': 94, 'N': 412, 'p_unique': 39, 'n_unique': 39, 'support': 46, 'conditions_count': 6, 'y_covered_avg': 11.06304347826087, 'y_covered_median': 10.7, 'y_covered_min': 5.0, 'y_covered_max': 27.9, 'mae': 11.806547516755456, 'rmse': 14.696088455647036, 'mape': 0.46617491141164896, 'p-value': 1.2859048511740222e-09}\n", "Rule 3cffdf44-dc64-4044-b430-ec95e153c4ad\n", "{'p': 70, 'n': 11, 'P': 175, 'N': 331, 'p_unique': 70, 'n_unique': 70, 'support': 81, 'conditions_count': 3, 'y_covered_avg': 13.058024691358028, 'y_covered_median': 12.3, 'y_covered_min': 5.0, 'y_covered_max': 50.0, 'mae': 10.150968623432389, 'rmse': 13.198145443376216, 'mape': 0.4001944889170244, 'p-value': 2.310205784360766e-06}\n", "Rule a974919d-9fbc-49f7-a34a-a23a024ac082\n", "{'p': 61, 'n': 5, 'P': 225, 'N': 281, 'p_unique': 61, 'n_unique': 61, 'support': 66, 'conditions_count': 4, 'y_covered_avg': 26.668181818181818, 'y_covered_median': 24.65, 'y_covered_min': 11.9, 'y_covered_max': 50.0, 'mae': 8.369080129356808, 'rmse': 10.07575737268071, 'mape': 0.5042875491291532, 'p-value': 0.00019747617416798985}\n", "Rule 46605476-d4c6-42cc-8893-9ac43588fe71\n", "{'p': 75, 'n': 4, 'P': 253, 'N': 253, 'p_unique': 75, 'n_unique': 75, 'support': 79, 'conditions_count': 5, 'y_covered_avg': 22.998734177215184, 'y_covered_median': 22.6, 'y_covered_min': 11.9, 'y_covered_max': 50.0, 'mae': 6.743183068994845, 'rmse': 9.199817656913865, 'mape': 0.3749461786251191, 'p-value': 1.0236828474464975e-10}\n", "Rule c5de0cc6-0031-490b-b622-4ea4c43e4690\n", "{'p': 84, 'n': 38, 'P': 116, 'N': 390, 'p_unique': 84, 'n_unique': 84, 'support': 122, 'conditions_count': 3, 'y_covered_avg': 31.9344262295082, 'y_covered_median': 30.950000000000003, 'y_covered_min': 16.5, 'y_covered_max': 50.0, 'mae': 11.740452277586991, 'rmse': 13.145722232031703, 'mape': 0.7291748055269496, 'p-value': 0.0033602885206500148}\n", "Rule dd0ebdad-a5c9-468a-a20a-fb89653b8706\n", "{'p': 128, 'n': 41, 'P': 211, 'N': 295, 'p_unique': 128, 'n_unique': 128, 'support': 169, 'conditions_count': 5, 'y_covered_avg': 19.96568047337278, 'y_covered_median': 19.9, 'y_covered_min': 11.8, 'y_covered_max': 36.2, 'mae': 6.62361952428842, 'rmse': 9.539899962247627, 'mape': 0.3215163752153801, 'p-value': 3.3630561612817378e-43}\n", "Rule 44f26c78-8a18-491b-b123-5062719056d8\n", "{'p': 49, 'n': 6, 'P': 144, 'N': 362, 'p_unique': 49, 'n_unique': 49, 'support': 55, 'conditions_count': 6, 'y_covered_avg': 16.220000000000002, 'y_covered_median': 16.0, 'y_covered_min': 7.0, 'y_covered_max': 30.7, 'mae': 8.08403162055336, 'rmse': 11.147693924839217, 'mape': 0.3367071061321477, 'p-value': 1.9183487774190778e-15}\n", "Rule 2bb7b89b-b885-4944-b1b8-1648064d4f13\n", "{'p': 23, 'n': 8, 'P': 142, 'N': 364, 'p_unique': 23, 'n_unique': 23, 'support': 31, 'conditions_count': 9, 'y_covered_avg': 20.64516129032258, 'y_covered_median': 20.6, 'y_covered_min': 16.1, 'y_covered_max': 24.5, 'mae': 6.547303327808237, 'rmse': 9.3799125758053, 'mape': 0.3286167130929514, 'p-value': 5.067990050751164e-15}\n", "Rule 2d879e71-b475-4cc4-9a5f-f525a2f927d9\n", "{'p': 27, 'n': 7, 'P': 109, 'N': 397, 'p_unique': 27, 'n_unique': 27, 'support': 34, 'conditions_count': 5, 'y_covered_avg': 15.358823529411763, 'y_covered_median': 15.05, 'y_covered_min': 7.0, 'y_covered_max': 23.7, 'mae': 8.5747733085329, 'rmse': 11.656997267512827, 'mape': 0.3491076881259912, 'p-value': 1.4991455332547193e-11}\n" ] } ], "source": [ "metrics = ruleset.calculate_rules_metrics(X, y)\n", "for rule_id, metrics in metrics.items():\n", " print('Rule', rule_id)\n", " print(metrics)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `calculate_ruleset_stats` function returns some general statistics regarding the rules present in the rule set." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'rules_count': 17, 'avg_conditions_count': 5.12, 'avg_precision': 0.82, 'avg_coverage': 0.45, 'total_conditions_count': 87}\n" ] } ], "source": [ "general_stats = ruleset.calculate_ruleset_stats()\n", "print(general_stats)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use `calculate_condition_importances` and `calculate_attribute_importances` to find the importances of conditions in the rules and consequently the importances of attributes in the data set." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'condition': 'NOX >= 0.66',\n", " 'attributes': ['NOX'],\n", " 'importance': 0.41219564492890814},\n", " {'condition': 'CRIM >= 6.84',\n", " 'attributes': ['CRIM'],\n", " 'importance': 0.3463070344654806},\n", " {'condition': 'RM >= 7.48',\n", " 'attributes': ['RM'],\n", " 'importance': 0.28703662615958525},\n", " {'condition': 'CRIM >= 6.99',\n", " 'attributes': ['CRIM'],\n", " 'importance': 0.2581918553894471},\n", " {'condition': 'RM >= 7.42',\n", " 'attributes': ['RM'],\n", " 'importance': 0.21428470226221466},\n", " {'condition': 'LSTAT < 7.96',\n", " 'attributes': ['LSTAT'],\n", " 'importance': 0.1841743175107864},\n", " {'condition': 'RM >= 7.26',\n", " 'attributes': ['RM'],\n", " 'importance': 0.17189228602638446},\n", " {'condition': 'LSTAT < 9.92',\n", " 'attributes': ['LSTAT'],\n", " 'importance': 0.1597209591117324},\n", " {'condition': 'LSTAT < 10.14',\n", " 'attributes': ['LSTAT'],\n", " 'importance': 0.13977988686201942},\n", " {'condition': 'LSTAT >= 14.43',\n", " 'attributes': ['LSTAT'],\n", " 'importance': 0.13474623065608687},\n", " {'condition': 'CRIM < 0.57',\n", " 'attributes': ['CRIM'],\n", " 'importance': 0.13236275435336914},\n", " {'condition': 'CRIM >= 13.08',\n", " 'attributes': ['CRIM'],\n", " 'importance': 0.12967575688395577},\n", " {'condition': 'LSTAT >= 14.40',\n", " 'attributes': ['LSTAT'],\n", " 'importance': 0.12762735615087725},\n", " {'condition': 'RM >= 7.08',\n", " 'attributes': ['RM'],\n", " 'importance': 0.11911032403269536},\n", " {'condition': 'LSTAT < 9.10',\n", " 'attributes': ['LSTAT'],\n", " 'importance': 0.11608645865167616},\n", " {'condition': 'CRIM >= 11.34',\n", " 'attributes': ['CRIM'],\n", " 'importance': 0.10957886720572627},\n", " {'condition': 'RM < 6.44',\n", " 'attributes': ['RM'],\n", " 'importance': 0.10721755983765796},\n", " {'condition': 'CRIM >= 6.60',\n", " 'attributes': ['CRIM'],\n", " 'importance': 0.107037596978978},\n", " {'condition': 'LSTAT < 6.25',\n", " 'attributes': ['LSTAT'],\n", " 'importance': 0.09987681138116272},\n", " {'condition': 'CRIM < 0.59',\n", " 'attributes': ['CRIM'],\n", " 'importance': 0.0972942505053715},\n", " {'condition': 'NOX >= 0.66',\n", " 'attributes': ['NOX'],\n", " 'importance': 0.09443159057752622},\n", " {'condition': 'RM < 7.17',\n", " 'attributes': ['RM'],\n", " 'importance': 0.08864968911934279},\n", " {'condition': 'LSTAT >= 9.43',\n", " 'attributes': ['LSTAT'],\n", " 'importance': 0.08830877842262368},\n", " {'condition': 'LSTAT >= 20.70',\n", " 'attributes': ['LSTAT'],\n", " 'importance': 0.08364177051884032},\n", " {'condition': 'PTRATIO < 17.50',\n", " 'attributes': ['PTRATIO'],\n", " 'importance': 0.0812812354219555},\n", " {'condition': 'AGE < 82.95',\n", " 'attributes': ['AGE'],\n", " 'importance': 0.07470814962300586},\n", " {'condition': 'RM >= 6.43',\n", " 'attributes': ['RM'],\n", " 'importance': 0.07395685376026782},\n", " {'condition': 'CRIM < 7.53',\n", " 'attributes': ['CRIM'],\n", " 'importance': 0.06852878406567027},\n", " {'condition': 'LSTAT < 14.14',\n", " 'attributes': ['LSTAT'],\n", " 'importance': 0.06500338170219208},\n", " {'condition': 'CRIM < 7.90',\n", " 'attributes': ['CRIM'],\n", " 'importance': 0.06484694679647318},\n", " {'condition': 'RM < 6.95',\n", " 'attributes': ['RM'],\n", " 'importance': 0.06142012277300351},\n", " {'condition': 'CRIM < 6.99',\n", " 'attributes': ['CRIM'],\n", " 'importance': 0.06072283131891407},\n", " {'condition': 'DIS < 1.89',\n", " 'attributes': ['DIS'],\n", " 'importance': 0.05913420105949188},\n", " {'condition': 'CHAS < 0.50',\n", " 'attributes': ['CHAS'],\n", " 'importance': 0.05575023428251599},\n", " {'condition': 'B < 368.37',\n", " 'attributes': ['B'],\n", " 'importance': 0.052711629950983545},\n", " {'condition': 'LSTAT >= 10.54',\n", " 'attributes': ['LSTAT'],\n", " 'importance': 0.052079922998162276},\n", " {'condition': 'RM < 6.43',\n", " 'attributes': ['RM'],\n", " 'importance': 0.048826126018804365},\n", " {'condition': 'AGE >= 84.50',\n", " 'attributes': ['AGE'],\n", " 'importance': 0.04776356065683929},\n", " {'condition': 'PTRATIO < 18.50',\n", " 'attributes': ['PTRATIO'],\n", " 'importance': 0.046696713324452715},\n", " {'condition': 'DIS >= 1.50',\n", " 'attributes': ['DIS'],\n", " 'importance': 0.043589660805792366},\n", " {'condition': 'B >= 363.17',\n", " 'attributes': ['B'],\n", " 'importance': 0.0433463463241028},\n", " {'condition': 'ZN < 92.50',\n", " 'attributes': ['ZN'],\n", " 'importance': 0.04309189275034518},\n", " {'condition': 'INDUS >= 7.17',\n", " 'attributes': ['INDUS'],\n", " 'importance': 0.04185396665287753},\n", " {'condition': 'CRIM < 43.64',\n", " 'attributes': ['CRIM'],\n", " 'importance': 0.04018867408245981},\n", " {'condition': 'RM < 7.28',\n", " 'attributes': ['RM'],\n", " 'importance': 0.03903537520801033},\n", " {'condition': 'RM < 7.14',\n", " 'attributes': ['RM'],\n", " 'importance': 0.038817813173326096},\n", " {'condition': 'TAX >= 222.50',\n", " 'attributes': ['TAX'],\n", " 'importance': 0.03358080804623055},\n", " {'condition': 'CRIM < 7.25',\n", " 'attributes': ['CRIM'],\n", " 'importance': 0.029954141206563537},\n", " {'condition': 'B >= 363.19',\n", " 'attributes': ['B'],\n", " 'importance': 0.029321745968898253},\n", " {'condition': 'RM < 8.35',\n", " 'attributes': ['RM'],\n", " 'importance': 0.026316696161776582},\n", " {'condition': 'CRIM < 32.15',\n", " 'attributes': ['CRIM'],\n", " 'importance': 0.02330906239533667},\n", " {'condition': 'RM < 8.32',\n", " 'attributes': ['RM'],\n", " 'importance': 0.018934742866803526},\n", " {'condition': 'RM >= 5.41',\n", " 'attributes': ['RM'],\n", " 'importance': 0.018498186505621655},\n", " {'condition': 'CRIM >= 0.02',\n", " 'attributes': ['CRIM'],\n", " 'importance': 0.017740808626785134},\n", " {'condition': 'B >= 86.04',\n", " 'attributes': ['B'],\n", " 'importance': 0.01698260625974729},\n", " {'condition': 'TAX >= 223.50',\n", " 'attributes': ['TAX'],\n", " 'importance': 0.015646623545917445},\n", " {'condition': 'DIS >= 1.37',\n", " 'attributes': ['DIS'],\n", " 'importance': 0.014608935287077046},\n", " {'condition': 'NOX < 0.72',\n", " 'attributes': ['NOX'],\n", " 'importance': 0.014313462697020378},\n", " {'condition': 'CRIM >= 0.04',\n", " 'attributes': ['CRIM'],\n", " 'importance': 0.01426262284060611},\n", " {'condition': 'LSTAT >= 3.15',\n", " 'attributes': ['LSTAT'],\n", " 'importance': 0.013348883895085673},\n", " {'condition': 'CRIM >= 0.01',\n", " 'attributes': ['CRIM'],\n", " 'importance': 0.01306363096456618},\n", " {'condition': 'AGE >= 28.25',\n", " 'attributes': ['AGE'],\n", " 'importance': 0.011369879785042732},\n", " {'condition': 'RM >= 5.38',\n", " 'attributes': ['RM'],\n", " 'importance': 0.011358876636887215},\n", " {'condition': 'DIS >= 1.18',\n", " 'attributes': ['DIS'],\n", " 'importance': 0.0110300559444215},\n", " {'condition': 'CRIM < 30.18',\n", " 'attributes': ['CRIM'],\n", " 'importance': 0.010494744135007401},\n", " {'condition': 'AGE >= 16.35',\n", " 'attributes': ['AGE'],\n", " 'importance': 0.009804987963755939},\n", " {'condition': 'CRIM >= 0.13',\n", " 'attributes': ['CRIM'],\n", " 'importance': 0.00925020629003368},\n", " {'condition': 'DIS < 2.76',\n", " 'attributes': ['DIS'],\n", " 'importance': 0.009243016697363543},\n", " {'condition': 'B < 396.55',\n", " 'attributes': ['B'],\n", " 'importance': 0.008241849524085656},\n", " {'condition': 'AGE >= 28.00',\n", " 'attributes': ['AGE'],\n", " 'importance': 0.007296220587578672},\n", " {'condition': 'RM >= 4.33',\n", " 'attributes': ['RM'],\n", " 'importance': 0.005964485712736271},\n", " {'condition': 'INDUS >= 1.34',\n", " 'attributes': ['INDUS'],\n", " 'importance': 0.0035914514368505924},\n", " {'condition': 'LSTAT < 33.01',\n", " 'attributes': ['LSTAT'],\n", " 'importance': 0.003345159046207844},\n", " {'condition': 'DIS < 9.06',\n", " 'attributes': ['DIS'],\n", " 'importance': 0.003239267344266099},\n", " {'condition': 'B >= 6.50',\n", " 'attributes': ['B'],\n", " 'importance': 0.003128072049983466},\n", " {'condition': 'NOX >= 0.40',\n", " 'attributes': ['NOX'],\n", " 'importance': 0.0006305615911450331},\n", " {'condition': 'INDUS < 20.73',\n", " 'attributes': ['INDUS'],\n", " 'importance': 0.00029380923899731247},\n", " {'condition': 'AGE >= 24.45',\n", " 'attributes': ['AGE'],\n", " 'importance': -0.0004797895116285027},\n", " {'condition': 'INDUS < 18.84',\n", " 'attributes': ['INDUS'],\n", " 'importance': -0.0008170150896906769},\n", " {'condition': 'DIS < 3.97',\n", " 'attributes': ['DIS'],\n", " 'importance': -0.0059103572995956415}]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "condition_importances = ruleset.calculate_condition_importances(X, y, measure=measures.c2)\n", "display(condition_importances)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'CRIM': 1.5328105685047448,\n", " 'RM': 1.3313204662551184,\n", " 'LSTAT': 1.2677399169074535,\n", " 'NOX': 0.5215712597945998,\n", " 'B': 0.15373225007780103,\n", " 'AGE': 0.150463009104594,\n", " 'DIS': 0.13493477983881683,\n", " 'PTRATIO': 0.12797794874640822,\n", " 'CHAS': 0.05575023428251599,\n", " 'TAX': 0.049227431592148,\n", " 'INDUS': 0.04492221223903476,\n", " 'ZN': 0.04309189275034518}" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "attribute_importances = ruleset.calculate_attribute_importances(condition_importances)\n", "display(attribute_importances)" ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 2 }