{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Generating rules using the CN2 algorithm\n", "\n", "In this tutorial we will learn how to integrate decision rules with other rule induction\n", "packages. We will cover multiple topics such as:\n", "* creating decision-rules rulesets programiclly,\n", "* writing custom rule quality measures,\n", "* programmatic creation of decision rule sets.\n", "\n", "We will use the implementation of the CN2 algorithm provided by the [Orange](https://orange3.readthedocs.io/projects/orange-data-mining-library/en/latest/) package. In this tutorial\n", "we will use the popular [titanic](https://www.kaggle.com/c/titanic/data) dataset. Later, we will write a custom rule set factory class that will transform an instance of the Orange rule set to an instance of the `decision_rules.classification.ClassificationRuleSet` class. Finally, we will briefly introduce the various operations that can be performed using such an object. \n", "\n", "Custom classes and methods presented in tutorial are already implemented and available in the **decision-rules** package, but this tutorial will teach you how to implement them yourself. This knowledge will enable you to add your own custom factories, quality measures, or prediction strategies in the future if needed.\n", "\n", "We will start by loading the [titanic](https://www.kaggle.com/c/titanic/data) dataset. We \n", "will use the `submodule` of Orange data here." ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | Column type | \n", "Column name | \n", "Data type | \n", "Possible values | \n", "
|---|---|---|---|---|
| 0 | \n", "feature | \n", "status | \n", "discrete | \n", "(crew, first, second, third) | \n", "
| 1 | \n", "feature | \n", "age | \n", "discrete | \n", "(adult, child) | \n", "
| 2 | \n", "feature | \n", "sex | \n", "discrete | \n", "(female, male) | \n", "
| 3 | \n", "label | \n", "survived | \n", "discrete | \n", "(no, yes) | \n", "
| \n", " | status | \n", "age | \n", "sex | \n", "
|---|---|---|---|
| _o8804 | \n", "1.0 | \n", "0.0 | \n", "1.0 | \n", "
| _o8805 | \n", "1.0 | \n", "0.0 | \n", "1.0 | \n", "
| _o8806 | \n", "1.0 | \n", "0.0 | \n", "1.0 | \n", "
| _o8807 | \n", "1.0 | \n", "0.0 | \n", "1.0 | \n", "
| _o8808 | \n", "1.0 | \n", "0.0 | \n", "1.0 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "
| _o11000 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
| _o11001 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
| _o11002 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
| _o11003 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
| _o11004 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
2201 rows × 3 columns
\n", "