python实现

数据集来源《机器学习实战》:https://github.com/apachecn/MachineLearning/blob/master/input/3.DecisionTree/lenses.txtarrow-up-right

young    myope    no    reduced    no lenses
young    myope    no    normal    soft
young    myope    yes    reduced    no lenses
young    myope    yes    normal    hard
young    hyper    no    reduced    no lenses
young    hyper    no    normal    soft
young    hyper    yes    reduced    no lenses
young    hyper    yes    normal    hard
pre    myope    no    reduced    no lenses
pre    myope    no    normal    soft
pre    myope    yes    reduced    no lenses
pre    myope    yes    normal    hard
pre    hyper    no    reduced    no lenses
pre    hyper    no    normal    soft
pre    hyper    yes    reduced    no lenses
pre    hyper    yes    normal    no lenses
presbyopic    myope    no    reduced    no lenses
presbyopic    myope    no    normal    no lenses
presbyopic    myope    yes    reduced    no lenses
presbyopic    myope    yes    normal    hard
presbyopic    hyper    no    reduced    no lenses
presbyopic    hyper    no    normal    soft
presbyopic    hyper    yes    reduced    no lenses
presbyopic    hyper    yes    normal    no lenses

数据集的特征从左到右为 ['age', 'prescript', 'astigmatic', 'tearRate'],最后一个为类别,类别包含三种类型['no lenses', 'hard', 'soft' ]。

基本思路:构建一个树形结构的输出,选择信息增益最大的特征作为节点,非叶子节点都是特征,节点上的特征将数据集分成各个子集,然后再递归。

如果画成树形结构:

树形结构代码参考:https://github.com/apachecn/MachineLearning/blob/master/src/py3.x/3.DecisionTree/decisionTreePlot.pyarrow-up-right

Last updated