现有一份数据集,包含专家对于是否可以使用隐形眼镜的诊断记录(来自《数据挖掘》),尝试用R语言实现规则的提取。
构造
> spectacle = factor(rep(c(rep("myope",4),rep("hypermetrop",3)),3))> age = factor(c(rep("young",8),rep("pre-presbyopic",8),rep("presbyopic",8)))> spectacle = factor(rep(c(rep("myope",4),rep("hypermetrop",4)),3))> astimatism = factor(rep(c("no","no","yes","yes"),6))> tear = factor(rep(c("reduced","normal"),12))> recommended = factor(c("none","soft","none","hard","none","soft","none","hard","none", "soft","none","hard","none","soft","none","none","none","none", "none","hard","none","soft","none","none"))> df <- data.frame(age,spectacle,astimatism,tear,recommended)
规则产生
> model <- rpart(formula = recommended ~.,data = df2)> summary(model)Call:rpart(formula = recommended ~ ., data = df2) n= 24 CP nsplit rel error xerror xstd1 0.2222222 0 1.0000000 1.000000 0.26352312 0.0100000 1 0.7777778 1.333333 0.2721655Variable importancetear 100 Node number 1: 24 observations, complexity param=0.2222222 predicted class=none expected loss=0.375 P(node) =1 class counts: 4 15 5 probabilities: 0.167 0.625 0.208 left son=2 (12 obs) right son=3 (12 obs) Primary splits: tear splits as RL, improve=5.0833330, (0 missing) astimatism splits as RL, improve=1.7500000, (0 missing) age splits as RRL, improve=0.2916667, (0 missing) spectacle splits as RL, improve=0.2500000, (0 missing)Node number 2: 12 observations predicted class=none expected loss=0 P(node) =0.5 class counts: 0 12 0 probabilities: 0.000 1.000 0.000 Node number 3: 12 observations predicted class=soft expected loss=0.5833333 P(node) =0.5 class counts: 4 3 5 probabilities: 0.333 0.250 0.417
可视化
> par(xpd = TRUE)> plot(model)> text(model)
算法C5.0的统计汇总
Call:C5.0.formula(formula = recommended ~ ., data = df2)C5.0 [Release 2.07 GPL Edition] Mon Mar 09 14:47:09 2015-------------------------------Class specified by attribute `outcome'Read 24 cases (5 attributes) from undefined.dataDecision tree:tear = reduced: none (12)tear = normal::...astimatism = no: soft (6/1) astimatism = yes: hard (6/2)Evaluation on training data (24 cases): Decision Tree ---------------- Size Errors 3 3(12.5%) << (a) (b) (c) <-classified as ---- ---- ---- 4 (a): class hard 2 12 1 (b): class none 5 (c): class soft Attribute usage: 100.00% tear 50.00% astimatismTime: 0.0 secs
发现影响医生决策佩戴隐形眼镜后泪腺分泌是否增多。