如何用R与PostgreSQL进行数据挖掘之关联规则
前面用PostgreSQL 函数实现了一个简易版的关联规则算法,今天尝试下R语言的关联规则包“arules”中的apriori算法。
- 连接数据库并读取数据
library(RPostgreSQL) drv <- dbDriver("PostgreSQL") con <- dbConnect(drv, user='postgres', dbname='steven', password='', host='127.0.0.1') rs <- dbSendQuery(con,"select customer_id,brand from trans;") results <- fetch(rs,n=-1) trans表的结构和数据示例如下, CREATE TABLE public.maoye ( customer_id text NULL, last_brand text NULL, ... ); customer_id,brand 090534,匡威 090534,阿迪达斯 090534,森达 090573,他她 090573,匡威 ...
- 安装并加载arules包
install.packages("arules") library("arules")
- 转data.frame为transaction
transactions <- as( split(as.vector(results$brand), as.vector(results$customer_id)), "transactions" )
- 分析(设定support,confidence等规则)
rules <- apriori(transactions, parameter = list(support = 0.1, confidence = 0.6, maxlen = 2, minlen=2))
- 对结果进行分析,或存入数据库
#分析结果 inspect(rules) #结果转data.frame,并存入数据库 associations<-as(rules,"data.frame") dbWriteTable(con,c("public","arules"),associations)