R语言关联规则压力测试-arules
前文说到如何用R与PostgreSQL进行数据挖掘之关联规则, 下面使用真实数据使用Apriori算法做个压力测试(系统配置,Windows 2008 64-bit,SSD,128G内存),620items, 163763 transactions。mini confidence和mini support均选择0.00001(选择这么低并没有意义),minlen=2,maxlen=5,输出规则高达3亿5千万之多,现实rule占用16.6G。
Apriori Parameter specification: confidence minval smax arem aval originalSupport support minlen maxlen target ext 1e-05 0.1 1 none FALSE TRUE 1e-05 2 5 rules FALSE Algorithmic control: filter tree heap memopt load sort verbose 0.1 TRUE TRUE FALSE TRUE 2 TRUE Absolute minimum support count: 1 Warning in apriori(transactions, parameter = list(support = 1e-05, confidence = 1e-05, : You chose a very low absolute support count of 1. You might run out of memory! Increase minimum support. set item appearances ...[0 item(s)] done [0.00s]. set transactions ...[620 item(s), 163763 transaction(s)] done [0.06s]. sorting and recoding items ... [614 item(s)] done [0.01s]. creating transaction tree ... done [0.07s]. checking subsets of size 1 2 3 4 5 done [37.09s]. writing ... [350487111 rule(s)] done [91.73s]. creating S4 object ... done [137.37s].
接下来把maxlen增大到6,报内存不足,失败
Apriori Parameter specification: confidence minval smax arem aval originalSupport support minlen maxlen target ext 1e-05 0.1 1 none FALSE TRUE 1e-05 2 6 rules FALSE Algorithmic control: filter tree heap memopt load sort verbose 0.1 TRUE TRUE FALSE TRUE 2 TRUE Absolute minimum support count: 1 Warning in apriori(transactions, parameter = list(support = 1e-05, confidence = 1e-05, : You chose a very low absolute support count of 1. You might run out of memory! Increase minimum support. set item appearances ...[0 item(s)] done [0.00s]. set transactions ...[620 item(s), 163763 transaction(s)] done [0.06s]. sorting and recoding items ... [614 item(s)] done [0.01s]. creating transaction tree ... done [0.08s]. checking subsets of size 1 2 3 4 5 6 done [85.17s]. writing ... Error in apriori(transactions, parameter = list(support = 1e-05, confidence = 1e-05, : not enough memory. Increase minimum support!
同样的数据集,用SAS Enterprise Miner Workstation 13.2测试,失败,代码如下,
libname datapath "E:\lib\mba\data"; data mba; set datapath.customer_brands; run; /********************关联分析****************/ proc dmdb batch data=mba out=dmassoc dmdbcat=catassoc; id customer_id ; class brand(desc); run; proc assoc data=mba dmdbcat=catassoc out=datassoc(label='Output from Proc Assoc') items=6 support=1; cust customer_id; target brand; run;