R语言关联规则压力测试-arules
前文说到如何用R与PostgreSQL进行数据挖掘之关联规则, 下面使用真实数据使用Apriori算法做个压力测试(系统配置,Windows 2008 64-bit,SSD,128G内存),620items, 163763 transactions。mini confidence和mini support均选择0.00001(选择这么低并没有意义),minlen=2,maxlen=5,输出规则高达3亿5千万之多,现实rule占用16.6G。


Apriori
Parameter specification:
confidence minval smax arem aval originalSupport support minlen maxlen target ext
1e-05 0.1 1 none FALSE TRUE 1e-05 2 5 rules FALSE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 1
Warning in apriori(transactions, parameter = list(support = 1e-05, confidence = 1e-05, :
You chose a very low absolute support count of 1. You might run out of memory! Increase minimum support.
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[620 item(s), 163763 transaction(s)] done [0.06s].
sorting and recoding items ... [614 item(s)] done [0.01s].
creating transaction tree ... done [0.07s].
checking subsets of size 1 2 3 4 5 done [37.09s].
writing ... [350487111 rule(s)] done [91.73s].
creating S4 object ... done [137.37s].
接下来把maxlen增大到6,报内存不足,失败
Apriori
Parameter specification:
confidence minval smax arem aval originalSupport support minlen maxlen target ext
1e-05 0.1 1 none FALSE TRUE 1e-05 2 6 rules FALSE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 1
Warning in apriori(transactions, parameter = list(support = 1e-05, confidence = 1e-05, :
You chose a very low absolute support count of 1. You might run out of memory! Increase minimum support.
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[620 item(s), 163763 transaction(s)] done [0.06s].
sorting and recoding items ... [614 item(s)] done [0.01s].
creating transaction tree ... done [0.08s].
checking subsets of size 1 2 3 4 5 6 done [85.17s].
writing ...
Error in apriori(transactions, parameter = list(support = 1e-05, confidence = 1e-05, :
not enough memory. Increase minimum support!
同样的数据集,用SAS Enterprise Miner Workstation 13.2测试,失败,代码如下,
libname datapath "E:\lib\mba\data"; data mba; set datapath.customer_brands; run; /********************关联分析****************/ proc dmdb batch data=mba out=dmassoc dmdbcat=catassoc; id customer_id ; class brand(desc); run; proc assoc data=mba dmdbcat=catassoc out=datassoc(label='Output from Proc Assoc') items=6 support=1; cust customer_id; target brand; run;
