PostgreSQL-关联规则的纯SQL实现
该SQL是基于Apriori算法的一个尝试(代码比较粗糙),并且只实现了一推一的情况,对照MadLib的结果做了验证,没有问题。
CREATE OR REPLACE FUNCTION assoc.rules() RETURNS void LANGUAGE plpgsql AS $function$ begin /* create view assoc.trans as select * from trans where purchase_date between '2015-08-01' and '2016-07-31'; */ create TEMP table assoc_customer_brands ON commit drop as select distinct on(customer_id, brand) customer_id, brand from assoc.trans; create TEMP TABLE assoc_customer_brands_0 ON commit drop as SELECT a.customer_id,a.brand _from, b.brand _to from assoc_customer_brands a,assoc_customer_brands b where a.customer_id=b.customer_id and a.brand<>b.brand; create TEMP TABLE assoc_customer_brands_1 ON commit drop as select _from, _to, count(*) as count from assoc_customer_brands_0 GROUP BY 1,2; CREATE TEMP TABLE assoc_customer_brands_2 ON commit drop as SELECT brand, count(*) cnt from assoc_customer_brands GROUP BY 1; DROP TABLE IF EXISTS assoc_customer_brands_3; CREATE TEMP table assoc_customer_brands_3 ON commit drop as SELECT count(DISTINCT customer_id) as total from assoc_customer_brands; -- EXP_CONF=rhs_count/total; -- LIFT= CONF/EXP_CONF -- CONF=COUNT/lhs_count -- SUPPORT= COUNT/total DROP TABLE IF EXISTS assoc.rules; CREATE table assoc.rules as SELECT r.cnt::DOUBLE PRECISION/total as exp_conf, a.count::DOUBLE PRECISION/l.cnt as conf , a.count::DOUBLE PRECISION/total as support, (a.count::DOUBLE PRECISION*total)/(l.cnt*r.cnt) lift, total , a.count, a._from||'==>'||a._to as rule, a._from _lhand, a._to _rhand , a._from item1 , '====================>'::text item2 , a._to item3 , NULL::text item4 from assoc_customer_brands_1 a ,assoc_customer_brands_2 l ,assoc_customer_brands_2 r ,assoc_customer_brands_3 t where a._from=l.brand and a._to=r.brand ; end; $function$
关于transaction数据的准备,customer_id 可以是order_id(购物篮), 也可以基于customer_id, brand可以是品牌,产品,品类中的一种,或者品类和品牌的组合(如果需要的话,比如category||’~’||brand)。
结果,可以根据需要对confidence, support和lift进行筛选。
Todo,后续可能会依据PostgreSQL的Array来把所有可能的子集都查出来,然后生成完整的关联规则。