基于粗糙用户聚类的协同过滤推荐模型

引用本文

王晓耘, 钱璐, 黄时友. 基于粗糙用户聚类的协同过滤推荐模型. 31(1): 45-51
Wang Xiaoyun, Qian Lu, Huang Shiyou. Collaborative Filtering Recommendation Model Based on Rough User Clustering. New Technology of Library and Information Service, 31(1): 45-51 复制到剪切板

Permissions

《现代图书情报技术》编辑部

基于粗糙用户聚类的协同过滤推荐模型

王晓耘, 钱璐, 黄时友

杭州电子科技大学管理学院杭州 310012

通讯作者:钱璐, ORCID: 0000-0001-6025-3665, E-mail:15158113182@163.com。

作者贡献声明：

王晓耘, 钱璐: 提出研究思路, 设计研究方案;

黄时友: 采集和分析数据, 进行实验;

钱璐: 论文起草;

王晓耘: 论文最终版本修订。

基金:本文系杭州电子科技大学研究生科研创新基金项目“基于粗糙集的协同过滤推荐算法改进及应用”(项目编号:KYCX2013JJ028)的研究成果之一;

摘要

【目的】将粗糙集引入到基于用户聚类的协同过滤中, 提高推荐质量。【方法】提出一种基于粗糙用户聚类的协同过滤推荐模型: 离线时采用粗糙K-means用户聚类算法, 根据用户与聚类中心的相似度将其分配到K个类的上、下近似中, 形成用户的初始近邻集; 在线时从目标用户的初始近邻集中搜索其最近邻, 预测项目评分并向其产生推荐。【结果】通过实验对比发现,该模型比传统的和基于项目的协同过滤推荐算法降低约14%的平均绝对误差, 比基于用户聚类的协同过滤推荐算法降低约10%的平均误差。【局限】在考虑上、下近似对聚类中心调整的重要程度时, 忽略了用户聚类数目和最近邻集用户数阈值的变化所产生的影响。【结论】该模型能有效提高推荐精度, 具有较强的可行性和现实意义。

关键词: 粗糙集; 用户聚类; 协同过滤; 上下近似

中图分类号:G254

Collaborative Filtering Recommendation Model Based on Rough User Clustering

Wang Xiaoyun, Qian Lu, Huang Shiyou

Management School, Hangzhou Dianzi University, Hangzhou 310012, China

Abstract

[Objective] In order to improve the quality of recommendation, rough set is introduced into collaborative filtering based on user clustering.[Methods] This paper proposes a collaborative filtering recommendation model based on rough user clustering. When off-line, it clusters all users by rough K-means user clustering algorithm, which assigns user to upper or lower approximation based on similarity and thus generates his initial neighbor. When on-line, the model starts searching the nearest neighbor from the target user’s initial neighbor, forecasts his ratings and makes recommendation.[Results] Experimental results show that the proposed model decreases the Mean Absolute Error (MAE) about 14% when compared with traditional and item-based collaborative filtering, and decreases MAE about 10% when compared with collaborative filtering based on user clustering.[Limitations] When considering the importance of upper and lower approximation to adjusting the centroid of cluster, this paper ignores the impact of the number of user clusters and the threshold of the number of nearest neighbors.[Conclusions] This model can effectively improve recommendation accuracy, and has high feasibility and practical significance.

Keyword: Rough set; User clustering; Collaborative filtering; Upper or lower approximation

Show Figures

1 引言

面对互联网上大量的商品信息, 用户(消费者)往往难以快捷地获取自己最感兴趣的商品, 他们很希望电子商务系统具有一种类似采购助手的功能来帮助其选购商品, 将其最可能感兴趣的商品推荐出来^[1]。在这种背景和需求下, 推荐系统应运而生, 其中协同过滤推荐是目前研究最多且应用最为成功的一种^{[2, 3]}。协同过滤通常分为基于用户(User-Based)和基于项目(Item- Based)的协同过滤两种, 前者通过计算用户之间的相似度, 利用与目标用户相似度较高的邻居对其他产品的评价来预测目标用户对特定产品的喜好程度, 据此对其进行推荐^[4]。它是基于这样一个假设, 即如果用户对一些项目的评分比较相似, 则他们对其他项目的评分也比较相似。然而随着用户和项目数量的急剧增加, 传统的协同过滤面临着一些严峻的问题, 比如数据稀疏性^[5]、算法的可扩展性^[6]、冷启动^[7]和推荐的实时性(推荐速度)^[8]等。这些问题直接或间接导致了推荐精度降低, 推荐质量急剧下降。

针对这些问题, 研究者提出了许多改进方法。其中, 聚类技术经常与协同过滤组合在一起^{[9, 10]}。基于用户聚类的协同过滤推荐算法^[11]利用用户对项目评分的相似性对用户进行聚类, 相似度较大的用户处于同一用户聚类中。当目标用户出现时, 在其所处的用户聚类中搜索最近邻, 大大降低了搜索空间, 提高了推荐速度和算法的可扩展性, 并在一定程度上缓解了稀疏性问题^[12]。因此该算法得到了广泛的运用, 但仍存在两个问题: 处于聚类边缘的用户与聚类中心的相似度较低, 算法对该用户的推荐精度会比较低^[13]; 该算法中用户只存在于一个固定的用户聚类中, 不符合现实中消费者往往属于多个消费群体的现象, 即该算法不能体现用户的多兴趣性, 严重影响推荐精度。目前, 研究者通常采用处理不确定性问题的方法来解决, 不确定聚类^[14]是考虑样本归属关系的不确定性而提出来的一类有效的聚类算法。将其运用于基于用户聚类的协同过滤中, 不是简单地用“ 属于” 或“ 不属于” 来表示用户的归属关系, 而是通过用户归属关系的不确定性表达将用户划分到多个类中, 以解决上述不足, 提高推荐精度。

2 研究背景

不确定聚类被广泛地运用于数据挖掘、机器学习、专家系统等领域中, 其典型代表有模糊聚类和粗糙聚类。Ruspini^[15]最早提出了模糊聚类的概念, 使用隶属度来描述数据对象隶属各个簇的不确定性。粗糙集理论(Rough Set Theory)是由波兰学者Pawlak^[16]在1982年提出的一种刻画不完整性和不确定性等问题的数学工具。Lingras等^[17]首次将粗糙集引入到聚类问题中, 提出了粗糙聚类算法。在计算样本归属关系时通过引入上、下近似的思想, 通过将样本划分到一个簇的上近似或下近似中来描述该样本确定属于或可能属于这个簇, 以提高聚类边界的聚类精度。

许多研究学者尝试将不确定聚类引入推荐系统中。在模糊聚类方面, Verma等^[18]提出一种模糊C-means聚类和协同过滤的混合推荐系统, 以解决稀疏性和可扩展性问题。Birtolo等^[19]提出一种基于项目模糊聚类的协同过滤推荐算法, 实验证明有较高的推荐精度。李华等^[20]提出一种基于用户情景模糊聚类的协同过滤推荐算法, 根据用户情景信息利用模糊聚类算法得到情景相似的用户群分类, 以改善数据稀疏性和实时性问题。王明佳等^[21]利用模糊聚类的方法对项目进行聚类, 结果证明能有效提高冷启动问题下的相似度计算精度。在粗糙聚类方面, Saha等^[22]提出一种粗糙聚类的方法来对用户交易(Transaction)数据进行聚类, 用以分析用户的交易行为并缓解稀疏问题。Tseng等^[23]提出一种基于粗糙集和协同过滤的推荐算法RSCF, 在粗糙集的基础上综合协同信息和内容特征共同预测用户的偏好, 以解决传统协同过滤的不足。Chen等^[24]为解决协同过滤推荐中的稀疏性问题, 运用粗糙集理论对目标用户未评分项目值进行预测填充, 再根据用户项目评分进行模糊用户聚类。杜金涛^[25]提出一种基于粗糙集的协同推荐模型, 将粗糙集理论同时运用于用户聚类和项目值预测填充中, 取得较好的推荐效果。

基于国内外研究发现, 不确定聚类经常用于项目聚类中, 而对用户聚类的研究则相对较少。此外, 模糊聚类应用于推荐系统中的研究很多, 已经比较成熟; 而将粗糙聚类应用于推荐系统中的研究则相对较少, 处于起步阶段。粗糙聚类算法的思想认为是数据对象属性的多态性导致簇边界的不确定性, 本文认为这与用户多兴趣性的描述非常相似, 比模糊聚类更适合应用于用户聚类。因此, 本文将粗糙聚类引入到基于用户聚类的协同过滤中, 拟从以下方面解决协同过滤中用户聚类存在的不足:

(1) 采用用户与聚类中心相似度的绝对差来体现相似度之间的差异性, 从而作为判断用户归属关系的依据。而传统的粗糙聚类算法和文献^[25]提出的算法均使用欧式距离作为用户归属的评判标准, 聚类结果很大程度上受到孤立点的影响^[26], 并且没有考虑相似度之间的差异性, 不能很好应用于协同过滤推荐中。

(2) 提出一种粗糙K-means用户聚类算法, 将相似度差异较大(即归属关系明确)的用户划分到该类的下近似中, 将相似度差异较小(即归属关系不明确)的用户划分到该类的上近似中, 避免了用户处于类边缘的情况。此外, 处于上近似的用户往往属于多个类, 以此来体现用户的多兴趣性。

(3) 将模型设计为离线和在线两个部分。离线时判断用户归属关系, 形成初始近邻集; 在线时在用户所在类的上、下近似中搜索其最近邻, 能够有效缩短搜索的空间和时间, 提高推荐速度。

3 基于粗糙用户聚类的协同过滤推荐模型

针对基于用户聚类的协同过滤的不足, 本文引入粗糙集, 提出一种基于粗糙K-means用户聚类的协同过滤推荐模型, 该模型由离线和在线两个部分组成: 离线时, 依据修正的余弦相似性方法计算用户与聚类中心之间的相似度, 采用粗糙K-means用户聚类算法根据相似度对用户进行粗糙聚类, 将所有用户分配到K个用户聚类的上近似和下近似中, 形成用户的初始近邻集; 在线时, 在目标用户的初始近邻集中搜索其最近邻, 预测目标用户的项目评分并产生Top-N推荐。整个推荐模型如图1所示:

	Figure Option View Download New Window
	图1 粗糙K-means用户聚类的协同过滤推荐模型

3.1 粗糙K-means用户聚类

(1) 粗糙聚类算法

粗糙聚类算法与一般聚类算法的区别在于计算样本归属关系时引入上、下近似的思想, 根据用户与聚类中心之间的相似度, 将确定属于某一类的样本归属到其相应的下近似中, 将不确定属于该类的样本归属到其相应的上近似中。其次, 更新的聚类中心由下近似集合中样本的算术平均与上近似集合中样本的算术平均线性加权而得。Pawlak^[16]给出粗糙集理论中上、下近似的示意图(见图2), Lingras等^[17]给出三个粗糙聚类性质, 均有助于对该算法的理解:

性质1: 一个对象只能属于一个簇的下近似;

性质2: 一个对象如果属于一个簇的下近似, 那么它也属于这个簇的上近似;

性质3: 如果一个对象不属于任何簇的下近似, 那么它至少属于两个簇的上近似。

	Figure Option View Download New Window
	图2 粗糙集上、下近似^[16]

(2) 粗糙用户聚类算法

本文将粗糙K-means算法引入基于用户聚类的协同过滤中, 提出一种粗糙K-means用户聚类算法。该算法根据用户与聚类中心之间的相似度对其进行粗糙聚类, 将用户分配到K个用户聚类的上近似和下近似中, 这样就允许类之间有重叠的现象, 以此体现用户的多兴趣性, 并同时避免了用户处于类边缘的情况。

在该算法中, 用户相似度的计算是算法成功实施的关键因素之一。文献^[27]罗列了5种经常用于协同过滤中衡量用户相似度的方法: 皮尔森相关性(COR)、余弦相似性(COS)、修正的余弦相似性(ACOS)^[28]、限制的皮尔森相关性(CPC)和史匹曼等级相关性(SRC)。本文对这5种方法进行对比发现, ACOS比起COR、CPC这些受到用户共同评分项目数量限制的方法, 更适用于数据稀疏的情况; 并且ACOS通过减去用户的平均评分来考虑不同用户的评分尺度问题, 比COS更具有相关性, 因此可以直接用相似度的绝对差来体现相似度之间的差异, 非常适合用于粗糙用户聚类中作为判断用户归属关系的依据— — 相似度差异较大即归属关系明确, 相似度差异较小即归属关系不明确。因此本文在粗糙K-means用户聚类算法中采用修正的余弦相似性来计算相似度。

(1)

其中, R_{i, c}表示用户i对项目c的评分, 和分别表示用户i和用户j对项目的平均评分, I_{i, j}表示用户i和用户j共同评分的项目集, I_i和I_j分别表示用户i和用户j评分的项目集。

粗糙K-means用户聚类算法的具体步骤如下:

输入: 用户-项目评分矩阵, 用户聚类数目K, 上、下近似集的阈值threshold, 聚类中心调整参数w₁和w_u。

输出: K个由上、下近似集组成的用户聚类, 每个用户所在的类标号。

步骤:

①随机挑选K个用户在n维项目空间上的评分向量c₁, c₂, L , c_K作为初始聚类中心。

②计算用户与聚类中心的相似度

设u为待聚类用户, 采用公式(1)计算u分别与K个聚类中心的相似度, sim(u, c_i)表示用户u与聚类中心c_i的最大相似度, 即。

③计算用户归属关系

相似度之间的差异可以用相似度的绝对差来表示。给定上、下近似集的阈值threshold, 对于集合 , 根据Lingras等^[17]提出的归属关系判断规则可以将所有用户分配到K个用户聚类的上、下近似集中, 并对每个用户所在的类标号。

规则1: 若 , 则 , 且。即用户u与多个聚类中心相似度较大, 则该用户的归属关系不明确, 不属于其中任何一个用户聚类的下近似。并根据性质3, 属于这几个用户聚类的上近似。

规则2: 若 , 则。即用户u与一个聚类中心相似度较大, 与其他的聚类中心相似度较小, 则该用户的归属关系明确, 属于该用户聚类的下近似。并根据性质2, 也属于该用户聚类的上近似。

④调整聚类中心

聚类中心的调整依赖于上、下近似中的用户, 通过确定属于该类的用户向量和可能属于该类的用户向量的算术平均加权组合来得到。具体计算公式^[17]如下。

(2)

其中, 参数w₁和w_u定义了上、下近似对用户聚类的重要程度, w₁+w_u=1。表示U_i下近似中的用户数量, 表示U_i上近似中的用户数量。

⑤重复步骤 ②-步骤④, 直至准则函数收敛。

该离线算法依据用户与聚类中心之间的相似度完成对所有用户的粗糙聚类, 形成用户的初始近邻集, 且每个近邻集由上近似集和下近似集组成。

3.2 搜索目标用户最近邻

当目标用户在线时, 可以通过以下搜索算法从目标用户的初始近邻集中搜索出其最近邻集。

输入: 目标用户, 用户的初始近邻集, 最近邻集用户数阈值N_u。

输出: 目标用户的最近邻集。

步骤(分两种情况):

(1)目标用户属于用户聚类的下近似

①设N为最近邻集用户数。将目标用户所属的下近似集作为其最近邻集。判断 , 成立则结束, 不成立则进入下一步。

②将目标用户所属的上近似集作为其最近邻集。判断 , 成立则结束, 不成立则进入下一步。

③统计目标用户所属上近似集中所有用户的类标号, 将出现频数最高的类中用户加入到目标用户的最近邻集。判断 , 成立则结束, 不成立则进入下一步。

④将出现频数次高的类中用户加入到目标用户的最近邻集。以此类推, 直至成立。

(2)目标用户属于至少两个以上用户聚类的上近似

①设N为最近邻集用户数。将目标用户所属的上近似集合并, 作为其最近邻集。判断 , 成立则结束, 不成立则进入下一步。

②依照情况(1)步骤②和步骤④中的方法进行最近邻集的用户扩充, 直至成立。

通过以上两种情况的搜索算法, 可以输出目标用户的最近邻集。然后依据最近邻用户-项目评分表, 采用平均加权策略来预测目标用户的项目评分, 并最终产生Top-N推荐。根据文献^[4]给出的协同过滤算法框架, 用户v为目标用户u的最近邻, 则u对未评分项目i的预测值为:

(3)

4 实验设计与结果

为测试本文提出的基于粗糙用户聚类的协同过滤推荐模型的性能, 实验平台选用PC机, 配置为Intel(R) Core(TM)2 Duo CPU T7250 @2.00GHz、DDRII 2GB的内存, 操作系统为Windows XP, 算法均在Matlab R2009a中实现。

4.1 数据集

采用MovieLens站点所提供的测试数据集^[29], 该站点由美国明尼苏达大学GroupLens研究项目组创建, 根据用户对电影的评分向其提供电影推荐列表, 被广泛应用于个性化推荐研究中。从GroupLens下载的数据包中含有943个用户对1 682部电影的100 000条评分记录, 并被分为5个互不相交的子数据集, 其中4个合为Base集, 另一个作为Test集。选取前150名用户及其相应的项目评分进行实验, 实验数据情况统计如表1所示, 数据稀疏等级达到92.31%, 极其稀疏。

表1 实验数据统计

4.2 度量标准

采用常用的一种评价推荐系统推荐质量的度量方法— — 平均绝对偏差(Mean Absolute Error, MAE)作为度量标准^{[30, 31]}, 计算预测的用户评分与实际用户评分之间的偏差, MAE值越小, 推荐精度越高。假设预测的用户评分集合为 , 实际评分集合为 , 则MAE可由以下公式^[30]计算:

(4)

4.3 实验结果与分析

(1) 由于参数w₁和_u在更新聚类中心时定义了上、下近似对用户聚类的重要程度, 所以该取值会对推荐精度产生很大的影响。实验1设计w₁的取值从0到1.0, 每次增加0.1(即w_u的取值从1.0到0, 每次减少0.1), 观察MAE值的变化。由于该实验主要用于测试参数w₁和w_u对MAE值的影响, 因此需要控制以下变量: 用户聚类数目K取值7, 最近邻集用户数阈值N_u取值20, 上、下近似集阈值threshold为经验值, 固定为0.05。实验结果如图3所示, 可以明显看出当w₁=0.8, w_u=0.2时推荐效果最好。即下近似中的用户比上近似中的用户在计算聚类中心时有更大的影响作用, 与现实情况也相符。

	Figure Option View Download New Window
	图3 参数w₁和w_u对MAE值的影响

(2) 为了验证本文提出的基于粗糙用户聚类的协同过滤(Collaborative Filtering Based on Rough User Clustering, RUC-CF)的有效性, 笔者进行了实验对比。参与对比试验的算法有: 传统的协同过滤算法 (Traditional Collaborative Filtering, CF)^[32]、Sarwar等^[28]提出的基于项目的协同过滤算法(Item-based Collaborative Filtering, ICF)、李涛等^[11]提出的基于多层相似性用户聚类的协同过滤算法(Clustering Basal Users Based Collaborative Filtering, UCCF)。实验中, 最近邻集用户数阈值N_u从10变化到30, 间隔为5。其他参数设置如表2所示:

表2 实验2参数设置

实验结果如图4所示, 可以看出RUC-CF的平均绝对误差值低于CF、ICF等对比算法14%左右, 低于UCCF对比算法10%左右, 说明本文提出的基于粗糙用户聚类的协同过滤推荐模型的推荐质量要优于其他对比算法, 能有效提高推荐精度。

	Figure Option View Download New Window
	图4 RUC-CF与对比算法的MAE值比较

5 结语

本文提出了基于粗糙K-means用户聚类的协同过滤推荐模型, 粗糙集的引入不仅解决了用户的多兴趣性问题, 同时避免了用户处于聚类边缘的情况。在计算相似度时, 采用修正的余弦相似度的绝对差进行用户归属关系的判断, 比传统粗糙聚类中基于欧式距离的相似度计算更适用于协同过滤。此外, 该模型离线和在线两部分的设计缩小了在线时目标用户最近邻的搜索空间和时间, 能够有效提高推荐速度。最后通过实验证明了该模型能有效提高推荐精度, 具有较高的可行性和现实意义。

本文的不足之处在于:

(1) 在粗糙K-means用户聚类算法中, 随机挑选K个用户的评分向量作为初始聚类中心, 没有考虑不同初始聚类中心的选择对聚类结果造成的影响;

(2) 在实验中测试参数w₁和w_u对MAE值的影响时, 将用户聚类数目K与最近邻集用户数阈值N_u作为定量, 没有验证和考虑该值的变化是否会对实验结果产生影响。

协同过滤推荐是重点研究和应用领域, 随着用户需求水平的不断提高, 推荐算法的研究在不断地发展与完善。下一步的研究重点除了解决以上两个不足之外, 还应确定该模型的应用范围。

参考文献

View Option

[1]	Lu L Y, Medo M, Yeung C H, et al. Recommender Systems[J]. Physics Reports-Review Section of Physics Letters, 2012, 519(1): 1-49. [本文引用:1]
[2]	Breese J S, Hecherman D, Kadie C. Empirical Analysis of Predictive Algorithm for Collaborative Filtering [C]. In: Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence, Madison, USA. San Francisco: Morgan Kaufmann Publishers, 1998: 43-52. [本文引用:1]
[3]	Park D H, Kim H K, Choi I Y, et al. A Literature Review and Classification of Recommender Systems Research[J]. Expert Systems with Applications, 2012, 39(11): 10059-10072. [本文引用:1] [JCR: 1.854]
[4]	Herlocker J L, Konstan J A, Borchers A, et al. An Algorithmic Framework for Performing Collaborative Filtering [C]. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, USA. ACM, 1999: 230-237. [本文引用:2]
[5]	梁昌勇, 李聪, 杨善林. 一种基于Rough集理论的最近邻协同过滤算法[J]. 情报学报, 2009, 28(5): 712-719. Liang Changyong, Li Cong, Yang Shanlin. A Nearest-Neighbor Collaborative Filtering Algorithm Based on Rough Set Theory[J]. Journal of the China Society for Scientific and Technical Information, 2009, 28(5): 712-719. [本文引用:1] [CJCR: 1.1348]
[6]	Takacs G, Pilaszy I, Nemeth B, et al. Scalable Collaborative Filtering Approaches for Large Recommender System[J]. Journal of Machine Learning Research, 2009, 10: 623-656. [本文引用:1] [JCR: 3.42]
[7]	Kim H N, Ji A T, Ha I, et al. Collaborative Filtering Based on Collaborative Tagging for Enhancing the Quality of Recommendation[J]. Electronic Commerce Research and Applications, 2010, 9(1): 73-83. [本文引用:1] [JCR: 1.48]
[8]	Braak P T, Abdullah N, Xu Y. Improving the Performance of Collaborative Filtering Recommender Systems through User Profile Clustering [C]. In: Proceedings of IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies, Milan, Italy. IEEE, 2009: 147-150. [本文引用:1]
[9]	Ungar L H, Foster D P, Andre E, et al. Clustering Methods for Collaborative Filtering [C]. In: Proceedings of 1998 Workshop on Recommender Systems. AAAI Press, 1998: 114-129. [本文引用:1]
[10]	李涛, 王建东, 叶飞跃, 等. 一种基于用户聚类的协同过滤推荐算法[J]. 系统工程与电子技术, 2007, 29(7): 1178-1182. Li Tao, Wang Jiand ong, Ye Feiyue, et al. Collaborative Filtering Recommendation Algorithm Based on Clustering Basal Users[J]. Systems Engineering and Electronics, 2007, 29(7): 1178-1182. [本文引用:1] [CJCR: 0.499]
[11]	李涛, 王建东. 基于多层相似性用户聚类的推荐算法[J]. 南京航空航天大学学报, 2006, 38(6): 717-721. Li Tao, Wang Jiand ong. Clustering Basal Users Based Recommendation Algorithm Using Multiple-Level Similarity[J]. Journal of Nanjing University of Aeronautics& Astronautics, 2006, 38(6): 717-721. [本文引用:2]
[12]	Gong S, Huang C. Employing Fuzzy Clustering to Alleviate the Sparsity Issue in Collaborative Filtering Recommendation Algorithms [C]. In: Proceedings of 2008 International Pre-Olympic Congress on Computer Science. Liverpool, UK: World Academic Press, 2008: 449-454. [本文引用:1]
[13]	Dcshpand c M, Karypis G. Item-based Top-N Recommendation Algorithms[J]. ACM Transactions on Information Systems, 2004, 22(1): 143-177. [本文引用:1] [JCR: 1.07]
[14]	周涛. 具有自适应参数的粗糙K-means聚类算法[J]. 计算机工程与应用, 2010, 46(26): 7-10. Zhou Tao. Adaptive Rough K-means Clustering Algorithm[J]. Computer Engineering and Applications, 2010, 46(26): 7-10. [本文引用:1] [CJCR: 0.457]
[15]	Ruspini E H. A New Approach to Clustering[J]. Information and Control, 1969, 15(1): 22-32. [本文引用:1] [CJCR: 0.669]
[16]	Pawlak Z. Rough Sets[J]. International Journal of Computer and Information Sciences, 1982, 11(5): 341-356. [本文引用:2]
[17]	Lingras P, West J. Interval Set Clustering of Web Users with Rough K-means[J]. Journal of Intelligent Information Systems, 2004, 23(1): 5-16. [本文引用:4] [JCR: 0.833]
[18]	Verma S K, Mittal N, Agarwal B. Hybrid Recommender System Based on Fuzzy Clustering and Collaborative Filtering [C]. In: Proceedings of the 4th International Conference on Computer and Communication Technology, Allahabad, India. IEEE, 2013: 116-120. [本文引用:1]
[19]	Birtolo C, Ronca D. Advances in Clustering Collaborative Filtering by Means of Fuzzy C-means and Trust[J]. Expert Systems with Applications, 2013, 40(17): 6997-7009. [本文引用:1] [JCR: 1.854]
[20]	李华, 张宇, 孙俊华. 基于用户模糊聚类的协同过滤推荐研究[J]. 计算机科学, 2012, 39(12): 83-86. Li Hua, Zhang Yu, Sun Junhua. Research on Collaborative Filtering Recommendation Based on User Fuzzy Clustering[J]. Computer Science, 2012, 39(12): 83-86. [本文引用:1] [CJCR: 0.61]
[21]	王明佳, 韩景倜, 韩松乔. 基于模糊聚类的协同过滤推荐算法[J]. 计算机工程, 2012, 38(24): 50-52. Wang Mingjia, Han Jingti, Han Songqiao. Collaborative Filtering Algorithm Based on Fuzzy Clustering[J]. Computer Engineering, 2012, 38(24): 50-52. [本文引用:1] [CJCR: 0.492]
[22]	Saha A, Das D, Karmakar D, et al. Clustering Customer Transactions: A Rough Set Based Approach [C]. In: Proceedings of the ISCA 22nd International Conference Computers and Their Applications in Industry and Engineering, San Francisco, USA. Cary, North Carolina, USA: ISCA, 2009: 213-218. [本文引用:]
[23]	Tseng V S, Su J H, Wang B W, et al. A Novel Recommendation Method Based on Rough Set and Integrated Feature Mining [C]. In: Proceedings of the 3rd International Conference on Innovative Computing Information and Control, Dalian, China. IEEE, 2008. DOI: DOI:10.1109/ICICIC.2008.612. [本文引用:1]
[24]	Chen D E, Ying Y L, Gong S J. A Collaborative Filtering Algorithm Based on Rough Set and Fuzzy Clustering [C]. In: Proceedings of the 5th International Conference on Fuzzy System and Knowledge Discovery, Shand ong, China. IEEE, 2008: 17-20. [本文引用:1]
[25]	杜金涛. 基于粗糙集的协同推荐模型研究[D]. 杭州: 杭州电子科技大学, 2009. Du Jintao. A Study on Collaborative Recommendation Model Based on Rough Set [D]. Hangzhou: Hangzhou Dianzi University, 2009. [本文引用:2] [CJCR: 0.2486]
[26]	张腾飞, 成龙, 李云. 基于簇内不平衡度量的粗糙K-means聚类算法[J]. 控制与决策, 2013, 28(10): 1479-1484. Zhang Tengfei, Chen Long, Li Yun. Rough K-means Clustering Based on Unbalanced Degree of Cluster[J]. Control and Decision, 2013, 28(10): 1479-1484. [本文引用:1] [CJCR: 0.907]
[27]	Ahn H J. A New Similarity Measure for Collaborative Filtering to Alleviate the New User Cold-Starting Problem[J]. Information Sciences, 2008, 178(1-2): 37-51. [本文引用:1] [JCR: 3.643]
[28]	Sarwar B, Karypis G, Konstan J, et al. Item-based Collaborative Filtering Recommendation Algorithms [C]. In: Proceedings of the 10th International Conference on World Wide Web (WWW’01). ACM, 2001: 285-295. [本文引用:2]
[29]	GroupLens Research. MovieLens Movie Rating Data Set [EB/OL]. [2014-01-16]. http://movielens.umn.edu/login. [本文引用:1]
[30]	Herlocker J L, Konstan J A, Terveen L G, et al. Evaluating Collaborative Filtering Recommender Systems[J]. ACM Transactions on Information Systems, 2004, 22(1): 5-53. [本文引用:2] [JCR: 1.07]
[31]	Jeong B, Lee J, Cho H. Improving Memory-Based Collaborative Filtering via Similarity Updating and Prediction Modulation[J]. Information Sciences, 2010, 180(5): 602-612. [本文引用:1] [JCR: 3.643]
[32]	Resnick P, Iakovou N, Sushak M, et al. GroupLens: An Open Architecture for Collaborative Filtering of Netnews [C]. In: Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work (CSCW’94), Chapel Hill, North Carolina, USA. ACM, 1994: 175-186. [本文引用:1]

2012

0.0

. 2012, 519(1):1-49 DOI:10.1016/j.jss.2011.07.029

Recommender Systems

Abstract To refine user interest profiling, this paper focuses on extending scientific subject ontology via keyword clustering and on improving the accuracy and effectiveness of recommendation of the electronic academic publications in online services. A clustering approach is proposed for domain keywords for the purpose of the subject ontology extension. Based on the keyword clusters, the construction of user interest profiles is presented on a rather fine granularity level. In the construction of user interest profiles, we apply two types of interest profiles: explicit profiles and implicit profiles. The explicit profiles are obtained by relating users’ interest-topic relevance factors to users’ interest measurements of these topics computed by a conventional ontology-based method, and the implicit profiles are acquired on the basis of the correlative relationships among the topic nodes in topic network graphs. Three experiments are conducted which reveal that the uses of the subject ontology extension approach as well as the two types of interest profiles satisfyingly contribute to an improvement in the accuracy of recommendation. Highlights ? We create an ontology extension method through clustering weighted keyword graphs. ? We propose a user interest profiling refinement method. ? User interest profiles consist of explicit and implicit interest profiles. ? Subtle differences between two users’ interests on a topic can be detected.

... 1 引言面对互联网上大量的商品信息, 用户(消费者)往往难以快捷地获取自己最感兴趣的商品, 他们很希望电子商务系统具有一种类似采购助手的功能来帮助其选购商品, 将其最可能感兴趣的商品推荐出来^[1] ...

1998

0.0

... 在这种背景和需求下, 推荐系统应运而生, 其中协同过滤推荐是目前研究最多且应用最为成功的一种^[2,3] ...

2012

1.854

0.0

. , :10059-10072

... 在这种背景和需求下, 推荐系统应运而生, 其中协同过滤推荐是目前研究最多且应用最为成功的一种^[2,3] ...

1999

0.0

... 协同过滤通常分为基于用户(User-Based)和基于项目(Item- Based)的协同过滤两种, 前者通过计算用户之间的相似度, 利用与目标用户相似度较高的邻居对其他产品的评价来预测目标用户对特定产品的喜好程度, 据此对其进行推荐^[4] ...

... 根据文献^[4]给出的协同过滤算法框架, 用户v为目标用户u的最近邻, 则u对未评分项目i的预测值为: ...

2009

0.0

1.1348

. 2009, 28(5):712-719

A Nearest-Neighbor Collaborative Filtering Algorithm Based on Rough Set Theory

目前协同过滤被广泛应用于数字图书馆、电子商务等领域的个性化服务系统.最近邻算法则是最早提出和最主要的协同过滤推荐算法,但用户评分数据稀疏性严重影响推荐质量.针对上述问题,提出了一种基于Rough集理论的最近邻协同过滤算法,以用户评分项并集作为用户相似性计算基础,并将非目标用户区分为无推荐能力和有推荐能力两种类型;对于无推荐能力用户不再计算用户相似性以改善推荐实时性,对于有推荐能力用户则提出一种基于Rough集理论的评分预测方法来填补用户评分项并集中的缺失值,从而降低数据稀疏性.实验结果表明新算法能有效提高推荐质量.

... 然而随着用户和项目数量的急剧增加, 传统的协同过滤面临着一些严峻的问题, 比如数据稀疏性^[5]、算法的可扩展性^[6]、冷启动^[7]和推荐的实时性(推荐速度)^[8]等 ...

2009

3.42

0.0

2010

1.48

0.0

. 2010, 9(1):73-83 DOI:10.1016/j.elerap.2009.08.004

Collaborative Filtering Based on Collaborative Tagging for Enhancing the Quality of Recommendation

Abstract We propose a collaborative filtering method to provide an enhanced recommendation quality derived from user-created tags. Collaborative tagging is employed as an approach in order to grasp and filter users’ preferences for items. In addition, we explore several advantages of collaborative tagging for data sparseness and a cold-start user. These applications are notable challenges in collaborative filtering. We present empirical experiments using a real dataset from del . icio . us . Experimental results show that the proposed algorithm offers significant advantages both in terms of improving the recommendation quality for sparse data and in dealing with cold-start users as compared to existing work.

2009

0.0

1998

0.0

... 其中, 聚类技术经常与协同过滤组合在一起^[9,10] ...

2007

0.0

0.499

. 2007, 29(7):1178-1182

Collaborative Filtering Recommendation Algorithm Based on Clustering Basal Users

为解决传统协同过滤算法在生成推荐时的速度瓶颈问题,提出了一种基于用户聚类的协同过滤推荐算法.该算法将推荐过程分成了离线和在线两个部分.离线时,算法对基本用户数据进行预处理,并对基本用户聚类;在线时,算法利用已有的用户聚类寻找目标用户最近邻居,并产生推荐.实验表明,基于用户聚类的协同过滤推荐算法不仅加快了推荐生成速度,而且提高了推荐质量.

... 其中, 聚类技术经常与协同过滤组合在一起^[9,10] ...

2006

0.0

. 2006, 38(6):717-721

Clustering Basal Users Based Recommendation Algorithm Using Multiple-Level Similarity

为了降低数据稀疏性的影响,提高推荐系统的推荐生成质量,提出了一种基于多层相似性用户聚类的协同过滤推荐算法.该算法采用新的多层用户相似性度量,并将推荐过程分成了离线和在线两个部分.离线时,算法对基本用户数据进行预处理,并对基本用户聚类;在线时,算法利用已有的用户聚类寻找目标用户最近邻居,并产生推荐.实验表明,该算法不仅加快了推荐生成速度,而且提高了推荐质量,降低了约6%的平均绝对误差.

... 基于用户聚类的协同过滤推荐算法^[11]利用用户对项目评分的相似性对用户进行聚类, 相似度较大的用户处于同一用户聚类中 ...

... 参与对比试验的算法有: 传统的协同过滤算法 (Traditional Collaborative Filtering, CF)^[32]、Sarwar等^[28]提出的基于项目的协同过滤算法(Item-based Collaborative Filtering, ICF)、李涛等^[11]提出的基于多层相似性用户聚类的协同过滤算法(Clustering Basal Users Based Collaborative Filtering, UCCF) ...

2008

0.0

... 当目标用户出现时, 在其所处的用户聚类中搜索最近邻, 大大降低了搜索空间, 提高了推荐速度和算法的可扩展性, 并在一定程度上缓解了稀疏性问题^[12] ...

2004

1.07

0.0

... 因此该算法得到了广泛的运用, 但仍存在两个问题: 处于聚类边缘的用户与聚类中心的相似度较低, 算法对该用户的推荐精度会比较低^[13] ...

2010

0.0

0.457

. 2010, 46(26):7-10 DOI:10.3778/j.issn.1002-8331.2010.26.003

Adaptive Rough K-means Clustering Algorithm

1.School of Science，Ningxia Medical University，Yinchuan 750004，China 2.Department of Mathematics，Shaanxi University of Technology，Hanzhong，Shaanxi 723000，China

Rough clustering is one of valid clustering algorithms in indeterminate clustering.Through analyzing rough k -means algorithm，its shortcoming about the parameters adjustment about w l ， w u and ε is pointed out.Rough k -means clustering algorithm with adaptive parameters is presented.This algorithm can optimize clustering result of rough k -means，and decrease sensitivity about noise.Finally，this algorithm’s validity is proved by experiments.

粗糙聚类是不确定聚类算法中一种有效的聚类算法，这里通过分析粗糙 k -means算法，指出了其中3个参数 w l ， w u 和ε设置时存在的缺点，提出了一种自适应粗糙 k -means聚类算法，该算法能进一步优化粗糙 k -means的聚类效果，降低对“噪声”的敏感程度，最后通过实验验证了算法的有效性。

... 目前, 研究者通常采用处理不确定性问题的方法来解决, 不确定聚类^[14]是考虑样本归属关系的不确定性而提出来的一类有效的聚类算法 ...

1969

0.0

0.669

. , :22-32

... Ruspini^[15]最早提出了模糊聚类的概念, 使用隶属度来描述数据对象隶属各个簇的不确定性 ...

1982

0.0

International Journal of Computer and Information Sciences. , 11(5):341

Rough sets

Zdzisław Pawlak (1)

1. Institute of Computer Sciences, Polish Academy of Sciences, P.O. Box 22, 00-901, Warsaw, PKiN

We investigate in this paper approximate operations on sets, approximate equality of sets, and approximate inclusion of sets. The presented approach may be considered as an alternative to fuzzy sets theory and tolerance theory. Some applications are outlined.

... 粗糙集理论(Rough Set Theory)是由波兰学者Pawlak^[16]在1982年提出的一种刻画不完整性和不确定性等问题的数学工具 ...

... Pawlak^[16]给出粗糙集理论中上、下近似的示意图(见图2), Lingras等^[17]给出三个粗糙聚类性质, 均有助于对该算法的理解: ...

2004

0.833

0.0

. 2004, 23(1):5-16 DOI:10.1023/B:JIIS.0000029668.88665.1a

Interval Set Clustering of Web Users with Rough K-means

1.Saint Mary's University

Data collection and analysis in web mining faces certain unique challenges. Due to a variety of reasons inherent in web browsing and web logging, the likelihood of bad or incomplete data is higher than conventional applications. The analytical techniques in web mining need to accommodate such data. Fuzzy and rough sets provide the ability to deal with incomplete and approximate information. Fuzzy set theory has been shown to be useful in three important aspects of web and data mining, namely clustering, association, and sequential analysis. There is increasing interest in research on clustering based on rough set theory. Clustering is an important part of web mining that involves finding natural groupings of web resources or web users. Researchers have pointed out some important differences between clustering in conventional applications and clustering in web mining. For example, the clusters and associations in web mining do not necessarily have crisp boundaries. As a result, researchers have studied the possibility of using fuzzy sets in web mining clustering applications. Recent attempts have used genetic algorithms based on rough set theory for clustering. However, the genetic algorithms based clustering may not be able to handle the large amount of data typical in a web mining application. This paper proposes a variation of the K -means clustering algorithm based on properties of rough sets. The proposed algorithm represents clusters as interval or rough sets. The paper also describes the design of an experiment including data collection and the clustering process. The experiment is used to create interval set representations of clusters of web visitors.

... Lingras等^[17]首次将粗糙集引入到聚类问题中, 提出了粗糙聚类算法 ...

... Pawlak^[16]给出粗糙集理论中上、下近似的示意图(见图2), Lingras等^[17]给出三个粗糙聚类性质, 均有助于对该算法的理解: ...

... 给定上、下近似集的阈值threshold, 对于集合 , 根据Lingras等^[17]提出的归属关系判断规则可以将所有用户分配到K个用户聚类的上、下近似集中, 并对每个用户所在的类标号 ...

... 具体计算公式^[17]如下 ...

2013

0.0

... 在模糊聚类方面, Verma等^[18]提出一种模糊C-means聚类和协同过滤的混合推荐系统, 以解决稀疏性和可扩展性问题 ...

2013

1.854

0.0

. 2013, 40(17):6997-7009 DOI:10.1016/j.eswa.2013.06.022

Advances in Clustering Collaborative Filtering by Means of Fuzzy C-means and Trust

Birtolo, Cosimo 1 ;Ronca, Davide 1 ;

Several approaches for recommending products to the users are proposed in literature, and collaborative filtering has been proved to be one of the most successful techniques. Some issues related to the quality of recommendation and to computational aspects still arise (e.g., cold-start recommendations). In this paper, we investigate the application of model-based Collaborative Filtering (CF) techniques and in particular propose a clustering CF framework and two clustering CF algorithms: Item-based Fuzzy Clustering Collaborative Filtering (IFCCF) and Trust-aware Clustering Collaborative Filtering (TRACCF). We compare several approaches by means of Epinions, MovieLens, Jester, and Poste Italiane datasets (with real customers). Experimental results show an increased value of coverage of the recommendations provided by TRACCF without affecting recommendation quality. Moreover, trust information guarantees high level recommendation for different users. (C) 2013 Elsevier Ltd. All rights reserved.

... Birtolo等^[19]提出一种基于项目模糊聚类的协同过滤推荐算法, 实验证明有较高的推荐精度 ...

2012

0.0

0.61

... 李华等^[20]提出一种基于用户情景模糊聚类的协同过滤推荐算法, 根据用户情景信息利用模糊聚类算法得到情景相似的用户群分类, 以改善数据稀疏性和实时性问题 ...

2012

0.0

0.492

. 2012, 38(24):50-52 DOI:10.3969/j.issn.1000-3428.2012.24.012

Collaborative Filtering Algorithm Based on Fuzzy Clustering

(1. School of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai 200433, China; 2. Shanghai Business School, Shanghai 200235, China)

To deal with the sparsity and expansibility of traditional collaborative filtering algorithm, which affects the accuracy of their recommendations, a collaborative filtering algorithm based on fuzzy cluster is proposed in this paper. It applies fuzzy clustering method to cluster the item, and computes the similarity between the users by analyzing the average ratings that the k users rate the items of the clusters. It predicts the ratings of the items that the k users rate based on the ratings of the neighbors that they rate, chooses the first n recommendations. Experimental result demonstrates that the algorithm can improve the accuracy of recommendation under the condition of the extreme sparsity of user rating data.

针对传统协同过滤算法普遍存在的稀疏性和扩展性问题，提出一种基于模糊聚类的协同过滤算法。利用模糊聚类的方法对项目进行聚类，通过用户-项目评分矩阵计算用户之间的相似度，从中选出与用户最相似的前k个用户，根据这k个用户对当前用户的未评分项目的打分进行预测，选出前n个推荐。实验结果证明，与基于用户的协同过滤算法相比，该算法能提高冷启动问题下的相似度计算精度。

... 王明佳等^[21]利用模糊聚类的方法对项目进行聚类, 结果证明能有效提高冷启动问题下的相似度计算精度 ...

2009

2008

0.0

... Tseng等^[23]提出一种基于粗糙集和协同过滤的推荐算法RSCF, 在粗糙集的基础上综合协同信息和内容特征共同预测用户的偏好, 以解决传统协同过滤的不足 ...

2008

0.0

... Chen等^[24]为解决协同过滤推荐中的稀疏性问题, 运用粗糙集理论对目标用户未评分项目值进行预测填充, 再根据用户项目评分进行模糊用户聚类 ...

2009

0.0

0.2486

... 杜金涛^[25]提出一种基于粗糙集的协同推荐模型, 将粗糙集理论同时运用于用户聚类和项目值预测填充中, 取得较好的推荐效果 ...

... 而传统的粗糙聚类算法和文献^[25]提出的算法均使用欧式距离作为用户归属的评判标准, 聚类结果很大程度上受到孤立点的影响^[26], 并且没有考虑相似度之间的差异性, 不能很好应用于协同过滤推荐中 ...

2013

0.0

0.907

. 2013, 28(10):1479-1484

Rough K-means Clustering Based on Unbalanced Degree of Cluster

a. College of Automation，b. Institute of Computer Technology，Nanjing University of Posts and Telecomunications

Rough ??-means clustering is a valid algorithm to process the inseparability of border of clusters. But to most algorithms, weights of objects in the lower approximate set or the upper approximate set are all the same without paying attention to the diversity in clusters. Therefore, a new algorithm is proposed. The algorithm can make the cluster has a more compact center, and the borders are separated each other with the unbalanced degree of cluster which means the contribution of an object to the cluster. The simulation analysis shows that this algorithm can improve the precision of the clustering results effectively.

粗糙??-means 聚类算法是一种有效的处理聚类边界模糊问题的算法, 但大多数算法对簇的下近似集和边界中的对象使用统一的权值, 忽略了簇内对象之间的差异性. 针对这一问题提出一种新的改进算法, 通过对簇内的每个对象加入簇内不平衡度量, 以区分不同对象对簇的贡献程度, 使得聚类结果簇内更紧凑、簇间更疏远. 不同数据集的仿真实验结果表明, 所提出算法可以有效提高聚类结果的精度.

2008

3.643

0.0

. , :37-51

... 文献^[27]罗列了5种经常用于协同过滤中衡量用户相似度的方法: 皮尔森相关性(COR)、余弦相似性(COS)、修正的余弦相似性(ACOS)^[28]、限制的皮尔森相关性(CPC)和史匹曼等级相关性(SRC) ...

2001

0.0

2014

0.0

... 1 数据集采用MovieLens站点所提供的测试数据集^[29], 该站点由美国明尼苏达大学GroupLens研究项目组创建, 根据用户对电影的评分向其提供电影推荐列表, 被广泛应用于个性化推荐研究中 ...

2004

1.07

0.0

... 平均绝对偏差(Mean Absolute Error, MAE)作为度量标准^[30,31], 计算预测的用户评分与实际用户评分之间的偏差, MAE值越小, 推荐精度越高 ...

... 假设预测的用户评分集合为 , 实际评分集合为 , 则MAE可由以下公式^[30]计算: ...

2010

3.643

0.0

... 平均绝对偏差(Mean Absolute Error, MAE)作为度量标准^[30,31], 计算预测的用户评分与实际用户评分之间的偏差, MAE值越小, 推荐精度越高 ...

1994

0.0