面向查询扩展的特征词频繁项集挖掘算法

doi:10.11925/infotech.1003-3513.2011.04.08

现代图书情报技术

2011, Vol. 27

Issue (4): 48-51 https://doi.org/10.11925/infotech.1003-3513.2011.04.08

知识组织与知识管理

本期目录 | 过刊浏览 | 高级检索

面向查询扩展的特征词频繁项集挖掘算法

黄名选¹, 马瑞兴², 兰慧红¹

1. 广西教育学院数学与计算机系南宁 530023;
2. 广西经济管理干部学院计算机系南宁 530007

Query Expansion Oriented Algorithm of Feature-words Frequent Itemsets Mining

Huang Mingxuan¹, Ma Ruixing², Lan Huihong¹

1. Department of Math and Computer Science, Guangxi College of Education, Nanning 530023, China;
2. Department of Computer Science, Guangxi Economic Mangement Cadre College, Nanning 530007, China

摘要
参考文献
相关文章
Metrics

全文: PDF (324 KB) HTML
输出: BibTeX | EndNote (RIS)

摘要为了获取高质量的扩展词,提出一种面向查询扩展的基于文本数据库的特征词频繁项集挖掘算法。该算法采用支持度衡量特征词频繁项集,给出新的剪枝策略,并结合原始查询,挖掘同时含有查询词项和非查询词项的特征词频繁项集,以提高挖掘效率。实验表明,与传统的挖掘算法相比,本算法更有效、更合理。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	黄名选
	马瑞兴
	兰慧红

关键词 ：频繁项集, 挖掘, 支持度, 查询扩展

Abstract：In this paper, a novel algorithm is proposed to mine feature-words frequent itemsets in text database, in order to obtain high-quality expansion terms for query expansion. This algorithm uses the support to measure the frequent itemsets, and only to mine those frequent itemsets containing original query terms and non- query terms synchronously. It can tremendously enhance the mining efficiency. The experimental results demonstrate that the algorithm is more efficient and more feasible than traditional ones.

Key words： Frequent itemset Mining Support Query expansion

收稿日期: 2011-02-15 出版日期: 2011-06-11

TP391

基金资助:

本文系广西教育厅科研项目“基于加权负关联规则挖掘的文本信息检索技术研究”(项目编号:201010LX679)和广西教育学院2010年度院级重点课题“基于正负关联规则的信息检索技术研究”(项目编号:桂教院科研[2010]7号(重点)-3)的研究成果之一。

引用本文:

黄名选, 马瑞兴, 兰慧红. 面向查询扩展的特征词频繁项集挖掘算法[J]. 现代图书情报技术, 2011, 27(4): 48-51.
Huang Mingxuan, Ma Ruixing, Lan Huihong. Query Expansion Oriented Algorithm of Feature-words Frequent Itemsets Mining. New Technology of Library and Information Service, 2011, 27(4): 48-51.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2011.04.08 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2011/V27/I4/48

[1] Agrawal R, Srikant R. Fast Algorithms for Mining Association Rules[C]. In:Proceedings of the 20th International Conference on Very Large Data Bases.1994:487-499.

[2] Han J, Pei J, Yin Y. Mining Frequent Patterns Without Candidate Generation[C]. In:Proceedings of 2000 ACM-SIGMOD International Conference Management of Data (SIGMOD’00).2000: 1-12.

[3] Burdick D, Calimlim M, Gehrke J. MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases [C].In:Proceedings of the 17th International Conference on Data Engineering. Heidelberg: IEEE, 2001:443-452.

[4] Li Q,Zhou X,Wang L,et al. Mining Maximal Frequent Pattems Using Minimal Combination Algorithm[J].Application Research of Computers,2008,25(3):702-704.

[5] 崔贯勋,李梁,王柯柯,等. 关联规则挖掘中Apriori算法的研究与改进[J].计算机应用,2010,30(11):2952-2955.

[6] 王强. 基于事务标识列表的关联规则挖掘算法[J]. 现代图书情报技术,2008(8):63-69.

[7] Cui H, Wen J R, Nie J Y,et al. Query Expansion by Mining User Logs [J]. IEEE Transactions on Knowledge and Data Engineering, 2003,15(4): 829-839.

[8] Zhang C, Qin Z,Yan X. Association-Based Segmentation for Chinese-Crossed Query Expansion [J]. IEEE Intelligent Informatics Bulletin, 2005,5 (1): 18-25.

[9] Qin Z, Liu L, Zhang S. Mining Term Association Rules for Heuristic Query Construction[C]. In:Proceedings of the 8th Pacific-Asia Conference(PAKDD 2004).2004: 145-154.

[10] Song M, Song I Y,Hu X,et al. Integration of Association Rules and Ontology for Semantic-based Query Expansion[C]. In:Proceedings of the 7th International Congress on Data Warehouse and Knowledge Discovery (DAWAK’05).2005: 326-335.

[11]Fonseca B M, Golgher P B, De Moura E S, et al. Discovering Search Engine Related Query Using Association Rules [J]. Journal of Web Engineering,2003, 2(4): 215-227.

[12] 黄名选,严小卫,张师超.基于矩阵加权关联规则挖掘的伪相关反馈查询扩展[J].软件学报,2009,20(7):1854-1865.

[13] 黄名选,严小卫,张师超. 基于完全加权关联规则的局部反馈查询扩展[J].计算机工程与应用, 2008, 44(7): 190-192.

[1]	黄名选,蒋曹清,卢守东. 基于词嵌入与扩展词交集的查询扩展^*[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[2]	许光,任明,宋城宇. 西方媒体新闻中的中国经济形象提取^*[J]. 数据分析与知识发现, 2021, 5(5): 30-40.
[3]	代冰,胡正银. 基于文献的知识发现新近研究综述 ^*[J]. 数据分析与知识发现, 2021, 5(4): 1-12.
[4]	谢旺, 王丽珍, 陈红梅, 曾兰清. 基于空间序偶模式挖掘污染源与癌症病例的关系 ^*[J]. 数据分析与知识发现, 2021, 5(2): 14-31.
[5]	郑新曼, 董瑜. 基于科技政策文本的程度词典构建研究^*[J]. 数据分析与知识发现, 2021, 5(10): 81-93.
[6]	华斌, 吴诺, 贺欣. 基于知识融合的政务信息化项目多专家审批意见整合^*[J]. 数据分析与知识发现, 2021, 5(10): 124-136.
[7]	胡广伟, 滕婕, 刘露. *政民互动中社会诉求主题挖掘和省际差异研究^——基于省级政府领导电子信箱语料的分析**[J]. 数据分析与知识发现, 2021, 5(10): 15-27.
[8]	余传明, 王曼怡, 林虹君, 朱星宇, 黄婷婷, 安璐. 基于深度学习的词汇表示模型对比研究*[J]. 数据分析与知识发现, 2020, 4(8): 28-40.
[9]	徐红霞,于倩倩,钱力. 基于主题模型和情感分析的话题交互数据观点对抗性分析 ^*[J]. 数据分析与知识发现, 2020, 4(7): 110-117.
[10]	夏天. 面向中文学术文本的单文档关键短语抽取 ^*[J]. 数据分析与知识发现, 2020, 4(7): 76-86.
[11]	沈卓,李艳. 基于PreLM-FT细粒度情感分析的餐饮业用户评论挖掘[J]. 数据分析与知识发现, 2020, 4(4): 63-71.
[12]	马建霞,袁慧,蒋翔. 基于Bi-LSTM+CRF的科学文献中生态治理技术相关命名实体抽取研究^*[J]. 数据分析与知识发现, 2020, 4(2/3): 78-88.
[13]	杜建. 医学知识不确定性测度的进展与展望^*[J]. 数据分析与知识发现, 2020, 4(10): 14-27.
[14]	关鹏,王曰芬. 国内外专利网络研究进展*[J]. 数据分析与知识发现, 2020, 4(1): 26-39.
[15]	李博诚,张云秋,杨铠西. 面向微博商品评论的情感标签抽取研究 ^*[J]. 数据分析与知识发现, 2019, 3(9): 115-123.

Viewed

Full text

Abstract

Cited

Shared

Discussed