Please wait a minute...
Data Analysis and Knowledge Discovery  2016, Vol. 32 Issue (12): 50-56    DOI: 10.11925/infotech.1003-3513.2016.12.07
Orginal Article Current Issue | Archive | Adv Search |
Mining Document Topics Based on Association Rules
Guangce Ruan(),Lei Xia
Department of Information Management, East China Normal University, Shanghai 200241, China
Shanghai Library, Shanghai 200031, China
Download: PDF(1491 KB)   HTML ( 54
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective]This study is to accurately identify potential knowledge correlations among textual information, and then enrich the methodology of knowledge mining. [Methods] First, we combined the topic model and association rules. Second, used the LDA model to extract topic set from the texts, which not only reduced the textual dimension but also realized the semantic space expression. Finally, we analyzed the semantic ties among the topics with association rules. [Results] We effectively found the potential knowledge association from the document texts with reasonable degrees of support and confidence, and then improved model’s “understanding” of the textual message. [Limitations] While preprocessing data, the self-defined dictionary posed some negative effects to the results. [Conclusions] The proposed method could extract the latent semantic association from unstructured textual information, and then improve the performance of knowledge discovery systems.

Key wordsAssociation rules      Topic model      Text topics     
Received: 07 September 2016      Published: 22 January 2017

Cite this article:

Guangce Ruan, Lei Xia. Mining Document Topics Based on Association Rules. Data Analysis and Knowledge Discovery, 2016, 32(12): 50-56.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2016.12.07     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2016/V32/I12/50

[1] Lazer D, Pentland A, Adamie L, et al.Computational Social Science[J]. Science, 2009, 323(5915): 721-723.
[2] Salton G, Wong A, Yang C.A Vector Space Model for Automatic Indexing[J]. Communications of the ACM, 1975, 18(11): 613-620.
[3] Ponte J M, Croft W B.A Language Modeling Approach to Information Retrieval [C]. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.1998: 275-281.
[4] Agrawal R, Imieliński T, Swami A.Mining Association Rules Betweensets of Items in Large Databases[C]. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. 1993: 207-216.
[5] 王鉴全, 季绍波. 基于关联规则的自动构词算法研究[J]. 计算机科学, 2014, 41(11): 256-259.
[5] (Wang Jianquan, Ji Shaobo.Research and Application on Auto-word Building[J]. Computer Science, 2014, 41(11): 256-259.)
[6] 何玉, 冯剑琳, 王元珍. 基于最大关联规则的文本分类[J]. 计算机科学, 2006, 33(11): 143-145.
[6] (He Yu, Feng Jianlin, Wang Yuanzhen.Text Classification Based on Maximal Association Rule[J]. Computer Science, 2006, 33(11): 143-145.)
[7] Cherfi H, Napoli A, Toussaint Y.Towards a Text Mining Methodology Using Association Rule Extraction[J]. Soft Computing, 2006, 10: 431-441.
[8] Sekhavat Y A, Hoeber O.Visualizing Association Rules Using Linked Matrix, Graph, and Detail Views[J]. International Journal of Intelligence Science, 2013, 3(1): 34-49.
[9] 刘菲, 黄萱菁, 吴立德. 利用关联规则挖掘文本主题词的方法[J]. 计算机工程, 2008, 34(7): 81-83.
[9] (Liu Fei, Huang Xuanjing, Wu Lide.Approach for Extracting Thematic Terms Based on Association Rules[J]. Computer Engineering, 2008, 37(4): 81-83.)
[10] Maedche A, Staab S.Discovering Conceptual Relations from Text [C]. In: Proceedings of the 14th European Conference on Artificial Intelligence (ECAI), Berlin, Germany. 2000: 321-325.
[11] Schutz A, Buitelaar P.RelExt: A Tool for Relation Extraction from Text in Ontology Extension [C]. In: Proceedings of the 4th International Semantic Web Conference. 2005: 593-606.
[12] Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. The Journal of Machine Learning Research, 2003, 3(3): 993-1022.
[13] Zaki M J.Scalable Algorithm for Association Mining[J]. IEEE Transactions on Knowledge and Data Engineering, 2000, 12(3): 372-390.
[14] 吴永梁, 陈炼. 基于改善度计算的有效关联规则[J]. 计算机工程, 2003, 29(8): 98-100.
[14] (Wu Yongliang, Chen Lian.Valid Association Rules Based on Lift-calculation[J]. 2003, 29(8): 98-100.)
[1] Qingtian Zeng,Xiaohui Hu,Chao Li. Extracting Keywords with Topic Embedding and Network Structure Analysis[J]. 数据分析与知识发现, 2019, 3(7): 52-60.
[2] Yong Zhang,Shuqing Li,Yongshang Cheng. Mining Algorithm for Weighted Association Rules Based on Frequency Effective Length[J]. 数据分析与知识发现, 2019, 3(7): 85-93.
[3] Bengong Yu,Yangnan Chen,Ying Yang. Classifying Short Text Complaints with nBD-SVM Model[J]. 数据分析与知识发现, 2019, 3(5): 77-85.
[4] Peiyao Zhang,Dongsu Liu. Topic Evolutionary Analysis of Short Text Based on Word Vector and BTM[J]. 数据分析与知识发现, 2019, 3(3): 95-101.
[5] Linna Xi,Yongxiang Dou. Examining Reposts of Micro-bloggers with Planned Behavior Theory[J]. 数据分析与知识发现, 2019, 3(2): 13-20.
[6] Jie Zhang,Junbo Zhao,Dongsheng Zhai,Ningning Sun. Patent Technology Analysis of Microalgae Biofuel Industrial Chain Based on Topic Model[J]. 数据分析与知识发现, 2019, 3(2): 52-64.
[7] Junwan Liu,Zhixin Long,Feifei Wang. Finding Collaboration Opportunities from Emerging Issues with LDA Topic Model and Link Prediction[J]. 数据分析与知识发现, 2019, 3(1): 104-117.
[8] Tao Zhang,Haiqun Ma. Clustering Policy Texts Based on LDA Topic Model[J]. 数据分析与知识发现, 2018, 2(9): 59-65.
[9] Yan Yu,Naixuan Zhao. Weighted Topic Model for Patent Text Analysis[J]. 数据分析与知识发现, 2018, 2(4): 81-89.
[10] He Li,Linlin Zhu,Min Yan,Jincheng Liu,Chuang Hong. Identifying Useful Information from Open Innovation Community[J]. 数据分析与知识发现, 2018, 2(12): 12-22.
[11] Weilin He,Guohe Feng,Hongling Xie. Analyzing Scientific Literature with Content Similarity - Topics over Time Model[J]. 数据分析与知识发现, 2018, 2(11): 64-72.
[12] Tingting Wang,Yu Wang,Linjie Qin. Dividing Time Windows of Dynamic Topic Model[J]. 数据分析与知识发现, 2018, 2(10): 54-64.
[13] Tingting Wang,Man Han,Yu Wang. Optimizing LDA Model with Various Topic Numbers: Case Study of Scientific Literature[J]. 数据分析与知识发现, 2018, 2(1): 29-40.
[14] Jiabin Qu,Shiyan Ou. Analyzing Topic Evolution with Topic Filtering and Relevance[J]. 数据分析与知识发现, 2018, 2(1): 64-75.
[15] Hui Li,Yunfeng Hu. Analyzing Online Reviews with Dynamic Sentiment Topic Model[J]. 数据分析与知识发现, 2017, 1(9): 74-82.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn