|
|
Mining Document Topics Based on Association Rules |
Guangce Ruan( ),Lei Xia |
Department of Information Management, East China Normal University, Shanghai 200241, China
Shanghai Library, Shanghai 200031, China |
|
|
Abstract [Objective]This study is to accurately identify potential knowledge correlations among textual information, and then enrich the methodology of knowledge mining. [Methods] First, we combined the topic model and association rules. Second, used the LDA model to extract topic set from the texts, which not only reduced the textual dimension but also realized the semantic space expression. Finally, we analyzed the semantic ties among the topics with association rules. [Results] We effectively found the potential knowledge association from the document texts with reasonable degrees of support and confidence, and then improved model’s “understanding” of the textual message. [Limitations] While preprocessing data, the self-defined dictionary posed some negative effects to the results. [Conclusions] The proposed method could extract the latent semantic association from unstructured textual information, and then improve the performance of knowledge discovery systems.
|
Received: 07 September 2016
Published: 22 January 2017
|
[1] | Lazer D, Pentland A, Adamie L, et al.Computational Social Science[J]. Science, 2009, 323(5915): 721-723. | [2] | Salton G, Wong A, Yang C.A Vector Space Model for Automatic Indexing[J]. Communications of the ACM, 1975, 18(11): 613-620. | [3] | Ponte J M, Croft W B.A Language Modeling Approach to Information Retrieval [C]. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.1998: 275-281. | [4] | Agrawal R, Imieliński T, Swami A.Mining Association Rules Betweensets of Items in Large Databases[C]. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. 1993: 207-216. | [5] | 王鉴全, 季绍波. 基于关联规则的自动构词算法研究[J]. 计算机科学, 2014, 41(11): 256-259. | [5] | (Wang Jianquan, Ji Shaobo.Research and Application on Auto-word Building[J]. Computer Science, 2014, 41(11): 256-259.) | [6] | 何玉, 冯剑琳, 王元珍. 基于最大关联规则的文本分类[J]. 计算机科学, 2006, 33(11): 143-145. | [6] | (He Yu, Feng Jianlin, Wang Yuanzhen.Text Classification Based on Maximal Association Rule[J]. Computer Science, 2006, 33(11): 143-145.) | [7] | Cherfi H, Napoli A, Toussaint Y.Towards a Text Mining Methodology Using Association Rule Extraction[J]. Soft Computing, 2006, 10: 431-441. | [8] | Sekhavat Y A, Hoeber O.Visualizing Association Rules Using Linked Matrix, Graph, and Detail Views[J]. International Journal of Intelligence Science, 2013, 3(1): 34-49. | [9] | 刘菲, 黄萱菁, 吴立德. 利用关联规则挖掘文本主题词的方法[J]. 计算机工程, 2008, 34(7): 81-83. | [9] | (Liu Fei, Huang Xuanjing, Wu Lide.Approach for Extracting Thematic Terms Based on Association Rules[J]. Computer Engineering, 2008, 37(4): 81-83.) | [10] | Maedche A, Staab S.Discovering Conceptual Relations from Text [C]. In: Proceedings of the 14th European Conference on Artificial Intelligence (ECAI), Berlin, Germany. 2000: 321-325. | [11] | Schutz A, Buitelaar P.RelExt: A Tool for Relation Extraction from Text in Ontology Extension [C]. In: Proceedings of the 4th International Semantic Web Conference. 2005: 593-606. | [12] | Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. The Journal of Machine Learning Research, 2003, 3(3): 993-1022. | [13] | Zaki M J.Scalable Algorithm for Association Mining[J]. IEEE Transactions on Knowledge and Data Engineering, 2000, 12(3): 372-390. | [14] | 吴永梁, 陈炼. 基于改善度计算的有效关联规则[J]. 计算机工程, 2003, 29(8): 98-100. | [14] | (Wu Yongliang, Chen Lian.Valid Association Rules Based on Lift-calculation[J]. 2003, 29(8): 98-100.) |
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|