[Objective] This paper reviews the semantic text mining techniques for intelligence analysis. [Coverage] We surveyed the leading semantic text mining research on intelligence analysis from the last ten years and a few earlier studies. [Methods] We first discussed the semantic text mining methodologies and algorithms for words, sentences and paragraphs. Then, we analyzed these techniques from the perspective of topic evolution and applications of mining technologies. [Results] Compared to the traditional intelligence analysis methods, semantic text mining approaches could process unstructured data and deal with multi-layer structured data. [Limitations] Only reviewed the leading studies and their applications in the scientific field. [Conclusions] Semantic text mining improve the performance of traditional intelligence analysis systems and become the future direction of research methodology. More research is needed to enrich the outlier semantic resources.
赵冬晓,王效岳,白如江,刘自强. 面向情报研究的文本语义挖掘方法述评*[J]. 现代图书情报技术, 2016, 32(10): 13-24.
Zhao Dongxiao,Wang Xiaoyue,Bai Rujiang,Liu Ziqiang. Semantic Text Mining Methodologies for Intelligence Analysis. New Technology of Library and Information Service, 2016, 32(10): 13-24.
(Zhang Min, Li Sheng, Zhao Tiejun, et al.Part of Speech Tagging Chinese Corpus Based on Statistics and Rules[J]. Journal of Software, 1998, 9(2): 134-138.)
(Guo Yonghui, Wu Baomin, Wang Bingxi.Correlation Voting Fusion Strategy Used for Part of Speech Tagging[J]. Journal of Chinese Information Processing, 2007, 21(2): 9-13.)
(Zhang Min, Li Sheng, Zhao Tiejun, et al.Part of Speech Tagging Chinese Corpus Based on Statistics and Rules[J]. Journal of Software, 1998, 9(2): 134-138.)
(Shang Xianli, Wang Xuedong.A Feature Selection Method Based on Dynamic Co-word Network for Microblog Topic Detection[J]. Documentation, Information&Knowledge, 2016(3): 80-88.)
(Du Siqi, Li Honglian, Lv Xueqiang.Chinese Chunking Based Emotional Label Extraction[J]. Information Studies: Theory & Application, 2016, 39(5): 125-129.)
(Lan Qiujun, Liu Wenxing, Li Weikang, et al.Sentiment Analysis of Financial Forum Textual Message[J]. New Technology of Library and Information Service, 2016(4): 64-71.)
(Lu Zhimao, Liu Ting, Li Sheng.The Research Progress of Statistical Word Sense Disambugation[J]. Electronic Sinica, 2006, 34(2): 333-343.)
[18]
Lesk M E.Automated Sense Disambiguation Using Machine Readable Dictionaries: How to Tell a Pine Cone from All Ice Cream Cone[C]. In: Proceedings of the S1GDOC Conference. New York: Association for Computing Machinery, 1986: 24-26.
[19]
Pook S L, Catlett J.Making Sense out of Searching[R]. Sydney: AT&T Bell Laboratories, 1988.
[20]
Agirre E, Rigau G.A Proposal for Word Sense Disambiguation Using Conceptual Distance [C]. In: Proceedings of the 1st International Conference on Recent Advances in NLP. 1995: 162-171.
(Lu Wenpeng, Huang Heyan, Wu Hao.Word Sense Disambiguation Based with Graph Model Based on Domain Knowledge[J]. Acta Automatic Sinica, 2014, 40(12): 2836-2850.)
(Zhang Yangsen, Guo Jiang.Analysis and Comparison of 4 Kinds of Statistical Word Sense Disambiguation Models[J]. Journal of Beijing Information Science & Technology, 2011, 26(2): 13-18.)
(Lu Song, Bai Shuo, Huang Xiong, et al.Supervised Word Sense Disambiguation Based on Vector Space Model[J]. Computer Research and Development, 2011, 38(6): 662-667.)
(Chen Feng, Zhai Yujia, Wang Fang.Automatic Theory Recognition in Academic Journals Based on CRF[J]. Library and Information Service, 2016, 60(2): 122-128.)
(Zhu Na, Wang Xiaoyue, Bai Rujiang.Semantic Role Labeling and the Application in Intelligence Analysis[J]. Information Studies: Theory & Application, 2015, 38(1): 98-103.)
[29]
Hacioglu K.Semantic Role Labeling Using Dependency Trees [C]. In: Proceedings of the 20th International Conference on Computational Linguistics. Association for Computational Linguistics, 2004.
(Wang Bukang, Wang Hongling, Yuan Xiaohong, et al.Chinese Dependency Parse Based Semantic Role Labeling[J]. Journal of Chinese Information Processing, 2010, 24(1): 25-29.)
[31]
Gildea D, Palmer M.The Necessity of Parsing for Predicate Argument Recognition [C]. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. 2002: 239-246.
[32]
Pradhan S, Ward W, Hacioglu K, et al.Shallow Semantic Parsing Using Support Vector Machines [C]. In: Proceedings of HLT-NAACL.2004: 233-240.
(Song Yijun, Wang Ruibo, Li Jihong.et al.Semantic Role Labeling of Chinese FrameNet Based on Conditional Random Fields[J]. Journal of Chinese Information Processing, 2014, 28(3): 36-47.)
(Li Ming, Wang Yabin, Zhang Qiwen, et al.Semantic Role Labeling Based on Tree Conditional Random Fields Model[J]. Computer Engineering, 2010, 36(18): 41-45.)
(Bai Rujiang, Zhu Na, Wang Xiaoyue.Semantic Representation of Technical Innovation Content Based on Semantic Enhancement[J]. Information Studies: Theory & Application, 2016, 39(3): 73-79.)
(Zhang Fan, Le Xiaoqiu.Research on Recognition of Concept Attribute Instances in Innovation Sentences of Scientific Research Paper[J]. New Technology of Library and Information Service, 2015 (5): 15-23.)
(Zhu Na, Wang Xiaoyue, Yang Jing, et al.Semantic Recognition of Technological Innovation Theme Based on LDA[J]. Library and Information Service, 2015, 59(14): 126-134.)
(Hong Yunjia, Xu Xin.Study on Multi-Level Text Clustering for Knowledge Base Based on Domain Ontology——Taking Knowledge Base of Chinese Cuisine Culture as an Example[J]. New Technology of Library and Information Service, 2013(12): 19-26.)
(Ye Chunlei, Leng Fuhai.Development of Discipline Theme Evolution Analysis Based on Co-word Analysis[J]. Information Studies: Theory & Application, 2012, 35(3): 79-82.)
(Tang Xiaobo, Fang Xiaoke.Micro Blog Topic Retrieval Model Research Based on Text Clustering and LDA[J]. Information Studies: Theory & Application, 2013, 36(8): 85-90.)
[44]
Mitchell T.Machine Learning[M]. McCraw Hill, 1996.
[45]
Yang Y.An Evaluation of Statistical Approaches to Text Categorization[J]. Information Retrieval, 1999, 1(1-2): 69-90.
[46]
Church K W, Hanks P. Word Association Norms, Mutual Information and Lexicography[J]. Computational Linguistics, 1990, 16(1): 22-29.
(Yang Jinfeng, Yu Qiubin, Guan Yi, et al.An Overview of Research on Electronic Medical Record Oriented Named Entity Recognition and Entity Relation Extraction[J]. Acta Automatic Sinica, 2014, 40(8): 1537-1560.)
(Hou Yuefang, Cui Lei, Wu Di.Co-Citation Clustering-Content Words Analysis in Subject Development[J]. Journal of the China Society for Scientific and Technical Information, 2007, 26(2): 309-314.)
(Chai Shengsan.Application of Content Words and Co-citation Clustering Analysis to Science Structure Studies[J]. Journal of the China Society for Scientific and Technical Information, 1997, 16(1): 68-73.)
[53]
Callon M, Law J, Rip A.Mapping the Dynamics of Science and Technology: Sociology of Science in the Real World[M]. London: The Macmillan Press LTD, 1998.
(Cui Lei.Keyword Link Cluster Analysis of the Immediately Highly Cited Papers and Its Utilization in Information Prediction[J]. Journal of the China Society for Scientific and Technical Information, 1995, 14(5): 368-373.)
[55]
Callon M, Courtial J P, Laville F.Co-word Analysis as a Tool for Describing the Network of Interactions Between Basic and Technological Research: The Case of Polymer Chemistry[J]. Scientometrics, 1991, 22(1): 155-205.
[56]
Kostoff R N, Eberhart H J, Toothman D R.Data-base Tomography for Technical Intelligence: A Roadmap of The Near-earth Space Science and Technology Literature[J]. Information Processing & Management, 1997, 34(1): 69-85.
(Wang Xiaoguang.Structure and Evolution of Scientific Knowledge Network: Co-word Network[J]. Journal of the China Society for Scientific and Technical Information, 2009, 28(4): 599-605.)
(Bai Rujiang, Leng Fuhai.Knowledge Innovational Evolution Analysis Based on k-clique Community Network[J]. Library and Information Service, 2013, 57(17): 86-94.)
(Zheng Yanning, Xu Xiaoyang, Liu Zhihui.Study on the Method of Identifying Research Fronts Based on Keywords Co-occurrence[J]. Library and Information Service, 2016, 60(4): 85-92.)
(Ba Zhichao, Yang Zijiang, Zhu Shiwei, et al.Key Words Semantic Network Based Field Topic Evolution Analysis Model[J]. Information Studies: Theory & Application, 2016, 39(3): 67-72.)
(Chen Qian, Gui Zhiguo, Guo Xin, et al.Topic Evolution in Text Stream Based on Feature Ontology[J]. Journal of Computer Applications, 2015, 35(2): 456-460.)
(Wang Ping.Topic Extraction and Evolution for Scientific Literature Based on Hierarchical Probabilistic Topic Model[J]. Library and Information Service, 2014, 58(22): 70-77.)
(He Jianmin, Li Xue.A Hidden Markov Model Research in the Microblog Public Opinion Evolutionary Analysis[J]. Information Science, 2016, 34(4): 7-12.)
[64]
Song M, Heo G E, Kim S Y.Analyzing Topic Evolution in Bioinformatics: Investigation of Dynamics of the Field with Conference Data in DBLP[J]. Scientometrics, 2014, 101(1): 397-428.
(Hu Zhengyin, Fang Shu.Review of Patent Text Technology Mining Research Development[J]. New Technology of Library and Information Service, 2014(6): 62-70.)
[66]
Yoon J, Kim K.Identifying Rapidly Evolving Technological Trends for R&D Planning Using SAO-based Semantic Patent Networks[J]. Scientometrics, 2011, 88(1): 213-228.
[67]
Park H, Yoon J, Kim K.Using Function-based Patent Analysis to Identify Potential Application Areas of Technology for Technology Transfer[J]. Expert Systems with Applications, 2013, 40(13): 5260-5265.
[68]
Yoon J, Kim K.Detecting Signals of New Technological Opportunities Using Semantic Patent Analysis and Outlier Detection[J]. Scientometrics, 2012, 90(2): 1-17.
(Hu Zhengyin, Fang Shu, Kui Ling.Patent Technology Evolution Analysis Based on SAO [C]. In: Proceedings of Professional Library Branch of China Library Association 2015 Scholar Conference, Guiyang. 2015.)