|
|
Measurement and Distribution of Index Quality in Research Topics from Academic Databases |
Li Keyu,Wang Hao(),Gong Lijuan,Tang Huihui |
School of Information Management, Nanjing University, Nanjing 210023, China Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China |
|
|
Abstract [Objective] This paper measures the quality of index terms from research topics in academic databases and explores their distribution characteristics. [Methods] We collected the index terms of research topics in humanities, society and natural sciences from Web of Science and CNKI. Then, we constructed terminology spaces based on research topics, domains and databases. Third, we used term discriminative capacity (TDC) to evaluate their quality. Finally, we conducted ANOVA testing to explore the distribution characteristics of index terms quality from different databases/domains. [Results] The index term quality of research topics followed the rules of “Abstract”> average level >“Keyword”. The “Title” of CNKI (“Keyword Plus” in Web of Science) were lower than “Abstract”, while the “Title” in WoS were lower than average. [Limitations] The amount of research topics in this study needs to be expanded. [Conclusions] The TDC measure method is stable and reliable, which helps us improve the information retrieval services and terms quality.
|
Received: 08 November 2019
Published: 07 July 2020
|
|
Corresponding Authors:
Wang Hao
E-mail: ywhaowang@nju.edu.cn
|
[1] |
易中梅. 应用检索实例谈谈信息检索的查全率和查准率[J]. 科技信息(科学教研), 2008(24):363-364.
|
[1] |
( Yi Zhongmei. Analysis on Recall Ratio and Accuracy Ratio of Information Retrieval Based on Retrieval Practices[J]. Science & Technology Information, 2008(24):363-364.)
|
[2] |
张玲. 中刊库检索效率及其影响因素比较分析[J]. 情报理论与实践, 2001,24(2):120-121.
|
[2] |
( Zhang Ling. Comparative Analysis of the Retrieval Functions of China Journal Database and Its Influence Factors[J]. Information Studies: Theory & Application, 2001,24(2):120-121.)
|
[3] |
Wolfram D, Zhang J. The Impact of Term-indexing Characteristics on a Document Space[J]. Canadian Journal of Information & Library Science, 2001,26(4):21-35.
|
[4] |
Wolfram D, Zhang J. An Investigation of the Influence of Indexing Exhaustivity and Term Distributions on a Document Space[J]. Journal of the American Society for Information Science and Technology, 2002,53(11):943-952.
doi: 10.1002/(ISSN)1532-2890
|
[5] |
Salton G, Wong A, Yang C S. A Vector Space Model for Automatic Indexing[J]. Communications of the ACM, 1975,18(11):613-620.
doi: 10.1145/361219.361220
|
[6] |
Zhang J, Yu Q, Zheng F S, et al. Comparing Keywords Plus of WOS and Author Keywords: A Case Study of Patient Adherence Research[J]. Journal of the Association for Information Science & Technology, 2016,67(4):967-972.
|
[7] |
魏凤萍, 何益华, 方吉, 等. 基于Web of Science的机构文献检索策略[J]. 上海高校图书情报工作研究, 2019,29(1):81-86.
|
[7] |
( Wei Fengping, He Yihua, Fang Ji, et al. Organization Literature Retrieval Strategy Based on Web of Science[J]. Research on Library & Information Work of Shanghai Colleges & Universities, 2019,29(1):81-86.)
|
[8] |
江宏春. 自然科学、社会科学、人文科学的关系——一种“学科光谱”分析[J]. 自然辩证法研究, 2014,30(6):61-67.
|
[8] |
( Jiang Hongchun. Relations Among Natural Science, Social Science and Human Studies Under the Analysis on the Spectrum of Disciplines[J]. Studies in Dialectics of Nature, 2014,30(6):61-67.)
|
[9] |
李醒民. 知识的三大部类:自然科学、社会科学和人文学科[J]. 学术界, 2012(8):5-33,286.
|
[9] |
( Li Xingmin. Three Divisions of Knowledge: Natural Science, Social Science and the Humanities[J]. Academics, 2012(8):5-33,286.)
|
[10] |
自动标引[EB/OL].[ 2020- 02- 17]. http://baike.baidu.com/view/853543.html.
|
[10] |
(Automatic Indexing[EB/OL]. [ 2020- 02- 17]. http://baike.baidu.com/view/853543.html.
|
[11] |
李晓瑛, 夏光辉, 孙海霞. MTI自动文献标引系统研究[J]. 医学信息学杂志, 2015,36(3):52-57.
|
[11] |
( Li Xiaoying, Xia Guanghui, Sun Haixia. Research on Medical Text Indexer[J]. Journal of Medical Informatics, 2015,36(3):52-57.)
|
[12] |
李军莲, 王序文, 夏光辉, 等. 面向文献主题自动标引的通用概念表建设[J]. 情报理论与实践, 2017,40(4):95-99.
|
[12] |
( Li Junlian, Wang Xuwen, Xia Guanghui, et al. Construction of Common Concept List for Automatic Text Subject Indexing[J]. Information Studies: Theory & Application, 2017,40(4):95-99.)
|
[13] |
黄丹丹. 基于深度学习的中文分词和关键词抽取模型研究[D]. 北京:北京邮电大学, 2019.
|
[13] |
( Huang Dandan. Research on Chinese Word Segmentation and Keyword Extraction Model Based on Deep Learning[D]. Beijing: Beijing University of Posts and Telecommunications, 2019.)
|
[14] |
张海潮, 王昊, 唐慧慧, 等. CRFs字角色标注方法在中文附加关键词抽取中的应用研究[J]. 情报理论与实践, 2019,42(2):169-176.
|
[14] |
( Zhang Haichao, Wang Hao, Tang Huihui, et al. Application of CRFs Chinese Character Role Labeling Method in Chinese Keywords Plus Extraction[J]. Information Studies: Theory & Application, 2019,42(2):169-176.)
|
[15] |
Chemical Indexing [EB/OL]. [2020-02-17]. https://www.theiet.org/media/5239/chemical-indexing-updated-jan-2020.pdf.
|
[16] |
Numerical Indexing [EB/OL].[2020-02-17]. https://www.theiet.org/media/2019/numerical-data-indexing.pdf.
|
[17] |
何琳, 常颖聪. 不同标引策略下的文本主题表达质量比较研究[J]. 图书馆杂志, 2014,33(5):29-33.
|
[17] |
( He Lin, Chang Yingcong. Comparative Study of Subject Presentation with Different Indexing Strategies[J]. Library Journal, 2014,33(5):29-33.)
|
[18] |
Willett P. An Algorithm for the Calculation of Exact Term Discrimination Values[J]. Information Processing & Management, 1985,21(3):225-232.
doi: 10.1016/0306-4573(85)90107-4
|
[19] |
Zhang J, Wolfram D. Visualization of Term Discrimination Analysis[J]. Journal of the American Society for Information Science and Technology, 2001,52(8):615-627.
doi: 10.1002/(ISSN)1532-2890
|
[20] |
Pushpalatha K P, Raju G. Compactness-A Useful Feature for Generating Search Index [C]// Proceedings of the 2012 IEEE International Conference on Technology Enhanced Education(ICTEE), Kerala, India. 2012.
|
[21] |
Cai D, van Rijsbergen C J. Learning Semantic Relatedness from Term Discrimination Information[J]. Expert Systems with Applications, 2009,36(2):1860-1875.
doi: 10.1016/j.eswa.2007.12.072
|
[22] |
Lu K, Mao J. An Automatic Approach to Weighted Subject Indexing-An Empirical Study in the Biomedical Domain[J]. Journal of the Association for Information Science and Technology, 2015,66(9):1776-1784.
doi: 10.1002/asi.23290
|
[23] |
Lu K, Cai X, Ajiferuke I, et al. Vocabulary Size and Its Effect on Topic Representation[J]. Information Processing & Management, 2017,53(3):653-665.
doi: 10.1016/j.ipm.2017.01.003
|
[24] |
Labani M, Moradi P, Ahmadizar F, et al. A Novel Multivariate Filter Method for Feature Selection in Text Classification Problems[J]. Engineering Applications of Artificial Intelligence, 2018,70:25-37.
doi: 10.1016/j.engappai.2017.12.014
|
[25] |
Bernauer L, Han E J, Sohn S Y. Term Discrimination for Text Search Tasks Derived from Negative Binomial Distribution[J]. Information Processing & Management, 2018,54(3):370-379.
doi: 10.1016/j.ipm.2018.01.003
|
[26] |
Lakshmi R, Baskar S. Novel Term Weighting Schemes for Document Representation Based on Ranking of Terms and Fuzzy Logic with Semantic Relationship of Terms[J]. Expert Systems with Applications, 2019,137:493-503.
doi: 10.1016/j.eswa.2019.07.022
|
[27] |
王昊, 唐慧慧, 张海潮, 等. 面向学术资源的术语区分能力的测度方法研究[J]. 情报学报, 2019,38(10):1078-1091.
|
[27] |
( Wang Hao, Tang Huihui, Zhang Haichao, et al. A Study on the Measurement Methods of Term Discriminative Capacity for Academic Resources[J]. Journal of the China Society for Scientific and Technical Information, 2019,38(10):1078-1091.)
|
[28] |
刘启元, 叶鹰. 文献题录信息挖掘技术方法及其软件SATI的实现——以中外图书情报学为例[J]. 信息资源管理学报, 2012,2(1):50-58.
|
[28] |
( Liu Qiyuan, Ye Ying. A Study on Mining Bibliographic Records by Designed Software SATI: Case Study on Library and Information Science[J]. Journal of Information Resources Management, 2012,2(1):50-58.)
|
[29] |
NLPIR汉语分词系统[CP/OL].[ 2020- 02- 17]. http://www.nlpir.org/wordpress/.
|
[29] |
(NLPIR Chinese Word Segmentation System[CP/OL]. [ 2020- 02- 17]. http://www.nlpir.org/wordpress/.
|
[30] |
熊欣, 王昊, 张海潮, 等. 中文术语粒度对其区分能力测度的影响分析[J]. 数据分析与知识发现, 2020,4(2-3):143-152.
|
[30] |
( Xiong Xin, Wang Hao, Zhang Haichao, et al. Impacts of Chinese Term Granularity on Measuring Term Discriminative Capacity[J]. Data Analysis and Knowledge Discovery, 2020,4(2-3):143-152.)
|
[31] |
Korfhage R R. Information Storage and Retrieval[M]. New York: Wiley, 1997.
|
[32] |
Zhang J, Korfhage R R. A Distance and Angle Similarity Measure Method[J]. Journal of the American Society for Information Science, 1999,50(9):772-778.
doi: 10.1002/(SICI)1097-4571(1999)50:9<>1.0.CO;2-J
|
[33] |
Salton G, Yang C S, Yu C T. Theory of Term Importance in Automatic Text Analysis[J]. Journal of the American Society for Information Science, 1975,26(1):33-44.
doi: 10.1002/(ISSN)1097-4571
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|