Please wait a minute...
Data Analysis and Knowledge Discovery  2023, Vol. 7 Issue (10): 109-118    DOI: 10.11925/infotech.2096-3467.2022.0915
Current Issue | Archive | Adv Search |
Extracting Value Elements and Constructing Index System for Calligraphy Works Based on Hyperplane-BERT-Louvain Optimized LDA Model
Pan Xiaoyu1,Ni Yuan2,3(),Jin Chunhua2,Zhang Jian2,3
1Computer School, Beijing Information Science & Technology University,Beijing 100192, China
2School of Economics & Management, Beijing Information Science & Technology University, Beijing 100192, China
3Beijing Key Laboratory of Big Data Decision Making for Green Development, Beijing 100192, China
Download: PDF (907 KB)   HTML ( 5
Export: BibTeX | EndNote (RIS)      

[Objective] This paper uses big data and artificial intelligence to identify the value elements of calligraphy works and provides technical support for their trading activities. It addresses the issue of lacking standards in the assessment of calligraphy works. [Methods] First, we combined the hyperplane algorithm and BERT model to preprocess calligraphy documents by eliminating stop words and expanding semantics to create an optimized corpus with high recognition. Secondly, we constructed a complex semantic network for calligraphy literature and introduced the Louvain algorithm to determine the optimal number of topics by maximizing the modularity of the community network. Finally, we developed a new method based on “Hyperplane-Bert-Louvain-LDA” (HBL-LDA) to construct an assessment index system of calligraphy value. [Results] Compared with LDA, the precision and F value of the topic recognition of the HBL-LDA were increased by 45.00% and 29.46%, respectively. The average topic quality rate was reduced by 0.96, with more high-quality topics identified. We also used regression models to verify the evaluation index system with representative calligraphy works, with the highest accuracy rate of 84.00%. [Limitations] This paper only constructed an evaluation system for calligraphy works, which cannot be applied to other artworks. The BERT model lacks the topic semantic information, which makes it challenging to expand similar feature words. [Conclusions] The new model for calligraphy value evaluation proposed in this paper provides new directions for constructing index systems in other fields.

Key wordsEvaluation Index System      LDA      Field Stop Words      Louvain      BERT     
Received: 29 August 2022      Published: 20 December 2023
ZTFLH:  TP391  
Fund:Young Scientist Project of National Key R&D Program(2021YFF0900200)
Corresponding Authors: Ni Yuan,ORCID:0000-0002-0600-2619,E-mail:。   

Cite this article:

Pan Xiaoyu, Ni Yuan, Jin Chunhua, Zhang Jian. Extracting Value Elements and Constructing Index System for Calligraphy Works Based on Hyperplane-BERT-Louvain Optimized LDA Model. Data Analysis and Knowledge Discovery, 2023, 7(10): 109-118.

URL:     OR

维度结构 指标维度 文献来源
三维度 形象张力、文化内涵、审美情趣 张志强等[19]
思想价值、艺术价值、学术价值 赵长青[20]
四维度 哲学观念价值、审美认同价值、
Summary of Evaluation Index System of Calligraphy Value
优化内容 方法 特点
主题数确定 人工设置 随意性较大
基于困惑度和相似度指标确定[24] 选取的主题数偏大、噪声主题多、主题间交叉性大[25]
自动确定[26] 算法效果提升明显,算法复杂度高、效率低
停用词筛选 静态的通用停用词和基于规则的方法[27] 忽略掉不同领域的专属停用词
词频统计、文档频率、辅助集[28] 容易将高区分度词语剔除,导致模型的关键特征减少,泛化能力降低
语义增强 Word2Vec 受窗口大小限制,不能获得整个句子的信息
BERT 产生动态词向量,表达词语的丰富含义
Research on Topic Number Choice, Stop Word Filtering and Semantic Enhancement
The Framework of Construction and Verification of Evaluation Index System
Schematic Diagram Based on Similar Feature Word Expansion
数据集的类型 主题 文本量
目标集 书法 5 362
辅助集1 绘画 5 362
电影 5 362
辅助集2 医学 5 362
科技 5 362
The Data Distribution of Target and Auxiliary Sets
模型 主题数 领域停用词过滤和相似特征词扩充 T e x t r a c t T c o r r e c t T s t a n d a r d 查准率/% 查全率/% F值/%
LDA 55 32 8 11 25.00 72.72 37.20
HB-LDA 55 34 9 11 26.47 81.81 39.99
L-LDA 10 10 6 11 60.00 54.54 57.13
HBL-LDA 10 10 7 11 70.00 63.63 66.66
The Results of Topic Extraction by Different Models
主题编号 主题词 主题 TQR
1 理论观念、碑学、方法、关系、书论、问题、概念、体系、方式 创作理念 8.02
2 部分、文学学术、书学、篆书成就、学者、论文、题跋诗歌 语言造诣 7.89
3 墓志书风、书法作品、书法家书法史、书家、地位、问题、文章、探究 风格特色、作者知名度 7.47
4 风格、作品、笔法篆刻、先生、特点、笔墨用笔线条行书 风格特色、笔法技艺 7.25
5 美学、绘画、精神、书画、文人、生命、中国画、哲学、内涵、人格 精神内涵 7.34
6 教育教学课程小学语文美术、写字、问题、现状、学校 审美教育 7.66
7 社会时代、人们、特色、政治、生活、功能、时期环境、活动 时代背景 7.65
8 碑刻时期石刻文献楷书书体资料史料历代大量 文献史料 7.54
9 设计、形式、运用、元素、内涵、民族、语言、空间融合、视觉 章法布局 7.18
10 书写、汉字、文字、草书字体、特征、传播、结构、方法、形态 字体形态 7.19
The Top10 Keywords and TQR of Topics by LDA Model
主题编号 主题词 主题 TQR
1 部分、学术、总结、成就、论文、全面、基础、学者、产生、重点 作者成就 8.22
2 语文、写字、草书教学课程学校教师学科、评价、素养 审美教育 4.90
3 人们、教育、功能、活动、理念、方式、生活、特色、体系、资源 审美教育 8.69
4 用笔楷书墓志笔法篆刻章法线条、笔者、结构、变化 笔法技艺、章法布局 4.85
5 美学文学精神、方式、意识、角度、层面、基础、系统、因素 精神内涵 9.22
6 文人、笔墨、生命、中国画、哲学、关系、书画、绘画、精神、人格 笔墨技巧、精神内涵 4.98
7 书学、书法作品、经典、书论、书家、书法家书法史、探究、书坛、碑帖 名家经典 4.64
8 碑刻石刻文献资料史料大量、整理、情况、地区、景观 文献史料 8.70
9 理论、社会、时代、观念、背景、代表、现象、核心、观点、个性 时代背景 6.45
10 汉字、设计、语言、民族、元素、传播、融合、运用、字体、形态 文化传播、字体形态 4.93
The Top10 Keywords and TQR of Topics by HBL-LDA Model
一级指标 二级指标 三级指标
文化性 其他著作引用的次数
艺术性 字体的类型
社会价值 传播度 在知网文献、主流新闻及报纸出现的次数
认可度 作者的作品被博物馆收藏的总数
经济价值 投资性 鉴藏印的数量
收藏性 材质的类型
The Evaluation Index System of Calligraphy Value
回归决策树 Bagging回归 随机森林
7∶3 78 66 67
8∶2 78 72 72
9∶1 84 78 72
The Accuracy of Regression Model Prediction
[1] 王玉卓, 闵华松. 基于毛笔建模的机器人书法系统[J]. 智能系统学报, 2021, 16(4): 707-716.
[1] (Wang Yuzhuo, Min Huasong. Robot Calligraphy System Based on Brush Modeling[J]. CAAI Transactions on Intelligent Systems, 2021, 16(4): 707-716.)
[2] 吕行佳. 当代书法批评的认识误区及反思[J]. 艺术传播研究, 2021(1): 39-44.
[2] (Lv Xingjia. Misunderstanding and Reflection on Contemporary Calligraphy Criticism[J]. Journal of Art Communication, 2021(1): 39-44.)
[3] 刘翔宇. 中国当代艺术品交易机制研究[D]. 济南: 山东大学, 2012.
[3] (Liu Xiangyu. Research on the Trading Mechanism of Contemporary Art in China[D]. Jinan: Shandong University, 2012.)
[4] 祝帅. 关于当代书法评价体系建立方法的思考[J]. 美术观察, 2015(8): 26-27.
[4] (Zhu Shuai. Reflections on the Establishment of Contemporary Calligraphy Evaluation System[J]. Art Observation, 2015(8): 26-27.)
[5] 俞琰, 赵乃瑄. 专利文本主题建模中领域停用词自动选取研究[J]. 图书情报工作, 2018, 62(11): 120-126.
doi: 10.13266/j.issn.0252-3116.2018.11.014
[5] (Yu Yan, Zhao Naixuan. Automatic Selection of Domain-Specific Stopwords in Topic Model of Patent Text[J]. Library and Information Service, 2018, 62(11): 120-126.)
doi: 10.13266/j.issn.0252-3116.2018.11.014
[6] Luhn H P. The Automatic Creation of Literature Abstracts[J]. IBM Journal of Research and Development, 1958, 2(2): 159-165.
doi: 10.1147/rd.22.0159
[7] 安毅, 张益富. “中国书法”评判体系构建初探[J]. 中国书法, 2016(14): 82-84.
[7] (An Yi, Zhang Yifu. On the Construction of the Evaluation System of “China Calligraphy”[J]. Chinese Calligraphy, 2016(14): 82-84.)
[8] 张茜茜. 基于文本挖掘的企业技术创新指标体系构建方法研究[D]. 北京: 北京交通大学, 2021.
[8] (Zhang Qianqian. Research on the Construction Method of Enterprise Technology Innovation Index System Based on Text Mining[D]. Beijing: Beijing Jiaotong University, 2021.)
[9] 冯坤, 杨强, 常馨怡, 等. 基于在线评论和随机占优准则的生鲜电商顾客满意度测评[J]. 中国管理科学, 2021, 29(2): 205-216.
[9] (Feng Kun, Yang Qiang, Chang Xinyi, et al. Customer Satisfaction Evaluation Method for Fresh E-commerce Based on Online Reviews and Stochastic Dominance Rules[J]. Chinese Journal of Management Science, 2021, 29(2): 205-216.)
[10] 杜杏叶. 学术论文关键指标智能化评价研究[D]. 长春: 吉林大学, 2019.
[10] (Du Xingye. Research on Intelligent Evaluation of Key Indicators of Academic Papers[D]. Changchun: Jilin University, 2019.)
[11] 徐选华, 侯宇舟, 何继善. 基于权威专家的不完全概率语言评价信息大群体决策方法及在干热岩勘探选址中的应用[J]. 运筹与管理, 2021, 30(8): 7-13.
doi: 10.12005/orms.2021.0240
[11] (Xu Xuanhua, Hou Yuzhou, He Jishan. Large Group Decision Making Method Based on Incomplete Probabilistic Linguistic Evaluation Information Considering Authoritative Expert and Its Application in Site Selection of Hot Dry Rock Exploration[J]. Operations Research and Management Science, 2021, 30(8): 7-13.)
doi: 10.12005/orms.2021.0240
[12] 彭定洪, 饶宏伟. 含多重偏见的犹豫模糊群体决策方法[J]. 模糊系统与数学, 2022, 36(2): 49-59.
[12] (Peng Dinghong, Rao Hongwei. Hesitant Fuzzy Group Decision Making Method with Multiple Biases[J]. Fuzzy Systems and Mathematics, 2022, 36(2): 49-59.)
[13] 刘佳琪. O2O外卖网站用户体验分析——以北京地区为例[D]. 北京: 首都经济贸易大学, 2018.
[13] (Liu Jiaqi. A Study on User Experience of O2O Takeaway Website——Take Beijing as an Example[D]. Beijing: Capital University of Economics and Business, 2018.)
[14] 王莲, 李然, 徐笑非, 等. 地域文化产品造型多维评价模型[J]. 包装工程, 2021, 42(20): 389-394, 401.
[14] (Wang Lian, Li Ran, Xu Xiaofei, et al. Multi-dimensional Evaluation Model of Regional Cultural Product Modeling[J]. Packaging Engineering, 2021, 42(20): 389-394, 401.)
[15] 王恒. 文化旅游偏好影响要素与优化导向——基于离散选择模型[J]. 社会科学家, 2022(1): 42-51.
[15] (Wang Heng. Influencing Factors and Optimization Orientation of Cultural Tourism Preference—Based on Discrete Choice Model[J]. Social Scientist, 2022(1): 42-51.)
[16] 张奕韬, 万常选, 刘喜平, 等. 基于PSP_HDP主题模型的非结构化经济指标挖掘[J]. 软件学报, 2020, 31(3): 845-865.
[16] (Zhang Yitao, Wan Changxuan, Liu Xiping, et al. Mining Unstructured Economic Indicators Based on PSP_HDP Topic Model[J]. Journal of Software, 2020, 31(3): 845-865.)
[17] 刘敬涛, 李秀霞, 邵作运. 一种基于主题提取和情感分析的图书评价方法[J]. 情报探索, 2022(5): 43-49.
[17] (Liu Jingtao, Li Xiuxia, Shao Zuoyun. A Book Evaluation Method Based on Topic Extraction and Sentiment Analysis[J]. Information Research, 2022(5): 43-49.)
[18] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[19] 张志强, 王嘉逸, 魏明. 环境中的书法价值探颐[J]. 中国书法, 2014(5): 166-169.
[19] (Zhang Zhiqiang, Wang Jiayi, Wei Ming. Exploring the Value of Calligraphy in the Environment[J]. Chinese Calligraphy, 2014(5): 166-169.)
[20] 赵长青. 初论书法价值及实现[J]. 中国书法, 2011(1): 39-40.
[20] (Zhao Changqing. On the Value and Realization of Calligraphy[J]. Chinese Calligraphy, 2011(1): 39-40.)
[21] 李庶民. 当代书法的价值取向和发展方向[J]. 中国书法, 2020(9): 176-178.
[21] (Li Shumin. Value Orientation and Development Direction of Contemporary Calligraphy[J]. Chinese Calligraphy, 2020(9): 176-178.)
[22] 陈振濂. 当代书法评价体系建设[M]. 第1版. 上海: 上海书画出版社, 2019: 113-191.
[22] (Chen Zhenlian. Contemporary Calligraphy Evaluation System Construction[M]. Edition 1. Shanghai: Shanghai Calligraphy and Painting Publishing House, 2019: 113-191.)
[23] 黄琳, 王丽亚, 明新国. 基于改进的LDA模型的产品服务需求识别[J]. 工业工程与管理, 2023, 28(1): 42-50.
[23] (Huang Lin, Wang Liya, Ming Xinguo. Product Service Requirement Identification Based on Modified-LDA Model[J]. Industrial Engineering and Management, 2023, 28(1): 42-50.)
[24] 关鹏, 王曰芬. 科技情报分析中LDA主题模型最优主题数确定方法研究[J]. 现代图书情报技术, 2016(9): 42-50.
[24] (Guan Peng, Wang Yuefen. Identifying Optimal Topic Numbers from Sci-Tech Information with LDA Model[J]. New Technology of Library and Information Service, 2016(9): 42-50.)
[25] 杨洋, 江开忠, 原明君, 等. 新闻话题识别中LDA最优主题数选取研究[J]. 数据分析与知识发现, 2022, 6(11): 72-78.
[25] (Yang Yang, Jiang Kaizhong, Yuan Mingjun, et al. Selecting Optimal LDA Numbers to Identify News Topics[J]. Data Analysis and Knowledge Discovery, 2022, 6(11): 72-78.)
[26] 王婷婷, 韩满, 王宇. LDA模型的优化及其主题数量选择研究——以科技文献为例[J]. 数据分析与知识发现, 2018, 2(1) :29-40.
[26] (Wang Tingting, Han Man, Wang Yu. Optimizing LDA Model with Various Topic Numbers: Case Study of Scientific Literature[J]. Data Analysis and Knowledge Discovery, 2018, 2(1): 29-40.)
[27] Ladani D J, Desai N P. Automatic Stopword Identification Technique for Gujarati Text[C]// Proceedings of 2021 International Conference on Artificial Intelligence and Machine Vision, Gandhinagar, India. Piscataway, NJ: IEEE, 2021: 1-5.
[28] 俞琰, 赵乃瑄. 基于辅助集的专利主题分析领域停用词选取[J]. 数据分析与知识发现, 2018, 2(11): 95-103.
[28] (Yu Yan, Zhao Naixuan. Choosing Stopwords for Patent Topic Analysis Based on Auxiliary Set[J]. Data Analysis and Knowledge Discovery, 2018, 2(11): 95-103.)
[29] Alshanik F, Apon A, Herzog A, et al. Accelerating Text Mining Using Domain-Specific Stop Word Lists[C]// Proceedings of 2020 IEEE International Conference on Big Data,Atlanta, GA, USA. Piscataway, NJ: IEEE, 2020: 2639-2648.
[30] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv: 1810.04805.
[31] 李菲菲. 基于C-LDA的教育领域搜索引擎的研究与实现[D]. 北京: 北京交通大学, 2018.
[31] (Li Feifei. Research and Realization of the Search Engine in the Field of Education Based on C-LDA[D]. Beijing: Beijing Jiaotong University, 2018.)
[32] Blondel V D, Guillaume J L, Lambiotte R, et al. Fast Unfolding of Communities in Large Networks[J]. Journal of Statistical Mechanics: Theory and Experiment, 2008, 2008(10): P10008.
[33] 徐进, 邓乐龄. 基于Louvain算法的铁路旅客社会网络社区划分研究[J]. 山东农业大学学报(自然科学版), 2018, 49(4): 722-725.
[33] (Xu Jin, Deng Leling. Study on Community Detection of Railway Passenger Social Networks Based on Louvain Algorithm[J]. Journal of Shandong Agricultural University (Natural Science Edition), 2018, 49(4): 722-725.)
[34] 官琴, 邓三鸿, 王昊. 中文文本聚类常用停用词表对比研究[J]. 数据分析与知识发现, 2017, 1(3): 72-80.
[34] (Guanqin, Deng Sanhong, Wang Hao. Chinese Stopwords for Text Clustering: A Comparative Study[J]. Data Analysis and Knowledge Discovery, 2017, 1(3): 72-80.)
[35] 张曦元. 基于LDA的博文分类及主题演化研究[D]. 沈阳: 东北大学, 2019.
[35] (Zhang Xiyuan. Research on Classification and Topic Evolution of Blog Based on LDA[D]. Shenyang: Northeastern University, 2019.)
[1] He Chaocheng, Huang Qian, Li Xinru, Wang Chunying, Wu Jiang. Trending Topics on Metaverse: A Microblog Text Analysis with BERT and DTM[J]. 数据分析与知识发现, 2023, 7(9): 25-38.
[2] Zhao Xuefeng, Wu Delin, Wu Weiwei, Sun Zhuoluo, Hu Jinjin, Lian Ying, Shan Jiayu. Identifying High-Quality Technology Patents Based on Deep Learning and Multi-Category Polling Mechanism——Case Study of Patent Applications[J]. 数据分析与知识发现, 2023, 7(8): 30-45.
[3] Zhang Zhenqing, Sun Wei. Interdisciplinary Subject Recognition Based on Feature Measurement and PhraseLDA Model——Case Study of Nanotechnology in Agricultural Environment[J]. 数据分析与知识发现, 2023, 7(7): 32-45.
[4] Ben Yanyan, Pang Xueqin. Identifying Medical Named Entities with Word Information[J]. 数据分析与知识发现, 2023, 7(5): 123-132.
[5] Xu Kang, Yu Shengnan, Chen Lei, Wang Chuandong. Linguistic Knowledge-Enhanced Self-Supervised Graph Convolutional Network for Event Relation Extraction[J]. 数据分析与知识发现, 2023, 7(5): 92-104.
[6] Su Mingxing, Wu Houyue, Li Jian, Huang Ju, Zhang Shunxiang. AEMIA:Extracting Commodity Attributes Based on Multi-level Interactive Attention Mechanism[J]. 数据分析与知识发现, 2023, 7(2): 108-118.
[7] Zhao Yiming, Pan Pei, Mao Jin. Recognizing Intensity of Medical Query Intentions Based on Task Knowledge Fusion and Text Data Enhancement[J]. 数据分析与知识发现, 2023, 7(2): 38-47.
[8] Wang Yufei, Zhang Zhixiong, Zhao Yang, Zhang Mengting, Li Xuesi. Designing and Implementing Automatic Title Generation System for Sci-Tech Papers[J]. 数据分析与知识发现, 2023, 7(2): 61-71.
[9] Zhang Siyang, Wei Subo, Sun Zhengyan, Zhang Shunxiang, Zhu Guangli, Wu Houyue. Extracting Emotion-Cause Pairs Based on Multi-Label Seq2Seq Model[J]. 数据分析与知识发现, 2023, 7(2): 86-96.
[10] Li Nan, Wang Bo. Recognition and Visual Analysis of Interdisciplinary Semantic Drift[J]. 数据分析与知识发现, 2023, 7(10): 15-24.
[11] Shi Yunmei, Yuan Bo, Zhang Le, Lv Xueqiang. IMTS: Detecting Fake Reviews with Image and Text Semantics[J]. 数据分析与知识发现, 2022, 6(8): 84-96.
[12] Zheng Jie, Huang Hui, Qin Yongbin. Matching Similar Cases with Legal Knowledge Fusion[J]. 数据分析与知识发现, 2022, 6(7): 99-106.
[13] Li Hui, Hu Jixia, Tong Zhiying. Subject Topic Mining and Evolution Analysis with Multi-Source Data[J]. 数据分析与知识发现, 2022, 6(7): 44-55.
[14] Wu Jiang, Liu Tao, Liu Yang. Mining Online User Profiles and Self-Presentations: Case Study of NetEase Music Community[J]. 数据分析与知识发现, 2022, 6(7): 56-69.
[15] Pan Huiping, Li Baoan, Zhang Le, Lv Xueqiang. Extracting Keywords from Government Work Reports with Multi-feature Fusion[J]. 数据分析与知识发现, 2022, 6(5): 54-63.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938