In order to accurately extract the information of E-mail, E-mail’s structure and content features are analyzed, and an E-mail pretreatment system based on structure of MIME mail is designed. Using block-treatment and feature identification methods, this system overcomes the shortcomings of informal style and filteres reply lines and advertising lines. The system finally realizes expectative goal of extracting E-mail information quickly and accurately.
胡燕,滕桂法,董素芬,王聃. 基于MIME邮件结构的邮件内容提取技术的研究[J]. 现代图书情报技术, 2008, 24(5): 85-88.
Hu Yan,Teng Guifa,Dong Sufen,Wang Dan. Research on Extracting E-mail Information Based on Structure of MIME Mail. New Technology of Library and Information Service, 2008, 24(5): 85-88.
[1] 汪晓平,钟军.Visual C++网络通信协议分析与应用实现[M].北京:人民邮电出版社,2003:347-380.
[2] 张孝祥,方立勋.Java 邮件开发详解[M].北京:电子工业出版社,2007:64-78.
[3] MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies[S]. Nathaniel Borenstein and Ned Freed, 1994.
[4] KFC 822:Standard for ARPA Internet Text Messages[EB/OL].[2007-09-28]. http://www.ietf.org/rfc/rfco822.txt?number=822.
[5] Carvalho V R, Cohen W W. Learning to Extract Signature and Reply Lines from Email[EB/OL]. [2007-09-28].http://www.cs.cmu.edu/~wcohen/postscript/email-2004.pdf.