电子病历命名实体识别和实体关系抽取研究综述

杨锦锋; 于秋滨; 关毅; 蒋志鹏

doi:10.3724/SP.J.1004.2014.01537

电子病历命名实体识别和实体关系抽取研究综述

doi: 10.3724/SP.J.1004.2014.01537

1.
哈尔滨工业大学语言技术中心网络智能研究室哈尔滨 150001;
2.
哈尔滨医科大学附属第二医院病案室哈尔滨 150086

基金项目:

国家自然科学基金（60975077）资助

详细信息

作者简介:
杨锦锋哈尔滨工业大学博士研究生.主要研究方向为自然语言处理，电子病历信息抽取.E-mail：yangjinfeng2010@gmail.com

通讯作者:
关毅哈尔滨工业大学教授. 主要研究方向为智能信息检索，网络挖掘，自然语言处理，认知语言学.E-mail：guanyi@hit.edu.cn

计量
- 文章访问数: 4694
- HTML全文浏览量: 304
- PDF下载量: 5142
- 被引次数: 0
出版历程
- 收稿日期: 2013-08-30
- 修回日期: 2013-12-18
- 刊出日期: 2014-08-20

An Overview of Research on Electronic Medical Record Oriented Named Entity Recognition and Entity Relation Extraction

1.
Web Intelligence Laboratory, Language Technology Center, Harbin Institute of Technology, Harbin 150001;
2.
Medical Record Room, The 2nd Affiliated Hospital of Harbin Medical University, Harbin 150086

Funds:

Supported by National Natural Science Foundation of China (60975077)

摘要

摘要: 电子病历（Electronic medical records，EMR）产生于临床治疗过程，其中命名实体和实体关系反映了患者健康状况，包含了大量与患者健康状况密切相关的医疗知识，因而对它们的识别和抽取是信息抽取研究在医疗领域的重要扩展. 本文首先讨论了电子病历文本的语言特点和结构特点，然后在梳理了命名实体识别和实体关系抽取研究一般思路的基础上，分析了电子病历命名实体识别、实体修饰识别和实体关系抽取研究的具体任务和对应任务的主要研究方法. 本文还介绍了相关的共享评测任务和标注语料库以及医疗领域几个重要的词典和知识库等资源. 最后对这一研究领域仍需解决的问题和未来的发展方向作了展望.
- 电子病历 /
- 命名实体识别 /
- 实体关系抽取 /
- 共享评测任务
Abstract: Electronic medical records (EMRs) are generated in the process of clinical treatments. Named entities and entity relations in EMRs reflect patients' health conditions and represent patients' personalized medical knowledge. Consequently, named entity recognition and entity relation extraction on EMR are important expansion of information extraction in the medical domain. In this paper, the language characteristic and structure features of EMR narratives are firstly discussed, and then general methods for named entity recognition and relation extraction are sketched out. Furthermore, this paper introduces and analyzes the tasks and corresponding methods for named entity recognition, entity assertion recognition and relation extraction of EMR in detail. Related shared evaluation tasks and annotated corpora as well as several important dictionaries and knowledge bases are also introduced. Finally, problems to be handled and future research directions are proposed.
- Electronic medical record (EMR) /
- named entity recognition /
- entity relation extraction /
- shared task

HTML全文

参考文献(132)

[1]	[2] Wasserman R C. Electronic medical records (EMRs), epidemiology, and epistemology: reflections on EMRs and future pediatric clinical research. Academic Pediatrics, 2011, 11(4): 280-287
[2]	[3] Uzuner O, Mailoa J, Ryan R, Sibanda T. Semantic relations for problem-oriented medical records. Artificial Intelligence in Medicine, 2010, 50(2): 63-73
[3]	[4] Demner-Fushman D, Chapman W W, McDonald C J. What can natural language processing do for clinical decision support? Journal of Biomedical Informatics, 2009, 42(5): 760-772
[4]	[5] Eysenbach G. Recent advances: consumer health informatics. British Medical Journal, 2000, 320(7251): 1713-1716
[5]	Lin Dong, Shao Jun-Li. A general and practical diagnosing and treating expert system of medicine. Acta Automatica Sinica, 1995, 21(3): 380-382(林东, 邵军力. 医学诊疗领域通用专家系统设计与实现. 自动化学报, 1995, 21(3): 380-382)
[6]	[7] Sager N, Friedman C, Lyman M S. Review of Medical language processing: computer management of narrative data. Computational Linguistics, 1989, 15(3): 195-198
[7]	[9] Uzuner O, Luo Y, Szolovits P. Evaluating the state-of-the-art in automatic de-identification. Journal of the American Medical Informatics Association, 2007, 14(5): 550-563
[8]	Uzuner O, Solti I, Cadag E. Extracting medication information from clinical text. Journal of the American Medical Informatics Association, 2010, 17(5): 514-518
[9]	Xu Yong-Dong, Quan Guang-Ri, Wang Ya-Dong. Research of electronic medical record key information extraction based on HL7. Journal of Harbin Institute of Technology, 2011, 43(11): 89-94(徐永东, 权光日, 王亚东. 基于HL7的电子病历关键信息抽取技术研究. 哈尔滨工业大学学报, 2011, 43(11): 89-94)
[10]	Uzuner O, South B R, Shen S, DuVall S L. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association, 2011, 18(5): 552-556
[11]	Chapman W W, Bridewell W, Hanbury P, Cooper G F, Buchanan B G. A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of Biomedical Informatics, 2001, 34(5): 301-310
[12]	Zheng J P, Chapman W W, Crowley R S, Savova G K. Coreference resolution: a review of general methodologies and applications in the clinical domain. Journal of Biomedical Informatics, 2011, 44(6): 1113-1122
[13]	Tian Y H. Coreference Resolutionon Entities and Events for Hospital Discharge Summaries [Master dissertation], Massachusetts Institute of Technology, USA, 2007
[14]	Uzuner O, Bodnari A, Shen S Y, Forbush T, Pestian J, South B R. Evaluating the state of the art in coreference resolution for electronic medical records. Journal of the American Medical Informatics Association, 2012, 19(5): 786-791
[15]	Filannino M. Temporal expression normalisation in natural language texts. ArXiv Preprint, ArXiv Preprint arXiv: 1206.2010, 2012
[16]	UzZaman N, Llorens H, Allen J, Derczynski L, Verhagen M, Pustejovsky J. TempEval-3: Evaluating events, time expressions, and tem-poral relations. ArXiv Preprint, ArXiv Preprint arXiv: 1206.5333, 2012
[17]	Zhou X J, Li H M, Lu X D, Duan H L. Temporal expression recognition and temporal relationship extraction from Chinese narrative medical records. In: Proceedings of the 5th International Conference on Bioinformatics and Biomedical Engineering. Wuhan, China: IEEE, 2011. 1-4
[18]	Sun W, Rumshisky A, Uzuner O. Evaluating temporal relations in clinical text: 2012 I2B2 challenge. Journal of the American Medical Informatics Association, 2013, 20(5): 806-813
[19]	Tange H J, Hasman A, Robbe P F, Schouten H C. Medical narratives in electronic medical records. International Journal of Medical Informatics, 1997, 46(1): 7-29
[20]	McDonald C J, Overhage J M, Tierney W M, Dexter P R, Martin D K, Suico J G, Zafar A, Schadow G, Blevins L, Glazener T, Meeks-Johnson J, Lemmon L, Warvel J, Porterfield B, Warvel J, Cassidy P, Lindbergh D, Belsito A, Tucker M, Williams B, Wodniak C. The regenstrief medical record system: a quarter century experience. International Journal of Medical Informatics, 1999, 54(3): 225-53
[21]	Fries J F. Time-oriented patient records and a computer databank. Journal of the American Medical Association, 1972, 222(12): 1536-1542
[22]	Weed L L. Medical records that guide and teach. New England Journal of Medicine, 1968, 278(12): 593-600
[23]	Jacobs L. Interview with Lawrence Weed, MDthe father of the problem-oriented medical record looks ahead. The Permanente Journal, 2009, 13(3): 84-89
[24]	Bossen C. Evaluation of a computerized problem-oriented medical record in a hospital department: does it support daily clinical practice? International Journal of Medical Informatics, 2007, 76(8): 592-600
[25]	Lynette H, Sager N. Automatic information formatting of a medical sublanguage. In: Proceedings of the 1982 Sublanguage: Studies of Language in Restricted Semantic Domains. Berlin, German: Walter de Gruyter, 1982. 27-80
[26]	Friedman C, Kra P, Rzhetsky A. Two biomedical sublanguages: a description based on the theories of Zellig Harris. Journal of Biomedical Informatics, 2002, 35(4): 222-235
[27]	Meystre S M, Savova G K, Kipper-Schuler K C, Hurdle J F. Extracting information from textual documents in the electronic health record: a review of recent research. Yearbook of Medical Informatics, 2008, 47(Suppl 1): 128-144
[28]	O'Donnell H C, Kaushal R, Barron Y, Callahan M A, Adelman R D, Siegler E L. Physicians' attitudes towards copy and pasting in electronic note writing. Journal of General Internal Medicine, 2009, 24(1): 63-68
[29]	Hammond K W, Helbig S T, Benson C C, Brathwaite-Sketoe B M. Are electronic medical records trustworthy? Observations on copying, pasting and duplication. In: Proceedings of the 2003 American Medical Informatics Association 2003 Annual Symposium. Washington DC, USA: AMIA, 2003. 269-273
[30]	Wilcox L, Lu J, Lai J, Feiner S, Jordan D. ActiveNotes: computer-assisted creation of patient progress notes. In: Proceedings of the 27th International Conference Extended Abstracts on Human Factors in Computing Systems. New York, USA: ACM Press, 2009. 3323-3328
[31]	Wilcox L, Lu J, Lai J, Feiner S, Jordan D. Physician-driven management of patient progress notes in an intensive care unit. In: Proceedings of the 28th International Conference Extended Abstracts on Human Factors in Computing Systems. New York, USA: ACM Press, 2010. 1879-1888
[32]	Grishman R, Sundheim B. Message Understanding Conference-6: a brief history. In: Proceedings of the 16th conference on Computational linguistics-Volume 1. Stroudsburg, PA, USA: Association for Computational Linguistics, 1996. 466-471
[33]	Lang Jun, Qin Bing, Liu Ting, Li Zheng-Hua, Li Sheng. Number type recognition of Chinese personal noun phrase. Acta Automatica Sinica, 2008, 34(8): 972-979 (郎君, 秦兵, 刘挺, 李正华, 李生. 中文人称名词短语单复数自动识别. 自动化学报, 2008, 34(8): 972-979)
[34]	Tang Bu-Zhou, Wang Xiao-Long, Wang Xuan. Confidence-weighted online sequence labeling algorithm. Acta Automatica Sinica, 2011, 37(2): 188-195(汤步洲, 王晓龙, 王轩. 置信度加权在线序列标注算法. 自动化学报, 2011, 37(2): 188-195)
[35]	Doddington G, Mitchell A, Przybocki M, Ramshaw L, Strassel S, Weischedel R. The automatic content extraction (ACE) program tasks, data, and evaluation. In: Proceedings of the 2004 International Conference on Language Resources and Evaluation. Lisbon, Portugal: European Language Resources Association, 2004. 837-840
[36]	Wang Ning, Ge Rui-Fang, Yuan Chun-Fa, Wong K F, Li Wen-Jie. Company name identification in Chinese financial domain. Journal of Chinese Information Processing, 2002, 16(2): 1-6 (王宁, 葛瑞芳, 苑春法, 黄锦辉, 李文捷. 中文金融新闻中公司名的识别. 中文信息学报, 2002, 16(2): 1-6)
[37]	Lin X D, Peng H, Liu B. Chinese named entity recognition using support vector machines. In: Proceedings of the 2006 International Conference on Machine Learning and Cybernetics. Guangzhou, China: IEEE, 2006. 4216-4220
[38]	Zhao Jian. Research on Conditional Probabilistic Model and Its Application in Chinese Named Entity Recognition [Ph.D. dissertation], Harbin Institute of Technology, China, 2006(赵健. 条件概率模型研究及其在中文名实体识别中的应用 [博士学位论文], 哈尔滨工业大学, 中国, 2006)
[39]	Finkel J R, Grenager T, Manning C. Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2005. 363-370
[40]	Finkel J R, Manning C. Joint parsing and named entity recognition. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2009. 326-334
[41]	Nadeau D, Sekine S. A survey of named entity recognition and classification. Lingvisticae Investigationes, 2007, 30(1): 3-26
[42]	Ke X, Li S Z. Chinese organization name recognition based on co-training algorithm. In: Proceedings of the 3rd International Conference on Intelligent System and Knowledge Engineering. Xiamen, China: IEEE, 2008. 771-777
[43]	Nadeau D. Semi-supervised Named Entity Recognition: Learning to Recognize 100 Entity Types with Little Supervision [Ph.D. dissertation], University of Ottawa, Canada, 2007
[44]	Ando R K, Zhang T. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 2005, 6: 1817-1853
[45]	Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 2011, 12: 2493-2537
[46]	Zhang Qi. Research on Entity Relation Recognition in Information Extraction [Ph.D. dissertation], University of Science and Technology of China, China, 2010 (张奇. 信息抽取中实体关系识别研究 [博士学位论文], 中国科学技术大学, 中国, 2010)
[47]	Swanson D R. Complementary structures in disjoint science literatures. In: Proceedings of the 14th annual international ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 1991. 280-289
[48]	Cohen A M, Hersh W R. A survey of current work in biomedical text mining. Briefings in Bioinformatics, 2005, 6(1): 57-71
[49]	Chen J X. Automatic Relation Extraction Among Named Entities from Text Contents [Ph.D. dissertation], National University of Singapore, Singapore, 2006
[50]	Che Wan-Xiang, Liu Ting, Li Sheng. Automatic entity relation extraction. Journal of Chinese Information Processing, 2004, 19(2): 1-6(车万翔, 刘挺, 李生. 实体关系自动抽取. 中文信息学报, 2004, 19(2): 1-6)
[51]	Aone C, Ramos-Santacruz M. REES: a large-scale relation and event extraction system. In: Proceedings of the 6th Conference on Applied Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2000. 76-83
[52]	Agichtein E, Gravano L. Snowball: Extracting relations from large plain-text collections. In: Proceedings of the 5th ACM conference on Digital libraries. New York, USA: ACM, 2000. 85-94
[53]	Bunescu R C, Mooney R J. Learning to extract relations from the web using minimal supervision. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL' 07). Prague, Czech Republic, 2007. 576-583
[54]	Zhang Z. Weakly-supervised relation classification for information extraction. In: Proceedings of the 13th ACM International Conference on Information and Knowledge Management. New York, USA: ACM, 2004. 581-588
[55]	Hasegawa T, Sekine S, Grishman R. Discovering relations among named entities from large corpora. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2004. 415
[56]	Chen J X, Ji D D, Tan C L, Niu Z Y. Unsupervised feature selection for relation extraction. In: Proceedings of the 2005 International Joint Conference on Natural Language Processing. Jeju Island, Korea: Springer, 2005. 262-267
[57]	Zhang Zhi-Tian. The Research of Relation Extraction with Unsupervised Method [Master dissertation], Harbin Institute Technology, China, 2007(张志田. 无监督关系抽取方法研究 [硕士学位论文], 哈尔滨工业大学, 中国, 2007)
[58]	Zhang Y, Zhou J. A trainable method for extracting Chinese entity names and their relations. In: Proceedings of the 2nd Workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2000. 66-72
[59]	Suchanek F M, Ifrim G, Weikum G. Combining linguistic and statistical analysis to extract relations from web documents. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2006. 712-717
[60]	Sleator D, Temperley D. Parsing English with a Link Grammar, Technical Report CMU-CS-91-196, School of Computer Science, Carnegie Mellon University, USA, 1991
[61]	Brin S. Extracting patterns and relations from the world wide web. The World Wide Web and Databases, 1999, 1590(2): 172-183
[62]	Ning Hai-Yan. Comparative Study of Automatic Entity Relation Extraction [Master dissertation], Harbin Insititute of Technology, China, 2010 (宁海燕. 实体关系自动抽取技术的比较研究 [硕士学位论文], 哈尔滨工业大学, 中国, 2010)
[63]	Fader A, Soderland S, Etzioni O. Identifying relations for open information extraction. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2011. 1535-1545
[64]	Carlson A, Betteridge J, Kisiel B, Settles B, Hruschka E R, Mitchell T M. Toward an architecture for never-ending language learning. In: Proceedings of the 24th AAAI Conference on Artificial Intelligence. Georgia, USA: AAAI, 2010. 1306-1313
[65]	Suchanek F M, Kasneci G, Weikum G. YAGO: A core of semantic knowledge unifying wordnet and Wikipedia. In: Proceedings of the 16th International Conference on World Wide Web. New York, USA: ACM, 2007. 697-706
[66]	Biega J, Kuzey E, Suchanek F M. Inside YAGO2s: a transparent information extraction architecture. In: Proceedings of the 22nd International Conference on World Wide Web Companion. Republic and Canton of Geneva, Switzerland: International World Wide Web Conferences Steering Committee, 2013. 325-328
[67]	Kim J D, Ohta T, Tateisi Y, Tsujii J. GENIA corpusa semantically annotated corpus for bio-textmining. Bioinformatics, 2003, 19(Suppl 1): 180-182
[68]	Tanabe L, Xie N, Thom L H, Matten W, Wilbur W J. GENETAG: a tagged corpus for gene/protein named entity recognition. BMC Bioinformatics, 2005, 6(Suppl 1): S3
[69]	Kim J D, Ohta T, Tsuruoka Y, Tateisi Y, Collier N. Introduction to the bio-entity recognition task at JNLPBA. In: Proceedings of the 2004 International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications. Stroudsburg, PA, USA: Association for Computational Linguistics, 2004. 70-75
[70]	Arighi C N, Roberts P M, Agarwal S, Bhattacharya S, Cesareni G, Chatr-Aryamontri A, Clematide S, Gaudet P, Giglio M G, Harrow I, Huala E, Krallinger M, Leser U, Li D, Liu F, Lu Z, Maltais L J, Okazaki N, Perfetto L, Rinaldi F, Saetre R, Salgado D, Srinivasan P, Thomas P E, Toldo L, Hirschman L, Wu C H. BioCreative III interactive task: an overview. BMC Bioinformatics, 2011, 12(Suppl 8): S4
[71]	Xu Wei, Fu Bin, Liu Liu, Yuan Chun-Fa, Li Wen-Jie. Domain extension of Chinese named entity recognition. In: Proceedings of the 9th Chinese National Conference on Computatinal Linguistics. Dalian, China, 2007. 503-508 (徐薇, 付滨, 刘柳, 苑春法, 李文捷. 中文命名实体识别系统的领域扩展, 第九届全国计算语言学学术会议. 大连, 中国, 2007. 503-508)
[72]	Uzuner O, Solti I, Xia F, Cadag E. Community annotation experiment for ground truth generation for the I2B2 medication challenge. Journal of the American Medical Informatics Association, 2010, 17(5): 519-523
[73]	Baldridge J, Osborne M. Active learning and the total cost of annotation. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Barcelona, Spain: Association for Computational Linguistics, 2004. 9-16
[74]	Settles B, Craven M, Friedland L. Active learning with real annotation costs. In: Proceedings of the 2008 NIPS Workshop on Cost-Sensitive Learning. Vancouver, Canada, 2008. 1-10
[75]	Tomanek K, Wermter J, Hahn U. An approach to text corpus construction which cuts annotation costs and maintains reusability of annotated data. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Prague, Czech Republic, 2007. 486-495
[76]	Blum A, Mitchell T. Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory. New York, USA: ACM, 1998. 92-100
[77]	Yarowsky D. Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 1995. 189-196
[78]	Zhu X J, Goldberg A B. Introduction to semi-supervised learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 2009, 3(1): 1-130
[79]	Fernandes E R, Brefeld U. Learning from partially annotated sequences. In: Proceedings of the 2011 European Conference on Machine Learning and Knowledge Discovery in Databases (Volume Part I). Berlin, Heidelberg: Springer-Verlag, 2011. 407-422
[80]	Lou X H, Hamprecht F. Structured learning from partial annotations. ArXiv Preprint, ArXiv Preprint, arXiv: 1206. 6421, 2012
[81]	Hovy D, Hovy E. Exploiting partial annotations with EM training. In: Proceedings of the 2012 NAACL-HLT Workshop on the Induction of Linguistic Structure. Stroudsburg, PA, USA: Association for Computational Linguistics, 2012. 31-38
[82]	Tsuboi Y, Kashima H, Oda H, Mori S, Matsumoto Y. Training conditional random fields using incomplete annotations. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008). Manchster, UK: ACM, 2008. 897-904
[83]	Pan S J, Yang Q. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10): 1345-1359
[84]	Torrey L, Shavlik J. Transfer learning. Handbook of Research on Machine Learning Applications. Hershey, PA: IGI Global, 2009
[85]	Bodenreider O. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Research, 2004, 32(suppl 1): D267-D270
[86]	Friedman C, Alderson P O, Austin J, Cimino J J, Johnson S B. A general natural-language text processor for clinical radiology. Journal of the American Medical Informatics Association, 1994, 1(2): 161-174
[87]	Coden A, Savova G, Sominsky I, Tanenblatt M, Masanz J, Schuler K, Cooper J, Guan W, de Groen P C. Automatically extracting cancer disease characteristics from pathology reports into a disease knowledge representation model. Journal of biomedical informatics, 2009, 42(5): 937-949
[88]	Savova G K, Masanz J, Ogren P V, Tanenblatt M, Masanz J, Schuler K, Cooper J, Guan W, de Groen Piet C. Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. Journal of the American Medical Information Association, 2010, 17(5): 507-13
[89]	Ferrucci D, Lally A. UIMA: an architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering, 2004, 10(3-4): 327-348
[90]	Ye Feng, Chen Ying-Ying, Zhou Gen-Gui, Li Hao-Min, Li Ying. Intelligent recognition of named entity in electronic medical records. Chinese Journal of Biomedical Engineering, 2011, 30(2): 256-262 (叶枫, 陈莺莺, 周根贵, 李昊旻, 李莹. 电子病历中命名实体的智能识别. 中国生物医学工程学报, 2011, 30(2): 256-262)
[91]	Li D C, Kipper-Schuler K, Savova G. Conditional random fields and support vector machines for disorder named entityrecognition in clinical texts. In: Proceedings of the 2008 Workshop on Current Trends in Biomedical Natural Language Processing. Morristown, NJ, USA: Association for Computational Linguistics, 2008. 94-95
[92]	Jiang M, Chen Y, Liu M, Rosenbloom S T, Mani S, Denny J C, Xu H. A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. Journal of the American Medical Informatics Association, 2011, 18(5): 601-606
[93]	Jonnalagadda S, Cohen S T, Wu S, Gonzalez G. Enhancing clinical concept extraction with distributional semantics. Journal of Biomedical Informatics, 2012, 45(1): 129-140
[94]	de Bruijn B, Cherry C, Kiritchenko S, Martin J, Zhu X. Machine-learned solutions for three stages of clinical information extraction: the state of the art at I2B2 2010. Journal of the American Medical Informatics Association, 2011, 18(5): 557-562
[95]	Ogren P, Savova G, Chute C. Constructing evaluation corpora for automated clinical named entity recognition. In: Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC'08). Marrakech, Morocco: European Language Resources Association, 2008. 28-30
[96]	Uzuner O, Goldstein I, Luo Y, Kohane I. Identifying patient smoking status from medical discharge records. Journal of the American Medical Informatics Association, 2007, 15(1): 14-24
[97]	Uzuner O. Recognizing obesity and comorbidities in sparse data. Journal of the American Medical Informatics Association, 2009, 16(4): 561-570
[98]	Aronow D B, Fangfang F, Croft W B. Ad hoc classification of radiology reports. Journal of the American Medical Informatics Association, 1999, 6(5): 393-411
[99]	Goryachev S, Sordo M, Zeng Q T, Ngo L. Implementation and Evaluation of Four Different Methods of Negation Detection, Technical Report, Decision Systems Group, Harvard Medical School, 2006
[100]	Mutalik P G, Deshpande A, Nadkarni P M. Use of general-purpose negation detection to augment concept indexing of medical documents: a quantitative study using the UMLS. Journal of the American Medical Informatics Association, 2001, 8(6): 598-609
[101]	Sohn S, Wu S, Chute C G. Dependency parser-based negation detection in clinical narratives. In: Proceedings of the 2012 AMIA Summits on Translational Science. San Francisco, USA: AMIA, 2012. 1-8
[102]	Harkema H, Dowling J N, Thornblade T, Chapman W W. ConText: an algorithm for determining negation, experiencer, and temporal status from clinical reports. Journal of Biomedical Informatics, 2009, 42(5): 839-851
[103]	Uzuner O, Zhang X, Sibanda T. Machine learning and rule-based approaches to assertion classification. Journal of the American Medical Informatics Association, 2009, 16(1): 109-115
[104]	Demner-Fushman D, Apostolova E, Islamaj D R, Lang F M, Neveol A, Shooshan S E, Aronson A R. NLM's system description for the fourth I2B2/VA challenge. In: Proceedings of the 2010 I2B2/VA Workshop on Challenges in Natural Language Processing for Clinical Data. Boston, MA, USA: I2B2, 2010
[105]	Grouin C, Abacha A B, Bernhard D. CARAMBA: concept, assertion, and relation annotation using machine-learning based approaches. In: Proceedings of the 2010 I2B2/VA Workshop on Challenges in Natural Language Processing for Clinical Data. Boston, MA, USA: I2B2, 2010
[106]	Clark C, Aberdeen J, Coarr M, Tresner-Kirsch D, Wellner B, Yeh A, Hirschman L. MITRE system for clinical assertion status classification. Journal of the American Medical Informatics Association, 18(5): 563-567
[107]	Frunza O, Inkpen D. Extraction of disease-treatment semantic relations from biomedical sentences. In: Proceedings of the 2010 Workshop on Biomedical Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2010. 91-98
[108]	Rink B, Harabagiu S, Roberts K. Automatic extraction of relations between medical concepts in clinical texts. Journal of the American Medical Informatics Association, 2011, 18(5): 594-600
[109]	Stone P J, Dunphy D C, Smith M S, Ogilvie D M. The General Inquirer: A Computer Approach to Content Analysis. Cambridge: MIT Press, 1966
[110]	Ryan R J. Groundtruth Budgeting: A Novel Approach to Semi-Supervised Relation Extraction of Medical Language [Master dissertation], Massachusetts Institute of Technology, USA, 2011
[111]	Wang X, Chused A, Elhadad N, Friedman C, Markatou M. Automated knowledge acquisition from clinical narrative reports. In: Proceedings of the 2008 AMIA Annual Symposium, 2008. 783-787
[112]	Chen E S, Hripcsak G, Xu H, Markatou M, Friedman C. Automated acquisition of disease drug knowledge from biomedical and clinical documents: an initial study. Journal of the American Medical Informatics Association, 2008, 15(1): 87-98
[113]	Roberts A, Gaizauskas R, Hepple M. Extracting clinical relationships from patient narratives. In: Proceedings of the 2008 Workshop on Current Trends in Biomedical Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2008. 10-18
[114]	Bekhuis T. Conceptual biology, hypothesis discovery, and text mining: Swanson's legacy. Biomedical Digital Libraries, 2006, 3(1): 2
[115]	Cameron D, Bodenreider O, Yalamanchili H, Danh T, Vallabhaneni S, Thirunarayan K, Sheth A P, Rindflesch T C. A graph-based recovery and decomposition of Swanson's hypothesis using semantic predications. Journal of Biomedical Informatics, 2013, 46(2): 238-251
[116]	Chapman W W, Nadkarni P M, Hirschman L, D'Avolio D W, Savova G K, Uzuner O. Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. Journal of the American Medical Informatics Association, 2011, 18(5): 540-543
[117]	Pestian J P, Brew C, Matykiewicz P, Hovermale D J, Johnson N, Cohen K B. A shared task involving multi-label classification of clinical free text. In: Proceedings of the 2007 Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2007. 97-104
[118]	Pestian J P, Matykiewicz P, Linn-Gust M. What's in a note: construction of a suicide note corpus. Biomedical Informatics Insights, 2012, 5: 1-6
[119]	Jiang Zhi-Peng, Zhao Fang-Fang, Guan Yi, Yang Jin-Feng. Research on Chinese electronic medical record oriented lexical corpus annotation. High Technology Letters, 2014, 24(6): 609-615 (蒋志鹏, 赵芳芳, 关毅, 杨锦锋. 面向中文电子病历的词法语料标注研究. 高技术通讯, 2014, 24(6): 609-615)
[120]	Xia F. The Segmentation Guidelines for the Penn Chinese Treebank (3.0). Technical Report IRCS-00-06, University of Pennsylvania, USA, 2000
[121]	Xia F. The Part-of-Speech Tagging Guidelines for the Penn Chinese Treebank (3.0). Technical Report IRCS-00-06, University of Pennsylvania, USA, 2000
[122]	Xue N, Xia F. The Bracketing Guide-lines for Penn Chinese Treebank Project. Technical Report IRCS-00-06, University of Pennsylvania, USA, 2000
[123]	Chen Z, Perl Y, Halper M, Geller J, Gu H. Partitioning the UMLS semantic network. IEEE Transactions on Information Technology in Biomedicine, 2002, 6(2): 102-108
[124]	Slaughter L, Ruland C, Rotegard A K. Mapping cancer patients' symptoms to UMLS concepts. In: Proceedings of the 2005 AMIA Annual Symposium, 2005. 699-703
[125]	Jimeno-Yepes A J, Aronson A R. Knowledge-based biomedical word sense disambiguation: comparison of approaches. BMC Bioinformatics, 2010, 11(1): 569-580
[126]	Jonquet C, Shah N H, Youn C H, Callendar C, Storey M A, Musen M A. NCBO annotator: semantic annotation of biomedical data. In: Proceedings of the 8th International Semantic Web Conference. Washington, DC, USA, 2009. 171-172
[127]	Pedersen T, Pakhomov S, McInnes B, Liu Y. Measuring the similarity and relatedness of concepts in the medical domain. In: Proceedings of the 2nd ACM SIGHIT Symposium on International Health Informatics. New York, USA: ACM, 2012. 879-880
[128]	Ruiz-Martinez J M, Valencia-Garcia R, Fernandez-Breis J T, Garcia-Sanchez T, Martinez-Bejar R. Ontology learning from biomedical natural language documents using UMLS. Expert Systems with Applications, 2011, 38(10): 12365-12378
[129]	Rosse C, Mejino J. A reference ontology for biomedical informatics: the foundational model of anatomy. Journal of Biomedical Informatics, 2003, 36(6): 478-500
[130]	Pisanelli D M, Battaglia M, De Lazzari C. ROME: a reference ontology in medicine. In: Proceedings of the 2007 Conference on New Trends in Software Methodologies, Tools and Techniques. Amsterdam, The Netherlands: IOS Press, 2007. 485-493
[131]	Wang X, Thompson P, Tsujii J, Anani-adou S. Biomedical Chinese-English CLIR using an extended CMeSH resource to expand queries. In: Proceedings of the 8th International Conference on Language Resources and Evaluation. Istanbul, Turkey: European Language Resources Association, 2012. 1148-1155
[132]	Shen Tong. The Chinesization and Formalization of Unified Medical Language System [Master dissertation], Harbin Insititute of Technology, China, 2013 (沈彤. 一体化医学语言系统的中文化和形式化表示研究 [硕士学位论文], 哈尔滨工业大学, 中国, 2013)