摘要看(Ferucci et al 2010, AI magazine)文章 — 这是一个“科普”文章。里面的细节拎出来写，大概可以写一百篇文章。
Building Watson: An Overview of the DeepQA Project
问题回答Question Answering (QA)是信息检索Information Retrieval (IR)的子领域。QA的特点
- 回答自然语言问题(who, where, when, what, etc.)
- 分而治之：If the clue contains more than one statement, Watson decomposes it into subclues.
- 问题分类：The next step is to identify what kind of answer the system is looking for – Lexical Answer Type (LAT); basically this means figuring out which word in the clue that represents the answer.
- 知识来源：本地保存了大量Web文档，做原始知识库。处理生成web snippet + downloadable knowledge bases (dbpedia, freebase, wordnet, yago)
- 结构化知识的作用有限： structured knowledge sources只在2%的时间能直接给出答案
- 问题分析：The next step is to analyse the clue further in order to find out how to search the knowledge base （question analysis ）,主要是NLP
- 从句子到逻辑：The clue fragments are parsed, first syntactically and then semantically, to extract a logical form
- 语义角色分析：Semantic role labeling is a major breakthrough in NLP, giving us algorithms that can identify the type of the subject and object in a parsed sentence: Coreference，named entity recognition， relation extraction
- 搜索的第一步：完备性 （查到尽可能多的文章）The process of searching the extended corpus is divided into two steps – at first the focus is on recall.It does this by querying its corpus in varied ways (Lucene and SPARQL are mentioned),
- 搜索的第二步：精确性（上百个小算法来打分） This list of possible answers is then scored by a small army of standalone algorithms, each of which employs state-of-the-art computational lingusitics techniques to examine a particular aspect of the answer. The focus here is on precision
- 核心贡献：Watson is reading documents and understanding the relations between predicates, subjects, and objects in all kinds of noisy and sometimes contradictory data
下一步，看“摘要”(Ferucci et al 2010, AI magazine)，做摘要的西瓜大丸子汤之摘要。