首页 > 笔记, 语义网 > 笔记: IBM沃森系统摘要的摘要的摘要

笔记: IBM沃森系统摘要的摘要的摘要

沃森系统主页:http://www-03.ibm.com/innovation/us/watson/

摘要看(Ferucci et al 2010, AI magazine)文章 — 这是一个“科普”文章。里面的细节拎出来写,大概可以写一百篇文章。

Building Watson: An Overview of the DeepQA Project
http://www.stanford.edu/class/cs124/AIMagzine-DeepQA.pdf

摘要的摘要看 Deep Question Answering – The AI techniques behind IBM’s Watson

下面是我对摘要的摘要的摘要

问题回答Question Answering (QA)是信息检索Information Retrieval (IR)的子领域。QA的特点

  • 回答自然语言问题(who, where, when, what, etc.)
  • 返回明确答案,IR往往返回整个文档。
其他有趣的点
  • 问题被读出来的时间,一般三秒,计算完毕
  • 分而治之:If the clue contains more than one statement, Watson decomposes it into subclues.
  • 问题分类:The next step is to identify what kind of answer the system is looking for –  Lexical Answer Type (LAT);  basically this means figuring out which word in the clue that represents the answer.
  • 知识来源:本地保存了大量Web文档,做原始知识库。处理生成web snippet + downloadable knowledge bases (dbpediafreebasewordnetyago)
  • 结构化知识的作用有限: structured knowledge sources只在2%的时间能直接给出答案
  • 问题分析:The next step is to analyse the clue further in order to find out how to search the knowledge base (question analysis ),主要是NLP
  • 从句子到逻辑:The clue fragments are parsed, first syntactically and then semantically, to extract a logical form
  • 语义角色分析:Semantic role labeling is a major breakthrough in NLP, giving us algorithms that can identify the type of the subject and object in a parsed sentence: Coreferencenamed entity recognition, relation extraction
  • 搜索的第一步:完备性 (查到尽可能多的文章)The process of searching the extended corpus is divided into two steps – at first the focus is on recall.It does this by querying its corpus in varied ways (Lucene and SPARQL are mentioned),
  • 搜索的第二步:精确性(上百个小算法来打分) This list of possible answers is then scored by a small army of standalone algorithms, each of which employs state-of-the-art computational lingusitics techniques to examine a particular aspect of the answer. The focus here is on precision
  • 硬件就没什么好说的了
  • 核心贡献:Watson is reading documents and understanding the relations between predicates, subjects, and objects in all kinds of noisy and sometimes contradictory data

下一步,看“摘要”(Ferucci et al 2010, AI magazine),做摘要的西瓜大丸子汤之摘要。

Advertisements
分类:笔记, 语义网 标签:

发表评论

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / 更改 )

Twitter picture

You are commenting using your Twitter account. Log Out / 更改 )

Facebook photo

You are commenting using your Facebook account. Log Out / 更改 )

Google+ photo

You are commenting using your Google+ account. Log Out / 更改 )

Connecting to %s

%d 博主赞过: