首页 > 笔记, 语义网 > 笔记:DeepQA (IBM沃森)(3)答案之生成

笔记:DeepQA (IBM沃森)(3)答案之生成

David A. Ferrucci et al,Building Watson: An Overview of the DeepQA Project. AI Magazine 31(3): 59-79 (2010).

==假设生成==

Hypothesis generation takes the results of question analysis and produces candidate answers by searching the system’s sources and extracting answer-sized snippets from the search results

这一步是要收集尽可能多的假设: the system generates the correct answer as a candidate answer for 85 percent of the questions somewhere within the top 250 ranked candidates

综合运用多种搜索方法,Indri, Lucene, SPARQL。triple store的作用主要是对线索里提到的名词或者关系做查找。

找到文献后,要做Candidate Answer Generation:取出和问题相关的字句,准备答案

==软过滤==

Soft Filtering。得到的假设(候选答案很多),要先用一些简单的算法“软”过滤一下,比如算“韩寒”是一个“省”的概率有多大。后面还有另一层过滤,用比较昂贵的方法。过滤后大概剩下100个假设。

==找证据==

对每个假设,要看看有没有充分的证据支持。一个方法是把假设代进问题本身,再去搜索,看能不能搜到。

然后对每个证据打分。有很多不同的打分算法,多于50个。这里有许多推理,比如时态关系,空间关系,还有证据来源的可信度。

==合并和排序==

The goal of final ranking and merging is to evaluate the hundreds of hypotheses based on potentially hundreds of thousands of scores to identify
the single best-supported hypothesis given the evidence and to estimate its confidence — the likelihood it is correct.

很多假设(候选答案)是等价的,要先合并之。

用一个机器学习方法做基于证据的排序。

==其他==

硬件(并行),抢答策略(博弈论)略。

技术讨论完。

Advertisements
分类:笔记, 语义网
  1. 还没有评论。
  1. 2012/04/16 @ 01:31

发表评论

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / 更改 )

Twitter picture

You are commenting using your Twitter account. Log Out / 更改 )

Facebook photo

You are commenting using your Facebook account. Log Out / 更改 )

Google+ photo

You are commenting using your Google+ account. Log Out / 更改 )

Connecting to %s

%d 博主赞过: