Archive

Archive for the ‘幻灯片’ Category

ISWC数据的24种武器[2010]

2012/01/05 1条评论

原文写于2010-11-08

http://tw.rpi.edu/weblog/2010/11/08/15-ways-to-explore-iswc-2010-data/

15 (and counting) Ways to Explore ISWC 2010 Data

This year at ISWC, when we worked on the metadata, we have a Data Consuming task force to develop tools that can browse/visualize the data many different ways, e.g., faceted browser, filter browser and mobile browser.

As soon as we have the basic dataset published, we immediately get feedback from people on off-the-shelf tools that can work with the data. The list is quickly growing. I collected the screen shots of some working instances (including tools the metadata committee has built) in a slides. I have no doubt that the number “15” will be changed when the main conference begins …. in 2.5 hours! So expect some updates very quick.

What strikes me is that the number and diversity of data browsers currently available, and many of them are clearly reaching the level of maturity for non-expert users to explore. That was not the case even one year ago. So much has been changed for the Semantic Web in 2010!

P.S. 2012-01-05 later I added more browsing tools, making the count 24.

Advertisements

一个Semantic Media Wiki简短教程

2012/01/03 2 条评论

原文写于2009-07-21, 做于RPI Web Science Summer Research Week (http://tw.rpi.edu/wiki/SummerProgram2009)。时间:30分钟;级别:初级。

Jesse Wang等在ASWC做过一个很棒的SMW的完整教程,感兴趣的可以去看他的Slidesahre主页:http://www.slideshare.net/jiaxinwang

讲座照片:http://tw.rpi.edu/wiki/Image:IMG_0881.jpg

P.S. 今天 Semantic MediaWiki 1.7.0 发布。

用语义维基来写应用

2011/12/27 1条评论

摘要:语义网应用适合于那种数据不断动态变化的情况。另外一个特点,就是它可以打破应用间的界限,打破服务间的界限。用语义维基(Semantic Wiki)做应用的一些例子,本身谈不上什么价值,只是这种思路,我觉得以后可能会有用。

注:关于语义维基基础,参《一个Semantic Media Wiki简短教程》(2009-07-21)

今天和人聊到语义网应用的一些特点,我举了用语义维基(Semantic Wiki)做应用的一些例子。下面引用的,大部分来自我以前的一篇文章:

Jie Bao, Li Ding, Rui Huang, Paul Smart, Dave Braines, Gareth Jones. A Semantic Wiki based Light-Weight Web Application Model, In Proceedings of the 4th Asian Semantic Web Conference, pp. 168-183, 2009

文章里面举了几个应用实例,比如地图应用,本体编辑器。

首先要指出,这个文章并不是说语义维基现在就是很好的开发工具了,或者语义网的应用都应该是这个模式。具体的开发工具,比如Semantic MediaWiki (SMW),还在很早期的阶段,比如IDE啊,Library啊(比如可重用的模板),这些都还没有,也许再过十年才能成熟——可以类比于1995年的JavaScript,直到后来发展为AJAX,才成为不可或缺的利器。这里想讲的,是一种范式(paradigm),就是我觉得,好的语义应用可能被开发的一种方法。

我以前讲过,语义网应用适合于那种数据不断动态变化的应用。你很难定义一个固定的数据schema,然后一劳永逸。相反,你的应用应该有与时俱进的能力。如果用户的需要变化了,你的应用应该可以非常迅速的跟进,甚至不需要你在应用上做什么事,也不需要用户做什么事,而是在用户本身产生的数据里,就体现了这种变化,被你的应用捕捉。

这种与时俱进的能力,要求应用开发也要走一条新路。比如基于语义维基的开发,就把数据的建模,业务的逻辑,界面的构造,大量地转移到“用户”可以控制的领域(见下图)。具体的讲,就是用一大堆浏览器里就可以编辑的模板,把应用变成随时、随地可以更新的东西。这样,传统的服务器和客户端的界限已经模糊了,数据和元数据、业务逻辑(以前通常都是写死在代码里的)的界限也模糊了。这样带来的好处,就好比在浏览器里写博客之于传统的用FTP上传HTML页面,并不是说真的实现了什么原来不能实现的功能,而是提高了演进的能力,降低了演进的代价。

另外我觉得语义应用的一个特点,就是它可以打破应用间的界限,打破服务间的界限。我们在RPI做的实验性开发,在维基上做了博客、任务列表、日历、邮件列表、文献管理系统、个人主页系统,等等很多不同的信息管理工具,而底层的数据,无非都是维基页面(wiki page)。这样的建模,不再是基于“文件”这样一种组织模式,而是统一的,把一切数据的组织都看作关系,而应用不过是大的关系网的一个映射。所以日历啊,邮件啊,都不过是一些模板在一个统一的结构化知识库上的用户界面构造而已(见下图)。你很难说清楚,到底那个triple是属于那个应用的。而一个应用里的改变(比如日历),也就可以自动地激发另一个应用里的改变(比如个人主页)。什么是语义?关系就是语义,通过现有关系推导出新的关系是更强的语义。把数据的结构彻底从应用的界面上解放出来,把智能从代码里转移到数据本身里,这是一种非常有力量的变化。当然,我们用维基做的这些玩具应用,本身都谈不上什么价值,只是这种思路,我觉得以后可能会有用。

这又两年多过去了,又有了新的想法。语义网的应用开发,应该会催生新的编程模式,新的编程语言——就如同Web本身催生了很多新的语言。现在看到的,都只是雏形,难用,但有合理性内核,应该不断加以发展。

2009年论文完整的幻灯片在这里:

分类:语义网, 幻灯片

维基中不可承受之轻

2011/12/22 2 条评论

The Unbearable Lightness of Wiking

这是我2010年5月在Spring SMW Conference 2010 (MIT)上的一个幻灯片。总结了KAHT项目(由DARPA支持)关于语义维基(Semantic Wiki)可用性的一个实验的结果。我们发现,普通用户的语义建模能力,很难产生有意义的语义数据。这不仅是系统本身(Semantic MediaWiki)的问题,更深刻的,是人的认知能力的问题。很多在知识表现学者想当然的问题,在“普通人”,会有完全不同的,“千奇百怪”的想法。

要补充一下:我这个幻灯,现在看,里面的结论(i.e., 需要对SMW做扩展),不见得正确。更多应该思考的,是元数据生态周期和用户心理的问题。

Towards Webtop [2008]

2011/11/24 2 条评论

http://tw.rpi.edu/wiki/Blog:Baojie/Item-50
http://tw.rpi.edu/weblog/2008/07/25/towards-webtop/

2008-07-25

Some of our Tetherless World researchers including me have just written a short paper to sell the idea of constructing a “webtop” using semantic technologies. In short, a webtop is a desktop on the web, that does similar jobs such as managing files, doing word processing, managing contacts, scheduling tasks, emailing, etc. Please see some examples of webtops with pretty GUIs.

Almost one decade ago, there has been hot for a while for the concept of “network computer”. At that time, a network computer means some low-end computer with limited storage and computational capacity that relying on the network to get great power. The webtop idea reminds me of network computer as they, while are different in many aspects, share the same idea of powering users with networked infrastructure. Ten years ago, this vision was tested with physical computers but largely failed, while today, with the advance of technologies, is revived by allowing users to create virtual computers that only exist on the websphere. I have many reasons to believe this time it will not only survive, but also prevail.

[P.S. 2011-11-24 It’s dubbed “Cloud” this time. 也就是坑爹的“云”忽悠。其实云才不是关键。关键是知识管理,把知识从用户行为和生成数据中提取出来(注意,不是挖掘,而是提取,相对容易)。]

One reason is from my personal experience. From about two years ago, I stopped installing many software that have been with me for many years: Encarta is replaced by Wikipedia.com, Outlook is replaced by Gmail, MS Street is replaced by Google Maps, MS Word is replaced by writing in wiki, Powerpoint is replaced by online latex writing with the Beamer package, among a long list of other things. Browser is the application I stayed for more than 80% of time when I’m on my computers. There is indeed a strong need for me to organize all such online applications and data — simply bookmarking is barely a solution. I need something that can organize them, enable me quick access to them, and last but not least, pretty and neat. A webtop does exactly those things.

How semantic technologies help in providing a webtop? Actually, long before the term “ontology” getting popular, users are already creating ontologies on daily basis: email classification, creating file folder trees, grouping contacts or naming a photo as “Wedding picture at Troy”, all those efforts are creating relations between things or annotating a “meaning” to an entity. With semantic technologies, those relations and annotations can be made explicit so that data can be more easily managed and queried. For example, I may query that “find all 2005 photos of my friends”, or “show all meetings (even if they are not called meeting, such as “briefing”) in the past month”. A webtop based on semantic technologies will make such an ability universal to any application on its top.

[P.S. 2011-11-24 嗯,就是语义搜索个人“知识”库。这个不远的将来就可能出现在市场上]

There have been controversies about semantic web ever since that term is coined. I think this is partly because the semantic web community as a whole, failed to provide enough end-user friendly tools that can do something helpful in daily life. I wish to see more tools to help daily web activities: semantic email, semantic blog, semantic calender, semantic abstract of news (a little more than RSS), tagging files (picture, mp3,…) with taxonomy, etc. Even more important, to survive, such an application should never ask users to learn RDF or anything needs more than 3 minutes to understand. Bring such applications together, it’s a webtop. I believe something like this is one of the killer apps the community has long been waiting for.

[P.S. 2011-11-24 现在回来看这个三年前的blog,觉得后悔,为什么浪费了三年不实现这些想法。也不是不想实现,实在是“执行力”不到——比如支配自己时间的权力和能力,比如稳定后方基础的工作,比如将想法转化为现实可行的技术配置,比如PPT的忽悠能力,比如人脉…这些都是今后一年我要重点学习的东西]

{{BlogInfo
|page=Blog:Baojie
|title=Towards Webtop
|visitor=User:Baojie
|date=2008/07/25 00:00 EDT
|source=http://tw.rpi.edu/weblog/2008/07/25/towards-webtop/
|tag=Jie’s_SW_Blog, Webtop
}}

参考:

Jie Bao, Li Ding, Deborah L. McGuinness, James A. Hendler. Towards Social Webtops Using Semantic Wiki, In International Semantic Web Conference (ISWC), Poster Track, 2008 (Download) (Slides) .

Enhanced by Zemanta

语义网与推荐(3)推荐系统基础

2011/11/21 1条评论

找了一些入门的slides来看。语义不语义,其实关系不大

Recommender Systems http://www.slideshare.net/T212/recommender-systems-1311490 【非常基础】

Recommender Engines http://www.slideshare.net/antiraum/recommender-engines 【同上,一般方法综述】

Tutorial: Recommender Systems http://www.recommenderbook.net/media/Tutorial_IJCAI_2011.pdf 【IJCAI 2011上的教程,by Dietmar Jannach & Gerhard Friedrich】

王守崑 – 豆瓣在推荐领域的实践和思考 http://www.slideshare.net/clickstone/ss-2756065 【挺不错,有些经验之谈】

How to build a recommender system http://www.slideshare.net/blueace/how-to-build-a-recommender-system-presentation 【Wakoopa;关于数据的选择,有趣】

Music Recommendation Tutorial  http://www.slideshare.net/ocelma/music-recommendation-tutorial 【虽然是说音乐,技术是通用的】

Music Recommendation and Discovery in the Long Tail http://www.slideshare.net/ocelma/celma-ph-d-defense-1067735 【Oscar Celma的博士答辩,2009】

Social Recommender Systems Tutorial – WWW 2011 http://www.slideshare.net/idoguy/social-recommender-systems-tutorial-www-2011-7446137

Google Tech Talk on Social Recommendation http://www.slideshare.net/dancarroll56/google-tech-talk-on-social-recommendation

更多

Enhanced by Zemanta
分类:语义网, 幻灯片 标签:

增强语义维基查询应答的表达力

2011/05/12 1条评论

Expressive Query Answering For  Semantic Wikis

May 11th, 2001 at Cambridge Semantic Web Meetup

#1 Welcome

#2 Semantic Wiki as a Data Store

Semantic wikis have been increasingly popular in the past a few years. Their popularity may be attributed to many features of “wikiness”, such as being collaborative, simple, easy to learn, informality-tolerate, and evolving-capable. A semantic wiki allows you to start from unstructured, raw data, and gradually adding structures or even semantics to the data by yourself or by others. This approach often works better than many other knowledge management approaches for non-expert users.

The part I love most of semantic wikis is that I can use them as a Web-based light-weight database. A wiki acts as an abstraction over the real data, regardless whether it is in a relational database, in a triple store, or online somewhere else. It also offers an easily-accessible interface that I can do almost all data management tasks from a browser: modeling, querying, and some inferencing. On the top of the wiki abstraction of data, we may build other interesting applications, such as maps, blogs, to-do lists, bibliography repository, and many other things.

#3 Semantic Media Wiki (SMW)

Semantic MediaWiki can be said the most popular semantic wiki system currently available. There are a couple of reasons for the success of semantic wikis in general, and of SMW in particular.

One prominent property shared by almost all semantic wikis  is their simplicity and low-costness. Traditionally, to build a semantic application, one need tools for building ontologies, for annotating data with the ontologies, for querying data, for reasoning with the data and the ontologies, and languages to build the user interface. This involves learning a whole set of languages and tools, such as OWL, Protégé, SPARQL, Jena, Pellet and Java, etc.

For many developers or users, the adoption cost of semantic web technologies is too high and the reward is relatively low. For example, a gym manager wants to build a website with a little bit semantics, will it make sense for him to learn the above set of languages? or to hire a semantic web programmer?

Semantic wikis fill the gap with a low-cost solution for light-weight semantic applications. SMW, for example, provides an integrated environment for ontology building, for data annotation, for reasoning and querying, and for UI building. As it is built on the top of Mediawiki, there are many extensions, from visualization to I/O, that we can use to build applications.

SMW provides a simple modeling language and a query language, which are considerably simpler than RDF and SPARQL, respectively. It is in fact a quite powerful tool and can be seen as a light-weight triple store, and we can build applications on its top.

#4 However, we often need more expressivity

However, despite its power, we often feel that the expressivity of SMW is too limited. For example, there are not inverse properties in SMW: I can not say that “has author” is the inverse of “author of”. Developers often need to use complicated templates and other tricks to work around this limitation.

Another frequently needed feature is transitive property. For example, I may want to say that Nashua is a part of New Hampshire, and New Hampshire is a part of United States; therefore, Nashua is a part of United States.

Similarly, we often need additional expressivity in the query language of SMW. One example is negation, such as to find cities that are not capitals. Another example is counting, for example, to find professors who advise more than 5 students.

#5 Desired Expressivity 

To pick up a right set of expressivity for semantic wiki modeling, we need to balance between expressiveness and simplicity. For example, why not pick OWL 2 QL as SMW data is stored in a relational database anyway? Or why not OWL 2 RL which can be implemented with rule-based reasoning?

To find the right mix of supported features, I believe that what matters the most is not whether the set is maximally expressive, or whether it is tractable for the worst case time complexity. The right criteria might be

1)If users need it
2)If the adoption cost is low

Keeping this in mind, I selected OWL Prime as the subset of OWL supported in the extended SMW modeling language.

For the query language, I extended SMW-QL with negation as failure and cardinality queries.

#6 Formalization

The next question is what semantics to use. OWL adopts the open world assumption (OWA), that is, if something can not be proven true, it is not necessarily false. Databases and many rule systems, on the other hand, adopt the closed world assumption (CWA).

Semantic wiki, is in fact more close to a database than to a knowledge base with OWA. When we query against a wiki, we are, for most of time, only interested in the knowledge mentioned in the wiki. If something is not said in the wiki, we assume that it is false. If we list two authors for a paper, then by default the paper has just the two authors and no others. For another example, if Berlin is not said to be a person, then Berlin is not a person.

A right semantics for SMW, is therefore not that of OWL, but a closed world semantics. For this research, I used datalog, which has a descriptive, closed world semantics, and with well-understood complexity and mature tool support.

For the sake of time, I will not cover the full details of modeling SMW in datalog, but only on the new features. You may refer more details in the backup slides.

#7 SMW-ML+

This slide shows the translation of extended SMW-ML into datalog. Their meanings are similar to the corresponding constructs in RDF or OWL, thus I may not have to explain them in details.

One thing worth noting is that the SameAs relation here is weaker than owl:sameAs, so that in counting, even if SameAS(x,y) is true, x and y are still counted as two individuals.

#8 Translation Rules for SMW-QL

This slide shows the translation of a SMW “ask” query into logic program rules. The query asks for cities that are capital of something. The query is turned into a rule on the right. The head of the rule is a special predicate “result”, which is used to collect all matched results in query answering. Each selection condition is translated into a body item in the rule.

This is a very simple example. For other constructs, such as conjunction, disjunction, subquery, and property chain etc, see the backup slides

 #9 SMW-QL+ : Negations

This slide shows the translation of the extended query language with negation into datalog.

For the second case, why not “C(X), not P(X,Y)” ?

If we have C(a), P(a,b), then the above query will return {a,b}, because C(a) and “not P(a,a)” are both true. Thus, “C(X), not P(X,Y)”  is not a right translation.

#10 SMW-QL+: (Non)qualified Cardinality

Qualified cardinality queries and nonqualified cardinality queries are translated into similar rules using the count function.

“Thing(x)” is added for safeness of the rule, that is, the rule will always return a result. We have a set of rules to ensure that everything is an instance of “thing”.

#11 Implementation

A quick note on the implementation. The backend reasoner I used is DLV, which has won the first ASP competition. In theory, other logic program solvers may be used as well. I have tried clasp, which was the winner of the second ASP competition. The performance of DLV and clasp are similar. I didn’t tried other solvers yet, such as smodels or cmodels. But it should not be too difficult to use them.

The implementation has a file-based mode and a database-based mode. In the database-based mode, real-time changes of instance data will be captured, but it is in general a little slower than the file-based mode.

As a side-benefit of this implementation, you are now able to decouple the content storage of the wiki and the semantic data storage of the wiki. As long as you provide an ODBC interface, your semantic data can be stored anywhere, not necessarily locally. This also enables remote querying of another wiki, or federated query of multiple wikis.

#12 Example 

This page shows a screen shots. On the left we show modeling and query scripts of two pages, using inverse property and transitive property. The query result is shown on the right.

#13 Scalability: Data Complexity

The next two slides show the scalability results. For data complexity, we measure query time as a function of the dataset size, for a fixed query. It is almost linear. This is largely because building an result set, or in DLV’s terminology, an answer set, requires linear time to the number of facts when the number of non-fact rules are small. In this experiment, we have about 100k triples of facts, but only less than 100 rules.

#14 Scalability: Query Complexity

In the second graph, we can see that the query complexity is almost constant. Query complexity measures, for a fixed dataset, how fast query time increases as a function of query size. I have tried several query patterns, and all of them show constant time behavior. It is not true for SMW itself as it translates queries into SQL.

An explanation for the constant time complexity is that the extended query are translated into non-ground rules, which are small when compared with the size of ground facts. For this sake, DLV is sensitive to factbase size in a linear way (probably because of grounding), but is insensitive to the rule set size as long as the factbase size is much larger.

As most semantic wikis as of today have less than 10k pages and 100k triples, the implementation is probably fast enough for typical wiki users.

#15 The SemanticQueryRDFS++ extension

We have released our work as an extension of Semantic MediaWiki, called SemanticQueryRDFS++. You may try it out.

We pick up this name because the OWL Prime subset of OWL has been called but others as RDFS 3.0 or RDFS++, and we believe “RDFS++” may give the best intuition of what is supported by our extension.

#16 Some other work on SMW by us

[a list]

#17 Summary

Summary, we have shown that formalizing SMW using datalog allows us to extend SMW for an expressive subset of OWL,  to implement a SMW query engine that is scalable for typical uses, and, not mentioned in this talk because it only be interesting to logicians, to analyze the reasoning complexity of SMW and our extensions

There are a couple things we want to do in the future. We want to support incremental reasoning so that we don’t have to compute the answer set every time from the scratch. We may support customized reasoning rules; if some users need more advanced reasoning, they should be able to. Finally, for exchanging data with other semantic web application, it would be nice to a translation between SPARQL and the query language of SMW.

[end]

分类:语义网, 幻灯片