首页 > 语义网, 幻灯片 > 增强语义维基查询应答的表达力

增强语义维基查询应答的表达力

Expressive Query Answering For  Semantic Wikis

May 11th, 2001 at Cambridge Semantic Web Meetup

#1 Welcome

#2 Semantic Wiki as a Data Store

Semantic wikis have been increasingly popular in the past a few years. Their popularity may be attributed to many features of “wikiness”, such as being collaborative, simple, easy to learn, informality-tolerate, and evolving-capable. A semantic wiki allows you to start from unstructured, raw data, and gradually adding structures or even semantics to the data by yourself or by others. This approach often works better than many other knowledge management approaches for non-expert users.

The part I love most of semantic wikis is that I can use them as a Web-based light-weight database. A wiki acts as an abstraction over the real data, regardless whether it is in a relational database, in a triple store, or online somewhere else. It also offers an easily-accessible interface that I can do almost all data management tasks from a browser: modeling, querying, and some inferencing. On the top of the wiki abstraction of data, we may build other interesting applications, such as maps, blogs, to-do lists, bibliography repository, and many other things.

#3 Semantic Media Wiki (SMW)

Semantic MediaWiki can be said the most popular semantic wiki system currently available. There are a couple of reasons for the success of semantic wikis in general, and of SMW in particular.

One prominent property shared by almost all semantic wikis  is their simplicity and low-costness. Traditionally, to build a semantic application, one need tools for building ontologies, for annotating data with the ontologies, for querying data, for reasoning with the data and the ontologies, and languages to build the user interface. This involves learning a whole set of languages and tools, such as OWL, Protégé, SPARQL, Jena, Pellet and Java, etc.

For many developers or users, the adoption cost of semantic web technologies is too high and the reward is relatively low. For example, a gym manager wants to build a website with a little bit semantics, will it make sense for him to learn the above set of languages? or to hire a semantic web programmer?

Semantic wikis fill the gap with a low-cost solution for light-weight semantic applications. SMW, for example, provides an integrated environment for ontology building, for data annotation, for reasoning and querying, and for UI building. As it is built on the top of Mediawiki, there are many extensions, from visualization to I/O, that we can use to build applications.

SMW provides a simple modeling language and a query language, which are considerably simpler than RDF and SPARQL, respectively. It is in fact a quite powerful tool and can be seen as a light-weight triple store, and we can build applications on its top.

#4 However, we often need more expressivity

However, despite its power, we often feel that the expressivity of SMW is too limited. For example, there are not inverse properties in SMW: I can not say that “has author” is the inverse of “author of”. Developers often need to use complicated templates and other tricks to work around this limitation.

Another frequently needed feature is transitive property. For example, I may want to say that Nashua is a part of New Hampshire, and New Hampshire is a part of United States; therefore, Nashua is a part of United States.

Similarly, we often need additional expressivity in the query language of SMW. One example is negation, such as to find cities that are not capitals. Another example is counting, for example, to find professors who advise more than 5 students.

#5 Desired Expressivity 

To pick up a right set of expressivity for semantic wiki modeling, we need to balance between expressiveness and simplicity. For example, why not pick OWL 2 QL as SMW data is stored in a relational database anyway? Or why not OWL 2 RL which can be implemented with rule-based reasoning?

To find the right mix of supported features, I believe that what matters the most is not whether the set is maximally expressive, or whether it is tractable for the worst case time complexity. The right criteria might be

1)If users need it
2)If the adoption cost is low

Keeping this in mind, I selected OWL Prime as the subset of OWL supported in the extended SMW modeling language.

For the query language, I extended SMW-QL with negation as failure and cardinality queries.

#6 Formalization

The next question is what semantics to use. OWL adopts the open world assumption (OWA), that is, if something can not be proven true, it is not necessarily false. Databases and many rule systems, on the other hand, adopt the closed world assumption (CWA).

Semantic wiki, is in fact more close to a database than to a knowledge base with OWA. When we query against a wiki, we are, for most of time, only interested in the knowledge mentioned in the wiki. If something is not said in the wiki, we assume that it is false. If we list two authors for a paper, then by default the paper has just the two authors and no others. For another example, if Berlin is not said to be a person, then Berlin is not a person.

A right semantics for SMW, is therefore not that of OWL, but a closed world semantics. For this research, I used datalog, which has a descriptive, closed world semantics, and with well-understood complexity and mature tool support.

For the sake of time, I will not cover the full details of modeling SMW in datalog, but only on the new features. You may refer more details in the backup slides.

#7 SMW-ML+

This slide shows the translation of extended SMW-ML into datalog. Their meanings are similar to the corresponding constructs in RDF or OWL, thus I may not have to explain them in details.

One thing worth noting is that the SameAs relation here is weaker than owl:sameAs, so that in counting, even if SameAS(x,y) is true, x and y are still counted as two individuals.

#8 Translation Rules for SMW-QL

This slide shows the translation of a SMW “ask” query into logic program rules. The query asks for cities that are capital of something. The query is turned into a rule on the right. The head of the rule is a special predicate “result”, which is used to collect all matched results in query answering. Each selection condition is translated into a body item in the rule.

This is a very simple example. For other constructs, such as conjunction, disjunction, subquery, and property chain etc, see the backup slides

 #9 SMW-QL+ : Negations

This slide shows the translation of the extended query language with negation into datalog.

For the second case, why not “C(X), not P(X,Y)” ?

If we have C(a), P(a,b), then the above query will return {a,b}, because C(a) and “not P(a,a)” are both true. Thus, “C(X), not P(X,Y)”  is not a right translation.

#10 SMW-QL+: (Non)qualified Cardinality

Qualified cardinality queries and nonqualified cardinality queries are translated into similar rules using the count function.

“Thing(x)” is added for safeness of the rule, that is, the rule will always return a result. We have a set of rules to ensure that everything is an instance of “thing”.

#11 Implementation

A quick note on the implementation. The backend reasoner I used is DLV, which has won the first ASP competition. In theory, other logic program solvers may be used as well. I have tried clasp, which was the winner of the second ASP competition. The performance of DLV and clasp are similar. I didn’t tried other solvers yet, such as smodels or cmodels. But it should not be too difficult to use them.

The implementation has a file-based mode and a database-based mode. In the database-based mode, real-time changes of instance data will be captured, but it is in general a little slower than the file-based mode.

As a side-benefit of this implementation, you are now able to decouple the content storage of the wiki and the semantic data storage of the wiki. As long as you provide an ODBC interface, your semantic data can be stored anywhere, not necessarily locally. This also enables remote querying of another wiki, or federated query of multiple wikis.

#12 Example 

This page shows a screen shots. On the left we show modeling and query scripts of two pages, using inverse property and transitive property. The query result is shown on the right.

#13 Scalability: Data Complexity

The next two slides show the scalability results. For data complexity, we measure query time as a function of the dataset size, for a fixed query. It is almost linear. This is largely because building an result set, or in DLV’s terminology, an answer set, requires linear time to the number of facts when the number of non-fact rules are small. In this experiment, we have about 100k triples of facts, but only less than 100 rules.

#14 Scalability: Query Complexity

In the second graph, we can see that the query complexity is almost constant. Query complexity measures, for a fixed dataset, how fast query time increases as a function of query size. I have tried several query patterns, and all of them show constant time behavior. It is not true for SMW itself as it translates queries into SQL.

An explanation for the constant time complexity is that the extended query are translated into non-ground rules, which are small when compared with the size of ground facts. For this sake, DLV is sensitive to factbase size in a linear way (probably because of grounding), but is insensitive to the rule set size as long as the factbase size is much larger.

As most semantic wikis as of today have less than 10k pages and 100k triples, the implementation is probably fast enough for typical wiki users.

#15 The SemanticQueryRDFS++ extension

We have released our work as an extension of Semantic MediaWiki, called SemanticQueryRDFS++. You may try it out.

We pick up this name because the OWL Prime subset of OWL has been called but others as RDFS 3.0 or RDFS++, and we believe “RDFS++” may give the best intuition of what is supported by our extension.

#16 Some other work on SMW by us

[a list]

#17 Summary

Summary, we have shown that formalizing SMW using datalog allows us to extend SMW for an expressive subset of OWL,  to implement a SMW query engine that is scalable for typical uses, and, not mentioned in this talk because it only be interesting to logicians, to analyze the reasoning complexity of SMW and our extensions

There are a couple things we want to do in the future. We want to support incremental reasoning so that we don’t have to compute the answer set every time from the scratch. We may support customized reasoning rules; if some users need more advanced reasoning, they should be able to. Finally, for exchanging data with other semantic web application, it would be nice to a translation between SPARQL and the query language of SMW.

[end]

Advertisements
分类:语义网, 幻灯片
  1. 还没有评论。
  1. 2012/01/08 @ 10:50

发表评论

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / 更改 )

Twitter picture

You are commenting using your Twitter account. Log Out / 更改 )

Facebook photo

You are commenting using your Facebook account. Log Out / 更改 )

Google+ photo

You are commenting using your Google+ account. Log Out / 更改 )

Connecting to %s

%d 博主赞过: