首页 > 语义网, 工程创业 > schema.org浏览笔记

schema.org浏览笔记

schema.org是最近三大搜索引擎公司(Google, Yahoo, Microsoft)推出的元数据网站(参2006年三大联合推出了sitemaps.org的模式)。就目前的市场看,G是主,Y和M是陪客。这个东西在语义网界争议很大,在SemTech上听到的词都是step back, embarrassing, bizarre, terrible 等——因为它没有用W3C的标准。所以有人(Michael HausenblasRichard Cyganiak)建了个Schema.RDFS.org,把那些schema转化成RDF。

先说我的屁股:我的屁股不在W3C那边,虽然也不一定在Google这边。

带着这样的偏见,我看了看schema.org。

它的格式称为Microdata。该文档目前是HTML Working Group的Last Call Draft(也就是基本完稿了),算HTML5的一部分。不奇怪的是,编辑是Google的Ian Hickson

Microdata看起来象这样:

<div itemscope itemtype ="http://schema.org/Movie">
  <h1 itemprop="name">Avatar</h1>
  <span>Director: <span itemprop="director">James Cameron</span> (born August 16, 1954)</span>
  <span itemprop="genre">Science fiction</span>
  <a href="../movies/avatar-theatrical-trailer.html" itemprop="trailer">Trailer</a>
</div>

也可以内嵌,如

<div itemscope itemtype ="http://schema.org/Movie">
  <div itemprop="director" itemscope itemtype="http://schema.org/Person">
</div>

schema.org提供了一些标准的词汇(全部),如:

一些“高级”的元数据包括

  • 时间,例 <time itemprop=”startDate” datetime=”2011-05-08T19:30″>May 8, 7:30pm</time>
  • 说明取值范围 <link itemprop=”availability” href=”http://schema.org/InStock”/&gt;
  • 不显示的数据:<meta itemprop=”ratingValue” content=”4″ />

schema.org说

The data model used is very generic and derived from RDF Schema (which in turn was derived from CycL, which in turn …)

为什么提到CycL? 大概是因为Ramanathan Guha现在在Google,见他的Blog (Jun 2, 2011)

Google在FAQ里对它的技术路线做了辩护。它说Why not RDFa or microformats?

Focusing on microdata was a pragmatic decision. Supporting multiple syntaxes makes documentation for webmasters more complex and introduces more overhead in terms of defining new formats. Microformats are concise and easy to understand, but they don’t offer an open extensibility mechanism and the reuse of the class tag can cause conflicts with website CSS. RDFa is extensible and very expressive, but the substantial complexity of the language has contributed to slower adoption. Microdata is the most recent well-known standard, created along with HTML5. It strikes a balance between extensibility and simplicity, and is most suitable for building the schema.org. Google and Yahoo! have in the past supported both microformats and RDFa for certain schemas and will continue to support these syntaxes for those schemas. We will also be monitoring the web for RDFa and microformats adoption and if they pick up, we will look into supporting these syntaxes. Also read the section on the data model for more on RDFa. 【总之RDFa太复杂(更不用说RDF或者OWL),Microformat扩展性不好】

Why don’t you support other vocabularies such as FOAF, GoodRelations, etc?

In creating schema.org, one of our goals was to create a single place where webmasters could go to figure out how to mark up their content, with reasonable syntax and style consistency across types. This way, webmasters only need to learn one thing rather than having to understand different, often overlapping vocabularies. A lot of the vocabulary on schema.org was inspired by earlier work like Microformats, FOAF, GoodRelations, OpenCyc, etc. 【言下之意就是大家只学一个schema就够了,而我这个是最棒的】

Peter Mika (Yahoo之Semantic Search guy)说

one of the key problems of the Semantic Web design of the W3C has been that it considered only technical issues, and not the need for a social process that would lead to bootstrap the system with data and schemas… With regard to schemas — we used to call them ontologies, until we found it scared people away — the expectation was that they would be developed in a distributed manner and machines would do the hard job of schema matching or somehow agreements would emerge. However, schema matching is a hard problem to automate….Finding stable and mature schemas with sufficient adoption has eventually become a major pain point.

我基本同意。W3C是要草根民主和计划经济。在现实世界中,还是寡头垄断加市场经济效果更好。你觉得这两点矛盾吗?呵呵。

你和一个Webmaster说一天RDF,不如给他一个工具,给他的页面加了一些schema(5分钟),然后过了1分钟,这个页面就上Google了,这样他就有加metadata的动力。何必纠缠于RDF这个皮相呢?从这个角度,我对Schema.RDFS.org不以为然。

Semantic Web不可能毕其功于一役,有初级阶段(linked data)和初级阶段的初级阶段(schema)。Schema.org就是语义网有Google特色的初级阶段的初级阶段。

P.S. (2011-09-12) Schema.org本身的成败其实无所谓,关键是它在推动这个生态系统形成的过程中所起的示范和教育语义。让商家和个人觉得元数据有用,愿意来发布,目的就达到了。过几年,长江后浪推前浪,Schema.org死在沙滩上也就没什么了。

Advertisements

发表评论

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / 更改 )

Twitter picture

You are commenting using your Twitter account. Log Out / 更改 )

Facebook photo

You are commenting using your Facebook account. Log Out / 更改 )

Google+ photo

You are commenting using your Google+ account. Log Out / 更改 )

Connecting to %s

%d 博主赞过: