Summer semester begins for me

[2004-07-13] 食指大动



In Dog we trust


很明显, <枫华园>的编辑所关心的问题, 都是老一代留学生(80年代)的问题, 比如对文革的回忆, 下乡生活, 打工经历.

<枫华园>的未来, 只怕不是很好的.

P.S. 2012-01-06 《枫华园》在我出国前,1999-200年,很给我启蒙的作用。当时感叹,怎么有学问、有见识的都在国外。那时候,用“猫”上网,还是不惜血本把《枫华园》的每一期都下下来看。每次收到邮件订阅(还有《新语丝》),都要一口气读完。等到自己出国,读过几百期,过了几年再回来看,觉得《枫华园》的那群80年代的老留学生们,不能与时俱进,老是纠缠一些过去的事,没劲。到了2009年,《枫华园》终于死掉了(参老郸: 《枫华园》自杀始末)。我今天回来想,其实自己何尝不在重复过去那一代人的经历:在文化的夹层中生活,一不留神就成了文化的活化石。


The future of my family…

P.S. 2012-01-06 这个美梦成真了,就是没有这多小猪就是了。有图为证。


Busy days.

For Visa and INDUS

8:30-10:30 Install Tomcat for Hu


对许多人, blog是露阴癖的另一种表达




复活节, 和Andy, Tang & Ivy吃饭






1- 容易动怒, 现实中和网上世界里. 开车被人嘀一下都觉得不得了了.
2- 泡网,
3- 办事拖拉
4- 见到女生就搭讪,不管说什么话题最后都能转到女人上
5- 极力表现自己的存在, 没事往maillist上发雄文.



Ideas got from today’s meeting

1- on intrusion data
2- on carson’s data
3- on reuters data






Veishea骚乱, 过瘾阿






Sony的Orio真是了不起, 叹为观止.
做学问阿, 不是吹牛


看某邪教的网站, 唯一的感受就是原来在24史里才看得到的荒诞, 其实就发生在我们身边. 两千年了, 怎么在这一点上没有进步呢?


::                                    ::
::         4年单身生活                 ::
:: 10个男人〓〓〓〓〓〓〓1位圣人+9只动物  ::
::                                    ::
A: 我们现在在做游戏 :)

B: 啥

A: 自己设计游戏。你有做过游戏程序设计吗?

B: 没有。什么游戏 啊

A: 我们才开始学,很基础。opengl ,maya什么的,一些老办法。我想做有新意的游戏,做全新的体验模式。

B: 怎么样的

A: 我的想法,总是被人否定。在国内,如果只有概念设计,一般都没人想做。都只希望在一些成熟的引擎和平台上做一些重复的游戏

B: 那是肯定的

A: 没人愿意做新的未知的尝试

B: 没有好处谁干啊!

A: 前景很好的!只是好多的技术人员都没什么自己的想法,都是跟在国外的技术后面。所以他们不想开发自己的东西。像karl carmark那样的天才,国内几乎没见过

B: 我们不需要天才。天才没多大用。

A: 那需要什么?

B: 工人,和政治家

A: ……好多人这样说

B: 呵呵, 因为这被证明是是最有效的社会组织方式

A: 一个动漫公司的老总也这样跟我说

B: 你不要向别人证明你有多新

A: 哈哈,难怪好多天才被饿死

B: 你要证明你能多挣钱

B: 饿死活该

A: ……

B: 世界上被不值得同情的是饿死的。残疾的例外

A: 但是那些活下来的天才改变着一个个时代

B: 能活下来的天才是知道怎么和世界妥协的天才

A: carmark是3d游戏世界的创世人

B: 说实话 我不相信世界上有天才

A: 我相信。我也接触过

B: 人的思想大多是随机的, 和进化中的变异一样

A: 那种浑身是刺,和社会格格不入的人

B: 每个人都有可能产生一些新奇的想法

B: 异端和天才是两回事

A: 这个我同意哦,但真正能坚持自己新奇的想法,并把他们实现的人,我佩服他们

B: 何况现代社会异端是一种时尚

B: 嗯, 实现了就 不一样了

A: 天才往往异端,但异端的并不一定就有才。

B: 关键不在是否天,关键在于实现。要实现就一定要学会妥协

A: 也有激进的出世态度。妥协的态度当然居多。在艺术界这样的例子比较多

B: 我做学生时间长了, 早期对天才也比较崇拜

A: 现在呢?

B: 日子久了, 才发现大多数科学成就不是天才完成的

A: 恩

B: 天才是少量的, 在某个关键点上起作用的人。或者说, 是被正确选择出来的大突变。在每一个问题上, 都有大量的突变, 最后会有一些幸运的被证明”正确”

A: 恩,有点遗传学的道理。和时空观念里的“或然”是同一道理。

B: 是的, 基本上是这样。所以下面两个事实同时成立。一个是90%的论文是垃圾,另一个是我们的科学和技术在飞速的进步。能不能设想, 我们消灭掉那90%的垃圾, 让剩下的10%"天才"来工作呢?这样他们就可以拥有更多的资源。

A: ……不可以

B: 为什么不可以

A: 因为你不知道哪个90%是垃圾

B: 是的, 对其中至少50% 你是无法事先知道的

A: 不知道是哪部分是真正有用的。这也和遗传学,基因学是出于同一道理哦。从整体上做优势选择。

B: 那么对于一个比较奇怪的想法, 一般人自然先有一个怀疑的态度。遗传学里, 大多变异是中性的, 也就是鸡肋。科学研究里, 大多论文也是没有用的, 无害也无益。

B: 嗯, 其实市场也是一样。科学研究也符合市场经济的原则。广而言之, 进化理论和市场竞争, 或者科学理论产生的过程, 都有一个有趣的模型。我这里的模型是进化博弈理论。相同点很多

A: 有道理,我对保健有过一些了解,发现居然有很多生理问题也符合市场经济的原则。没听说过吧

B: 很正常啊

I get tired of HTML and text in only natural language, I get tired of ftp, I also get tired of careful maintenance of my all online resources: Public BBS, Academic Notes, …. , and this homepage. It’s even hard for myself to find useful topics in the homepage, as it becomes bigger and bigger.

It’s time for semantic integration and some intelligence.

I’m trying to build a new site based on Semantic Web. Technologically, it’s a wiki system with the support of a controlled ontology, and some code work to convert old homepage into wiki pages. It will be ready probably in this month.

P.S. 2011-01-05 那个月我确实建了一个基于JSPWiki的个人主页,有了一点ontology在后面管理。后来发展为一个项目,写了一篇文章:

Jie Bao & Vasant Honavar (2004). Collaborative Ontology Building with Wiki@nt – A Multi-agent Based Ontology Building Environment. In Proceedings of the 3rdInternational Workshop on Evaluation of Ontology-Based tools (EON), co-Located with ISWC 2004 (p. 37-46).

我离开ISU后(2008年2月),这个网站就下线了。原网址 http://boole.cs.iastate.edu:9090/popeye

可以去Internet Archive去看:http://wayback.archive.org/web/*/http://boole.cs.iastate.edu:9090/popeye/Wiki.jsp?page=Main




2012/01/05 留下评论



US has different statistic standard in urban population counting. Only 9 cities are listed as one-million-population city

  • Los Angles,
  • San Diego,
  • Phoenix,
  • Dallas,
  • San Antonio,
  • Houston,
  • Chicago,
  • Philadelphia,
  • New York

[encarta map|http://encarta.msn.com/encnet/features/mapcenter/Map.aspx?name=United+States]

However, big cities usually covers more than one county, but the sub urban area is not counted in the city’s population. Whilst in China, lots sub urban agriculural population is also counted in.

In 2000, 49 cities in USA have more than 1 million population<br>

This one says there 47 such cities in 1998.

In 1999, China has 37 one-million-population cities, In 2001 this number is 41


“据报道,最近二十年是中国有史以来城市发展最快的时期。其中一九九○年至二○○一年间,全国地级城市由一百八十八个增加到二百六十九个,人口超百万的特大城市由三十一个增加到四十一个。 ” [Citation|http://www.china.org.cn/chinese/OP-c/318455.htm]

Ames details: [http://www.city-data.com/city/Ames-Iowa.html]

P.S. 2012-01-05。中国百万人口以上的特大城市,2005年达到49个(建设部数据),2008年65个

Mel Gibson的这部片子非常非常的弱智



外星人出场和他们打架那一场, 让我笑得打滚 – 明明是搞笑片吗! 演员越严肃, 搞笑效果越是好!

最后男主角又当了牧师, 重抄传播精神鸦片的伟大使命, 实在是搞笑之极品, 和至尊宝去西天取经是一个道理.

反过说, <大话西游>也是一部科幻片,

佛祖保佑全世界, 让上帝自个保佑美国去吧.


【Film】Signs 灵异象限 — 为什麽外星人恨美国?


不似中东战祸频仍,也不像欧洲多处在两次世界大战中险被夷平,美国内战结束後,虽经历两次世界大战和越南、韩战,战火仅在珍珠港事件波及到美国本土。因此 911 事件给美国带来极大的震撼。失去亲人的伤痛须被抚平,或是震撼、或是悲痛、或是愤怒的情绪需要出口,而经历过情绪的沈淀後,更多人想要一个答案:为什麽?该怎麽做?2002 年的许多电影都可以放在这个文本下去阅读。

Signs 的宣传以 crop circle 为中心。初看这个架式,大家要以为这是一部风格类似 X Files 的电影了。但不是英国南部的田园景致,不是欧洲巫术或鬼怪传说,主角的任务不是揭发火光会或圣殿骑士跨全球的密组织。在 crop circle 的薄薄外衣下包装的是彻底美国式的拓荒精神。

像 Signs 一样以「男人捍卫自己的家园」为主题的电影很多。Signs 一如往例,塑造了这麽一个意象:一栋木造房屋,孤零零地座落在大片田野中间。男主人 Graham Hess (在其他电影里通常还扛着一把衣柜内取出的来福枪)凝视着远方,准备独立抵外侮。警力的出现只是点缀,一切靠男主人的勇气和意志力。而他别无所求,只求保护自己的家园。这不是最卑微又最神圣的愿望吗?这是一部 2002 年的西部片。



主角 Graham 原是牧师,但在妻子出车祸後失去了信仰。车祸後两人最後一次谈话的回忆被造作地大力陈:妻子卡在车祸现场,已经没救了,但很凑巧地一息尚存,很凑巧地让主角刚好还有机会讲完情感洋溢的台词。这麽一个一厢情愿的剧情设计,考量的不是真实性,而是让悲伤的情绪有出口,彷是在与埋在双子星大楼瓦砾堆内死伤者对话。Graham也想问,为什麽?为什麽命运要做这样的安排,让无辜者丧生?上帝的意图是什麽?

但剧情的安排并没有让主角做内在的心灵探索,而是把罪魁祸首用有形的方式具象化。答案不是内在的心灵救赎,而是往外找到敌人。外星人出现了。以迅雷不及掩耳的速度在世界各角落出现。就如同妻子丧生的事件直接击到 Graham 个人的信仰,外星人的出现使人类对宇宙的认识改变了,信仰崩解了,一切价值都受到挑战。但这次,巨变以「外星人」的形象被具体化。外星人大老远来到这个星球,却除了造一些恶意的小恐慌,确立了它们扮演的是「坏人」之外什麽建树都没有,又是一个一厢情愿的设计:我们需要一个敌人。「为什麽」的答案出现了,「怎麽做」就很简单了:把它们打回去吧。

在电影收尾的高潮, Hess 一家人终於和外星人见面。外星人一出手便挟持了小女儿。敌人当然必须要有这样的「懦夫」行为,这恰恰也是布希描述自杀攻击使用的辞。人类与外星人历史性的首次会面,没有 Contact 中 Jodie Foster 与外星智慧充满机锋的对话,也没有电光石火的高科技武器。靠的是弟弟 Merill 拿起棒球棍,重拾当年成为全垒打王的信心, 挥棒把外星人击垮。对运动稍有了解的观众都知道,挥棒动作是针对挥击一颗飞来的棒球最佳化的。把它当成剑道就只是破绽百出的莽夫猛击。但这似乎并不重要,这场对决不是物理性的而是象徵性的,是未知、邪恶的外星文明,对决挥着棒球棍的美国青年、对抗没能成为棒球选手的童年遗憾,对抗家庭价值、对抗赤手空拳保家卫国的决心,对抗美国梦、美国精神。


外星人离开了。主角一家人回到小屋,重拾信仰。面对 911 巨变,对「为什麽?该怎麽做?」的问题,这是 Signs 想说的故事。但我们不能满足於这麽一个一厢情愿的故事。电影的前半段送出了许许多多的问号:它们从哪来?来地球做什麽?为什麽要造 crop circles?为什麽怕水?但电影的高潮过後,这些问题都可以被悬而不答。这是值得忧心之处。面对另一个文明,美国群众潜意识的答案中没有对话与了解。反正「它们」是外星人。我们看不到它们的真面目,但总之先得把窗户用木板钉的密密实实的。後来把它们打跑了,我们每个人回到自己的家,舔舐了伤口,继续过我们的生活。故事结束。它们是谁?不重要。

Gertie, 变了…

电影里, Hess 一家人只是不断地拿木板钉着窗户。而在现实世界中,这层壁垒可以被「保护家园」的诉求推到遥远的前线。2003 新年前夕,布希接受访问时被问到,在经济如此不景气的现况下,发动战争是否恰当?布希回答,如果伊拉克使用他们的武器打到我们家,经济就更糟了。所以,我们得先动手。

岁末新初, 1982 年的电影ET 被选为历来 100 大电影小爱略特遇见了有治愈能力、来地球收集植物标本的外星人,透过这段邂逅,修补了家庭和友谊。

整整 20 年的 2002,爱略特长大了。外星人又来了….

15 (and counting) Ways to Explore ISWC 2010 Data

This year at ISWC, when we worked on the metadata, we have a Data Consuming task force to develop tools that can browse/visualize the data many different ways, e.g., faceted browser, filter browser and mobile browser.

As soon as we have the basic dataset published, we immediately get feedback from people on off-the-shelf tools that can work with the data. The list is quickly growing. I collected the screen shots of some working instances (including tools the metadata committee has built) in a slides. I have no doubt that the number “15” will be changed when the main conference begins …. in 2.5 hours! So expect some updates very quick.

What strikes me is that the number and diversity of data browsers currently available, and many of them are clearly reaching the level of maturity for non-expert users to explore. That was not the case even one year ago. So much has been changed for the Semantic Web in 2010!

P.S. 2012-01-05 later I added more browsing tools, making the count 24.

ISWC Twitter数据动态[2010]

2012/01/05 1条评论



At ISWC 2010, there are several on-going efforts to leverage Twitter data. Some ones that I’m aware of are:

Joshua Shinavier has helped to build a triple store (powered by AllegroGraph) that contains tweets related to the conference, along with basic ISWC metadata. Here is an example of SPARQLing with the triple store (details about tweets with tag #iswc2010 and #iswc). More examples and guide on how to use the triple store will be out soon.

URL: http://flux.franz.com/catalogs/demos/repositories/iswc2010#query

Marian Dörk helped us to visualize tweets at ISWC. You can see the relative traffic by time, the distribution of buzz words at the conference, and who is twittering about what. Marian is looking into interviewing our attendees for the tool – if you have comment, let him (mdoerk@ucalgary.ca) or me know (baojie@cs.rpi.edu)

URL: http://ilab51.cpsc.ucalgary.ca/iswc2010

To be continued.

ISWC2010 Metadata is Online

Below is an announcement I just sent for the ISWC2010 Metadata.


Dear SWers and LODers

ISWC2010 is around the corner and we are very excited about the coming week!

As in previous years, ISWC 2010 provides its basic metadata in RDF. The dataset gives details about authors, organizers, papers, events (e.g., sessions and talks), and some mappings to other linked data. The data is freely available at http://data.semanticweb.org/conference/iswc/2010, and can be downloaded as a single RDF file. There is a SPARQL endpoint [1] for this dataset, as well as for some previous ISWC/ESWC/WWW conferences. For more details about access, please refer [2].

You may view/use the data in many different ways. Any RDF-aware application should be able to access it, e.g., browsing [3][4]. If you use an IPhone/IPad/IPod/Android or Chrome/Safari, you can also look at a mobile browser at iswc.mobi [5] (provided by Alvaro Graves, RPI). Please also note that this year almost all pages on the ISWC 2010 website have some RDFa annotations that you can distill with, e.g., by [6]. We are also working on other user interfaces and additional data, e.g.,about workshops.

An initial list of tools, apps and visualizations for the ISWC 2010 metadata is on the W3C SW Wiki:


Free feel to expand the list if you know other tools that can work with the dataset, or have developed mashup, visualization or any other apps based on the dataset.

Please let me know if you notice missing information or errors in the dataset, or have any suggestion to improve the dataset.

The dataset is made possible by the work of the ISWC 2010 Metadata Committee and help from many members of the SW and LOD community. I would like to thank all of you who supported this work in one way or another.

I wish you will have fun playing with the data, as well as participating the conference, either onsite or remotely!


[1] http://data.semanticweb.org/sparql
[2] http://data.semanticweb.org/documentation/user/faq#get_data
[3] http://linkeddata.uriburner.com/about/html/http://data.semanticweb.org/conference/iswc/2010/paper/498
[4] http://iwb.fluidops.com/resource/semantic:person/ian-horrocks
[4] http://iswc.mobi
[5] http://www.w3.org/2007/08/pyRdfa/extract?uri=http://iswc2010.semanticweb.org/accepted-papers/123

ISWC 2010 Attendees by Country

This is a preliminary stat of ISWC attendees by country. Data is from registration list as of Oct 28, 2010, and the final data may be a little different. Also note that it’s different from the author/organizer data.

By no surprise, the host country China has the most attendees (15.4%). US, UK and Germany follow as the next biggest players (14.9%, 14.4%, 12.6%), respectively. By continents, we have

Europe: 60.05%
Asia: 21.69%
Americas: 17.72%
Australia: 1.06%
Africa: 0.53%
Antarctica: 0% – not surprising, isn’t it ?

Clearly, Europe is still the most active in Semantic Web research. We also see this fact by various other statistics, e.g., orgnaizations involved in recent semantic web events.

======== geek separation line ========

The following tools are used:

  • The original data is in spreadsheet. Countries are given in ISO 3166-1 alpha-2 code, e.g., Netherlands is NL, which many people are not familiar with. I found the code-to-name mapping from another source. However, as Excel can’t do join, I imported the two csvs into one RDF file using TopBraid Composer (here is how).
  • A SPARQL query is used to do the join, the result is saved as a Google Docs spreadsheet.
  • The spreadsheet is visualized by Datapress.

The SPARQL query I used

SELECT count(?subject) ?n
?subject a :Person ;
:country [:name ?n] .

From the end user point of view, while all the tools I used are wonderful, what I really wish to have is some integrated environment that I can do all the above together, ideally all in browser – and even more ideally – I don’t have to know that there are RDF and SPARQL underneath, just like now I don’t have to worry about Javascript, JSON or Google Charts since they are all hidden from the interface.

It’s really likely that I missed some tools that can do the job easier, as semantic tools have been mushroomed in the recent years. I will keep looking for better ways to visualize ISWC data – in the end, I wish everyone, especially non-prorammers, can do it. That will show the most of the beauty of semantics. I believe we are very close to that.

{{Exhibit}} {{Footnotes}}


A Word Cloud for ISWC 2010


Sharon Myrtle Paradesi of the DIG group, MIT, has helped us to generate a tag cloud for ISWC 2010. The input is abstracts of papers in the proceedings (i.e., research track, in-use track, doctor consortium track, and invited talks) and poster/demo proceedings. While I don’t know the full details of Sharon’s techniques, she applied a set of NLP algorithms (e.g., stemming, casing, and stop word removal, etc.) to make the cloud.

As one may expect, Ontology, Semantic, Data, Query and Web are most visible. OWL, RDF and SPARQL are in comparable size, while RIF is not seen. Also notable to mention that MediaWiki is in the cloud (confession: I’m biased).

It would be interesting to compare the evolution of such word clouds of ISWC year by year – ISWC actually has 8 years of metadata since 2003.  I hope Sharon and I will find time to do it sometime in the future.

P.S. 2012-01-04 可以和ISWC 2011的标签云做一个比较。注:2011组委会没有用我上面引用的这个2010标签云,而是重做了一个,所以有所不同。

图片出处http://semanticweb.com/report-from-day-3-at-iswc_b24204 (by Juan Sequeda)

图中的女士是2011大会主席Natasha Noy

Timeline of ISWC 2010 Main Conference Talks


This is another visualization using Datapress
It shows talks at the main conference of ISWC 2010.
{{Exhibit}} {{Footnotes}}


Visualizing Data using Datapress

I attended a seminar at MIT last Friday (for this year, I’m a part-time member of the DIG group there). Edward Benson gave an impressive demo on Datapress, an extension of WordPress that can enable non-geeks to import and visualize data in their blogs.

Since our TW blog is based on WordPress, I installed the extension and began to try. The installation was surprisingly smooth, just a few clicks and it’s done in 30 seconds!

The first thing I want to try is to visualize the ISWC 2010 dataset I recently built. Since Datapress does not yet support importing from RDF, I created a spreadsheet using a SPARQL query in TopBraid Composer:

SELECT distinct ?l ?lat ?long
?s swc:isSubEventOf iswc2010:research-track .
?s swc:isSuperEventOf ?p .
?p swc:hasRelatedDocument ?d .
?d foaf:maker ?m.
?m swrc:affiliation ?o.
?o rdfs:label ?l .
?o foaf:based_near ?b .
?b geo:lat ?lat.
?b geo:long ?long .

There are some minor format requirements (I didn’t get it right in the first try -Ted helped me to identify the problem)
* the first line of the spreadsheet should be headers, and the “key” line should have “{{label}}”
* To show on a map, coordinates should be shown as Lat,Lng. Hence, I need to combine the last two columns into one, separated with a comma.

The next step is to upload it to Google Docs, and share it as a public document (can be viewed here)

Then, I can go back to the blog post that I’m writing, click a button on Datapress toolbar in the editing interface, add the data by giving it the URL to the Google Docs spreadsheet, and select Map visualization. The process is very user friendly.

You can add multiple visualizations to one post. This is a very handy way to generate visualization using Exhibit. Actually, I have thought about visualizing ISWC data using Exhibit, but didn’t get time (or too lazy) to program. Datapress saved me.

Ted will give the presentation about Datapress at ISWC next week. Don’t miss it if you will also be at Shanghai!

(The map shows locations of research track authors at ISWC 2010.) {{Exhibit}} {{Footnotes}}


OWL 2 Reference Card released

We’re pleased to announce the OWL 2 Reference Card [1]. The Card is meant to be a “cheat sheet” of OWL 2 features printable on a single piece of paper (on both sides). It is based on the OWL 2 Quick Reference Guide [1], which is now a Proposed Recommendation [2] in the OWL 2 Web Ontology Language document set.

Background: OWL 2 [4] is an extension to OWL 1 with a few new functionalities. Some of the new features are syntactic sugar (e.g., disjoint union of classes) while others offer new expressivity, including:

* keys;
* property chains;
* richer datatypes, data ranges;
* qualified cardinality restrictions;
* asymmetric, reflexive, and disjoint properties; and
* enhanced annotation capabilities

Comments and suggestions to the Card are welcome (please send to public-owl-comments@w3.org)

[1] http://www.w3.org/2007/OWL/refcard

[2] http://www.w3.org/2007/OWL/wiki/Quick_Reference_Guide

[3] http://www.w3.org/TR/2009/PR-owl2-quick-reference-20090922/

[4] http://www.w3.org/TR/owl2-overview/

Jie Bao

I will pay delicious $100 for hierarchical tagging

Just saw Jim’s post on What is the Semantic Web really all about?

I have been wondering about this problem too. What is Semantic Web? Yesterday I have asked a question “Why few (or none?) Web 2.0 sites provide hierarchical tagging?” on LinkedIn and get some pretty good answers:


For your convenience, I attached my LinkedIn post at the end of this blog.

There are two things in the answers that draw my attention:
* Many do _not_ believe tags, or even hierarchical tags, are semantic; “semantics” means RDF or triples at least to them;
* Some believe that even implementing a hierarchical tagging system is not easy in engineering or social aspects.

I think these two beliefs, among many other reasons, may explain in part why the “Semantic Web” is still far from a reality. The first is about the overestimation of what is “semantics”: triple is one way to express semantics, but it is a question that whether it is _the_ way. The second is about the underestimation of “Web”-scale: realizing a knowledge system, even if is conceptually “simple”,  on the Web can lead to serious scalability problems, both for machine (can you make <1s response for all queries?) and for people (on changing their way of thinking).

Here is what I believe about “semantic web” (note no-capitalization). First, it is not necessarily “the Semantic Web” (just like there is no “the Mobile Web”), as defined by W3C standards or the layered cake model. Semantics is a way of organizing things, RDF and OWL are some ways to express it, but other ways should be encouraged too and sometimes may work better. Second, tools and services should be “web-ish”, something like a semanticized version of youtube or gmail; after all, “web users” are rarely a bioinformatician or can master a Java-based ontology editor.  Third, starting deployment with very very basic semantics like trees (yeah, I know some will protest) and sameAs, but do it in a very very efficient way – if we can’t even come up with a Web-efficient tree reasoner, then how realistic we can come up with a Web-efficient RDF or OWL reasoner?

Now I’m prepared to dodge tomatoes 😀

by Jie Bao


My original post on LinkedIn (reorganized a bit)

Why few (or none?) Web 2.0 sites provide hierarchical tagging?

Gmail label and delicious tagging are flat, which is troublesome all the time for me. I have to add (unnecessarily) many tags even if they can be easily inferred. I didn’t find an alternative that allows me to organize my tags in a tree or network. Is there any technical or marketing reason?

People have been talking about semantic web a for a while and are looking for a killer app. It’s apparent that hierarchical tagging is semantic, is in high demand, and is relatively easy to do. Why there is none in popular sites?

PS 1: Let me clarify some situations when hierarchical tagging will save me a lot of time: recently I’m reading a book of Qian Mu, a historian, and tagging my notes on delicious with tags “qianmu“; I also want all those notes be tagged with “history“, but I have to always add both “qianmu” and “history”.

Sometimes I want more than one tags to be inferred. For example, when I add “wuxu” (the year of 1898), I want tags “qing“, “china” and “reform” to be added. You will find how trouble it is to add all 4 tags together when you have about 10 notes on “wuxu”.

In another example, I want to share my tags in both Chinese and English. If I can define two subclass relations between two tags, each in a different language, I will not have to always add the both tags.

Now I have about 1000 tags on delicious. I’m really really in despair need for a hierarchy. I’m willing to pay delicious $100 for such a service.

PS 2: Further clarification: I don’t believe I will need a tagging system that always requires me to pick up terms from a tree, DAG, or a network. I can still freely add tags. But I need some way to clean up my tags from time to time, and organize them. It is just like how i clean up my “download” folder: put them into different folders, and if a folder is too big make some subfolders.

P.S. 2012-01-04 绝对不可以要求用户去维护一个标签树!这是一个极大的认知负担。不要给用户造成思想的负担。丁一虹(Yinghong Ding)在我原帖的回复里说得很好。


A little, tiny semantics in action — from Google

I just read about Google’s Canonical Link Tag. It’s a little application of RDFa’s “rel” property. It is not yet a big thing, but I’m happy for that it is from Google, who seems quite remote from semantic web technologies so far.


“Last week Google, Yahoo, and Microsoft announced support for a new link element to clean up duplicate urls on sites. The syntax is pretty simple: An ugly url such as http://www.example.com/page.html?sid=asdf314159265 can specify in the HEAD part of the document the following:

That tells search engines that the preferred location of this url (the “canonical” location, in search engine speak) is http://example.com/page.html instead of http://www.example.com/page.html?sid=asdf314159265 .”

[originally posted on 2008-10-28, http://tw.rpi.edu/weblog/2008/10/28/why-bother/]

Why Bother…

From Talis: “Jim Hendler at the INSEMTIVE 2008 Workshop”

that people will (and do) create metadata when there are obvious and immediate benefits in them doing so. No-one really consciously sits down to share or create metadata: they sit down to do a specific task and metadata drops out as a side-effect.”

I can not agree any more. I have tried to tag all my blogs once upon a time, after a few weeks, I found myself bored because there is no clear, immediate benefits for doing so. I would only tag things that I have to, like to tell my friends a list of posts of the same topic.

The only tagging system that is consistently successful upon me is the gmail labeling: I organize mails related to the same task (like writing a paper) on daily basis, because it is very useful, and immediately useful. Even though, I only label a tiny fragment of all my emails.

I have seen too many people having their (PC) desktops full of files and being too lazy to organize them – myself is one of them. Every year I have to spare a day or two to reorganize my harddisk, and dig out the hidden treasures of my “Downloads” folder. I believe for semantic web to be successful, creating an ontology should be at least as easy as and as useful as organizing files on a harddisk.

In fact, people are creating metadata or even ontology everyday: every email sorting, every contact on the cell phone, every folder creating, every calender item, every wiki post, … We just need to make them explicit, and most of all, without bothering the user to click even one more button.

Jie Bao

P.S. 2012-01-03 今天回来看,再次同意得不能再同意了!要实现一个实用的系统,要为用户做减法,而不是要求他提供元数据。

[ 原文写于2008-08-13,  http://tw.rpi.edu/weblog/2008/08/13/cuil-semantic-search/]

Cuil, Semantic Search

Last week, Cuil.com caught my eye. It gave me a very good impression in just 5 seconds (BTW, 10 seconds is a survival maximal for any website for me). First, I tried, as many people may do, my name. It didn’t disappoint me by hitting quite precisely my pages. I also love the grid-based layout. A few minutes later, I found its “Explore by Category” option. It looks like that Cuil has some sort of ontology hierarchies for web pages.

A few “google” results reveal that Cuil may use some clustering technique to build such hierarchies. It is interesting to think that will such hierarchies indeed improve search experience? When I search “Semantic Web”, cuil recommends me to browse “Ontology (computer Science)” and some of its sub category; it also suggests me to look at “James Hendler”‘s homepage. I would say that it will be very useful for exploring.

Building meta data using machine learning technology is a cool thing. On the other hand, I believe that human intervention is also critical. When wikipedia knowledge is used in clustering, I expect some gain in recall or preciseness. As “Ontology (computer Science)” is a wikipedia page, I guess that cuil may have already used wikipedia information in their results.

Also don’t forget the “network effect”. I have created a prefix-based, syntactical gmail label hierarchy for a while. I really like to share part of the hierarchy to my friends, so that when I send a mail labeled with “party”, then they don’t need to relabel it again. If millions of users can share their small hierarchies (not only on gmail, but also on flicker, youtube, twine, etc.), each is connected somehow to hierarchies of friends and families, eventually we will have a very large network of ontologies which may improve search much more than we can do now. Just a random thougt.

P.S. I found one interesting thing. Cuil caches my wiki page at Iowa State University. However, that page should be offline no later than May 2008, while Cuil was online officially only on July 28, 2008. It seems its crawler has been alive for a while.

Jie Bao

P.S.  2012-01-03 cuil.com is offline now, see its’ wikipedia page: http://en.wikipedia.org/wiki/Cuil

分类:语义网, 旧文