Thursday, February 01, 2007
What will Wikipedia be like 5 years from now?
With the continued growth of Wikipedia and its sister projects, it's worth asking what the Wikimedia ecosystem will look like down the road. Here's my vision of what it will and/or should be like.
Necessary functional improvements:
Search. Wikipedia's current internal search program is horrible. It is bizarrely sensitive to case, but lacks all the features we've come to expect from search. Quotation marks mean nothing. Results are often woefully incomplete (I often have to use a site-specific Google search to find what I'm looking for on Wikipedia). The interface is clunky, especially with all the check boxes at the bottom for different namespaces (and the fact that checking/unchecking only registers if you use the right search box, of the three available). But when search finally gets done right on Wikipedia, it will be a great thing; we'll need a new verb to complement "to google" ("look it up on Wikipedia" just doesn't have the same ring). Wikipedia search will be cross-project, with redirects and related entries (Wiktionary and Wikisaurus, Wikimedia Commons, articles in other languages) nested together. It should have some of the elements of Google's search algorithm; the readable text of piped links should affect results, and results should be ordered by a sort of internal PageRank with the option of reordering them by size, date of last edit, etc.
Stable versions and Approved versions. It's been in the works for a while now, but there is still no system for managing stable articles where acceptable edits are few and far between, nor is there a good way to flag vetted versions (e.g., a version approved as a Featured Article). Semi-protection is a mediocre substitute for version control, while proposals to implement similar features manually have been too complicated for the community to accept. For stable, largely complete articles, new edits should not show up until they have been screened by one or a few other editors. And for ultra-stable articles, there should be an integrated system for revision and draft work while the consensus version remains viewable to readers.
Audio/Visual accessibility. Because the major formats are all patented and could potentially have significant use limitations placed on them, Wikipedia uses Ogg files with free and open encoding to store and serve audio and video content. For the most part, users must go through a bit of trouble (i.e., downloading and installing codecs from off-site), although audio content now has rudimentary in-browser support. Obviously, the ideal would be integrated audio-video content without leaving the article; YouTube and Google Video have done this fairly well, though with proprietary technology (Adobe Flash with patented codecs). Video (both historical and user-created) will undoubtedly become a much bigger part of Wikipedia and Commons in the future.
Unified login. Obviously, it would be convenient to have a single account for all the Wikimedia projects. It's been in the works for a while now, but it's more of a convenience for editors (and a correction of a design flaw) than a major improvement.
Metadata handling. The current system of templates, categories, and other article metadata (beyond basic linking and formatting markup) is unintuitive, inconsistent, awkward, and intimidating to new editors, and the categories are difficult to navigate and far less useful than they could be. Something like a metadata namespace, for infoboxes, categories, Featured Article stars, interwiki links and the like, would be very beneficial.
Categories. Related to the metadata issue, the category system needs to be completely overhauled. In the current system, categories must be divided and subdivided to maintain usefulness, and editors (new and established) often apply overly general categories to new articles. Instead, Wikipedia subdivides large categories into more specific ones. Broad categories like "American people" or "Songs" must be constantly monitored so they do not grow out of control. For example, for a given song, the subdivision branches into a wilderness of partially-overlapping subcategories like "songs by year", "songs by artist", "songs by lyricist", "songs by nationality", and "songs by genre", along with a host of other possible orthogonal categories like "songs with sexual themes" and "cat songs". Ideally, categorization would be both simpler and more flexible. Assigning broad categories ("songs", "folk music", "1963", "protest", "Bob Dylan"), with some semantic information ("is", "from", "related to", "performed by") should automatically create appropriate subcategories (Blowin' in the Wind is a song and is folk music , from 1963, related to protest, performed by Bob Dylan).
Hoped-for functional improvements:
Verifiability assessment. Eventually, Wikipedia will need a way to sort articles according to verifiability and sourcing (as a proxy for reliability, the direct measurement of which will always run into the problem of self-reference and the authority of editors). Readers should be able to tell immediately (before even beginning to read) whether an article is based on peer-reviewed articles and scholarly books, mainstream media sources, local or niche-oriented professional journalism, blogs and internet sites, primary sources, etc. Potentially, this could solve some of the perennial contentious issues about notability and the borderline of original research. The volume of material on minor topics (especially related to popular culture, current events, and minor/local institutions) is growing much faster than it can be strictly vetted (and deleted when appropriate) according to the current notability and verifiability guidelines, and there is a lot of material that is de facto acceptable, even if it doesn't strictly comply with the current rules. And a lot of this is good, accurate material that readers and editors find useful. If material with few or potentially unreliable sources is clearly flagged as such, there will be less incentive to wage futile wars of deletionism on what is undeniably valuable. In other words, a compromise between elitist and populist visions of what Wikipedia should be.
Discussion forums. I envision a discussion board for each article, separate from the talk page, where users (editors and readers alike) can discuss the subject of the article without the concern of trying to improve the article. This departs somewhat from the core mission of Wikipedia, but I think it would be beneficial is several ways. First, it would direct most of the irrelevant commentary away from talk pages, making collaboration among editors run more smoothly. Second, it could host ads for the support of the Wikimedia Foundation, without compromising the non-commercial nature of Wikipedia itself. And third, it would enhance the usefulness of Wikipedia at the borders of verifiability; readers who want more than the article has to offer can turn to the other forum participants for the speculation, rumor, and strained interpretation they seek.
Stat tracking. Mainly for performance reasons, Wikipedia does very little in the way of internal stat tracking. But in the long run, it would be useful, both for identifying popular articles and for studying Wikipedia itself. In addition to hit counters for every article, the site should track (without retaining any potential identifying information) visit paths as readers surf from one article to another. And for those with editcountitis, some automatic sophisticated contribution analysis (like what can be done through JavaScript hacks by knowledgeable editors now) would be nice: things like total content added, deleted, histograms of edit size and frequency, etc.
So what will the future Wikipedia be like in a broader sense? Its cultural authority and perceived reliability will continue to increase, but surely both will begin to level off within the next few years. Traditional non-specialist encyclopedias will simply be irrelevant, and probably bankrupt. Given the degree of brand success Wikipedia has already achieved, the chances for a successful fork are quickly approaching nil. Citizendium seemed like it had an outside chance at becoming a viable competitor, but it has been managed poorly thus far and I think the window of opportunity is closing rapidly. Citizendium membership is turning out as odd mix of people who don't edit Wikipedia because it doesn't respect (their) authority enough, and because it respects authority (of published sources) too much; thus, many of the same issues that drive experts away from Wikipedia will show up in Citizendium if it grows large enough to matter. If it retains the GFDL license, Citizendium may have a place as a minor satellite of Wikipedia from which content is occasionally imported.
Wikipedia will also seriously eat away at the specialist encyclopedia market. I expect the viability of specialist encyclopedias will vary by field, according to which experts embrace and contribute to Wikipedia. In general, scientists (especially in the "harder" fields) and mathematicians have shown a great deal more enthusiasm than humanists, with social scientists somewhere between. (I find this ironic, because humanities fields have so much more to gain from an integrated and cross-linked ecology of knowledge; despite constant flux and discipline genesis at the borders and the current rhetorical vogue of "interdisciplinary" research, science topics are relatively self-contained compared to humanities topics.) It's an open question whether the academic culture of the humanities will get on board in a significant way. Unfortunately, I think the Ivory Tower mentality and its paradoxical counterpart of academic careerism (especially in the current tight job market) are too entrenched; I expect participation just to continue with incremental gains through the recruitment of individual humanist Wikipedians.
As more and more people look to Wikipedia as their first (and often only) source for arbitrary information, Wikipedia will begin to seriously encroach on the market share of the search companies. It's entirely possible that one or more of the major portals (most likely Ask.com and Yahoo!) will replace Wikipedia search results with mirrored content with added advertising. And if implemented well, some users might even prefer this; after all, ads results are sometimes just what you were looking for. (Similarly, Wikipedia itself might implement optional ads, which would only appear if explicitly enabled by users.) The ecosystem of value-added and exploitive businesses making a living off of Wikipedia will expand dramatically, which is bound to create plenty of unforeseen issues and controversies. But I don't expect any major crises in that respect, since Wikipedia has always been built with the (legal and practical) potential for commercial exploitation.
The bigger problem will be professional PR and information management. In the next year or two, Wikipedia will have to create a system to deal with the complaints and requests of powerful economic and political entities. The recent Microsoft brouhaha over paid editing is the tip of the iceberg. It will be a challenge to create a system that is acceptable to the community but also acceptable enough to outsiders that they will use it instead of guerrilla editing. However, 5 years from now I think there will be some kind of stable equilibrium through a combination of an official system for dealing with accusations of bias from article subjects and vigilant groups of Wikipedians on the lookout for whitewashing.
In addition to encyclopedias, search, and PR, a number of other industries are going to feel pressure from the free content behemoth of Wikimedia projects. Wikimedia Commons will cut drastically into the market for stock photography, although Getty Images and Corbis will still have control of plenty of images that can't be reproduced, and free media from limited-access venues (like celebrity functions) will still be hard to come by. (Wikipedia has tried, unsuccessfully thus far, to get Wikipedian photographers into red carpet events and award shows.) The glut of easily available images is already prompting stock photography companies to go the MPAA/RIAA route of suing liberally over copyright.
Politically, Wikipedia will do a lot to foster the free culture movement and especially to improve the atmosphere for copyright reform. It's probably too optimistic to expect a reduction of copyright terms within the next five years, but at least any further extension (beyond the atrocious Copyright Term Extension Act of 1998) should be unlikely. Unfortunately, there's no good way to show people how lame 95 year copyright terms are until the great content from the 1920s, 30s, 40s, and 50s starts to come into the public domain. (That stuff is our cultural heritage and ought to be in the public domain already; I think something like 50 years or the life of the author plus 20 is more than enough protection to serve the intended purpose of copyright.)
That's all...the crystal's gone dark.
Posted by Sage at 3:05 PM
Labels: futurology, technology, Wikipedia
3 comments:
- godtvisken said...
- I'd also really like to see better linking between the different languages of wikipedia. Right now all the articles are linked manually and often out of traditional alphabetical order.
- 8:37 PM
- NCurse said...
- Far far the best article on the subject. You've mentioned all the important issues of the future of wiki. BTW, are you an editor or an admin?
- 12:15 AM
- Sage said...
- Ncurse, I'm an editor (Ragesoss). I would go up for adminship, but it's more trouble than it's worth since I don't do vandal-whacking and/or deletion discussion about articles I don't care about. The only things I would probably use the tools for is the occasional history merge or speedy deletion, and to clean up after my students when I make them do Wikipedia assignments.
It's easy to predict the shower of opposes with some variant of "doesn't need the tools", and my fragile psyche can't handle the rejection. ;) - 10:05 AM
2007年2月1日,星期四
随着维基百科及其姊妹计划的持续成长,是时候来考虑考虑维基媒体生态的未来了。以下是我对它未来可能模样的一些看法。
必要的功能改进:
- 搜索。维基百科现在的内部搜索功能相当恐怖。对大小写敏感到诡异,又缺乏常人所能想到的一般搜索功能。引号等于零。搜索结果七零八落到了悲哀的程度(我经常用Google的站点搜索功能来搜索维基百科中我所需的内容)。界面笨拙,特别是底下钩选名字空间的一堆复选框(而且选取/取消动作仅在你使用三个搜索框中特定的那个时才会生效)。但若有一天维基百科的搜索功能得以完善,事情应该非常美妙:我们需要一个新的动词以补充“Google一下”(“查阅维基百科”完全不够响亮)。搜索功能应该能跨项目,包含重定向和相关条目(维基词典和维基俗语(Wikisaurus),维基共享,其他语言的同样条目),并嵌套起来。应该用上某种Google的搜索算法;使用了管道的链接显示出来的部分要影响搜索结果,结果应该按照某种内部PageRank排序,并能按照大小、上次编辑日期等选项排序。
- 稳定版和审核版。目前这已在进展之中,但是仍没有一个管理编纂质量不高条目的稳定版本的机制,也没有一个对核定版本(比如,被选为特色条目的那个版本)进行标记的良好办法。半保护作为版本控制的替代办法其实不怎么样,而手工进行的提议又太复杂不可接受。对于稳定版或者大部已完成的条目,新修改应当在经过其他编辑检验后才生效。而对于极稳定的条目,应该有个用于进行修订和草拟工作的集成系统,并同时将协商中的版本展示给读者看。
- 音像的访问(accessibility)。因为市面上的主要格式都受专利保护,可能会有重大的使用限制,所以维基百科以采用开源自由编码的Ogg文件来贮存和提供音像内容。虽然音频内容现在有基本的浏览器支持,但对绝大部分内容,用户都得经过一点麻烦的手续(即,从外部站点下载安装编码)才能获得。显然,理想状况是不离开条目页面就能获得音像内容;YouTube和Google Video在这上面就做得很好,虽然使用了专利技术(Adobe Flash及专利编码)。(历史的或者用户自创的)视频在未来的维基百科和维基共享毫无疑问会越来越多。
- 统一登录。显然,在所有维基媒体计划上只用一个账号会方便许多。目前这已在进展之中,但这更多是为了编辑的便利(同时也是纠正设计上的瑕疵)而不算一个大的改进。
- 元数据处理。当前的模板系统、分类系统及其他条目元数据(比基本的链接和格式标记语言)对新手来说不合直觉,缺乏一致,笨拙难用,咄咄逼人。而分类也难以浏览,远远没有它们本来该提供的便利。如果能有一些元数据的名字空间,用于信息框、分类、特色条目星标、跨维基链接等等,那会是非常有益的。
- 分类。与元数据的话题相关,分类系统需要彻底检修一翻了。在现在的系统中,分类必须划分得当以保证其不缺乏意义;而(新来的)编辑常常会给新条目加上过大的分类。维基百科将较大的分类划分为一些更具体的小分类。大分类比如“美国人”或者“歌曲”常常需要监视,以免分类下条目过多。像一首歌曲,可能就会被划分进一片互有重叠的分类丛林之中,比如:“按年分类”“按歌手分类”“按作词分类”“按国籍分类”“按流派分类”,并伴随着其他一些无关的分类比如“有关性的歌曲”“有关猫的歌曲”。理想的话,分类应该更简单更灵活些。假定有一些大分类(“歌曲”“民歌”“1963年”“Bob Dylan”),再来一些语义信息(“是”“始于”“关于”“由”)应当能自动产生合适的子分类(Blowin' in the Wind是歌曲,是民歌,始于1963年,关于抗议,由Bob Dylan 演奏)。
希望能有的功能改进:
- 可查证性评估。最终,维基百科需要一种依照可查证性和来源充分性(以作为可靠性的代表,因为直接衡量可靠性往往会遇到自我参照或者编辑权威不够的问题)来整理条目的方法。读者(甚至在开始阅读之前就)应得到直接告知:条目是否受到同行评审,来源是否包括学术书籍、主流媒体来源、本地或营利的专业新闻出版物、网志、网站、第一手资料等等。也许,这能解决一些在知名度和原创研究上长期不休的争论。关于(特别是关于流行文化、当前事件、小/当地机构的)小论题的文字材料增长得很快,但这些文字在当前的知名度和可供查证相关指引下得不到充分审视(和恰当的删减);然而又有许多材料是事实上(de facto)可以接受的,即使并未严格符合当前的准则;而且许多这类良好准确的材料也能给读者和编辑带来便利。如果有潜在少量不可信来源的材料能得到明确标记,那么条目到底是好是坏的问题就不会激起那么多删除主义(deletionism)的无用论战。或者说,这调和了精英和草根的维基观。
- 论坛。我想象,可以每个条目都有一个讨论板,与讨论页面分开,用户(编辑和读者之类)能讨论条目而不关注如何增进条目质量。虽然这与维基百科的核心目标有所区别,但我认为这在几个方面都是有益的。首先,这能将无关议论引导出讨论页面,让编辑的协作更加顺畅;其次,可以在上面挂广告以支持维基媒体基金会,同时不在维基百科的非商业本质上有任何妥协;最后,这也在可供查证的边缘让维基百科变得更加有用,想阅读更多内容的读者可以去论坛寻求一些猜测、八卦、歪评。
- 统计记录。主要是性能原因,维基百科在内部统计记录上做得很少。不过长远来看,这个功能不论是对确定条目热度还是研究维基百科都大有裨益。除了记录每个条目的点击量外,还应该能记录读者在条目间跳转的访问路径(而不保存任何潜在的身份信息)。对于那些过度关注编辑次数的人,给出一些复杂的贡献分析(现在一些技术好的编辑已经通过脚本hack实现了)也不错:添加/删除的内容,编辑数量和频率的直方图,之类的。
那么维基百科广义上的未来又会是怎样?它在文化上的权威性和被认可的可靠性会不断提升,但在未来几年二者都会开始保持一个稳定的状态。传统的非专家百科全书肯定不能望其项背,甚至还可能倒闭。考虑维基百科已经创出的品牌,相似计划想要成功的可能性会越来越小。大众百科似乎有少许可能性能生存下来并成为维基百科的竞争者,不过它至今为止都管理不善,我想机会的大门正在快速紧闭。大众百科的成员结构变成了一个奇怪的组合,有一些成员是因为本身非常尊重(特别是出版物来源的)权威的维基百科对于他们权威的尊重不够而不愿编辑维基百科的;这样下来,许多造成专家们离开维基百科的问题也会在大众百科出现,如果这些问题变得足够显著。如果大众百科采用GFDL,或许它还能有作为维基百科的卫星百科而存在的一席之地,时不时有一些内容得到维基百科的采用。
维基百科同样会蚕食专家百科全书的市场。根据专家们对维基百科的接纳程度和贡献程度,我估计专家百科全书的生存能力会因领域不同而有很大差异。总的来说,科学家(特别是比较“硬”的领域)和数学家表现出来比人文学者热情得多,社会科学家则居中。(我觉得这挺讽刺,人文学科要从整合交叉的知识生态中汲取很多;而除了持续的消长变迁和当前对于“学科间”研究的修辞学热潮,科学相对于人文学科更加自恰一些。)人文学科的学术文化是否能有足够的参与度仍有待考察。不幸的是,我认为象牙塔意识及其对立面学术功利心态(特别是在当前紧俏的职业市场)已根深蒂固;我期待这种参与能伴随着独立的人文维基人的加入能有越来越多的收获。
越来越多的人把维基百科视为他们的首要(往往还是唯一)的特定信息来源,维基百科会步步蚕食搜索公司的市场份额。完全可能出现一个或更多门户网站(像Ask.com或者Yahoo!)将维基百科的站点搜索结果直接显示成条目内容,再加上一些广告。如果做得好,某些用户群甚至会偏好这种方式;毕竟,广告结果总是符合搜索目的。(类似地,维基百科自身也可以实现选择性广告的功能,在用户明确允许的情况下显示。)维持维基百科的附加值和商业资源开发体系会飞速扩张,同时这也必然导致大量无法预料的问题和争议。但是我不认为在这个方面会出现任何大的危机,至少维基百科的建设发展一直(在法律上和实践中)保留着商业开发的潜力。
更大的问题是专业公关和信息管理。在未来一两年中,维基百科将不得不去建立一个机制以应对那些实力雄厚的政治经济实体的抱怨和请求。最近关于微软有偿编辑的骚动就是冰山一角。这个机制的建立会是一个挑战,因为既要让维基社群能够接受,又要让外界能接受并采用条目内容而不是去打编辑游击战。而五年后,我想会有某种稳定的制衡机制,包括某种官方体系来解决条目主角对于条目偏颇的指责,以及一个警醒的维基团体来时刻提防粉饰行为。
除了百科全书、搜索引擎、公关,许多其他产业也会逐渐感到来自维基媒体诸项目组成的自由内容兵团的压力。维基共享会高屋建瓴地冲进照片库服务市场,虽然Getty Images和Corbis仍然掌控大量不可复制的图片,来自限制访问场合(比如名人聚会)的自由媒体内容仍然会难以获得。(维基百科尝试过让维基摄影师去参加隆重事件和颁奖仪式,但迄今为止都失败了。)过量易于获得的图像已经驱使照片库公司开始去通过MPAA/RIAA渠道大搞版权诉讼。
政治上来说,维基百科会极大的促进自由文化运动特别是营造版权改革的氛围。要说未来五年会出现什么对版权条款的削减可能会过于乐观,但至少版权保护不太可能(沿着凶恶的《1998年版权期限延长法案》脚步)继续延伸了。不幸的是,在1920年代、30年代、40年代、50年代的无数内容逐渐进入公有领域之前,没有什么好办法能让人们相信95年版权法是多么没有道理。(这些是我们的文化遗产,理应进入公有领域;我认为像作品发表50年或者作者死后20年的期限已经远远可以满足保护版权的要求。)
以上。
Sage发表于3:05 PM
3条评论
godtvisken说(8:37 PM):
我也希望看到不同语种维基百科有更好的链接。目前条目只能手动添加链接,而且还常常不按字母顺序排列。
NCurse说(12:15 AM):
是这个主题迄今为止最好的文章。谈到了维基未来所有的重要议题。另外,你是编辑还是管理员?
Sage说(10:05 AM):
Ncurse,我是编辑(Ragesoss)。我即将升任管理员,但这事麻烦比便利多,因为我也不做打击破坏工作,也不参与不感兴趣条目的删除讨论。唯一要做的可能就是用此权限不时做一些历史合并和快速删除,收拾整理我的学生完成的我给他们布置的维基百科作业。
容易预测,反对此文的声浪很快就会来到,“此功能多余”之类的说法也会出现,我脆弱的心灵可受不了这种打击。;)
-
生活中不可缺少的十大网站
生活中不可缺少的十大网站
-
Digg,维基百科以及Web 2.0民主的迷思
digg,维基百科的成功真的是体现了群众的智慧吗?研究分析表明,实情可能并非如此。
-
Wikia 准备推出搜索引擎,向 Google 宣战
泰晤士报12月23日报道,Wikipedia 创始人 Jimmy Wales 计划于2007年推出一款全新的搜索引擎,名为Wikiasari。 他的矛头直指Google。Jimmy Wales说到: ...
-
Google Knol开放,挑战维基百科
终于上线了,抢先体验中~
-
丰田模式
2004年年度消费者报告调查中排名前十四款车型中有七款是丰田或者雷克萨斯。这种排名情况已经连续出现了好多年。丰田模式在其中功不可没!
-
英语维基百科达到二百万个条目
2007年9月9日,8:21(UTC时间),英语版的维基百科发布了它的第二百万个条目,这个条目是:El Hormiguero,这是一个有关于一个西班牙流行的电视秀节目,这个条目的...
-
“知其然” 或是 “知其所以然”
我想不到任何一项单独的技能,是我在MIT所学,而今又应用在工作中的。
-
十八万穆斯林要求维基百科删除先知形象
著名的免费在线百科全书网站wikipedia(维基百科)最近陷入大批穆斯林的抱怨中,他们要求网站删除中世纪艺术家描绘的先知穆罕默德形象,但维基百科拒绝了。
相关小组
-
Wikipedia
(25译文)
标签:

维基百科五年展望
翻译: