• 1071阅读
  • 0回复

2010年的搜索引擎

级别: 管理员
The search engine of 2010

Here are some notional news flashes from the future: May 2008. Google launches G-Life, a substitute for the fallible human memory. By searching your e-mail, instant messages and telephone calls, and with the help of voice recorders set up around the home, you can now recall everything you said or wrote.


November 2008. Yahoo!'s new MobileBuddy, a voice-activated search engine, gives you real answers wherever you are. No more long lists of websites to pick through: just ask it what you want to know. MobileBuddy will also vibrate if the groceries you are about to buy are available more cheaply elsewhere.

October 2009. Regulators uncover e-mails that hint at the scope of Microsoft's search engine ambitions. According to its critics, by building its search engine into Windows, Office and other software, Microsoft is on the way to controlling access to the world wide web.

January 2010. 10 years after America Online bought Time Warner, Google acquires Walt Disney. The mania for internet distribution again has the upper hand over entertainment "content".

Fanciful? Perhaps. But the search-engine business is at the beginning of a wave of innovation that could change many aspects of everyday life and reshape parts of the information industry. Google has demonstrated the power of search, Microsoft and Yahoo! are in hot pursuit and a crowd of other search companies are seeking a gap.

One way they are looking to gain an edge is by bringing more of the world's information within reach of the software "crawlers" that index data so it can be searched. The web represents only a small fraction of what is out there: information locked up in commercial databases, celluloid archives or personal filing cabinets remains beyond reach.

"Five years from now, people will think about searching everything," says Craig Silverstein, the first employee to be hired by Google's founders and now technology director.

A spate of real headlines from recent months shows how fast this work is progressing. All four leading search engines - from Google, Yahoo!, Microsoft and AskJeeves - have come up with better ways to search the hard drives of personal computers. Google is to digitise tens of millions of books from five of the biggest academic libraries. Last week, Yahoo began a search service to find video files on the web, while Google started to offer transcripts of television news.

This, though, is just scratching the surface. "In terms of information consumption, television is probably the largest information channel for consumers," says Mike Lynch, chief executive of Autonomy, a UK search technology company.

The technology to sift through and make sense of television or other video content is advancing (see right). But most of the content remains in unsearchable analogue form or under lock and key. Little work has gone into converting archives of old TV shows into digital form.

"This is not a technical challenge - it's really a question of the business model," says Karen Howe, head of Singing Fish, a search engine owned by America Online that looks for audio and video files. TV networks, like film studios, have yet to find a way to make money from their content over the internet: until they do, there is little incentive to do the laborious work of turning old celluloid into digital files. Many other commercial information companies, from newspaper publishers to database vendors, feel the same.

Even without these commercial sources of information, there is ample data from everyday life that could be digitised and made searchable. Personal telephone calls would be a treasure trove of useful information, says Mr Lynch. If making voice calls over the internet becomes common - turning conversations into streams of data like any other - building a searchable index of your phone calls may not be that far away.

Voice calls are only part of the conversation that occupies people's daily lives: much of the written communication, including e-mail and instant messages, is already in digital form. What if all this could be captured and searched - along with the personal photographs, online web logs and video journals that are becoming an increasing part of life? "There's this whole branch of ephemeral information that needn't be ephemeral," says Mr Silverstein at Google.

A second front in the search wars is being fought over ways to make it easier for people to filter and extract information from the simple list of web pages that most search engines produce.

"It's clear that a list of links, though very useful, doesn't match the way people give information to each other," says Mr Silverstein. The question that he says Google - like others - is now trying to address is: "How can the computer become more like your friend when answering your questions?"

That means giving direct answers to questions, extracting data from online sources rather than giving links to web pages. It also means doing a better job of divining what the searcher is looking for, tailoring results more closely to what, based on past experience, appear to be the user's particular interests.

Most of the new search engines appearing on the web do not claim to come up with better results to a query - in fact, many of them take their results straight from Google or other established engines. Instead, they aim to organise and present information in more useful ways.

Users will want more direct responses to their search queries, the experts acknowledge. "The biggest change we will see in the next five years will be in the way people use computers," says Mr Silverstein. Mobile handsets will become the most common way to find information on the internet, he adds. At that point, most queries would best be made and answered by voice. f the search companies become a more integral part of everyday life, how far will their influence eventually extend - and what impact will they have on other companies that exist to create or distribute information?

The answer depends in part on how the balance of power evolves between the companies that create entertainment and media "content" and those that distribute it on the internet. This is not a new issue: it was the perceived power of the early internet distributors that led Time Warner to sell out to America Online at the height of the dotcom bubble.

Search engines bring a new twist. As omniscient distributors of information, they threaten to be both a content-creator's best friend and worst nightmare. Now, anyone can find your content, making everyone a potential customer - but your best customers can also just as easily find your competitors. For suppliers of information that can easily be turned into a commodity - weather forecasts, stock quotes, dictionary definitions, telephone numbers, maps - this is a serious threat.

As search engines get better at presenting this sort of information to users, it could take on a context that bears little relation to the one in which it was originally published. "Ten years from now, it will be much more difficult to distinguish searching for something from seeing and using it," says Charles Ferguson, a technology author.

Media and entertainment companies with unique content may feel they can wall themselves off from this commoditising trend. That would depend, however, on how much free information will eventually fall within the search engines' reach and whether it can start to rival the quality of proprietary information.

Already, internet blogs and communal internet pages known as wikis (from the Hawaiian for "speedy") are pushing the boundaries of what was known as "user-generated content". Results from Wikipedia, a free encyclopaedia maintained over the internet by volunteers, may not match the standards of publications produced by professional editors, but the service still manages to answer many common questions.

Even if this proves no ultimate threat to proprietary information, it is a reminder that publishers urgently need to find workable business models for delivering their content online.

Search engines themselves are likely to have their own long-term influence determined in part by the extent to which they can insinuate themselves into the wider fabric of internet activity. Microsoft, for instance, already talks of its search engine as a "platform" - a piece of software on top of which others will be able to build their own technology, much as other software developers have ridden on the Windows operating system.

Publishing the "application programming interfaces" of its search engine - the software links that other developers use to connect their own software - is an important part of Microsoft's strategy, says Adam Sohn, director of sales and marketing at its MSN offshoot. "We look for problems that are broad and horizontal," he says. Search is "a capability developers don't want to build every time they write an application", so why not encourage them to build on to the Microsoft technology?

This is a page straight out of Microsoft's usual playbook, says Mr Ferguson, who has long written about how a dominant "platform" seems to emerge during each successive phase of the computer age - from the IBM mainframe to Windows. Will the search market mirror these technologies, becoming a winner-takes-all business?

Apparently aware of the threat, Google also says it is considering opening its application programming interface to let other developers draw on its technology. Last week, fulfilling a promise made last year, it gave advertisers access to the interfaces for its search-related advertising service, giving them more power to influence how their advertisements are displayed.

"Everyone loves to call themselves a platform," says Mr Silverstein. But he adds that, while search may share some of the characteristics of other computing platforms, the amount of work still to be done to make the universe of information searchable will leave the field wide open for a long time to come.

Once this early rush of innovation has passed, will search engines still attract as much as attention as they do now? The fluctuation of internet fashion suggests that they may not, says Mr Lynch. "In the early days [of the internet], search was 'it'," he says. "Then portals were 'it'. Now fashion is swinging back again."

The search engines will also have to figure out new ways to make money when search ceases to be a stand-alone activity conducted from a dedicated website but instead becomes a core feature of many other activities carried out over the internet.

The people behind the search engines, needless to say, see themselves as more than just the latest, passing internet fashion. But even they concede that the function of search engines should eventually be absorbed into the fabric of a more "intelligent" internet.

Finding information would not involve going to a separate place - a search engine - to ask a question. Instead, the answer would present itself wherever you happened to be, and in the most appropriate form. "Search will become more and more important and less and less visible," says Mr Silverstein at Google. "It will be ubiquitous and invisible."

At that stage, depending on your point of view, Google and its rivals would either be one of the most powerful forces shaping everyday life or just another invisible cog in the great Information Age machine that is being created out of the internet.

When it comes to the cutting edge of internet search technology, fans of television sport will probably see the benefits sooner than most.

Couch potatoes five years from now will be able to make their own highlights of the latest games, predicts Hong-Jiang Zhang, managing director of Microsoft's Beijing research centre. "You will just be able to see the exciting parts," he says.

To understand why, consider how sport differs from other video footage that appears on TV. Most information from the natural world does not follow recognisable laws of "grammar" or obey a circumscribed "vocabulary" - at least, not in ways we yet understand. The almost limitless variety in natural phenomena makes it difficult for a computer to parse the information contained in a random piece of video and "understand" what is going on, says Mr Zhang.

Sport is different: it follows rules. Turn those rules into recognisable visual signposts - a football hitting the back of the net, for instance - and events potentially become searchable.

Image recognition and movement analysis have reached a level where they can produce a searchable "index" of the events of a sports game using a standard PC, says Mr Zhang, though he adds that complete accuracy is some way off.

The visual clues are only part of the story - combine them with other sources of information and accuracy improves. This extra information, known as meta-data, can take different forms. The most basic is text, picked up from the subtitles in a film or the captioning in a TV show. The early video search services, such as one just launched by Yahoo, rely on searching text like this.

Soundtracks offer a potentially more fruitful source of information. By combining elements of speech recognition and sound analysis (identifying the noise from an explosion, for instance), it becomes possible to guess at much of what is happening in a piece of video, says Mike Lynch, head of Autonomy, a UK search technology company. "For most things humans want to search for, sound recognition works very well indeed," he adds.

The internet has also introduced an important new layer of context. Led by Google, web search engines interpret the meaning of information based on the meta-data attached to web pages, as well as analysing the links between web pages to assess its relevance.

Meta-data promises to bring other forms of visual content within reach of the search engines. Some digital cameras already encode information on a picture, such as the time it was taken. Global positioning sensors built into camera phones could add location information. Using the voice capabilities of a camera phone, the user could also append commentary when taking a snapshot, then use keywords to search for the picture later, says Adam Sohn, marketing director of Microsoft's MSN unit.

Ultimately, all the random, unstructured information contained on web pages and other data-repositories could be subjected to a form of structuring that made it more intelligible to machines. This is the idea behind the Semantic Web, a vision of the future internet promoted by Tim Berners-Lee, creator of the World Wide Web.

Even if this remains out of reach, much of the effort put into search will be dedicated to imprinting order on the chaos by organising information in more coherent ways. As Daniel Read, product management vice-president at AskJeeves, says: "We need data standards that go across the web, that are in all consumer products."
2010年的搜索引擎

假想这是一些未来的快讯:


2008年5月,Google推出G-life,作为易犯错的人类记忆的替代品。通过搜索你的电子邮件、即时信息和电话,并在家中到处安装的语音记录器的帮助下,现在你可以记起你说过或写过的一切。

2008年11月,无论你在哪里,雅虎(Yahoo!)新推出的语音激活搜索引擎MobileBuddy都将应声而答。再也无须从一长串网站列表里挑选了,你只要问一下想知道什么。如果你正准备买的杂货在别处更便宜,MobileBuddy还会振动提醒你。

2009年10月,监管机构发现一些电子邮件,这些邮件对微软(Microsoft)在搜索引擎上的雄心有所揭示。微软的批评者说,通过将其搜索引擎嵌入Windows、Office和其它软件,微软公司行将掌控进入万维网的通道。

2010年1月,美国在线(America Online)收购时代华纳(Time Warner)10年后,Google收购了沃尔特迪斯尼公司(Walt Disney)。互联网传播热潮再次占了娱乐“内容”的上风。

离奇吗?或许。但搜索引擎业务是一波创新浪潮的开端,这股潮流可能会改变日常生活的诸多方面,并重塑信息产业的某些部分。Google已经展示了搜索的力量,微软和雅虎正在穷追猛赶,而其它一大批搜索引擎公司正在寻找缺口。

它们期望获得优势的一个方法,是把世界上更多信息纳入“爬行”软件的信息检索范围,以便这些信息能被搜索到。万维网代表的仅是信息的一小部分:锁在商业数据库中的信息、胶片档案或个人档案柜目前仍在搜索范围之外。

“五年后,人们会想搜索一切,”Google创始人的首名雇员、现任技术总监的克雷格?西尔弗斯坦(Craig Silverstein)说道。

近几个月来,媒体的大量报道显示出这一工作进展飞快。所有四大搜索引擎公司Google、雅虎、微软和AskJeeves,都想出了搜索个人电脑硬盘的更好方法。Google正打算将五家最大的学术图书馆数千万册藏书数字化。上周,雅虎推出一项在网上搜索视频文件的服务,而Google则开始提供电视新闻的文字稿。

不过,这只是些初步行动。“就信息消费而言,电视可能是消费者最大的信息渠道,”英国搜索技术公司Autonomy的首席执行官迈克?林奇(Mike Lynch)说。

内容数字化

筛选并整理电视或其它视频内容的技术正在进步。但大多数内容仍以无法搜索的模拟格式存在,或是被锁定并需要密码。将老的电视节目档案转换成数字格式的工作做得很少。

“这不是个技术挑战,而是个商业模式的问题,”Singing Fish的负责人凯伦?豪(Karen Howe)说。Singing Fish是美国在线拥有的搜索引擎,专门搜索音频和视频文件。和电影公司一样,电视网络迄今仍未找到在互联网上利用内容赚钱的方法。在找到这种方法之前,它们没什么动力把旧胶片转换成数字文档,因为这是非常繁重的工作。其它很多商业信息公司都有同感,包括报纸出版商和数据库供应商等。

即使没有这些商业性的信息来源,日常生活中也有大量信息可供数字化,即可被搜索。林奇先生说,个人电话通话将是个有用信息的宝藏。如果通过互联网拨打语音电话变得常见(把交谈转化为数据流),那么建立一个对你通话的可搜索索引,或许不是那么遥不可及。

语音电话只是人们日常生活中交谈的一部分:很大一部分书面交流已经以数字格式存在,包括电子邮件和即时信息等。如果一切交流形式,包括日趋成为生活一部分的个人照片、网上日志和视频短片等都能被获取并可搜索,那会怎样呢?“所有这些转瞬即失的信息都不再短暂了,”Google的西尔弗斯坦先生说。

过滤和选取

在搜索战的另一个战场上,各公司正围绕这一点展开较量:如何让人们更容易从多数搜索引擎提供的简单网页列表中,过滤并选取信息。

“显然,尽管一个链接列表很有用,但它与人们彼此传递信息的方式不相称,”西尔弗斯坦先生说。他说,和其它引擎一样,Google正努力解决的一个问题是:“在回答你的提问时,电脑怎样才能更像你的朋友?”

这意味着要直接回答问题、从网上资源提取数据,而不是提供网页的链接。这还意味着,要更好地猜出搜索者在寻找什么,并根据以往的经验,猜测使用者特定的兴趣所在,而使搜索结果更贴近使用者的需求。

大多数网上出现的新搜索引擎并不声称能提供更好的搜索结果。事实上,许多新搜索引擎都直接从Google或其它老牌搜索引擎那里获取结果。不同的是,它们旨在以更有用的方式来组织并显示信息。

专家们承认,使用者会希望自己的搜索问题获得更多直接答复。“未来5年中,我们见到的最大变化将是,人们使用电脑的方式将有所不同,”西尔弗斯坦先生说。他补充说,移动电话将变成在互联网上寻找信息的最常见手段。到时候,大多数问题都最好通过声音进行问答。如果搜索引擎公司成为日常生活中更不可或缺的部分,那它们的影响力最终将延伸到多远呢?它们对其他一些为创造或传播信息而存在的公司会产生什么影响呢?

答案部分取决于两类公司之间的权力平衡将怎样演化,这两类公司分别是创造娱乐和媒体“内容”的企业和在互联网上传播这些内容的企业。这不是个新问题:正是早期网络传播商那种显著的力量,才导致时代华纳在网站泡沫的高峰将自己卖给美国在线。

搜索引擎与内容创造者

搜索引擎带来了新的转机。作为无所不知的信息传播者,搜索引擎既可以成为内容创造者的挚友,也可能成为它们的梦魇。现在,人人都可以找到你的内容,这样人人都成了你的潜在客户,但你最好的客户也能同样轻易地找到你的竞争对手。这对一些信息供应商来说是个严重的威胁,因为它们提供的信息可以轻易转变成商品,如天气预报、股票报价、字典定义、电话号码、地图等。

随着搜索引擎更善于向使用者提供这类信息,信息具有的背景与最初发布时的背景可能完全不同。“10年后,要区分搜索内容和见到并使用这个内容将难得多,”科技作家查尔斯?弗格森(Charles Ferguson)说。

拥有独特内容的媒体娱乐公司可能会觉得,它们能免受这一平价商品化趋势的影响。但这将取决于最终会有多少免费信息落入搜索引擎的搜索范围,以及这些免费信息能否开始与专有信息的质量相媲美。

互联网博客和“维基”(wiki,源自夏威夷语,意为“快速”)的社区网页已在拓展被称为“用户产生内容”的疆域。“维基百科全书”(Wikipedia)的结果或许达不到专业编辑所做出版物的水准,但该服务仍能回答许多常见问题。“维基百科全书”是一部由志愿者维护的免费网上百科全书。

即使事实证明,这些并不会对专有信息构成最终威胁,但它还是提醒出版商,它们亟需找到行得通的商业模式,以在网上提供自己的内容。

搜索引擎本身的长期影响或许将部分取决于,它们能在多大程度上逐步渗入更广泛的互联网构架中。以微软为例,它已将其搜索引擎描述为一个“平台”,即一个其他人能在上面开发技术的软件,就像其它软件开发商在Windows操作系统的基础上开发软件。

应用编程接口

发布其搜索引擎的“应用编程接口”(API),即其他软件开发商可用来连接自己软件的软件链,是微软策略的重要部分,MSN分部的销售及营销总监亚当?索恩(Adam Sohn)说。“我们寻求广泛、横向的问题,”他说。搜索不是“开发商每次编写应用程序时都想建立的功能”,何不鼓励它们以微软的技术为基础呢?

弗格森先生说,这是微软惯用的攻略之一。弗格森先生长期以来一直在写一些文章,内容是在电脑时代接连出现的每个阶段(从IBM的大型机到Windows)中,一个主导“平台”是如何兴起的。搜索市场是否会和这些技术一样,变成胜者通吃的业务呢?

Google显然意识到威胁,因此也表示,正考虑开放它的应用编程接口,以便其它开发商利用它的技术。上周,为了实践去年的承诺,Google向广告客户开放了用于搜索相关广告服务的界面,赋予客户更多权力来影响自己广告的显示方式。

“每个人都喜欢把自己称为一个平台,”西尔弗斯坦先生表示。但他补充说,虽然搜索可能具有其它计算平台的一些特性,但由于仍需开展大量工作,才能让整个信息世界都能被搜索,因此在今后很长一段时间内,搜索领域将留有大量有待填补的余地。

一旦这股早期创新潮流过去,搜索引擎还会像现在这样吸引这么多注意力吗?互联网时尚起起落落,意味着它们也许做不到这一点,林奇先生说。“在互联网早期,搜索是‘时尚’,”他说,“然后门户网站是‘时尚’。现在潮流又回转了。”

当搜索不再是人们在专门网站上实施的孤立活动,而成为互联网上开展的许多其它活动的核心特色时,各搜索引擎还必须找到赚钱的新途径。

不用说,各搜索引擎的供应商都认为自家引擎不光是最新的、短暂的网络时尚。但即使他们也承认,搜索引擎的功能最终应被纳入一个更“智能”互联网的构架中。

寻找信息将不是去某个专门的地方(一个搜索引擎)提出问题。相反,不管你在什么地方,答案都会主动出现,而且是以最恰当的形式呈现。“搜索将变得越来越重要,同时也越来越无形,”Google的西尔弗斯坦先生说,“它将无处不在而又无从觉察。”

到那时候,Google及其对手究竟是塑造日常生活最强大的力量之一,还是正在网上造就的、伟大的信息时代机器上的又一个小齿轮,那就全凭你怎么看了。
描述
快速回复

您目前还是游客,请 登录注册