外國的技術書籍,有很多時候是集結了一堆博士編輯,先開出大方向之後,再邀稿請大家集思廣益,把最新的研究成果用書籍的方式發表,好處是可以融合很多作者的知識精華,並且光是從邀稿內容就可以看出未來趨勢。
搜尋引擎技術的發展,從關鍵字之後,已經演進到語意網分析、推薦引擎排序、異質來源整合、異質格式整合。以下介紹一篇 2010 年 5 月的寫書邀稿內容:
*Introduction*
Scientific and economic organizations are confronted with handling an abundance of strategic information in their domain activities. One main challenge is to be able to find the right information quickly. In order to do so, organizations must master information access: getting relevant query results that are organized, sorted, and actionable. (資料檢索之後的呈現已經不再只是條列,必須要將類似主題的內容組織起來,經過推薦引擎排序,並且允許社群基於資料進行互動)
Recent technological progress in computer science, Web technologies, and the constantly evolving information available on the Internet has drastically changed the landscape of search and access to information. Current search engines employ advanced techniques involving machine learning, social networks and semantic analysis.
*Objectives of the Book*
The main goal of this book is to transfer the new research results from the fields of advanced computer sciences and information science in order to master the access to information. The readers will be able to have a better idea of the results in applied research. The achievement of relevant, organized, sorted and workable answers -- to name but a few -- from a search is become a daily need for the enterprises and organizations, and, to a greater extent, for anyone. It does not consist of accessing to structural information like in standard databases only; neither it does consist of searching information strictly by the mean of a combination of key words. It goes far beyond that. The information sought must be able to be identified by the topics covered by it, that is to say its textual, audio, video or graphical content. (異質格式整合呈現,如何以設計良好的GUI, 創造優質使用者經驗?) This is not a new issue. However, recent technological advances have totally changed the used techniques. The new Web technologies, the emergence of Intranet systems and the abundance of information on the Internet have created the need for efficient search and information access tools.
*Recommended topics include, but /are not limited to/, the following:*
. Semantic Web
More and more content producers, as a result of the W
. Generation of large-scale search engine index
. Video, audio and graphics indexing
. Query user interface: Controlled natural languages, natural language query, multilingual search, etc.
. Index Data Structures: Suffix tree, tree, Inverted index, Ngram index, Term document matrix, etc. (各種索引資料結構,都有助於將檢索速度增快到瞬間得到答案,這是一個很好的技術導引關鍵字列表,可以由此學習到 搜尋引擎 index 技術的專有名詞)
. Multi-sources and multi-formats (異質來源與異質格式整合呈現,將會是未來優質搜尋引擎的關鍵) indexation: Most recent search engines can index many different information sources, such as:
- FTP servers,
- files systems,
- Web pages,
- DBMS such as Oracles, Sybase, DB2, SQL Server and others.
- Document-oriented databases such as Lotus Notes.
- Desktop applications files such as Microsoft Office suite (Word, PowerPoint),
RTF format, ...
- Adobe's Portable Document Format (PDF)
- PostScript (PS)
- LaTex
- The UseNet archive (NNTP) and other deprecated bulletin board formats
- XML and derivatives like RSS
- SGML (this is more of a general protocol)
. Emergence of new axis in the Next Generation of Search Engines
- Real-time search, (目前搜尋引擎多是 batch 有時差,如何做到 0 時差的搜尋? 在特定少數資料下即時搜尋,才有辦法平衡搜尋時間與實用度)
- Local search,
- GPS sensitive search, (結合手機GPS的LBS搜尋,目前 google map 已經在手機上製作出非常精良的搜尋引擎,可以立刻找到附近的店家,在台灣也很容易使用,我採用Nokia N97 mini安裝額外的 google map 軟體,就可以瞬間找到我附近的 IKEA 地圖,以及如何驅車前往的方法)
- Mobile search,
- Search in the Cloud,
- Search using Hadoop,
- Map reduce, etc.
沒有留言:
張貼留言