2010-07-23

軟體開發團隊 - 自我評量法 (靈感來自 Joel Test)

~~ 客戶參與部分 ~~

1. [界面模擬] 是否與客戶進行了 UI Mockups 模擬溝通,才開始寫程式?(balsamiqs)

2. [測試案例] 是否有完整測試案例,才開始寫程式?包括功能測試、使用者經驗可用度測試、速度測試、壓力測試?(測試目的、預期結果、測試結果)

3. [週優先權] 是否客戶與軟體團隊,都有專人負責安排 Weekly 調整任務優先權?讓工程師一次專注在一件事情上開發,一週內不要受到變動的干擾?(Mantis優先權控制)

4. [線上文件] 是否專案相關的規格文件,都放在網路上隨時可更新溝通,並且可以線上同步編輯?(Google Apps)

~~ 軟體團隊部分 ~~

5. [版本控制] 是否採用版本控制系統?(SVN)

6. [任務追蹤] 是否採用除蟲追蹤資料庫?(Mantis 除蟲追蹤)

7. [除錯優先] 是否先除錯才新增程式碼?

8. [最新時程] 是否有最新的時程表(不可以是過期的),並且包含信心指數分析?

9. [專職測試] 是否有專職的測試工程師?

10. [走廊測試] 是否新功能一出爐,就有進行走廊測試?

11. [自動建構] 是否一個按鈕就能編譯建構,並部署整個專案,採用自動化Script?

每題一分,及格分數: >=8 勉強及格


2010-03-16

由 "先進搜尋引擎" 的寫書邀稿內容,洞悉未來趨勢

各位親愛的同學們好 :)

外國的技術書籍,有很多時候是集結了一堆博士編輯,先開出大方向之後,再邀稿請大家集思廣益,把最新的研究成果用書籍的方式發表,好處是可以融合很多作者的知識精華,並且光是從邀稿內容就可以看出未來趨勢。

搜尋引擎技術的發展,從關鍵字之後,已經演進到語意網分析、推薦引擎排序、異質來源整合、異質格式整合。以下介紹一篇 2010 年 5 月的寫書邀稿內容:

*Introduction*

Scientific and economic organizations are confronted with handling an abundance of strategic information in their domain activities. One main challenge is to be able to find the right information quickly. In order to do so, organizations must master information access: getting relevant query results that are organized, sorted, and actionable. (資料檢索之後的呈現已經不再只是條列,必須要將類似主題的內容組織起來,經過推薦引擎排序,並且允許社群基於資料進行互動)


Recent technological progress in computer science, Web technologies, and the constantly evolving information available on the Internet has drastically changed the landscape of search and access to information. Current search engines employ advanced techniques involving machine learning, social networks and semantic analysis.

*Objectives of the Book*

The main goal of this book is to transfer the new research results from the fields of advanced computer sciences and information science in order to master the access to information. The readers will be able to have a better idea of the results in applied research. The achievement of relevant, organized, sorted and workable answers -- to name but a few -- from a search is become a daily need for the enterprises and organizations, and, to a greater extent, for anyone. It does not consist of accessing to structural information like in standard databases only; neither it does consist of searching information strictly by the mean of a combination of key words. It goes far beyond that. The information sought must be able to be identified by the topics covered by it, that is to say its textual, audio, video or graphical content. (異質格式整合呈現,
如何以設計良好的GUI, 創造優質使用者經驗?) This is not a new issue. However, recent technological advances have totally changed the used techniques. The new Web technologies, the emergence of Intranet systems and the abundance of information on the Internet have created the need for efficient search and information access tools.


*Recommended topics include, but /are not limited to/, the following:*

. Semantic Web

More and more content producers, as a result of the W3C recommendations on the semantic Web, index their databases with metadata or taxonomies (ontologies), (語意網技術不只是提供資料,還有資料彼此之間欄位意義的關連性) in order to allow the search engines to adapt to the semantic analyzers. Currently many algorithms are being developed for semantic information research systems that do not impose hit-or-miss keyword searcher on the user.
. Generation of large-scale search engine index
. Video, audio and graphics indexing
. Query user interface: Controlled natural languages, natural language query, multilingual search, etc.

. Index Data Structures: Suffix tree, tree, Inverted index, Ngram index, Term document matrix, etc. (各種索引資料結構,都有助於將檢索速度增快到瞬間得到答案,這是一個很好的技術導引關鍵字列表,可以由此學習到 搜尋引擎 index 技術的專有名詞)

. Multi-sources and multi-formats (異質來源與異質格式整合呈現,將會是未來優質搜尋引擎的關鍵) indexation: Most recent search engines can index many different information sources, such as:

- FTP servers,
- files systems,
- Web pages,
- DBMS such as Oracles, Sybase, DB2, SQL Server and others.
- Document-oriented databases such as Lotus Notes.
- Desktop applications files such as Microsoft Office suite (Word, PowerPoint),
RTF format, ...
- Adobe's Portable Document Format (PDF)
- PostScript (PS)
- LaTex
- The UseNet archive (NNTP) and other deprecated bulletin board formats
- XML and derivatives like RSS
- SGML (this is more of a general protocol)

. Emergence of new axis in the Next Generation of Search Engines
- Real-time search, (目前搜尋引擎多是 batch 有時差,如何做到 0 時差的搜尋? 在特定少數資料下即時搜尋,才有辦法平衡搜尋時間與實用度)
- Local search,
- GPS sensitive search, (結合手機GPS的LBS搜尋,目前 google map 已經在手機上製作出非常精良的搜尋引擎,可以立刻找到附近的店家,在台灣也很容易使用,我採用Nokia N97 mini安裝額外的 google map 軟體,就可以瞬間找到我附近的 IKEA 地圖,以及如何驅車前往的方法)
- Mobile search,
- Search in the Cloud,
- Search using Hadoop,
- Map reduce, etc.