User:OrenBochman/Search/Plan

From mediawiki.org

Milestone 1 - Working Prototyping[edit]

  • Goal: Build & Deploy Protype to labs

Tasks[edit]

  • Project Admin
    • Development stage: 10% Write a Test Plan
    • Development stage: 20% Finish Risks Assesment
  • Development stage: 10% Development (Search engine componets are listed below)
  • Development stage: 10% Prortype Deployment to Labs

Search Engine Components[edit]

  • Indexing
    • Development stage: 20% reading compressed dumps
    • Development stage: 00% reading html from cache
    • Development stage: 00% integrate Html cleaning (Tika)
  • User Interface
    • {{decistage|0} javascript user interface
  • Admin & Deployment
    • Development stage: 80% continious integration
    • Development stage: 00% configuration management
    • Development stage: 00% puppetise environment
  • Testing
    • Development stage: 30% Unit Test in CI
    • Development stage: 00% preliminary benchmarking reports via R in docs folder

Milestone 2 - Feature Compatibility with Lucene_Search 2.1

  • Goal: Match most of 2.1 Features
  • Use TermPositionVectors
  • Fast Highlighting
  • Update analysis chain to work with current api version
    • HTML Analyzer
    • Processing Wikicode - wikitokenizer
    • Lowercase
    • Hyperlink
    • Aliases
    • Title Shingles
  • Language support
    • Accent Normlization
    • Snowball
    • English
    • CJK filter
    • Serbian
    • Vietnamese
    • Russian
  • Wordnet
  • Spelling/Did you mean
    • Admin
  • JMX support
  • More Benchmarking Reports
  • Integrate Existing UI

Milesotne 3: Production[edit]

  • Deployment to production environment

These should be operational from the prototype stage

  • Shrading
  • Replication
  • Update Mechanism (Incremental)
    • Get Update form Metadata Repository
    • Get Data via maintenence/dump.php
    • Bittorrent based distibution of search indexes update dumps
  • Configuration Managment
    • Standalone - LocalSettings.php
    • Multiple - CommonSettings.php
  • realtime indexing
    • minimize edit to search time
    • update to special page on search

Phase 4: NG Features[edit]

  • UI improvemnets
  • More Language support
  • Result Clustering support
  • Result Faceting support
  • Disambiguation support
  • Search Analytics
  • Morphological Search
  • Ontology
  • Semantic Search
  • Entity Extraction
  • Integration of NLP tools
  • Memee