MediaWiki Developer Meet-Up 2009/Notes/WikiWord

  • WikiWord extract a thesaurus from Wikipedia.
  • Thesaurus supplies relations:
    • term <-> concept (meaning relation)
    • concept <-> concept (related, similar, broader, narrower)
    • concepts = wiki articles
    • terms = title, redirect, anchor text, sort key, etc
  • multilingual
    • concepts from multiple wikipedias combined
    • terms in multiple languages refering to one concept
  • useful for indexing, disambiguation
  • plan: multilingual image search for commons (german blog post)
  • ideas for improvement:
    • get magic names and patterns from pywikipediabot config
    • use incremental updates as much as possible
    • look at coocurrance in paragraphs, look at co-coocurance
    • for image serach: index by yimage caption (used images)