User:OrenBochman

From MediaWiki.org
Jump to: navigation, search
  • Name: Oren Bochman
  • Main Project title: "Wikipedia Search"
  • Contact information:
  • my page on Wiktionary
MediaWiki-extensions-icon.svg This user is a proud MediaWiki extension developer and participant in WikiProject Extensions.

These Are A Few Of My Favourite Things[edit | edit source]

Ant [1] ANTLR [2] w:ApacheBench [3] Apertium [4] Bugzilla [5] Code Review [6]
carrot2 [7] DAWG[8] Etherpad Lite[9] Jenkins [10] Lucene[11] Maven[12]
Nutch[13] Open Relevance [14] R[15] Subversion [16] SOLR[17] Tika[18]
Translate Wiki [19] Vogella On Java[20] Wikilabs[21] UIMA[22] Solarium [23]

Quick IRC Channels Links[edit | edit source]

mediawiki mediawiki-dev mediawiki-ops wikimedia-tech wiktionary Openzim
Kiwix Lucene Solr Hadoop Nutch Semantic Media Wiki
Left.svg
Search NG Project
Todo List Operational Plan Test Plan Risk Assessment
NG Search Spec Search NG Analytics NLP Tools Search Tools
Search Labs Configuration Lucene-search Spec Old Code Review
Q&A
Right.svg



TODO[edit | edit source]

Setup SCP support - https://wikitech.wikimedia.org/wiki/User:Wikinaut/Help:Access_to_instances_with_PuTTY_and_WinSCP Setup port forwarding with moodle 2.5 instance setup tmux

Translate Wiki[edit | edit source]

  • Solar on labs
  • Salarium integration

Berlin Hackaton[edit | edit source]

Search:

OAI Extension[edit | edit source]

This extention needs to updated to work a little differently. It needs to provide the content info

  • Trigger updates on link change:
  • Output To be modified to the following format
Output Schema
Content Type
DataDump (HTML for pages/JSON for wikidata / URI for files)
Metadata in JSON - pages listed below
Page Meta Data
Page Id
RevId
Title
Internal link List
Exernal Links List
InterWiki Links List
Catagory List Visible
Catagory List Hidden
InterWiki List
GeoData List
Edit Count
Editor Count
Cache Hits Weekly
Cache Hits Weekly Normalaised
Afd Nomination
Project List

WikiData[edit | edit source]

Schedule video confrence with BugMiester

WikiData MetaData
Title
RevId
Internal Links
Exernal Links
InterWiki
Catagories
InterWiki
RevId
WikiData Page Data
Title
RevId
Internal Links
Exernal Links
InterWiki
Catagories
InterWiki
RevId

Hackathon 2013[edit | edit source]

  • Develop
    • Tron bot - Quality analytics + advice for new articles
    • Orwell01 - Sandbox edit test
      • Needs a ~/.description
    • Orwell02 - Group edit export to Gephi
    • Orwell03 - plsi + grammar check
    • Orwell04 - Configurable checks
    • SVG Comics gadget to display animated SVG comics based on Arun Ganesh D3 enchanced Map gadget.
    • Bootsrap skin for moodle based on ....
  • Contacts:
    1. User:Ocaasi (room mate) USA who does many projects including the wikipedia library and the wikipedia adventure.
    2. Martijn Hoekstra - Helps with AfC Stats.
    3. User:Luis Felipe Schenone (roomate) - Helped with Widget design (Argentina)
    4. Evan Rosen (room mate) wiki metrics developer from the analytics team.
    5. Chris Steipp Senior Security Engineer working on OpenId & OAUTH development.
    6. User:Yug fr Wiktionary wiki data migration & maps design. Recommends http://commons.wikimedia.org/wiki/File:Israel_location_map.svg as basis svg maps for Israel.
    7. wikt:fr:User:Darkdadaah anothe fr wikitionary
    8. User:Kolossos Czech dev in the Czech Wikipedia - we met in Berlin and Amsterdam (Like puzzles)
    9. Peter Bena Czech project lead of huggle the (Who is a volunteer labs ops who knows the tool migration stuff.
    10. User:Yurik who works at on wikipedia zero.
    11. Magioladitis lead developer of auto wiki browser
    12. User:Erik Zachte of the analytics team who makes monthly aggregates of Wikipedia dumps.
    13. User:Planemad Arun Ganesh - map developer
    14. Susanna anas phd interested in maps and memorabilia ...
    15. User:Kelson open zim and Kiwix !!
    16. Antoine Musso - Jenkins and search
    17. User:MarkAHershberger Old timer like me.
    18. User:Henna - Report issues with Vargrent (64 bit python)
    19. User:TMg
    20. Merlijn van Deen - Pywikipedia bot assitence
    21. Maarten Dammers - WLM solr connection.
    22. Sebastiaan - intersted in video teaching scripts
    23. The wiki loves art guy
    24. kimmo.virtanen@gmail.com Kimmo (room mate) from Finland
    25. mike rubio mikerubio@gmail.com (room mate) from the Philippines.
    26. user:lyhana8 french developer of Wiktionary project migration to wikidata

Extension Ideas[edit | edit source]

  • Latex Diagram Builder (Latex to SVG script)
    • take latext diagram in a <latexD><\latexD>
    • Outputs an SVG of the diagram.
    • Easy to do since latex can work a command line application.
  • Gambit extension
    • Take an extensive form game
    • Generate diagram
    • Generate solutions
    • Easy to do since gambit works as a command line application.
    • cannot make ess reports

Confrences[edit | edit source]

SOLR[edit | edit source]

security: [1]

Stuff[edit | edit source]

  • Cooperate with
    • Google on NLP
    • Academia
    • Apertium
    • HFST

Summer Of Code[edit | edit source]

Lucene Lemma Analyzers based on Morphology Extraction from Wikipedia Text[edit | edit source]

  • Part 1: use & expand induction software to process exiting languages.
  1. Lemmas to word sense:
    1. exsiting works
    2. semantic frames - verb "think" (about) takes a noun complement XXX. In hungarian this is more explicit. Can be powerfull format for representing knowldge in sentences. Could be used to convert text to relation. (go, go to XXX,go from XXX to YYY) not many relations are needed. Verbs of motions, events,
    3. logic frames - map simple senteces to a prologu like logic structure
  • Part 2 extract semantic frames from (part of speech tagged) corpus.
  • deliverables:
  1. semantic networks used in wikipedia
  2. search and retrieve sample sentences for semantic frame patterns

Lucene - Automatic Query Expansion System[edit | edit source]

use SVD or other methods to make a cross language word nets

User Fingerprinting[edit | edit source]

  1. anonymous fingerprinting for:
    • free unregisterd editor contribution.
    • sock pupet detection
  • probably not a good GSOC concept

Lucene - NG Wiki Parser Filter[edit | edit source]

Integrate the cutting edge parser as a lucne filter to allow offline indexing of wiki source. Deliverable: up to date wikipedia parser. Problems - no specs Problem - templates THis will probably be one of my own projects if I get to work full time

UIMA Content Extraction From Talk Pages[edit | edit source]

Use UIMA to automate content extraction talk and user Talk Pages. This is to facilitate tracking of action on various policies. Product a Q&A system.

This is on the frnge of contetnt analytics.


Corpus Stuff[edit | edit source]

Foot notes[edit | edit source]

  1. Ant
  2. Grammars
  3. Benchmark
  4. Machine Translation
  5. QA
  6. Media Wiki's
  7. clustering
  8. data structure
  9. real time collaboration
  10. CI
  11. search lib
  12. language detection
  13. checking external links
  14. testing search
  15. Statistics & data mining
  16. source control
  17. search engine
  18. language detection
  19. translation memory
  20. tutorials
  21. testing
  22. content analytics frame work
  23. SOLR PHP integration

Subpages[edit | edit source]

OrenBochman//Search/Resources OrenBochman//Search/Test Plan
OrenBochman/Bugs OrenBochman/Contacts OrenBochman/Dev Contacts
OrenBochman/Features OrenBochman/Header OrenBochman/HunSig
OrenBochman/HunSig/Development OrenBochman/HunSig/Research OrenBochman/Ideas
OrenBochman/Installation OrenBochman/Introduction OrenBochman/Lucene
OrenBochman/Main OrenBochman/ParserNG
OrenBochman/ParserNG/Preprocessor OrenBochman/ParserNG/Preprocessor Antlr OrenBochman/ParserNG/Sanitizer Antlr
OrenBochman/ParserNG/Tests OrenBochman/ParserNG/Tests/Test1 OrenBochman/ParserNG/Tests/Test2
OrenBochman/ParserNG/Tests/Test3 OrenBochman/ParserNG/Tests/Test4 OrenBochman/ParserNG/Transliterator Antlr
OrenBochman/ParserNG/WikiTable OrenBochman/ParserNG/antlr OrenBochman/Scratch
OrenBochman/Search OrenBochman/Search/Analytics OrenBochman/Search/BrainStorm
OrenBochman/Search/Conf OrenBochman/Search/Features
OrenBochman/Search/Labs OrenBochman/Search/NGSpec
OrenBochman/Search/NLP Tools OrenBochman/Search/NLP Tools/Morphology
OrenBochman/Search/Plan OrenBochman/Search/Porting
OrenBochman/Search/Risk Assesssment OrenBochman/Search/Spec OrenBochman/Search/Tab
OrenBochman/Search/Test Plan OrenBochman/Search/Todo OrenBochman/Search/Tools
OrenBochman/Search/Tools/ OrenBochman/SearchTools/Awk Antlr
OrenBochman/Social Wiki OrenBochman/Sul
OrenBochman/WikiJournal OrenBochman/bots OrenBochman/common.css
OrenBochman/common.js OrenBochman/new ssh key OrenBochman/skin