User:Physikerwelt/Sandbox/MathSearchTask

= Free Wikipedia Subtask =

Dataset overview
3 Formats available:
 * 1) HTML
 * 2) * Standard output of future Wikipedia
 * 3) * Parsed with PHP
 * 4) XHTML
 * 5) * Standard output of new Visual Editor editing
 * 6) * Parsed with JavaScript (Parsoid)
 * 7) WikiText
 * 8) * Raw source data not parsed at all

Format of math elements
The id attribute is composed as follows:
 * ID = math.$PageID.$EquationID

Additional List with Page-ID and Page-Title mappings can be found here:

Wrapper for the page content
In HTML format the PHP output is wrapped with the following html elements:

Query 1: 1/〈x〉 ≤〈1/x〉

 * Uses standard Content MathML
 * Systems must use information given in the query only
 * Systems must decide on result format
 * Pre-processing of the Wikipedia input data necessary
 * Other languages can be used

Query 2: E=mc²

 * Content Query
 * Uses content dictionaries
 * System must use information given in the query only not the presentation layout

Performance queries

 * NTCIR10 Query format
 * The link provided must be included in the result
 * Participants should report the position of the reference hit

Final Presentations

 * No evaluation
 * Presentation judged by audience
 * Requirements
 * Length 30 Min (hard cut) (no official NTCIR session)


 * Structure (recommended)
 * System overview
 * Performance Queries
 * Demonstrations Queries
 * Summary

Sample Strcuture

 * System overview
 * Test Hardware description
 * Software used
 * Licenses and Cost
 * Reproducibility of the setup
 * Spend effort i.e. development time
 * Performance Queries
 * Time measurement
 * Indexing
 * Queries
 * Overall time incl. Reading of input and output of the result in NTCIR format or similar
 * Coverage
 * Were all seeds found
 * Average position
 * Result format
 * How many information is returned to the user
 * How is the portion of the result selected
 * Goal: Answer the users information need very quick
 * Demonstration Queries
 * What are the most impressive results?
 * How were those result achieved?
 * Why does that principal works independant of the particular queries?
 * Which key information was missing to archive better results?
 * What are further steps?
 * Summary
 * Key features of the presented system
 * What are the differences to other systems?

ToDos

 * Find a public web server
 * Update the URLs
 * Publish Task Description (as you see here)
 * Make Video announcement (this presentation) probaly a wontfix?