User:OrenBochman/Bugs

=Bugs Fix Plan for Search=

Bugzilla Links

 * bugs
 * patches

Specifying This Behaviour

 * 1) use case 1: readers who do not want to see templates in their search results.
 * 2) use case 2: editors who want to find template to use (knowing it's name)?.
 * 3) use case 3: editors who want in finding suitable template in a catagory.
 * 4) use case 4: template dev would be interested in finding all the pages where a template is used.
 * 5) use case 5: template dev would be interested in finding all templates that use a template.
 * 6) use case 6: template dev would be interested in finding all templates in a template catagory.
 * 7) use case 7: admins would want to find all pages pages using a template.
 * 8) use case 8: admins who want to find all pages using a template with a certain value parameters.
 * 9) use case 9: admins whom want to find all pages using non existing templates.
 * 10) use case 10: users whom want to find all pages containing arbitary code.

Open Questions

 * are there some more use cases ?
 * how common are these situations?
 * what is the current practice for the above use cases?
 * use case 2: Special:what links here.
 * use case 3: look at the templates catagory.
 * should search the results diffrentiate between template that exists templates that don't?
 * what about transclusion from outside the templates namespace?:
 * when templates do not contain template syntax should they be shown?
 * when a template is not in the template namespace (say in user's) how can we know they are templates?

Analysis
Here are some approaches possible to implement this feature.


 * 1) Option 1: Quick and Dirty
 * 2) storing raw page's source in a   with unexpanded source
 * 3) querying with a   and  .
 * 4) it will double the index size a WFTU per wiki.
 * 5) it requires no UI change - just extra syntax + documentation.
 * 6)  → to search for wiki source text
 * 7)  → to search for exact wiki source text
 * 8)  → to search for wiki source text
 * 9)  → to search for wiki source text
 * 10)  → to search for wiki source text
 * 11) it may require its own ranking.
 * 12) Option 2: Elegant
 * 13) indexing and storing the page's parsed source in a  
 * 14) and querying with a   to search the source
 * 15) it would increase index by a factor of a WFTU.
 * 16) it could require UI change
 * 17) it could require its own ranking.
 * 18) option 3: Efficient
 * 19) indexing the page's parsed source in a flat  
 * 20) querying using a   which would provide markup search capability.
 * 21) it would increase index by a log(WFTU). (this is a guess)
 * 22) it could require UI change
 * 23) it could require its own ranking.

option 1 will likely be inefficient. To effectively index wiki code a (java) parser for wiki code would be required.< The requirements are a parser that can process and tag
 * templates
 * template parameters
 * magic words
 * parser functions
 * extensions
 * comments
 * nowiki
 * includeonly
 * noinclude


 * 1) I have been doing some work on writing a preprocessor but the work is far from over - it could be completed do this task.

Ranking & User Interface

 * it is possible to avoid UI change by adding a new search syntax
 * if the source search feature will function as a stand alone aplication its ranking will need just a little tweeking.
 * if it is necessary to integrate it with general search it will require a more significant effort inolving.
 * specification.
 * design.
 * implementation.

Specifying This Behaviour
highlighted text in search reults is sometimes corrupt when showing multibyte characters

Open Questions

 * where is this behaviour taking place?
 * (analyzer) during indexing
 * (analyzer) during retrieval
 * (highlighter) during result rendering
 * later in php

Analysis

 * investigate by unit testing

Specifying This Behaviour
highlighted text in search reults is sometimes corrupt when showing multibyte characters

Open Questions

 * where is this behaviour taking place?
 * (analyzer) during indexing
 * (analyzer) during retrieval
 * (highlighter) during result rendering
 * later in php

Analysis

 * investigate by unit testing

Specifying This Behaviour
when running the update script the DTD download fails with "Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"

This is the explanation given in w3.org for 503 response code

10.5.4 503 Service Unavailable

The server is currently unable to handle the request due to a temporary overloading or maintenance of the server. The implication is that this is a temporary condition which will be alleviated after some delay. If known, the length of the delay MAY be indicated in a Retry-After header. If no Retry-After is given, the client SHOULD handle the response as it would for a 500 response.

Note: The existence of the 503 status code does not imply that a server must use it when becoming overloaded. Some servers may wish to simply refuse the connection.

Open Questions

 * how to reproduce the error?

Analysis
looking at the stack trace the error occurs:
 * org.wikimedia.lsearch.oai.OAIParser.parse(OAIParser.java:64) called by
 * org.wikimedia.lsearch.oai.OAIHarvester.read(OAIHarvester.java:64) called by
 * org.wikimedia.lsearch.oai.IncrementalUpdater line:191
 * workarounds
 * use  instead of   -- how to tell xerces
 * try to clear the poxy
 * testing

multithreading
http://phplens.com/phpeverywhere/?q=node/view/254

missing pages
debugging page id of a missing main page

debugging page id of a missing category page

SQL schema
https://secure.wikimedia.org/wikipedia/mediawiki/wiki/File:MediaWiki_database_schema_1-17_%28r82044%29.png

=References=