Toolserver:Ghel

ghel (GeoHack External Links) is a package consisting of a robust URL parser for links point the GeoHack service, relational database of articles with geographic information, and a error log. Several other program are sometimes included, such as regular expression link matching tool and an interwiki tool.

Access and query examples
Toolserver user can access these database by connecting to the server for their respective wikis. phpMyAdmin will helps introduce users to layout of the tables. External users can either to use the tool or download table dumps with the necessary WMF tables.

Querying an article
Select coordinates from Ridge Route (id: 809205). This featured articles includes coordinates for all location discussed.

Location pages without images
Select the first 100 pages have geographical coordinate but lack images, optimization and more sophisticated image analysis is left as an exercise for the reader.

Featured articles by number of contained coordinates
Sort by number of coordinates of pages in Category:Featured articles.

Moon articles and other languages
Select articles with lunar coordinates and list the other languages they include.

locateCoord tool
locateCoord.py (source code): a quick and simple tool to access to search the database. The this query will give coordinates that are 5 km near the center of New York City. A more sophsitcaed rewrite will be required as it is pretty limited and to add support for multiple formats such as JSON/XML/YMAL/etc. Do not query this tool faster than 10 times per minute.

To do

 * Develop an API capable of writing out in HTML, JSON, serialized PHP, KML, OSM, and XML.
 * Language independent article ranking table (length, incoming links, interwiki links)
 * Reset primary bit for multiple primary coordinates form the same article
 * WikiMiniAltas/OSM data integration under heavy load without killing the databases.
 * Reimplemented features into GeoHack.
 * Documentation, source code should be documented so a novice could understand it.
 * Live updating, MySQL triggers functionality is required for this.

Fields
 This section is rough draft of definitions
 * gc_from
 * page id


 * gc_lat
 * latitude


 * gc_lon
 * Longitude


 * gc_alt
 * Elevation in meters above the sea level


 * gc_head
 * The direction in degree from north (if applicable)


 * gc_dim
 * The rough size of the object


 * gc_type
 * w:Wikipedia:WikiProject Geographical coordinates/type:


 * gc_size
 * City population size


 * gc_globe
 * Which body are the coordinates on (NOTE get standards for other bodies)


 * gc_primary
 * Where the coordinate represents the primary object in the Photo or article (TODO word this better)


 * gc_name
 * The Name of the object, if none is given then the article title will be used


 * gc_location
 * MBR point binary

Schema summary
mysql> describe u_dispenser_p.coord_enwiki; +-+--+--+-+-+---+ +-+--+--+-+-+---+ +-+--+--+-+-+---+
 * Field      | Type                     | Null | Key | Default | Extra |
 * gc_from    | int(8) unsigned          | NO   | MUL | NULL    |       |
 * gc_lat     | float                    | NO   |     | NULL    |       |
 * gc_lon     | float                    | NO   |     | NULL    |       |
 * gc_alt     | float                    | YES  |     | NULL    |       |
 * gc_head    | float                    | YES  |     | NULL    |       |
 * gc_dim     | float unsigned           | YES  |     | NULL    |       |
 * gc_type    | varchar(63)              | YES  |     | NULL    |       |
 * gc_size    | float                    | YES  |     | NULL    |       |
 * gc_region  | varchar(127)             | YES  |     | NULL    |       |
 * gc_globe   | enum('Earth','Moon',...) | YES  |     | Earth   |       |
 * gc_primary | tinyint(1)               | NO   |     | 0       |       |
 * gc_name    | varchar(255)             | NO   |     | NULL    |       |
 * gc_location | point                   | NO   | MUL | NULL    |       |

Dumps
The ghel database is dump every Thursdays at 9:40 UTC and is accessible from http://toolserver.org/~dispenser/dumps/ as compressed sql dumps.

Logs
Error and warning outputted from the tool are available at http://toolserver.org/~dispenser/logs/. Error are items ghel could not parse, while warning are things it could parse but should be corrected for other programs to read correctly.

Source code

 * geodbcompiler.py - Simple application to create and fill the database with the geographic data
 * ghel.py - GeoHack External Link parsing library
 * regioncheck.py - produces reports using Administrative Boundaries - First Level (ESRI) dataset retrieves all state boundary polygons and find the shortest distance to each one. If the point is found inside it skips it a moves to the next point.  This way it gives the shortest distance to all points outside of the country.