Toolserver:Ghel

Dispenser's geographical coordinate tools are all currently experimental at the moment.

Access
Dump are available upon request, this maybe automated in the future if there's demand. Toolserver user can access these database by connecting to the server for their respective wikis. phpMyAdmin will helps introduce users to layout of the tables.

SELECT page_id, gc_lat, gc_lon, gc_region, page_title FROM u_dispenser_p.coord_enwiki JOIN page ON page_id = gc_from WHERE page_namespace=0 AND gc_from NOT IN (SELECT DISTINCT il_from from imagelinks) limit 100; The above query yields the first 100 pages have geographical coordinate but lack images, optimization and more sophisticated image analysis is left as an exercise for the reader.

locateCoord.py
locateCoord.py very simple and quickly code tool to give uses the ability to retrieve the data from the database. The this query will give coordinates that are 5 km near the center of New York City. Eventually a rewrite will be needed with support for JSON/XML/YMAL/etc. with better option support as the current arcutecture is limited.
 * locateCoord.py source code

geosearch.py
geosearch.py is a simple tool to assist in tracking down pages from which common error or inapporate data is entered into. Other languages/database is suport with the paramter  (ex: commons, de, fr, ...). More Examples
 * geosearch.py source code

iwccoord.py
iwcoord.py find possible coordinates that can be copied from one language to another, doesn't actually use ghel or the database.
 * iwcoord.py source code

regioncheck.py
regioncheck.py produces reports using Administrative Boundaries - First Level (ESRI) dataset retrieves all state boundary polygons and find the shortest distance to each one. If the point is found inside it skips it a moves to the next point. This way it gives the shortest distance to all points outside of the country.

Wikipedia-World
It is reported data is being used from here on the project project

Logs
Error and warning outputted from the tool are available at http://toolserver.org/~dispenser/logs/. Error are items ghel could not parse, while warning are things it could parse but should be corrected for other programs to read correctly.

Things left to do

 * Develop an API capable of writing out in HTML, JSON, serialized PHP, KML, OSM, and XML.
 * Language independent article ranking table (length, incoming links, interwiki links)
 * Reset primary bit for multiple primary coordinates form the same article
 * WikiMiniAltas/OSM data integration under heavy load without killing the databases.
 * Reimplemented features into GeoHack.
 * Documentation, source code should be documented so a novice could understand it.
 * Live updating, MySQL triggers functionality is required for this.

Fields
 This section is rough draft of definitions
 * gc_from
 * Article ID


 * gc_lat
 * latitude


 * gc_lon
 * Longitude


 * gc_alt
 * Elevation in meters above the sea level


 * gc_head
 * The direction in degree from north (if applicable)


 * gc_dim
 * The rough size of the object


 * gc_type
 * w:Wikipedia:WikiProject Geographical coordinates/type:


 * gc_size
 * City population size


 * gc_globe
 * Which body are the coordinates on (NOTE get standards for other bodies)


 * gc_primary
 * Where the coordinate represents the primary object in the Photo or article (TODO word this better)


 * gc_name
 * The Name of the object, if none is given then the article title will be used


 * gc_location
 * MBR point binary

Schema summary
mysql> describe u_dispenser_p.coord_enwiki; +-+--+--+-+-+---+ +-+--+--+-+-+---+ +-+--+--+-+-+---+
 * Field      | Type                     | Null | Key | Default | Extra |
 * gc_from    | int(8) unsigned          | NO   | MUL | NULL    |       |
 * gc_lat     | float                    | NO   |     | NULL    |       |
 * gc_lon     | float                    | NO   |     | NULL    |       |
 * gc_alt     | float                    | YES  |     | NULL    |       |
 * gc_head    | float                    | YES  |     | NULL    |       |
 * gc_dim     | float unsigned           | YES  |     | NULL    |       |
 * gc_type    | varchar(63)              | YES  |     | NULL    |       |
 * gc_size    | float                    | YES  |     | NULL    |       |
 * gc_region  | varchar(127)             | YES  |     | NULL    |       |
 * gc_globe   | enum('Earth','Moon',...) | YES  |     | Earth   |       |
 * gc_primary | tinyint(1)               | NO   |     | 0       |       |
 * gc_name    | varchar(255)             | NO   |     | NULL    |       |
 * gc_location | point                   | NO   | MUL | NULL    |       |

Dumps
The database is dump weekly and is accessible from http://toolserver.org/~dispenser/dumps/ as compressed sql dumps.

Dumping is schedule for Thursdays at 9:40 UTC.

Source code

 * geodbcompiler.py - Simple application to create and fill the database with the geographic data
 * ghel.py - GeoHack External Link parsing library