User:Nealindia/Mock

From mediawiki.org

Proof of Concept for my idea[edit]

What I am planning here is, I will create a Content Analyser tool which will scan through the whole Article.

Now we have the article, then we will be extracting references to places, city, towns, special places, etc from these articles.

Using geocoding services like of Google (http://code.google.com/apis/maps/documentation/services.html#Geocoding) , we can get the coordinates of that referenced place and mark it on the map or mark an area of interest (KML)

Now, lets consider an example: I have copied the following paragraph from Wikipedia page of Barack Obama: --

Barack Obama was born at Kapi'olani Maternity & Gynecological Hospital in Honolulu, Hawaii, United States,[4] to Stanley Ann Dunham,[5] an American of predominantly English descent from Wichita, Kansas,[6] and Barack Obama, Sr., a Luo from Nyang’oma Kogelo, Nyanza Province, Kenya Colony. Obama is the first President to have been born in Hawaii.[7][8] Obama's parents met in 1960 in a Russian language class at the University of Hawaii at Mānoa, where his father was a foreign student on scholarship.[9][10] The couple married on February 2, 1961,[11] and Barack was born later that year. His parents separated when he was two years old and they divorced in 1964.[10] Obama Sr. remarried and returned to Kenya, visiting Barack in Hawaii only once, in 1971. He died in an automobile accident in 1982.[12]

--


Now as we see, the only relevant information we want to be shown on the map is shown in bold and other geographical information which we don't need is shown in bold & italics. Using NLP techniques, we can disregard irrelevant & repeated information. And use only places which are referred with words like born at Hawai, lives at NY, works at White House, etc.

Once we extract this information, using the geocoding service which I mentioned above, we can get the actual coordinates of the place and thereby can easily integrate the maps into our articles!

This technique would become fundamental to the integration of OpenStreetMaps(OSM) into Wikipedia and other MediaWiki websites and help in easy integration of the maps without much work.

Also, I plan to place locations of all contributors of an article on that page so as to better understand the geographical awareness about that topic.

Please feel free to edit this article and put in your suggestions.

Discussion[edit]

Could you explain the benefit of your extraction in comparision with the existing geocoords extraction that we have?

daily update by using the database entries for links to geohack in the externallink table.

Ans:Please refer to below question. Basically this is a built-in database for places and their coordinates and this python scripts search for all places within a radius of 10km or x km at certain coordinates. I am proposing on extraction of the referencves to geographical places from an article. Please read below for more idea.

  • We also have a tool that shows all links in an article in a map:
http://maps.google.com/maps?f=q&source=s_q&hl=de&geocode=&q=http:%2F%2Ftoolserver.org%2F~para%2Fcgi-bin%2Fkmlexport%3Fproject%3Den%26article%3DDresden_Frauenkirche%26linksfrom%3D1&sll=51.051944,13.741666&sspn=0.000101,0.000279&ie=UTF8&t=h&z=3
What will be the benefit of your way?

Ans: The thing I am suggesting is of extracting references to places, cities, etc from articles and then representing that information onto the map. I am not saying to show links in an article on the map. The thing mentioned in the question above, utilizes geocoding service to show the place. My work would involve extracting the geographical information from articles and putting it on map. Also, I would be putting contributors geographical information onto the map which will help better analysing the article's origin and geographical importance.

  • If you will extensively use the geocoding methods of google we will need to talk about the license of the results. Are this results really free? In european union we have there perhaps some copyright problems.
  • Why you don't want to use toolserver where you would have all wikipedia-databases accessable?
I am open to any suggestions but we can first start working with some random articles and then later on move it to wikipedia database (toolserver).

It hope this questions helps to clearify your proposale. --Kolossos 13:02, 4 April 2010 (UTC)

I like your idea of "contributors geographical information", but for the other things I'm skeptical. We run the last years good by making your own geocoding. It's IMO more exactly and you get no problems if different objects have the same name. --Kolossos 21:07, 4 April 2010 (UTC)
But if we integrate it with automatic location's reference identification, then wouldn't that be great? Also, I would be glad to work on "contributors geographical information" part.--Nealindia 21:44, 4 April 2010 (UTC)
I think there will be enough results if you would use the coordinates which are in a coordinate template. We have nearly 2.8 millions coordianates in 42 languages extracted in a database, nearly 800.000 should be on different objects. In german wikipedia we have a high level of geocoded articles. Perhaps it would be good to write a tool that helps to compare different coordinates which are connected with a interwikilink. Find errors. And than write a bot which copy this geocodinate templates to other wikipedias with the help of interwikilinks.
Didn't know this stat! We can indeed do this and use the German wikipedia. We can definitely build such a bot. I would preferably build it in PHP as have lot of experience in this language.
Do you know Wikipedia-world-project? In the moment I'm working to move to PostGIS and to merge all languages. Ididn't force a long time to copy the coordinates in other languages to don't copy errors, but now I think the time is right to do this. --Kolossos 08:05, 5 April 2010 (UTC)
An other project could be to bring in a half automatic way the wikipedia coordinates inside OpenStreetMap (http://wiki.openstreetmap.org/wiki/Key:wikipedia). It's only halfautomatic possible because the existing OSM-object are points, lines or areas.
Didn't get why why is it only half way automatic possible...
One reason is that you have other names in OSM for objects, because the name tag don't need to be unique in OSM. Than you have also area e.g. for a building and I'm not sure that the wikipedia coordinate for this building is in each case inside this area or you catch a the polygon of a district that also cover this wp-coord.... The other reason is a legal problem, see [1], unfortunately we use in the wikipedia google maps as help and think for points that it is ok, what seems to be true for Wikipedia, but for OpenStreetMaps it is a too high risk. So I think a JOSM-Pluging could be the right way. If you only drag the wikipedia-attributes to OSM-objects you would solve the legal problems and you can detect errors i one of both systems. I feel that enough people would help if the right tool would exist. --Kolossos 08:05, 5 April 2010 (UTC)
All this would brings in my eyes more for wikipedia than "contributors geographical information". Contributers geo-fingerprint could be nice for a science analyses but you need to be also carefully with personal rights aspect. --Kolossos 23:30, 4 April 2010 (UTC)
We can club this project with any of the above projects. Shouldn't take much time working on this. We can take permission from the contributor whether he allows us to use his geographical information for this purpose or not.
Perhaps it's possible to club everything but I believe it's better to write a second proposal and try to find a mentor. --Kolossos 08:05, 5 April 2010 (UTC)
I have drafted a new proposal at User:Nealindia/GSoC2 and its more about extending the Maps extension.