Analytics/Wikistats/Database API

New API calls for data analysis metrics

 * concept documentation for new API call

* action=analytics * Collect data from the analytics database. Parameters metric      - Type of data to collect. Names include source to allow for alternate sources of similar metrics, which likely are different in definition and interpretation One value: comscore_unique_visitors comscore_reach_percentage squid_pageviews dump_article_count dump_binary_count dump_edits dump_new_registered_editors dump_active_editors_5 dump_active_editors_100 Parameter is always required month_range - First and last month to include in time series One value: single month as yyyy-mm month range as yyyy-mm;yyyy-mm Parameter is always required filter     - One or more filter criteria, separate with | Values: 4 input modes to specify values (only one way allowed per filter) A: * for all valid codes, which returns required data per known code B: one or more codes separated by comma, which returns required data per code specified C: one or more codes separated by plus sign, which returns required data totalled for all specified codes D: means return required data for highest n (number) occurences using most recent month region_code= A or B or C                    Values for comscore_.. metrics: for valid region codes see .. (?)                  country_code= B or C or D                     for valid country codes see .. (?)                  web_properties=C or D                     This parameters requires extra authorisation project_codes=C for valid project codes see .. (?)                  wiki_codes=C specify each wiki code as project:language, e.g. wp:en for English Wikipedia, wq:de for German Wikiquote data       - One or more type of data to be returned Values: time_series returns ordered list of value pairs, on efor each month within range percentage_growth_last_month percentage_growth_last_year, percentage_growth_series growth percentages are relative to oldest value (80->100=25%) lang        - Language code, used for region and country names Default: en                 Supported: en     mobile       - Y or N or * normalized  - Y or N   modality     - absolute or indexed edits       - A[nonymous] or R[egistered] or * format      - (see elsewhere) . Examples: api.php?action=analytics&month_range=2008-03:2011-03&metric=squid_page_views&


 * API calls (prototyping phase)

action=analytics &metric=[comscore_unique_visitors|comscore_reach_percentage] &month_range=yyyy-mm;yyyy-mm &filter=[ region_code=[aa,bb,..|*] | country_code=[aa,bb,..|*|top:10] | web_properties=top:10 ] &data=[time_series,time_series_cumulative,percentage_growth_last_month,percentage_growth_last_year,percentage_growth_series] [&lang=en] &format=[csv,json,text,...] &modality=[absolute|indexed]

action=analytics &metric=squid_pageviews &month_range=yyyy-mm;yyyy-mm &filter=[ (none | ?) region_code=[aa,bb,..|*] | country_code=[aa,bb,..|*|top:10] ] | project_codes=[wb,wn,wk,,...|*] (| wiki_codes=wp:en,wp:de,..|top:10)] &data=[time_series,percentage_growth_last_month,percentage_growth_last_year,percentage_growth_series] &mobile=[Y|N|*] &normalized=[Y|N] [&lang=en] &returntype=[csv,json,text,...] &modality=[absolute|indexed]

action=analytics &metric=[dump_article_count|dump_binary_count|dump_edits|dump_new_registered_editors|dump_active_editors_[5,100]] &month_range=yyyy-mm;yyyy-mm &filter=[project_codes=[wb,wn,wk,,...|*] | wiki_codes=wp:en,wp:de,..|top:10] &data=[time_series(,percentage_growth_last_month,percentage_growth_last_year,percentage_growth_series(?))] &normalized=[Y|N] &edits=[[un]registered|*] [&lang=en] &returntype=[csv,json,text,...] &modality=[absolute|indexed]


 * Notes
 * unique_visitors rather than visitors as comscore has variations that we might add later
 * concise syntax for month range as we'll use that a lot
 * * = all
 * lang=en default
 * is modality the right word? scale?
 * returned result will contain ordered (?) array of arrays containing region_code,region name,ordered set of yyyymm,value pairs (as it does now)
 * modality=absolute also used for log scale (call it modality rather than scale?, scale is rather linear/logarithmic)
 * only time_series??
 * metrics=squid_pageviews: also 'language_codes=en,de,..' meaning data for all projects (wiktionary, etc) for that language ? (will we actually use this?)
 * 'filter=none' to make sure requester really want have
 * verbose(?) or always return exact call issued, maybe even definition of metric from 'definitions' table