User:Slaporte/Article quality visualization

From mediawiki.org

What we Have[edit]

  • Reference Count
  • Intro paragraph
  • Paragraph count
  • Image count
  • Category count
  • reference section count
  • external link section count
  • external link count
  • article assessment
  • google web search results
  • google news search results
  • page visits per day
  • likelihood of vandalism (from Wikitrust)
  • incoming
  • outgoing
  • number of editors
  • recency from last edit

Areas[edit]

  • structure
  • trustworthy
  • complete
  • objective

Formula[edit]

  • reference count / paragraph count ENOUGH REFERENCES
  • paragraph count / google news search result SIGNIFICANCE
  • paragraph count / google web search result SIGNIFIANCE
  • image count / paragraph count ENOUGH IMAGES
  • category count ENOUGH CATEGORIES
  • unique editor count EDIT HISTORY
  • time since last revision EDIT HISTORY
  • assessment ASSESSMENT
  • feedback FEEDBACK
  • incoming links INTERCONNECTION
  • outgoing links INTERCONNECTION
  • incoming links / outgoing links INTERCONNECTION

to do[edit]

  • quality algorithm
  • visualization on page
  • batch processing, page history
   API dependency?

brainstorming quality metrics[edit]

is the article at least a couple of paragraphs? is the article as long as it is important? (e.g. is it proportional to the number of results on google for the article's subject?) does the article have at least 1 picture for every n paragraphs? is the article in a category? does the article have at least 1 source for every n sentences? are there a large number of unique editors? are a good proportion of the editors users with long histories of editing articles? has the article been featured? are any of the paragraphs or sentences too long? are there any grammar or spelling errors? has the article been edited recently? how many flags does the article have? (e.g. neutrality, citation needed, weasel words, etc.) what are the user-created page ratings of the article?





Minimum requirements -- Y/N //would also be good to highlight which calls to action you want to encourage /* would apply to all non-stub article pages? */ One infobox $('.infobox').length One intro paragraph $('.mw-content-ltr p').length Three incoming links API: http://en.wikipedia.org/w/api.php?action=query&format=json&list=backlinks&bltitle=Charizard&bllimit=100&blnamespace=0 n images

       $('img').length

n categories

       $('a[href*="/wiki/Category:"]').length

More than one editor API: http://en.wikipedia.org/w/api.php?action=query&format=json&prop=revisions&titles=Charizard&rvprop=user&rvlimit=500


not stub: if( !$('#siteSub').length ) return;

Content

  • by density (blah per n paragraphs/words etc)

References $('.reference').length

   Does it have references and external links section
   $('#References').length
   $('#External_links').length

Edits

  • by frequency/rate of edits (# edits/day, days since last edit)
  • by "demographics" of editors (total number of editors; percentage of editors that are registered; uniqueness; editor's; quality of editors)

Significance/External

   http://stats.grok.se/json/en/200804/Main_page

Quality Assessment - Sometimes available on the article's talk page /* * @author Outriggr - created the script and used to maintain it

* @author Pyrospirit - currently maintains and updates the script
*/
   getRating: function getRating (text) {
       this.callHooks('getRating_before');
       var rating = 'none';
       if (text.match(/\|\s*(class|currentstatus)\s*=\s*fa\b/i))
           rating = 'fa';
       else if (text.match(/\|\s*(class|currentstatus)\s*=\s*fl\b/i))
           rating = 'fl';
       else if (text.match(/\|\s*class\s*=\s*a\b/i)) {
           if (text.match(/\|\s*class\s*=\s*ga\b|\|\s*currentstatus\s*=\s*(ffa\/)?ga\b/i))
               rating = 'a/ga'; // A-class articles that are also GA's
           else rating = 'a';
       } else if (text.match(/\|\s*class\s*=\s*ga\b|\|\s*currentstatus\s*=\s*(ffa\/)?ga\b|\{\{\s*ga\s*\|/i)
                  && !text.match(/\|\s*currentstatus\s*=\s*dga\b/i))
           rating = 'ga';
       else if (text.match(/\|\s*class\s*=\s*b\b/i))
           rating = 'b';
       else if (text.match(/\|\s*class\s*=\s*bplus\b/i))
           rating = 'bplus'; // used by WP Math
       else if (text.match(/\|\s*class\s*=\s*c\b/i))
           rating = 'c';
       else if (text.match(/\|\s*class\s*=\s*start/i))
           rating = 'start';
       else if (text.match(/\|\s*class\s*=\s*stub/i))
           rating = 'stub';
       else if (text.match(/\|\s*class\s*=\s*list/i))
           rating = 'list';
       else if (text.match(/\|\s*class\s*=\s*sl/i))
           rating = 'sl'; // used by WP Plants
       else if (text.match(/\|\s*class\s*=\s*(dab|disambig)/i))
           rating = 'dab';
       else if (text.match(/\|\s*class\s*=\s*cur(rent)?/i))
           rating = 'cur';
       else if (text.match(/\|\s*class\s*=\s*future/i))
           rating = 'future';
       this.callHooks('getRating_after');
       return rating;
   }


Where the code goes[edit]

The eventual goal is to create a mediawiki 'gadget' which users can enable at https://www.mediawiki.org/wiki/Special:Preferences#mw-prefsection-gadgets

 $('.reference').length

JS Fiddle: http://jsfiddle.net/MqfAZ/

JS Fiddle for UI fiddlin': http://jsfiddle.net/eSEFq/


Template for citation:

 $(".ambox-Refimprove:contains('citation')").length
 $('.ambox-Notability').length
 $('.ambox:contains("importance")').length
 $('.ambox:contains("advertisement")').length
 $('.ambox:contains("cleanup")').length
 $('.ambox:contains("confusing")').length
 $('.ombox:contains("deletion")').length
 $('.ambox:contains("quality standards")').length
 $('.haudio').length