In my quest to find all articles that have only been edited by one human editor, I wrote this query:
USE enwiki_p; SELECT page_title FROM ( SELECT p.page_title, r.rev_actor, a.actor_name FROM ( SELECT page_title, page_len, page_id FROM page WHERE page_namespace = 0 # Mainspace AND NOT page_is_redirect ) AS p # All mainspace pages LEFT JOIN revision_userindex r ON r.rev_page = p.page_id LEFT JOIN actor a ON r.rev_actor = a.actor_id WHERE NOT IS_IPV4(a.actor_name) # Ignore IP editors and bots AND NOT IS_IPV6(a.actor_name) AND LOWER(a.actor_name) NOT LIKE '%bot%' AND LOWER(a.actor_name) NOT LIKE '%script%' ) AS pra GROUP BY (page_title) HAVING COUNT(rev_actor) < 2 # Only 1 editor
Problem: It times out.
Question: How to make it run faster?
For instance, to improve speed I have thought about skipping pages that have more than 50 revisions, but I am not sure how to implement it.