[JobQueue] Improved refreshLinks/htmlCacheUpdate job de-duplication.
* Added JobQueue::deduplicateRootJob() function which uses cache
records of the last time a "root" job was initiated for a task
in order to invalidate prior jobs for that task. For refreshLinks,
the "task" is basically "enqueueing the refresh jobs for title X".
* (bug 27914) Also added new Job::getDeduplicationFields() function
and made use of it with refreshLinks to exclude things like 'masterPos'
from duplicate job check comparisons for refreshLinks.
* (bug 27914) Always resolve refreshLinks2 jobs down to refreshLinks jobs.
For each affected pages, one of them will get their job popped
first, which will remove the duplicates for that page unless
one page is in a refreshLinks2 jobs and the other in refreshLinks.
* (bug 37731) Made LinksUpdate/HTMLCacheUpdate defer the large
backlinks query by doing it in an outer job.
* (bug 42065) HTMLCacheUpdate will no longer purge pages that were
already purged since the job was added.
Change-Id: I71b743e0a38e60a874ca856e80cb761bea06b689
12 files changed: