[JobQueue] Improved refreshLinks/htmlCacheUpdate job de-duplication.
authorAaron Schulz <aschulz@wikimedia.org>
Thu, 8 Nov 2012 22:01:40 +0000 (14:01 -0800)
committerGerrit Code Review <gerrit@wikimedia.org>
Wed, 28 Nov 2012 09:29:41 +0000 (09:29 +0000)
commit5ef62175bf8923eea565741e877820b2426b42e4
tree6100d1e6e151074f474d08347dc1e2702ee51154
parent502b0a22ebf969fb897b66045551bd31c499af46
[JobQueue] Improved refreshLinks/htmlCacheUpdate job de-duplication.

* Added JobQueue::deduplicateRootJob() function which uses cache
  records of the last time a "root" job was initiated for a task
  in order to invalidate prior jobs for that task. For refreshLinks,
  the "task" is basically "enqueueing the refresh jobs for title X".
* (bug 27914) Also added new Job::getDeduplicationFields() function
  and made use of it with refreshLinks to exclude things like 'masterPos'
  from duplicate job check comparisons for refreshLinks.
* (bug 27914) Always resolve refreshLinks2 jobs down to refreshLinks jobs.
  For each affected pages, one of them will get their job popped
  first, which will remove the duplicates for that page unless
  one page is in a refreshLinks2 jobs and the other in refreshLinks.
* (bug 37731) Made LinksUpdate/HTMLCacheUpdate defer the large
  backlinks query by doing it in an outer job.
* (bug 42065) HTMLCacheUpdate will no longer purge pages that were
  already purged since the job was added.

Change-Id: I71b743e0a38e60a874ca856e80cb761bea06b689
12 files changed:
includes/AutoLoader.php
includes/LinksUpdate.php
includes/cache/BacklinkCache.php
includes/cache/HTMLCacheUpdate.php
includes/job/Job.php
includes/job/JobQueue.php
includes/job/JobQueueDB.php
includes/job/JobQueueGroup.php
includes/job/jobs/DuplicateJob.php [new file with mode: 0644]
includes/job/jobs/HTMLCacheUpdateJob.php
includes/job/jobs/NullJob.php
includes/job/jobs/RefreshLinksJob.php