Preserve whitespace in search index text content
authorErik Bernhardson <ebernhardson@wikimedia.org>
Mon, 10 Sep 2018 23:39:09 +0000 (16:39 -0700)
committerErik Bernhardson <ebernhardson@wikimedia.org>
Fri, 14 Sep 2018 18:10:35 +0000 (11:10 -0700)
commit0d779c1ac6ee790663e4a97027d0f044522dfaa2
tree13fad49683951f55b4cc8cbead484ca9441f1169
parentd0fd63da25fb4798a8bbee3032b5cea35812d1ca
Preserve whitespace in search index text content

Certain html tags imply a word break, but our html stripping doesn't
understand that at all. Adjust the html stripping to inject whitespace
for all block level tags (per MDN) along with the <br> element.

Bug: T195389
Change-Id: I9fbfac765ea88628e4f9b2794fb54e1cd0060203
includes/content/WikiTextStructure.php
includes/parser/RemexStripTagHandler.php
includes/parser/Sanitizer.php
tests/phpunit/includes/content/WikitextStructureTest.php
tests/phpunit/includes/parser/SanitizerTest.php