Improve RemexStripTagHandler working with tables
authorErik Bernhardson <ebernhardson@wikimedia.org>
Thu, 14 Mar 2019 20:06:27 +0000 (13:06 -0700)
committerErik Bernhardson <ebernhardson@wikimedia.org>
Thu, 14 Mar 2019 20:11:59 +0000 (13:11 -0700)
HTML, generated by some infoboxes and perhaps other places, gets
stripped in a way that merges words together that should not be
merged. Add tr, th, and td to the list of tags that should force
word separation.

Bug: T218001
Change-Id: Ib374339628b1f543ea4e07f24aa3e3b76f3117b5

includes/parser/RemexStripTagHandler.php
tests/phpunit/includes/parser/SanitizerTest.php

index bf4c098..2d75c86 100644 (file)
@@ -87,7 +87,10 @@ class RemexStripTagHandler implements TokenHandler {
                'pre' => true,
                'section' => true,
                'table' => true,
+               'td' => true,
                'tfoot' => true,
+               'th' => true,
+               'tr' => true,
                'ul' => true,
                'video' => true,
        ];
index ad8aa1e..1f6f4e8 100644 (file)
@@ -527,6 +527,7 @@ class SanitizerTest extends MediaWikiTestCase {
                        ],
                        [ '1<span class="<?php">2</span>3', '123' ],
                        [ '1<span class="<?">2</span>3', '123' ],
+                       [ '<th>1</th><td>2</td>', '1 2' ],
                ];
        }