<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.occupyboston.org/wiki/index.php?action=history&amp;feed=atom&amp;title=Mediawiki_Deleted_Page_and_Spam_Account_Cleanup</id>
	<title>Mediawiki Deleted Page and Spam Account Cleanup - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.occupyboston.org/wiki/index.php?action=history&amp;feed=atom&amp;title=Mediawiki_Deleted_Page_and_Spam_Account_Cleanup"/>
	<link rel="alternate" type="text/html" href="https://wiki.occupyboston.org/wiki/index.php?title=Mediawiki_Deleted_Page_and_Spam_Account_Cleanup&amp;action=history"/>
	<updated>2026-06-11T19:16:31Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.41.0</generator>
	<entry>
		<id>https://wiki.occupyboston.org/wiki/index.php?title=Mediawiki_Deleted_Page_and_Spam_Account_Cleanup&amp;diff=15789&amp;oldid=prev</id>
		<title>Srevilak: Notes on mediawiki cleanup.  Initial revision.</title>
		<link rel="alternate" type="text/html" href="https://wiki.occupyboston.org/wiki/index.php?title=Mediawiki_Deleted_Page_and_Spam_Account_Cleanup&amp;diff=15789&amp;oldid=prev"/>
		<updated>2012-09-02T00:58:35Z</updated>

		<summary type="html">&lt;p&gt;Notes on mediawiki cleanup.  Initial revision.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;== Background ==&lt;br /&gt;
&lt;br /&gt;
I've decided to remove all of the deleted pages from Mediawiki's database, along with removing spammer user accounts.  My definition of spammer: (1) the account is currently blocked, or (2) the user's only revisions are deleted pages.&lt;br /&gt;
&lt;br /&gt;
I'm making notes about the process, so that we know what to do next time.  (Or, what not to do, if that happens to be the case).&lt;br /&gt;
&lt;br /&gt;
Every non-spam page has been marked as &amp;quot;patrolled&amp;quot;; all of the spam has been deleted.  This makes the cleanup a lot easier.&lt;br /&gt;
&lt;br /&gt;
== backup stuff ==&lt;br /&gt;
&lt;br /&gt;
First, we backup the entire database.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mysqldump --complete-insert ob_wiki | xz -c &amp;gt; obwiki-pre-cleanup.sql.xz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And, let's make a separate backup of the user table.  In case we accidentally delete an user we shouldn't, this will make it easy to put them back.  Note that this form of mysqldump writes each wrote as a separate insert statement.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mysqldump --extended-insert=0 \&lt;br /&gt;
  --complete-insert \&lt;br /&gt;
  --no-create-info \&lt;br /&gt;
  ob_wiki user  obwiki-user.sql&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Of course, you should rehearse this procedure, by loading the database dump into your own development environment.&lt;br /&gt;
&lt;br /&gt;
== deleted page removal ==&lt;br /&gt;
&lt;br /&gt;
First, truncate the archive table.  This removes metadata from deleted pages, along with their edit history.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MYSQL&amp;gt; truncate table archive;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Next, remove the text of delete pages.  (Media wiki store all text -- deleted and non-deleted in the same &amp;lt;tt&amp;gt;text&amp;lt;/tt&amp;gt; table).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
SHELL&amp;gt; php maintenance/purgeOldText.php --purge&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Removing Orphaned Accounts ==&lt;br /&gt;
&lt;br /&gt;
Mediawiki comes with &amp;lt;tt&amp;gt;maintenance/removeUnusedAccounts.php&amp;lt;/tt&amp;gt; which is a pretty conservative account removal tool.  It removes accounts that don't meet the following criteria:&lt;br /&gt;
&lt;br /&gt;
:(No edits, no deleted edits, no log entries, no current/old uploads)&lt;br /&gt;
&lt;br /&gt;
I've made a modified version of this program, that omits the &amp;quot;no log entries&amp;quot; condition.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
*** maintenance/removeUnusedAccounts.php        2012-09-01 19:33:53.875901952 -0400&lt;br /&gt;
--- maintenance/removeUnusedAccounts-obit.php   2012-09-01 19:33:45.672785090 -0400&lt;br /&gt;
***************&lt;br /&gt;
*** 91,98 ****&lt;br /&gt;
         */&lt;br /&gt;
        private function isInactiveAccount( $id, $master = false ) {&lt;br /&gt;
                $dbo = wfGetDB( $master ? DB_MASTER : DB_SLAVE );&lt;br /&gt;
!               $checks = array( 'revision' =&amp;gt; 'rev', 'archive' =&amp;gt; 'ar', 'logging' =&amp;gt; 'log',&lt;br /&gt;
!                                                'image' =&amp;gt; 'img', 'oldimage' =&amp;gt; 'oi', 'filearchive' =&amp;gt; 'fa' );&lt;br /&gt;
                $count = 0;&lt;br /&gt;
  &lt;br /&gt;
                $dbo-&amp;gt;begin();&lt;br /&gt;
--- 91,102 ----&lt;br /&gt;
         */&lt;br /&gt;
        private function isInactiveAccount( $id, $master = false ) {&lt;br /&gt;
                $dbo = wfGetDB( $master ? DB_MASTER : DB_SLAVE );&lt;br /&gt;
!               $checks = array( 'revision' =&amp;gt; 'rev', &lt;br /&gt;
!                                  'archive' =&amp;gt; 'ar', &lt;br /&gt;
!                                  //'logging' =&amp;gt; 'log',&lt;br /&gt;
!                                  'image' =&amp;gt; 'img', &lt;br /&gt;
!                                  'oldimage' =&amp;gt; 'oi', &lt;br /&gt;
!                                  'filearchive' =&amp;gt; 'fa' );&lt;br /&gt;
                $count = 0;&lt;br /&gt;
  &lt;br /&gt;
                $dbo-&amp;gt;begin();&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So, let's run this&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
php maintenance/removeUnusedAccounts-obit.php --delete&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally, let's remove users that appear in the blocked list.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-- See who we're about to delete&lt;br /&gt;
MYSQL&amp;gt; select ipb_address from ipblocks where ipb_address in (select user_name from user);&lt;br /&gt;
&lt;br /&gt;
-- now delete them&lt;br /&gt;
MYSQL&amp;gt; delete from user where user_name in (select ipb_address from ipblocks);&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Cleaning IP Blocks ==&lt;br /&gt;
&lt;br /&gt;
Now that we've removed a bunch of users, we no longer need blocks for those users.&lt;br /&gt;
&lt;br /&gt;
Range based blocks contain text in ipb_range_start and ipb_range_end.  For user blocks, these will be empty fields.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MYSQL&amp;gt; delete from ipblocks where ipb_range_start = '' and ipb_range_end = '';&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Stuff removed ==&lt;br /&gt;
&lt;br /&gt;
These statistics come from a 1 Sep. 2012 cleanup run.&lt;br /&gt;
&lt;br /&gt;
* 3019 pages from &amp;lt;tt&amp;gt;archive&amp;lt;/tt&amp;gt;, representing 1162 pages.&lt;br /&gt;
* 4133 records from &amp;lt;tt&amp;gt;text&amp;lt;/tt&amp;gt;&lt;br /&gt;
* 3010 accounts from &amp;lt;tt&amp;gt;user&amp;lt;/tt&amp;gt;&lt;br /&gt;
* 2006 records from &amp;lt;tt&amp;gt;ipblocks&amp;lt;/tt&amp;gt;&lt;/div&gt;</summary>
		<author><name>Srevilak</name></author>
	</entry>
</feed>