Moving at the Speed of Creativity by Wesley Fryer

MediaWiki spam cleanup recap and tutorial

The best way to learn how to use a new technology tool is to find or create a current, authentic purpose for using it. That is my present situation with MediaWiki (the free and open source wiki platform which powers WikiPedia and many other wiki sites worldwide) and our Storychasers’ wiki. In this post, I’ll share a bit of background about how I got to this point and some of the “cleanup” procedures I had to do recently to delete spammer user accounts, delete spammer’s pages, and revert other spam edits to our wiki. I’ve also added some additional protection to the installation, and I’ll describe that as well. MediaWiki is definitely NOT as easy and straightforward to use (from a site administrator’s perspective) as a commercially-supported wiki option like Google Sites, WikiSpaces, or PBworks, but it IS extremely powerful and flexible – and I’m glad to have this opportunity to learn more about its functions. One of my personal goals for our Storychasers Mobile Learning Collaborative is to encourage more awareness and use of open source software tools, so learning more technical details about MediaWiki is a good thing… even though I wish some of this was a bit less tedious to figure out, and I definitely could think of several other ways I’d have rather spent a few hours this weekend than cleaning up wiki spam.

Mediawiki logo from

BACKGROUND

For our Celebrate Oklahoma Voices project wiki the past three years we’ve used a Google Site as our content management system / wiki platform, and that choice has served us very well although it’s still limited in some ways. (Most notably the inability to use all types of embed code on pages, as you can with Wikispaces and PBworks.) Rather than create a separate project wiki for Celebrate Kansas Voices and duplicate many of the resource pages we utilize in our workshops (like those for image resources, audio/music resources, and copyright,) it seems more logical to use a central Storychasers wiki which can not only support CKV and COV, but additional statewide oral history / digital storytelling projects we may start in the months ahead. Back in 2008 I installed MediaWiki on the Storychasers’ shared hosting account I’d paid to create, and setup the installation to be accessible from a subdomain of our storychasers.org site (at wiki.storychasers.org) as ideas for the Storychasers project (and eventually non-profit organization) started to take shape. As the start date for our first Celebrate Kansas Voices workshop (this past week) drew nearer, instead of setting up a brand new wiki installation I decided to just use this existing MediaWiki installation. I made this decision knowing it was NOT as secure as a Google Site could be, in terms of preventing spammers from editing it, but figured if and when I ran into those problems I’d just figure out how to resolve them. Those resolutions are detailed below.

WHY SPAM ON A WIKI?

Why would someone put spam on someone else’s wiki website? Can you “turn in” someone who has spammed your site? Both of these questions were asked this weekend by my 10 year old daughter, who was interested to hear a little bit about what her dad was doing “cleaning up” one of his websites. The short answers to these questions are:

  1. Some people add their own links to other people’s wikis, blogs, and other websites in an attempt to gain higher Google Pagerank, which comes (at least in part) from the number of times other websites link to your website. By adding links to your site, website developers hope to increase the rank of their own sites when people search for the products or terms they sell and use on their sites.
  2. No, there is generally NOT a way to turn someone in to the “spam police” if they do something like add spam to your wiki or blog. IP addresses which are known for sending out spam and running spambots can be blacklisted, but as far as I know there’s not any way to “turn in” wiki spammers like those I dealt with on our site.

HOW DO PEOPLE SPAM A WIKI?

Spam can be added to a wiki, and specifically a MediaWiki installation, in multiple ways. The list below is not necessarily comprehensive, but these are the spam forms I dealt with this weekend.

Sometimes a spammer will simply add a link to their site on an existing page of your wiki. This happened to our “Interviewing Techniques” page, when a newly registered user added a link to a commercial essay writing service.

Spam link on our MediaWiki

Spammers can also add their links to NEW PAGES which they create on your wiki site. The following two screenshots show examples.

This is an example of a SPAM page created on our MediaWiki install

If you are logged into your MediaWiki installation as an administrator, you’ll have a DELETE tab at the top of each article which can be used to delete spam pages.

Deleting spam page on wiki

Each article page on a MediaWiki installation includes a DISCUSSION tab at the top, which can be used by authors to converse about proposed and past edits to that particular article. On controversial WikiPedia articles, for example, it can be illuminating to view the discussion pages to get an “insiders perspective” on the debates raging around particular topics. Discussion pages are, however, yet another place spammers can add their unwanted links, and this happened on our site a LONG time ago. This particular edit dated back to 2008, but I had never viewed it until this weekend so I didn’t know it existed.

Spam on the discussion page

Spam on discussion pages can be removed by simply clicking the EDIT page on the discussion page in question, deleting the spam, and then saving the page.

HOW CAN YOU FIND NEW SPAM ON A MEDIAWIKI SITE?

There are several ways to find new spam on your site. One way is to click the link (usually in the left sidebar for MediaWiki) for RECENT CHANGES. It will show all the pages which have been changed in the past seven days, by default, and you can specify a different time interval if desired. In the case of our wiki, which only has a VERY limited number of genuine (non-spammer) authors, it’s pretty easy to see when someone you don’t know has edited or created a page.

Spam edits on MediaWiki installation

After clicking the DELETE key on a spam page, MediaWiki will prompt you to identify the reason for the deletion. This information will be archived on the site. I chose “vandalism” as the reason for most of my page deletions this weekend.

Spam pages deleted - Hide Admin Edits

It’s also possible to use the automatically-generated MediaWiki page for “Dead End Pages” to identify spam pages, especially those which may not have been recently created. Dead end pages are ones which do not link to other pages on your wiki. Spammers will almost always just link OUT to their own sites, and not link to other pages on your wiki, so this is a quick way to find them in many cases.

Using Dead-end pages in MediaWiki to delete spam pages

When you want to remove spam on one of your own pages, you don’t want to DELETE the entire page, of course. Instead, you want to REVERT to the most recent “good” version of the page. WikiPedia has a good article on page reverting you can checkout for more information about how to do this. I used the “Rollback” option this weekend to revert some spam edits made to my own pages.

HOW CAN YOU DELETE SPAM USERS FROM A MEDIAWIKI INSTALLATION?

It’s possible to “ban” a user account which a spammer has created on your MediaWiki site, but in my case I wanted to delete their accounts altogether. I did this by logging into the administrative control panel of our shared hosting server account, and accessing the program phpMyAdmin. I use this program periodically to back up the mySQL databases used in my WordPress installations. MediaWiki uses MySQL also. By viewing the USERS table of my MediaWiki database in phpMyAdmin, I was able to select and DELETE about thirty spam users who had registered on my site to add unwanted content.

Deleting MediaWiki Spam accounts

HOW CAN SPAM BE PREVENTED AND KEPT OUT OF A MEDIAWIKI INSTALLATION?

MediaWiki is setup to be a very open platform for editing and user contribution, but this openness can and often IS exploited by spammers with malicious intentions. Over a year ago I figured out how to restrict or prohibit anonymous editing on my MediaWiki installation. This is done by adding an extra line of code to your “LocalSettings.php” file in your MediaWiki installation. Once this is done, anonymous users (who have not created an account on your site) will be shown a permissions error when they try to edit ANY page on your site.

Only registered users can edit!

I’ve found disabling anonymous editing DOES make a big difference, but it certainly won’t keep out all spammers since it’s a pretty straightforward process to create a new account on a MediaWiki installation. The official MediaWiki manual has an extensive page on strategies which can prevent unwanted access to your MediaWiki installation. In most cases, you want to strike a balance between locking everything down TOO much, and still allowing enough openness that others who want to make worthwhile contributions to your website can do so.

At this point, I’ve chosen to set “page protection” on the main pages of our wiki, so only administrators can edit them. This is straightforward to do when you’re logged in as an administrator: Simply click the PROTECT tab at the top of an article/page to enable it for that page, and choose the desired restrictions.

Adding MediaWiki page protection

Down the road, I think it will be good to install the MediaWiki extension “ConfirmAccount” to create an approval queue for new accounts on our site. I’m not ready to do that now, however, because I’m still running an old version of MediaWiki (v 1.3) and have never upgraded it. The latest version (as of this writing) is 1.6, and my cursory review of the MediaWiki upgrade procedures indicates it can be tricky to upgrade an installation as old as mine. I’m also not entirely sure how I’ll be able to do this since I’m running MediaWiki on a shared hosting server, and don’t have command line access to my server. If you have any links that would help me for this upgrade process, I’d be most appreciative.

FINAL THOUGHTS

All of this probably seems fairly complicated, and I’m not going to contest that. Despite these complexities, however, I think it’s GREAT to gain further knowledge about the way MediaWiki works and site administrators can revert as well as delete unwanted contributions. I’ve made a number of contributions to WikiPedia over the past few years, but my knowledge of the MediaWiki platform is still pretty limited. I’m glad to learn more and also have a chance to pass along some of this knowledge to you.

If you’re an educator considering what to use for your own classroom learning portal, I’d encourage you to go with an option like Google Sites, WikiSpaces, or PBworks. It’s important to know how to contribute to WikiPedia as well as understand how the site works overall, in terms of edits and validity, but it’s probably not wise (unless you’re an uber-geek) to run a MediaWiki site as your primary professional learning site. If you’re a school technology director or other IT staff member, however, I definitely encourage you to look further at MediaWiki as a platform for student and educator use. You do have to provide your own server, but MediaWiki is a unique open-source solution which has important relevance for all learners because of its utilization by WikiPedia. WikiPedia doesn’t appear to be going away anytime soon, and as it continues to grow in breadth and depth its value as well as importance will only increase.

If you want to learn more about WikiPedia, listen to Jimmy Wales’ (its founder) presentation “Free Speech, Free Minds and Free Markets” on Fora.tv from September 2008. The summary of this 1 hour, 43 minute talk is:

Wikipedia founder Jimmy Wales joins journalist Christopher Lydon to address the direction of web 2.0 and how Objectivist philosophy guides his vision. Across the globe we are building, editing, and contributing to a growing body of knowledge and tools at everyone’s fingertips. Volunteers in leaderless organizations contribute to online initiatives and articles. Software developers spend their free time collaborating with complete strangers. Amazingly, these efforts are creating products of extraordinary quality, sometimes better than that of large for-profit organizations. Why do we do it? Why does it work?

Note that while Jimmy Wales spent time discussing Wikia Search in this presentation, that service was discontinued in mid-2009.

WikiPedia matters, and so does MediaWiki. It pays to become more informed about both.

Technorati Tags:
, , , , , , , ,

If you enjoyed this post and found it useful, subscribe to Wes’ free newsletter. Check out Wes’ video tutorial library, “Playing with Media.” Information about more ways to learn with Dr. Wesley Fryer are available on wesfryer.com/after.

On this day..


Posted

in

,

by

Tags:

Comments

14 responses to “MediaWiki spam cleanup recap and tutorial”

  1. Stephen M (Ethesis) Avatar

    btw, http://mmsd.org/webpub/aolpress.htm is no longer found on your website.

    I was thinking of reinstalling aolpress for some fast and dirty prototyping. It works well for that.

    Most download locations for it no longer work though, and in any case I’m wondering if it will work under Vista or Windows 7.

    Anyway, just a heads up about a broken link.

  2. Wesley Fryer Avatar

    Thanks for letting me know, Stephen – That is not a website I actually maintain, and I don’t have a copy of AOL Press, so I’m afraid I can’t help you with getting a copy. It’s kind of amazing a simple keyword search for “AOL Press” on both Google and Bing today brings my 2003 website, Basics of AOL Press up in the first five hits. The last sentence in section five of the tutorial really gives this away as an old webpage:

    Copy your entire html folder to a floppy disk and give it to the person responsible for maintaining your school website.

    🙂
    I’d recommend using the free program Kompozer now instead of AOL Press to edit basic webpages.

  3. Gregory J Kohs Avatar

    Note that Jimmy Wales is not “founder” of Wikipedia. Dr. Larry Sanger came to Jimmy Wales, the owner of some web servers, and asked him to install wiki software. Wales spent about 30 minutes installing wiki software. Sanger then named the project “Wikipedia”. Sanger issued the first public invitation to participate. And Sanger ironed out most of the key guidelines and policies that still govern Wikipedia today. For better or for worse, those who give this full-credit “founder” title to Wales are, in a word, lying about history.

  4. Mathieu Plourde Avatar

    Thanks for this post, Wesley. It’s very timely, as we are about to start playing with MediaWiki to host our Sakai documentation at the University of Delaware.

    We’re probably going to heavily restrict the number of people who will have access to edit the content, but we might open it up a bit as time progresses.

  5. Wesley Fryer Avatar

    Glad this is helpful, Mathieu.

    The problem with access permissions on MediaWiki is that it’s setup to use a blacklist rather than a whitelist of users. With Google Docs you can specify just a few editors and let everyone else have read-only access. With MediaWiki, by default anyone can edit. You can require user accounts, and that is what I did initially. Those people still have all edit rights, unless you lock individual pages.

    Since I locked all our individual pages on Aug 8th, I haven’t had any other spam edits. Theoretically someone could have created new spam pages with user accounts, but that hasn’t happened.

    I think there’s a free plugin available which lets you specify a required enrollment key to create a new user account. That might be helpful in your case. It’s burdensome to lock all pages, because then your “regular” users can’t edit. I’m not completely sure what the solution is in all cases, but at least there are options – AND at least there are ways (as this story demonstrates) to lock things down so spammer’s can’t take over. Larry Lessig is the first individual I saw running MediaWiki on his own site (wiki.lessig.org – now apparently not available there) and I think it was taken down on his site because of spam / lack of lockdown issues. Not sure though, but I think I read that on his blog at one point.

    I’ll be very curious to know how things go with your MediaWiki installation. There’s certainly a LOT which can be done with it, and a lot of value to both using it and learning how to manage it. If you think of it, please send me a tweet when you get your site up and running and I’d love to check it out.

    Good luck to you.

  6. Mathieu Plourde Avatar

    It does make sense to blacklist only and leave open by default. That’s what a wiki culture is. There are so many other tools available if what you’re looking for is lockdown control.

    I’ll definitely get back in touch, probably closer to December. Thanks again!

  7. Wesley Fryer Avatar

    Here’s that plugin, it’s called ConfirmAccount

  8. Wesley Fryer Avatar

    Right – I agree. With spambots that are programmed to put spam on wikis, however, that cultural ethic can certainly be put under siege, I think.

  9. Roy Grubb Avatar

    Useful article, thanks.

    Something I experience with my Mediawiki site puzzles me. I wonder if you have any idea what is happening.

    I don’t allow anonymous edits, but only my front page is locked for editing. I have practically no problem with spam – just a couple of spammers in two years, and the first was before I disabled anonymous edits.

    But I do get large numbers of accounts set up with obvious spammer account names like Silverseeker73x0g5. Every day I get a few, with the times between setting each one up being a one or two hours.

    The accounts are never apparently used for editing. I block them as soon as I see them, and I check Recent Changes a couple of times a day. I like to block rather than delete them, because that blocks other attempts from the same IP address, which at least causes the spammer the minor inconvenience of having to go through a proxy.

    What worries me is that some editing might be going on that doesn’t, for some reason, show up in recent changes. Google’s webmaster tools doesn’t show any keywords on my site that don’t have relevance to the wiki.

    Have you seen this behavior? Any ideas?

    Roy

  10. Wesley Fryer Avatar

    Roy: I absolutely DO have this same problem of lots of spammers creating accounts on my MediaWiki site, and I’d like to put an end to it. I want to install the ConfirmAccount extension, which I think can prevent this, but I’m running an older version of MediaWiki and haven’t dedicated the time (yet) to figure out how to backup my entire site and upgrade. You might give that extension a try, I think it should stop the spammers creating accounts. I’d love to know if you do!

    The spam accounts which are created on my site aren’t currently being used for editing either, now that I’ve protected all my main pages. I am thinking these must be generated by spambots.

  11. Roy Grubb Avatar

    I think you’re right about bots. That might explain the hour or more gaps between account registration. Still I wonder at the motivation of the person running the bots.

    I’ll look into the extension, thanks for the link.

    I think it is advisable to upgrade version – there are (as so often) known security issues with older versions.

  12. Nutellacrepe Avatar
    Nutellacrepe

    thanks! Very handy summary.

  13. Alexandar Danilovic Avatar
    Alexandar Danilovic

    Just google “Wamp Server (or LAMP for linux)” and download it to create a local server on your computer.  I am guessing you know how to backup your wiki after 2-3 years. Fire the wiki up on your local server, then perform the upgrade, and if you are satisfied with the result, you can upload the wiki to your host. Obviously, you should disable edditing for a day or less wia $wgReadOnly = “bla bla”; so that no edits are “lost” in the upgrade.

    If you can spare the time, you might be able to update the wiki and learn a thing or two as well. Cheers.

  14. Felicity Merriman Avatar
    Felicity Merriman

    Another method that can keep bots out of your wiki is by using the Check Spambots extension, along with Spam Blacklist and Questy Captcha. While the latter might be a little more inconvenient on part of the user (as he has to answer a trivia question), that would certainly stop bots dead on their tracks.

    I’d advise against deleting users via the database, though, I heard that it can cause undesired consequences on a wiki site.