(redirected from WikiSpam)

[Home]ToothyWikiInternals/Censorship

ec2-3-136-154-103.us-east-2.compute.amazonaws.com | ToothyWiki | ToothyWikiInternals | RecentChanges | Login | Webcomic

MoonShadow has decided to forego his idealism and venture into the heady waters of censorship. In particular, URLs posted by wiki spammers will be added to a list of censored material at MoonShadow's discretion. This will be accomplished by adding the following patch to DoPost, immediately before the line that requests an editing lock:


  # Check for banned content
  {
    my $bannedcontent = 0;

    $string =~ m'http:// (url1) /' and $bannedcontent++;
    $string =~ m'http:// (url2) /' and $bannedcontent++;
    $string =~ m'http:// (url3) /' and $bannedcontent++;
    # (add more here)

    if ($bannedcontent)
    {
      &ReportError(T('The posted text contains the URL of a commercial site that
 advertises using wiki spam. This practice is not tolerated and such URLs will t
herefore be censored.'));
      return;
    }
  }



Suggestion: allow the posting of spammed URLs to pages under Spam/ --DR
(PeterTaylor) Some spam is after people following the links directly, but some is simply after improving the position in Google search results, and in the latter case killing the links entirely is the way forward.

If you want to request that a URL blocked by the spam blocking system be unblocked, one possible page to use is:  MoonShadow/WhiteListRequest?

Spam currently being censored (robots are blocked from past revisions using [robots.txt]):

Would it not be sufficient to obfuscate the edit URL parameters? --Bobacus
Sorry, I don't get it - what would that do? So far AFAICT we've been spammed by one bot and four live humans, one of which actually went back and checked RecentChanges and put the spam back a few times after people reverted it. - MoonShadow
My bad. I'd assumed it was being done by bots that assumed the syntax of the URL for editing (on the basis that many wikis may be based on the same code). I am surprised that people are bothering to spam wikis like this manually! --Bobacus
Actually, you're right on the whole - while it's still early days for wiki spam, judging by bigger wikis the balance is shifting towards bots. They've just not been a problem for us yet ^^; Yes, I should probably do something about the edit URL before something nasty gets here.. - MoonShadow

I was thinking of some sort of Wiki RBL... -- Senji

A [google search] shows they spammed 1,270 sites.  Time to edit "edit" methings. --DR
I have a feeling that we are the target of a botnet driving a crawler.  I begin to see the rationale behind out-of-proportion penalties - deterrence really is required when things are this hard to stop.  --Vitenka (Suggest a 'this is a rollback' checkbox, to avoid appearing in the RecentChanges page?)
You need to take out the change of which you're rolling back too (which is why I haven't been making my rollbacks 'Minor Edit's. -- Senji
Making them minor edits doesn't help, for the twin reasons that the spam probably wasn't a minor edit in the first place and that I tend to watch the wiki with minor edits visible.  I don't mind (so much) seeing a spam in that list if I'm the first person to see it and have to remove it - but it's annoying to see them when someone has already killed them.  --Vitenka
I think that concentrating on how to do the rollbacks is tackling the wrong part of the problem.  Disallow the spammed URLs, either in the text of the page or in comments (such as in the latest spate).  Block certain IP addresses from accessing the wiki.  Get people to log in before being allowed to post.  Things like that.  That way, the wiki doesn't get messed up in the first place. --M-A
Blocking spam URLs is done here, but there are too many to list and whitelisting is a bad option.  Blocking IPs can be done, but has the same problems.  RBLs require a trusted party, and I don't trust the antispammers as far as I can throw them.  Requiring login means that you decrease the rate of new people arriving, and would probably stop me from posting.  --Vitenka
(PeterTaylor) Require login only for edits which include a hyperlink?
How about MoonShadow create an option "destroy this edit", enabled (and visible) only to administrators, and distribute an admin password to a few trusted Wikizens? This would mean none of the rest of us get bothered by the edits as soon as a Spambuster has zotted them, makes it very easy for those few people to do so so that it's minimum ongoing work, and doesn't require anyone to login? I suggest this as an addition to the current censorship lists. For that matter, allow those adminny types to add sites to the banned list too - usually this will go together with a spam-revertion. The main drawback I can see is that it creates a two-tier wikizen hierarchy... --AlexChurchill
(PeterTaylor) There's already a two-tier hierarchy: MoonShadow and the rest of us. ;-)
Well someone seems to just want to come here and be petty - Senji's whole page got deleted. Fixing it is a matter of moments, but it could get very tedious (especially if someone malicious actually has the skill to write a simple script and especially especially if they have access to a BotNet?) --Gwyntar
The only cure for that sort of pettiness is to ignore it. The spam isn't a major problem at the moment. I actually hadn't noticed it until this discussion started...so, I say we just continue doing what we're doing and anyone who notices mischief going on quietly undoes it. Hope it blows over. Don't let them force us to spend time instituting measures against them. -- Xarak
Ah yes, look slike it's two seperate people who happenned to attack at the same time.  One of whom at least is doing it by hand and is a MonkeyFacedFaecesEater?.  --Vitenka
Requiem misparsed that epithet and now has a very strange mental image.
Oh thanks a lot Requiem, now I see it too ^_^ How would that even work? You'd have to specially sculpt it that way and you'd have problems with texture unless...ok I'll stop. -- Xarak




Where's a good place to discuss general approaches to solving this problem?  Semi-innovative things like "Let the spammer see his page, but hide it from normal users"  --Vitenka (Though that's hard to implement.  And silently failing to include the URL when you're not logged in might blow up normal users.)

Dunno. Could delete the redirect in WikiSpam - seems a likely place :) - MoonShadow




What happens to a wiki that isn't being maintained? [Click on pretty much any page in their RecentChanges]..
Oh, Ewww.  And I just found pretty much the same of CURSWiki.  Much as I enjoy the occasional discussion of japanese and russian, might it be worth banning any outgoing link text that contains non-english characters?  --Vitenka
Not any more it isn't. Since MoonShadow gave us the 'action=edit' code fix, we've got on top of it. --Requiem




On a related note, a relatively new ToothyWikizen who has the preference "Show differences on all pages" switched on contacted me in disgust about all the porn sites all over the main ToothyWiki page. On a yellow background. You see the problem. One workaround would be to have those who fix spam do another null edit immediately afterwards, but that's really kludgey. Another would be to get rid of the "Show diffs on all pages" checkbox, but that seems a bit of a shame. Not sure what can be done about this... The "destroy this edit" mentioned above?  Any other ideas? --AlexChurchill
Um... someone who has managed to log in and select that check box really ought to know better...  --Vitenka
Just tried it.  Either the main page is special, or null edits get thrown away automatically.  --Vitenka
You'd need to have "show minor edits" turned on; otherwise it rolls the spammer's edit and the minor edit undoing it into a single null edit. Sorry, but the whole thing reads to me like "I went out of my way to make sure I could see all the changes that would normally be hidden, I looked at such a change, and I am complaining because it showed it to me." It's WorkingAsIntended. I really don't see what you'd like me to do about it. - MoonShadow
Fair point. I don't know how this person got all those options chacked: probably poking around Preferences trying things to see what they do. --AC

You misunderstood.  I just went to the main page and edited it, no changes, saved - went to RecentChanges, it didn't show up.  I 'do' have 'show minor edits' turned on.  Perhaps simply explaining what 'show all' means, and why it might be bad (which is counterintuitive, it is normally a good thing) might be sufficient?  --Vitenka
Ah, sorry. Yes, it does ignore null edits. What do you think, Alex - would altering the text from "Show minor edits" to something like "Show all edits, even minor ones like spelling corrections and spam removal", and default difference type from "Major" and "Minor" to "Relevant changes only" and "Everything, even spelling corrections and spam removal" help? - MoonShadow
Null edits Don't Work (have tried them before). The wiki's even smart enough to know that a 'enter, backspace' is the same as not doing anything at all... - SunKitten
Well, not smart - it doesn't actually see the... You don't want the technical stuff, do you?  Anyway, will bear that in mind, in future.  --Vitenka
Most spam cleanups are marked as such. Maybe you could gently suggest to the individual in question that they not click on diffs marked as spam, and we can try and encourage reliable spam marking - SunKitten
The problem is the individual in question didn't click on a diff link or see a revision comment - they have turned on the "Show diffs on all pages" option, which does a diff at the top of every page they visit, and set the diff type to "minor" (otherwise the spam and its fix would have been rolled into one and they wouldn't have seen it). - MoonShadow
Oh, I see. I've never used the 'show diffs on all pages' option - was happily unaware it existed :) - SunKitten
And the problem-problem is that any new wikizen might do just the same.  I think edumacating them as to why they might not want to set the 'show minor' option is a good idea.  (We're mostly pretty good about only using it for real minor things and spam edits)  Alternately, an action=ChokeToDeath? (this change is a spam edit) which doesn't show up in the diff in the 'all diffs' might be nice, but harder to do.  --Vitenka

How about this: whenever I fix spam or spot spam that someone else has fixed, I currently add the URL(s) to a banned list so the spam does not recur. I could have the diff code check the revisions it's fetched and is about to diff against the banned list, and only display the diff if neither of them contain banned URLs. - MoonShadow
Oooh!  That's even clever!  --Vitenka  (I'm surprised by how spamless this place is, recently - is your script catching a lot?)
That sounds like a very cunning plan. Your wording changes above might be useful also (although the wording don't have to be quite that polarised, and I'd suggest "Default difference type" --> "Default difference to show"). But if you can plug it into the blockedlist, that would be really clever. --AC
One problem with this is that there will be a time delay between someone editing out the spam, and you adding the URLs to the banned list, during which people will see the spam in the diffs. --Admiral
There's also a gap between the spam arriving and being edited out.  Neither will be terribly long ;)  --Vitenka
There's a multitude of people purging spam from the wiki, but only one MoonShadow adding URLs to a banned list. --Admiral
Quite. I feel a better longterm solution to the time delay problems would be for me to tidy up the banned list code so that the list is obtained programmatically rather than hardwired into the wiki script, and stored in a more intuitive format than perl regexps (yes, to some people there is such a thing :) Probably shell-style * and ? wildcards). I could then place it on an edit-locked uncrawlable wiki page and hand out editor passwords to people that regularly tidy spam and ask for them (which would also save me a lot of fuss, both in terms of the mechanics of editing the list and also because I won't always be the one editing it). I might then publish the patch on the UseModWiki site :) - MoonShadow
That sounds like generally a good plan, although I would shy away from getting wrong what perl has already got right. I like the idea of using a wiki page itself as an editable config file. Of course, you realise that once you have implemented one user-authorisation-required page you'll be inundated with requests to make more. How many of your patches have you sent off to the UseModWiki site? --Admiral
More a case of using what shell has already got right than what perl has, since the learning curve for most users will be shallower. As for patches, this could be the first change major enough yet self-contained enough for me to be able to submit; since there are large numbers of changes and I have not maintained a revision history I am generally unable to provide patches for individual features, and so I've never submitted any before, though a number of people are running branches of ToothyWiki in preference to UseModWiki. - MoonShadow
I wonder how many projects start that way?  ;)  Doesn't the link-map already work as a user-editable page (though ours is locked down)?  --Vitenka

Another idea - when you check in a change, specify whether or not the default diff shown will be between the old version and the new, or between the old old version and the new. So effectively you would keep a counter of the "old" version number, update it when an edit is done and so specified, and then show diffs from that "old" version instead of always from the newest-but-one version. The only dangerous thing it would allow people to do is to make people see more changes in the diff than only the changes you made, but it would mean spam gets cancelled out and won't appear in the diff. --Admiral
That would need some careful thinking in order to work out quite how it interacts with user preferences. At the moment, we don't always show differences from the newest-but-one version - there are also concepts of author diff and major diff. Because of this, the decision on which two revisions to generate a diff between is performed entirely at display time.  A "skip this diff" flag might work, except that's basically what I'll be accomplishing with the banned list check *and* that wouldn't have the associated problem of abuse since no-one else but me can edit the banned list. - MoonShadow




(MoonShadow) Whoever suggested changing the "edit this page" URL to a nonstandard one, your time has come.
 sham@hex:~/root/wiki$ grep action=edit /var/log/apache/access.log | wc -l
  1064
sham@hex:~/root/wiki$ grep action=edit /var/log/apache/access.log | wc -l
  1074




So, I have noticed that quite a lot of bot spammers use a summary that is very easy to recognise. Like the last one, which was "hHUeuVeCnb?". Could an automatic system be put in place that recognises a summary that contains exactly ten alpha characters with no spaces that isn't a word, and rejects the edit? --Admiral
MoonShadow said (in a spam revert comment): Nothing in that spam I could have filtered on :(.
I disagree. That particular spam had a summary (of "pnrQVOetMBt?") that was very easily recognisable, as suggested above. Although it seems they have progressed beyond having exactly ten characters. --Admiral



The posted text contains the URL of a commercial site that advertises using wiki spam. This practice is not tolerated and such URLs will therefore be censored. Your IP address has been recorded.
Problematic site(s): <something or other>
Posting host: user-544523ba.lns4-c11.dsl.pol.co.uk
The problem is, that URL was ALREADY ON the page - it wasn't part of the new stuff I added during that edit.
Is there any way you could set the spam checker to check the DIFF rather than the complete submitted page, so that old URLs (presumably intentionally put there, and not actual wikispam) don't get broken?  --DR

Ooops, on checking it appears that this time it was in the new stuff. i f o r m a t i o n w e e k to be precise.  Still a good idea, though, so I'll leave it in. --DR

Hit again, by the "The posted text contains the URL of a commercial site that advertises using wiki spam." message.  This time, the URL I was posting to SiteofTheMoment? was an apple shortcut, of the format:
itms://deimos3.apple.com/WebObjects?/Core?.woa/Browse?/researchchannel.org.1795918000.01795918006.1795787206?i=1394082387
which ToothyWiki didn't recognise and leaves looking like:
[itms://deimos3.apple.com/WebObjects?/Core?.woa/Browse?/researchchannel.org.1795918000.01795918006.1795787206?i=1394082387 a link]
so I had the bright idea of converting it to a T i n y U r l:
http://t i n y u r l.com/sidmeier
only it turns out that service is on your blacklist.

Is there anyway you could make a whitelist of ToothyWiki login IDs (ones which have passwords) that can post URLs on the blacklist?  --DR
TinyUrl? is blocked? That's surprising - I didn't know spammers used that. There are alternatives like http://snipurl.com and http://tr.im - try one of them (I seem to have been able to post them) --AC

Latest problem by DouglasReay:
On my page DouglasReay/KnowledgeStructure?
I had a link to http://www.infoanarchy.org/wiki/wiki.pl?The_Circle
That site is no more, so I tried to link to the archived version from the Wayback Machine:
http://web.archive.org/web/20041115143951/http://www.infoanarchy.org/wiki/wiki.pl?The_Circle
and got the dread:
"The posted text contains the URL of a commercial site that advertises using wiki spam. This practice is not tolerated and such URLs will therefore be censored. Your IP address has been recorded."
(note, in adding this report to this page, I added " " into the URLs)
Should be fixed now. The actual thing being banned was unrelated, the regexp was slightly too greedy. --MoonShadow

ec2-3-136-154-103.us-east-2.compute.amazonaws.com | ToothyWiki | ToothyWikiInternals | RecentChanges | Login | Webcomic
This page is read-only | View other revisions | Recently used referrers
Last edited December 3, 2012 10:34 am (viewing revision 81, which is the newest) (diff)
Search: