<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Robert Accettura&#039;s Fun With Wordage &#187; bayesian</title>
	<atom:link href="http://robert.accettura.com/blog/tag/bayesian/feed/" rel="self" type="application/rss+xml" />
	<link>http://robert.accettura.com</link>
	<description>Robert Accettura&#039;s Personal Blog on Web Development and Tech</description>
	<lastBuildDate>Thu, 09 Feb 2012 01:43:47 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<atom:link rel='hub' href='http://robert.accettura.com/?pushpress=hub'/>
<cloud domain='robert.accettura.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
		<item>
		<title>Bayesian Spam Filter Poisoning With RSS</title>
		<link>http://robert.accettura.com/blog/2007/01/29/bayesian-spam-filter-poisoning-with-rss/</link>
		<comments>http://robert.accettura.com/blog/2007/01/29/bayesian-spam-filter-poisoning-with-rss/#comments</comments>
		<pubDate>Tue, 30 Jan 2007 01:41:48 +0000</pubDate>
		<dc:creator>Robert</dc:creator>
				<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[Spam]]></category>
		<category><![CDATA[bayesian]]></category>
		<category><![CDATA[news]]></category>
		<category><![CDATA[spamassassin]]></category>
		<category><![CDATA[Thunderbird]]></category>

		<guid isPermaLink="false">http://robert.accettura.com/archives/2007/01/29/bayesian-spam-filter-poisoning-with-rss/</guid>
		<description><![CDATA[Overview Bayesian Filtering is a great method for fighting spam. Unlike rule based filtering which spammers can easily adapt to with simple modifications, Bayesian adapts with the spammers changes, making it much more difficult for them to defeat the filtering. &#8230; <a href="http://robert.accettura.com/blog/2007/01/29/bayesian-spam-filter-poisoning-with-rss/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<h3>Overview</h3>
<p><a href="http://www.paulgraham.com/spam.html">Bayesian Filtering</a> is a great method for fighting spam.  Unlike rule based filtering which spammers can easily adapt to with simple modifications, Bayesian adapts with the spammers changes, making it much more difficult for them to defeat the filtering.  As a result it&#8217;s used in server side mail filtering as well as client side filtering in various products including <a href="http://www.mozilla.com/en-US/thunderbird/">Mozilla Thunderbird</a>, <a href="http://www.spamassassin.org">SpamAssassin</a>, and <a href="http://spambayes.sourceforge.net">SpamBayes</a>.  Despite this level of &#8220;intelligence&#8221; it&#8217;s not foolproof.  Like anything that analyzes unsanitized input, its vulnerable to <a href="http://en.wikipedia.org/wiki/Bayesian_poisoning">poisoning</a>.  To be fair, there is a debate on if it <a href="http://sunbeltblog.blogspot.com/2006/08/does-bayesian-poisoning-exist-maybe.html">exists or not</a>.  I personally believe it does exist.</p>
<p><span id="more-1237"></span></p>
<h3>So What Is This &#8220;Poisoning&#8221; You Speak Of?</h3>
<p>Poisoning refers to spammers putting non-spam words (either gibberish, random words, or old texts) into spam.  This technique itself is nothing new.  This is a technique used for years to help get around spam filters.  This is why some of your spam may contain things like:</p>
<blockquote><p>
Everything you can imagine is real.<br />
What this country needs is a good five cent cigar.<br />
What the eye does not admire the heart does not desire.<br />
Action is coarsened thought thought becomes concrete, obscure, and unconscious.<br />
A man profits more by the sight of an idiot than by the orations of the learned.
</p></blockquote>
<p>The above comes from spam trying to pitch a Canadian pharmacy!  Doesn&#8217;t sound very medical does it?  That&#8217;s the point.  They then throw the url and a quick &#8220;buy pills&#8221; somewhere in there.</p>
<h3>What&#8217;s Now Going On</h3>
<p>My theory is that the new technique spammers seem to be taking on is to use RSS feeds as an input source to make spam look more legitimate and keep the content timely (to avoid filtering).  RSS is easy to retrieve, parse, and is extremely plentiful.  As a result it&#8217;s possible to have an endless sea of <a href="http://en.wikipedia.org/wiki/Salt_(cryptography)">salt</a> to try and get around the filters.</p>
<h3>Examples</h3>
<p>Here are a few examples I collected in about 10 minutes of skimming my spam folder only looking at titles for ones that look like they may have come from feeds.  Google searches seem to indicate most come from CNN RSS feeds.  To perform searches to find the origin you need to be a little creative and make use of Google&#8217;s cache, since an articles title could change through the life of the article.</p>
<p>I then decided to use Google Reader to display over 1,000 titles from the past week in my &#8220;General News&#8221; tag, this includes a few but not all of their feeds (mainly U.S, World News).  As a side note this category is somewhat of an antique, since I don&#8217;t read general news via RSS since I work for a <a href="http://www.cbsnews.com">news website</a>.  I get all the news I can tolerate from 9-5 <img src='http://robert.accettura.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> .  I&#8217;m also a feed junkie.</p>
<p>I clearly couldn&#8217;t find all within the range of a week and 3 feeds, but I did find enough to make me wonder.  The screenshots are below:</p>
<p><strong>U.S., Iraqi forces battle insurgents</strong><br />
<a id="p1255" rel="attachment" class="imagelink" href="http://robert.accettura.com/?attachment_id=1255" title="U.S., Iraqi forces battle insurgents"><img id="image1255" src="http://robert.accettura.com/wp-content/uploads/2007/01/us_iraqi_forces_battle_insurgents_lg.thumbnail.gif" alt="U.S., Iraqi forces battle insurgents" /></a><br />
Story Date: Tue, 9 Jan 2007<br />
Email Sent: Wed, 10 Jan 2007</p>
<p><strong>Rice &#8216;loves&#8217; Fox News; CBS anchor &#8216;decent guy&#8217;</strong><br />
<a id="p1254" rel="attachment" class="imagelink" href="http://robert.accettura.com/?attachment_id=1254" title="Rice 'loves' Fox News; CBS anchor 'decent guy'"><img id="image1254" src="http://robert.accettura.com/wp-content/uploads/2007/01/rice_loves_fox_news_lg.thumbnail.gif" alt="Rice 'loves' Fox News; CBS anchor 'decent guy'" /></a><br />
Story Date: Thu, 11 Jan 2007<br />
Email Sent: Fri, 12 Jan 2007</p>
<p><strong>Rebel  &#8216;We aided bin Laden escape&#8217;</strong><br />
<a id="p1253" rel="attachment" class="imagelink" href="http://robert.accettura.com/?attachment_id=1253" title="Rebel  'We aided bin Laden escape'"><img id="image1253" src="http://robert.accettura.com/wp-content/uploads/2007/01/rebel_we_aided_bin_laden_escape_lg.thumbnail.gif" alt="Rebel  'We aided bin Laden escape'" /></a><br />
Story Date: Thu, 11 Jan 2007<br />
Email Sent: Sun, 14 Jan 2007</p>
<p><strong>Madonna defends Rosie</strong><br />
<a id="p1252" rel="attachment" class="imagelink" href="http://robert.accettura.com/?attachment_id=1252" title="Madonna defends Rosie"><img id="image1252" src="http://robert.accettura.com/wp-content/uploads/2007/01/madonna_defends_rosie_lg.thumbnail.gif" alt="Madonna defends Rosie" /></a><br />
Story Date: Thu, 11 Jan 2007<br />
Email Sent: Fri, 12 Jan 2007</p>
<p><strong>Swank  &#8216;I am in a relationship</strong><br />
<a id="p1251" rel="attachment" class="imagelink" href="http://robert.accettura.com/?attachment_id=1251" title="Swank  'I am in a relationship'"><img id="image1251" src="http://robert.accettura.com/wp-content/uploads/2007/01/hilary_swank_yes_i_am_in_a_relationship_lg.thumbnail.gif" alt="Swank  'I am in a relationship'" /></a><br />
Story Date: Tue, 09 Jan 2007<br />
Email Sent: Wed, 10 Jan 2007</p>
<p><strong>Gwynn, Ripken in Hall, McGwire misses</strong><br />
<a id="p1250" rel="attachment" class="imagelink" href="http://robert.accettura.com/?attachment_id=1250" title="Gwynn, Ripken in Hall, McGwire misses"><img id="image1250" src="http://robert.accettura.com/wp-content/uploads/2007/01/gwynn_ripken_in_hall_lg.thumbnail.gif" alt="Gwynn, Ripken in Hall, McGwire misses" /></a><br />
Story Date: Tue, 09 Jan 2007<br />
Email Sent: Thu, 11 Jan 2007</p>
<p><strong>Court papers  Dancer cleared one Duke suspect</strong><br />
<a id="p1249" rel="attachment" class="imagelink" href="http://robert.accettura.com/?attachment_id=1249" title="Court papers  Dancer cleared one Duke suspect"><img id="image1249" src="http://robert.accettura.com/wp-content/uploads/2007/01/court_papers_dancer_cleared_one_duke_suspect_lg.thumbnail.gif" alt="Court papers  Dancer cleared one Duke suspect" /></a><br />
Story Date: Tue, 11 Jan 2007<br />
Email Sent: Fri, 12 Jan 2007</p>
<p>As you can see, many were sent the day after the story appeared in the feed.</p>
<p><em>I should note this is <strong>not</strong> the feed owners fault in any way, nor is there any reasonable effort they can make to stop or prevent such misuse.  No need to go after blog owners or news sites.  Most of them get spammed more than you.</em></p>
<p>Here&#8217;s a list of the emails I spotted for the past several days.  I&#8217;m not sure where a few of them came from (if anyone wants to dig deeper, feel free).  As of a week ago, several others could be found around the web by searching google and viewing the google cached version of some pages.  Headlines can change as a story evolves.  This further complicates this research:</p>
<ul>
<li>Here&#8217;s a list of news related subjects from spam emails:</li>
<li>Court papers  Dancer cleared one Duke suspect</li>
<li>Filing  Duke suspect just watched</li>
<li>Fortune  The 100 best companies to work for</li>
<li>Gwynn, Ripken in Hall, McGwire misses  MORE</li>
<li>Iranian officials detained in Iraq, U.S. official says</li>
<li>Kennedy threatens Bush Iraq plan</li>
<li>Madonna defends Rosie</li>
<li>Man in hot pants struts in boots, cheers city  MORE</li>
<li>Mom charged with stabbing kids</li>
<li>N.J. suspected as source of stench MORE</li>
<li>O&#8217;Reilly, Colbert on each&#8217;s shows.</li>
<li>Rebel  &#8216;We aided bin Laden escape&#8217;</li>
<li>Rice &#8216;loves&#8217; Fox News; CBS anchor &#8216;decent guy&#8217;</li>
<li>Sen. Johnson&#8217;s condition upgraded</li>
<li>Stem-cell funding passes House, faces veto threat</li>
<li>Swank  &#8216;I am in a relationship&#8217;</li>
<li>Teacher accused of taking improper photos found dead</li>
<li>U.S. gunships target al Qaeda suspects in Somalia</li>
<li>U.S., Iraqi forces battle insurgents</li>
<li>Witnesses  Al Qaeda targeted MORE</li>
</ul>
<h3>Outlook</h3>
<p>The potential for this to manifest itself more in the future seems somewhat high.  One could rather easily spider some blogging networks for a bunch of random blog RSS feeds to leach content rather than just the subject.  They would resemble legitimate email even more than a news site could.</p>
<p>Will this seriously harm spam filters?  I doubt it.  It&#8217;s not drastically different from previous methods.  What&#8217;s so interesting is that they seem to be tapping a new fresh data source.</p>
<p>It&#8217;s hard to say how widespread this is exactly.  I&#8217;ve got at least a dozen in the past few days.  All from different sources, and even to different addresses.  Because of how botnets can be used to send spam, it&#8217;s somewhat difficult to tell if they come from the same origin.</p>
<p>This may even <em>help</em> in the war on spam.  Because they are distributing copyrighted information, perhaps (I&#8217;m not a lawyer) this might qualify as copyright infringement.  AOL, whose parent company like CNN is Time Warner may be interested.  Microsoft has MSNBC to look out for.  That&#8217;s two giant email providers who have sued spammers before, with news networks that have an online presence and may be ripped for the purpose of spamming.</p>
<p>What&#8217;s interesting about the above emails is that most look strikingly similar in terms of actual contents.  The titles also have  the theme of being from RSS feeds.  The headers indicate different origins, making it likely they were sent using a botnet, but have the same master.</p>
<h3>Conclusion</h3>
<p>The need for real-time blacklisting may become more of a necessity to be truly effective in the long run.  Similar to how Phishing is being handled.  The danger might not be spam getting through, but legitimate email looking more like the new spam and being caught.</p>
<p>I&#8217;d love to see someone like Google or Yahoo do an analysis of spam in comparison to their search indexes.  I can manually do only so many, and visually scan for relevant information.  I&#8217;m sure with Gmail or Yahoo Mail&#8217;s spam, and Google or Yahoo&#8217;s index, there could be some real insight.  The people at Google have already done some decent work on <a href="http://www.google.com/safebrowsing/report_phish/">Phishing</a> and <a href="http://www.mattcutts.com/blog/info-about-malware-warnings-and-how-to-appeal-them/">Malware</a>.  I think spam wouldn&#8217;t be far off.  Using what I could access from Google was very valuable in seeing how spammers are operating.  I bet they can see more than I can.</p>
<h3>Further Research</h3>
<p>I do have a copy of the emails referenced in this post.  I am not making them publicly accessible to prevent some immature wanna-be hacker from attacking someone&#8217;s PC because their IP address was previously issued to an infected computer.  By the time I strip all the headers out, they aren&#8217;t really any more useful than what&#8217;s already posted here.
<div id="rja_commentCountImage"><a href="http://robert.accettura.com/archives/2007/01/29/bayesian-spam-filter-poisoning-with-rss/#comments"><img src="http://robert.accettura.com/wp-content/commentCount/2007/01/a9eb812.gif" alt="Comment Count" style="border:0;" /></a></div>
]]></content:encoded>
			<wfw:commentRss>http://robert.accettura.com/blog/2007/01/29/bayesian-spam-filter-poisoning-with-rss/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Thunderbird 2.0 Beta 1</title>
		<link>http://robert.accettura.com/blog/2006/12/17/thunderbird-20-beta-1/</link>
		<comments>http://robert.accettura.com/blog/2006/12/17/thunderbird-20-beta-1/#comments</comments>
		<pubDate>Mon, 18 Dec 2006 03:37:29 +0000</pubDate>
		<dc:creator>Robert</dc:creator>
				<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[bayesian]]></category>
		<category><![CDATA[Spam]]></category>
		<category><![CDATA[Thunderbird]]></category>

		<guid isPermaLink="false">http://robert.accettura.com/archives/2006/12/17/thunderbird-20-beta-1/</guid>
		<description><![CDATA[Thunderbird 2.0b1 is out, I updated a few days ago. I really love the new tagging functionality. Being able to create your own tags makes organizing mail about 100X easier. The presets of 1.5 just weren&#8217;t enough. As far as &#8230; <a href="http://robert.accettura.com/blog/2006/12/17/thunderbird-20-beta-1/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.mozilla.com/en-US/thunderbird/releases/2.0b1.html">Thunderbird 2.0b1</a> is out, I updated a few days ago.  I really love the new tagging functionality.  Being able to create your own tags makes organizing mail about 100X easier.  The presets of 1.5 just weren&#8217;t enough.  As far as the UI goes, I was initially not to fond of the earth tone coloring, but I think the new icons are starting to grow on me.  There is also a new phishing detection (similar to Firefox).  To test it, I looked in my spam folder for a few phishing emails to test the new filter against.  So far so good.</p>
<p>The only downsides thus far is bayes spam filtering is not performing as good as it did on 1.5.  I reset things, hopefully after a few days of learning it will resolve itself.  Or perhaps it&#8217;s a lingering regression in 2.0.  It is after all still in beta.  The other is the new mail notification doesn&#8217;t seem to open mail if you click on it.  I was hoping it would open email when clicked.  Perhaps it&#8217;s just not obvious where to click.  The appearance and effect seems to be much better now.</p>
<p>It&#8217;s hard to write even a mini-review of beta software, since it is just beta and things are incomplete or subject to change.  I plan to write more on it closer to the 2.0 release.  Despite it&#8217;s lower profile development (compared to Firefox), and more subtle changes) it&#8217;s really evolving.  The changes made really do make it a much better experience.
<div id="rja_commentCountImage"><a href="http://robert.accettura.com/archives/2006/12/17/thunderbird-20-beta-1/#comments"><img src="http://robert.accettura.com/wp-content/commentCount/2006/12/285ab94.gif" alt="Comment Count" style="border:0;" /></a></div>
]]></content:encoded>
			<wfw:commentRss>http://robert.accettura.com/blog/2006/12/17/thunderbird-20-beta-1/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>

