@spam: The underground on 140 characters or less

This post is based on research to appear in CCS 2010 — an advance pdf is available under my publications

To understand spam propagating within Twitter, we plugged into Twitter’s streaming API and monitored tweets submitted to the site over the course of one month. Given that we have no pre-existing notion of what spam ‘looks like’, we use three blacklists to flag URLs previously identified in email spam: Google Safebrowsing, URIBL, and Joewein. Due to the potential of URL shortening provided by services such as bit.ly or other obfuscation techniques, we crawl each URL until reaching the final landing page and use the domain for determining blacklist presence.

During our monitoring we gathered over 200 million tweets from the stream and crawled 25 million URLs. Over 3 million tweets were identified as spam. Of the URLs crawled, 2 million were identified as spam by blacklists, 8% of all unique links. Of these blacklisted URLs, 5% were malware and phishing, while the remaining 95% directed users towards scams.

Spam Breakdown by Type

Twitter presents itself as an entirely different delivery mechanism from email, with a vastly different audience. For that reason, we analyzed the breakdown of spam on Twitter to understand which players were involved and how they correspond to spam directed at email. As shown in the figure below, many of the traditional email scams have found their way onto Twitter, but a new category purporting an easy solution to generating Twitter followers has appeared, largely directing users to phishing pages that steal Twitter credentials.

Abuse of Twitter Features

Given the limitation of 140 characters to attract a victim’s attention, Twitter scams have evolved to abuse Twitter’s core features such as @mentions, #hashtags, and RT @ retweets.

Callouts are mentions used to target specific users in order to infiltrate their feed and appear personalized. In our data set, roughly 10% of scams were advertised using personalized mentions, while only 3% of phishing/malware used the feature. An example would be: Win an iTouch AND a $150 Apple gift card @victim! http://spam.com

Retweet Hijacking is an attempt to abuse the credibility of other users to draw a wider audience or increase trust. Given a tweet from a trusted user such as @barackobama A great battle is ahead of us…, a spammer will prepend a link and retweet the original text: http://spam.com RT @barackobama A great battle is ahead of us…. Because modifying and retweeting is common behavior, there is no simple mechanism to detect forgeries or malicious behavior.

Retweet Purchasing relies on other trusted parties to retweet spam tweets. Services such as retweet.it purport to retweet a message 50 times to 2,500 Twitter followers for $5 or 300 times to 15,000 followers for $30. The accounts used to retweet are other Twitter members (or bots) who sign up for the retweet service, allowing their accounts to be used to generate traffic.

Trend Setting is an attempt to create a trending topic on Twitter by abusing hundreds of compromised/fake accounts all tweeting with the same #hashtag. We encountered a total of 12 different attempts to generate trends using roughly 2,000 accounts each, all purporting to provide users with more followers if they provide their account credentials. Of tweets in our data set, roughly 70% of all phishing tweets included a trend setting #hashtag.

Trend Hijacking allows spammers to ride on the success of currently trending topics, allowing spam tweets to be syndicated to the entire Twittersphere rather than a limited audience of followers. Of all the #hashtags we encountered in spam, roughly 86% were user-generated topics. An example would be Help donate to #haiti relief: http://spam.com.

How Successful is Twitter Spam?

Despite widespread abuse of Twitter by spammers, the current mechanisms in place to prevent spam are fairly limited. Twitter currently uses Google’s Safebrowsing API to block malicious links, simultaneously relying on account heuristics such as aggressive friending/unfriending and repeated tweets to detect spam behavior. Paired with a system designed for the dissemination of links and information, Twitter is an ideal propagation platform for spam.

To estimate Twitter clickthrough, we measure the ratio of clicks a link receives, reported by bit.ly compared to the number of tweets sent. Given the broadcast nature of tweeting, we measure reach as a function of both the total tweets sent t and the followers exposed to each tweet f, where reach equals txf . In the event multiple accounts with potentially variable number of followers all participate in tweeting a single URL, we measure total reach as the sum of each individual account’s reach. Averaging the ratio of clicks to reach for 245,000 bit.ly URLs, we find roughly 0.13% of spam tweets generate a visit, orders of magnitude higher when compared to clickthrough rates of 0.003-0.006% reported for spam email.

Twitter’s improved clickthrough rate compared to email has a number of explanations. First, users are faced with only 140 characters in which to base their decision whether a URL is spam. Paired with an implicit trust for accounts users befriend, increased clickthrough potentially results from a mixture of naivety and lack of information. This result highlights the need for social networks to quickly adapt to spam threats, adopting similar controls to email, though within a real-time framework.

Spam Breakdown by Type

Abuse of Twitter Features

How Successful is Twitter Spam?

One Comment