1

The Spam Ecosystem: New Profits From Political Censorship

This post is based on research from “Adapting Social Spam Infrastructure for Political Censorship” published in LEET 2012 – a pdf is available under my publications. Any views or opinions discussed herein are my own.

In recent years social networks have emerged as a significant tool for both political discussion and dissent. Salient examples include the use of Twitter for a town hall with the Whitehouse. The Arab Spring that swept over the Middle-East also embraced Twitter and Facebook as a tool for organization, while Mexicans have adopted social media as a means to communicate about violence at the hands of drug cartels in the absence of official news reports. Yet, the response to the growing importance of social networks in some countries has been chilling, with the United Kingdom threatening to ban users from Facebook and Twitter in response to rioting in London and Egypt blacking out Internet and cell phone coverage during its political upheaval. While nation states can exert their control over Internet access to outright block connections to social media, parties without such capabilities may still desire to control political expression.

Recently, I had a chance to retroactively examine an attack that occurred on Twitter surrounding the Russian parliamentary elections. For a little back story on the attack, upon the announcement of the Russian election results, accusations of fraud quickly followed and protesters organized at Moscow’s Triumfalnaya Square. When discussions of the election results cropped up on Twitter, a wave of bots swarmed the hashtags that legitimate users were using to communicate in an attempt to control the conversation and stifle search results related to the election.

In total, there were 46,846 Twitter accounts discussing the election results. It turns out, 25,860 of these were controlled by attackers in order to send 440,793 misleading or nonsensical tweets, effectively shutting down conversations about the election. While abusing social networks for political motives is nothing new, the attack is noteworthy because (1) it relied on fraudulent accounts purchased from spam-as-a-service marketplaces and (2) it relied on over 10,000 compromised hosts located around the globe. These marketplaces — such as hxxp://buyaccs.com/ — are traditionally used to outfit spam campaigns, freeing spammers from registering accounts in exchange for a small fee. However, this attack shows that malicious parties can easily adapt these services for other forms of attacks, including political censorship and astroturfing.

Below is a preview of our results. For more details, check out the full LEET 2012 paper.

Tweets

We identify 20 hashtags that correspond with the Russian election, the top 10 of which are shown below.

Hashtag Translation Accounts
чп Catastrophe 23,301
6дек December 6th 18,174
5дек December 5th 15,943
выборы Election 15,082
митинг Rally 13,479
триумфальная Triumphal 10,816
победазанами Victory will be ours 10,380
5dec December 5th 8,743
навальный Alexey Navalny 8,256
ridus Ridus 6,116

We aggregate all of the accounts that participated in the hashtags and then segment them into accounts that are now suspended by Twitter (25,860) and those that appear to be legitimate (20,986). We then aggregate all of the tweets sent by these accounts during the attack. As shown in the following figure, legitimate conversations (black line) appear diurnally over the course of 2 days from December 5th — December 6th. Conversely, the attack (blue line) occurs in two distinct waves, outproducing tweets compared to legitimate users at certain periods.

If we restrict our analysis to only tweets with the relevant election hashtags, the impact of the attack is even starker.

Accounts

One of the most interesting aspects of the attack was the accounts involved. Accounts were registered in four distinct waves, where each wave has a uniquely formatted account profile that can be captured by a regular expression. We call these waves Type-1 through Type-4. Accounts were acquired as far back as seven months proceeding the attack. For most of the time, the accounts were dormant, though some did come alive at various intervals to tweet politically-orientated tweets prior to the attack.

Interestingly, all of the accounts were registered with mail.ru email addresses, which allows us to extend our analysis one step further. We take the regular expressions that capture the accounts used in the attack and apply it to all Twitter accounts registered with mail.ru emails in the last year. From this, we identify roughly 975,000 other spam accounts, 80% of which remain dormant with 0 following, 0 followers, and 0 tweets. These accounts were registered in disparate bursts over time and likely all belong to a single spam-as-a-service program. The registration times of these presumed spam accounts are shown below. Legitimate accounts show a relative trend in growth, while the anomalous bursts in registrations are attributed to a malicious party registering accounts to later sell.

IP Addresses

We examine one final aspect of the attack: the geolocation of IPs used to access accounts tweeting about the election. Legitimate accounts tend to be accessed from either the United States (20%) or Russia (56%). In contrast, the accounts controlled by the attacker were accessed from hosts located around the globe; only 1% of logins originated from Russia. Furthermore, 39% of these IP addresses appear in blacklists, indicating many of the hosts are used simultaneously in other spam-related activities. Combined, this information reveals that the attackers relied on compromised hosts which again may have been purchased from the spam-as-a-service underground.

1

Suspended Accounts In Retrospect

This post is based on research from IMC 2011 – a pdf is available under my publications. Any views or opinions discussed herein are my own and are based solely on research I conducted prior to working at Twitter.

As Twitter continues to grow in popularity, so does the marketplace for abusing Twitter as a service for spamming. In order to understand this phenomenon, we tracked the behavior of 1.1 million accounts suspended by Twitter for disruptive activities (e.g. spamming, aggressive following) over the course of seven months. In the process, we collected a dataset of 80 million tweets sent by spam accounts in addition to 37.8 million URLs presumed to direct to spam. What follows is an analysis of the abuse of online social networks through the lens of the tools, techniques, and support infrastructure spammers rely upon.

Our Dataset and All Its Caveats

Our dataset was derived from Twitter’s garden hose which provides a sample of all tweets appearing on Twitter. More precisely, we received 150 tweets/second, amounting to 12 million tweets per day in the absence of network outages or errors. Rather than receive generic tweets, we specifically requested tweets that contain URLs, simply because they are more interesting from a spam perspective; they have a clear monetization angle we could manually analyze. In total, we collected 1.8 billion tweets from August 2010 through March 2011, only 80 million of which turned out to be spam. Here was our daily sample size, with breaks indicating an outage in our collection (oops, measurement is hard!):

Due to rate limiting performed by Twitter, our sample rate was strictly decreasing; we were limited to 150 tweets/second, while Twitter continued to grow in volume. As a result, the total fraction of tweets with URLs we received dropped from 90% at the onset of our study down to 60% at its completion:

For further details on our collection methodology, validation, and sampling, check out the paper.

State of Twitter Spam – How Much?

Using our dataset, we counted the number of spam tweets sent by accounts suspended by Twitter each day from August 2010 through March 2011. The results are shown here:

Our calculations are a strict lower-bound as we rely on Twitter to identify spam; something we know is imperfect. Based on manual analysis, we estimated that Twitter caught 37% of spam, which means the actual number of spam tweets per day is likely much higher. Nevertheless, we can discern that at least half a million spam tweets are sent each day. Interestingly, the highest volume of spam preceded the holiday season; even spammers have gift suggestions for you and your family.

One noteworthy observation is that while the total volume of spam appears to be flat, our sample size was decreasing. This would indicate that spam on Twitter was actually increasing over time.

Spam Accounts – How Many, How Active, How Long?

To be continued…

Working @Twitter & Researching 4 Facebook

After a bit of juggling and finishing up research at Berkeley for the Spring (fingers crossed for IMC 2011), I landed an internship at Twitter over the Summer. My goal is to examine spam that is targeting their systems and to see whether any of the research ideas coming out of our Berkeley group are transferable. Plus, I get sweet sweet access to data.

Also, to my surprise, I won a fellowship from Facebook to continue performing security research into social networks (starting next Fall). The meat of my proposal include understanding malicious application usage, account abuse, and characterizing the monetization of social network spam. I’m delighted my proposal got some traction; now to just get the work done next year.

1

Monarch: Preventing Spam in Real-Time

This post is based on research from Oakland Security & Privacy 2011 – a pdf is available under my publications

Recently we presented our research on Monarch, a real-time system that crawls URLs as they are submitted to web services and determines whether the URLs direct to spam. The system is geared towards environments such as email or social networks where messages are near-interactive and accessed within seconds after delivery.

The two major previous approaches for detecting and filtering spam include domain and IP blacklists for email and account-based heuristics in social networks which attempt to detect abusive user behavior. However, these approaches fail to protect web services. In particular, blacklists are too inaccurate and slow in listing new spam URLs. Similarly, account-based heuristics incur delays between a fraudulent account’s creation and its subsequent detection due to the need to build a history of (mis-)activity. Furthermore, these heuristics for automation fail to detect compromised accounts that exhibit a mixture of spam and benign behaviors. Given these limitations, we seek to design a system that operates in real-time to limit the period users are exposed to spam content; provides fine-grained decisions that allow services to filter individual messages posted by users; but functions in a manner generalizable to many forms of web services.

To do this, we develop a cloud-based system for crawling URLs in real-time that classifies whether a URL’s content, underlying hosting infrastructure, or page behavior exhibits spam properties. This decision can then be used by web services to either filter spam or as a signal for further analysis.

Design Goals

When we developed Monarch, we had six principles that influenced our architecture and approach:

  1. Real-time results. Social networks and email operate as near-interactive, real-time services. Thus, significant delays in filtering decisions degrade the protected service.
  2. Readily scalable to required throughput. We aim to provide viable classification for services such as Twitter that receive over 15 million URLs a day.
  3. Accurate decisions. We want the capability to emphasize low false positives in order to minimize mistaking non-spam URLs as spam.
  4. Fine-grained classification. The system should be capable of distinguishing between spam hosted on public services alongside non-spam content (i.e., classification of individual URLs rather than coarser-grained domain names).
  5. Tolerant to feature evolution. The arms-race nature of spam leads to ongoing innovation on the part of spammers’ efforts to evade detection. Thus, we require the ability to easily retrain to adapt to new features.
  6. Context-independent classification. If possible, decisions should not hinge on features specific to a particular service, allowing use of the classifier for different types of web services.

Architecture

The architecture for Monarch consists of four components. First, messages from web services (tweets and emails in our prototype) are inserted into a dispatch Kestrel queue in a phase called URL Aggregation. These are then dequeued for Feature Collection, where a cluster of EC2 machines crawls each URL to fetch the HTML content, resolve all redirects, monitor all IP addresses contacted, and perform a number of host lookups and geolocation resolution. We optimize feature collection to include caching and whitelisting of popular benign content. These features are then stored in a database, which is later used during Feature Extraction to transform the data into meaningful binary vectors. These are then supplied to Classification. We obtain a labeled dataset from email spam traps as well as blacklists (our only means of obtaining a ground truth set of spam on Twitter). Using a distributed logistic regression with L1-regularization, which we detail in the paper, we are able to reduce from 50 million features down to 100,000 of the most meaningful features and build a model of spam in 45 minutes for 1 million samples. During live operation, we simply use this model to classify the features of a URL. Overall, it takes roughly 6 seconds from insertion into the dispatch queue to obtain a final decision for whether a URL is spam, with network delay accounting for the majority of overhead.

Results

  • Training on both email and tweets, we are able to generate a unified model that correctly classifies 91% of samples, with 0.87% false positives and 17.6% false negatives.
  • Throughput of the system is 638,000 URLs/day when running on 20 EC2 instances.
  • Decision time for a single URL is ~6 seconds

One of the unexpected results is that Twitter spam appears to be independent from email spam, with different campaigns occurring in both services simultaneously. This seems to indicate the actors targeting email haven’t modified their infrastructure to attack Twitter yet, though this may change over time.

Feedback

There remain a number of challenges in running a system like Monarch that are discussed in the paper as well as pointed out by other researchers.

  • Feature Evasion: Spammers can attempt to game the machine learning system. Given the real-time feedback for whether a URL is spam, they can attempt to modify their content or hosting to avoid detection.
  • Time-based Evasion: URLs are crawled immediately upon their submission to the dispatch queue. This creates a time of click, time of use challenge where spammers can present benign content upon sending an email/tweet, but then change the content to spam after the URL is cleared.
  • Crawler Evasion: Given we operate on a limited IP space and use single browser type, attackers can fingerprint both our hosting and browser client. They can then redirect our crawlers to benign content, while sending legitimate visitors to hostile content.
  • Side effects: Not all websites adhere to the standard that GET requests should have no side effects. In particular, subscribe and unsubscribe URLs as well as advertisements may have side effects introduced by our crawler.

Other interesting questions also remain to be answered. In particular, it would be useful to understand how accuracy performs over time on a per campaign basis. Some campaigns may last a long time, increasing our overall accuracy, while quickly churning campaigns that introduce new features may result in lower accuracy. Similarly, it would be useful to understand whether the features we identify appear in all campaigns (and are long lasting), or whether we are able to quickly adapt to the introduction of new features and new campaigns.