Defining a Taxonomy of Social Network Spam

In the process of working on my thesis, I’ve had to write some new background content on the taxonomy of social network spam. I figured I would share these ideas here, since the probability of someone reading my search-indexed blog >> than the probability of someone reading a 150, non-indexed document. As usual, any views or opinions discussed herein are my own.

As the underground economy adapts its strategies to target users in social networks, attacks require three components: (1) account credentials, (2) a mechanism to engage with legitimate users (i.e. the victims that will be exploited to realize a profit), and (3) some form of monetizable content. The latter is typically a link that redirects a victim from the social network to a website that generates a profit via spamvertised products, fake software, clickfraud, banking theft, or malware that converts a victims machine or assets (e.g. credentials) into a commodity for the underground economy. With respect to Twitter, the underpinnings of each of these components can be outlined as follows:

spam_taxonomy_small

What becomes apparent from this taxonomy is that, while there are several ways to engage with victims (and more constantly emerge as new features are added — such as Vine), the ingress and egress points of abuse are much fewer. For this reason, I typically advocate for anti-spam teams to develop URL-based defenses and at-registration time defenses. Strangling those two choke points collapses all the other pain points of social network spam and abuse which are arguably harder to solve given the diverse ways legitimate users engage one another within social networks.

The rest of this post spends a little time defining the different components of this abuse taxonomy.

Credentials — The Ingress Point

In order to interact users in a social network, criminals must first obtain credentials for either new or existing accounts. This has lead to a proliferation of fraudulent accounts — automatically generated credentials used exclusively to disseminate scams, phishing, and malware — as well as compromised accounts –- legitimate credentials that have fallen into the hands of miscreants, which criminals repurpose for nefarious ends. Notable sources of compromise include the brute force guessing of weak passwords, password reuse with compromised websites, as well as worms or phishing attacks that propagate within the network.

Engaging Victims

Any of the multitude of features on Twitter can be targets of abuse in a criminal’s quest for drawing an audience. While its possible to solve one facet of abuse, criminals are constantly evolving how they engage with users to leverage new features added to social networks as well as to adapt to defense mechanisms employed by online social network operators. The result is a reactive development cycle that never affords defenders any reprieve. To illustrate this point, here are just some ways in which criminals engage with users.

Mention Spam consists of sending an unsolicited @mention or @reply to a victim, bypassing any requirement of sharing a social connection with a victim. Spammers can either initiate a conversation or join an existing conversation to appear in the expanded list of tweets associated with a conversation between a victim and her followers.

Direct Message Spam is identical to mention spam, but requires that a criminal’s account be followed by a victim. As such, DM spam is typically used when an account has become compromised due to the low rate of fraudulent accounts (11% — “Suspended Accounts in Retrospect”) that form relationships with legitimate users.

Trend Poisoning relies on embedding popular #hashtags in a spam tweet, allowing the tweet to appear in real-time searches about breaking news and world events performed by victims. Even relevance-based searches can be gamed by inflating the popularity of a spam account or tweet, similar to search engine optimization.

Search Poisoning is identical to trend poisoning, but instead of emerging topics typified by #hashtags, spammers embed specific keywords/brands in their tweets such as “viagra” and “ipad”. From there, users that search for information relevant to a keyword/brand will be exposed to spam.

Fake Trends leverage the availability of thousands of accounts under the control of a single criminal to effectively generate a new trend. From there, victims looking at emerging content will be exposed to the criminal’s message.

Follow Spam occurs when criminal leverages an account to generate hundreds of relationships with legitimate users. The aim of this approach is to either have a victim reciprocate the relationship or at least view the criminal’s account profile which often has a URL embedded in its bio.

Favorite Spam relies on abusing functionality on Twitter which allows a user to favorite, or recommend, a tweet. Criminals will mass-favorite tweets from victims in the hopes they either reciprocate a relationship or view the criminal’s account profile, just like follow spam.

Fake Followers are distinct from follow spam, in that a criminal purchases relationships from the underground economy. The goal here is to inflate the popularity of a criminal’s account (often for SEO purposes).

Retweet Spam entails hundreds of spam accounts all retweeting another (spam) account’s tweet (often for SEO purposes).

Monetizing Victims

Profit lies at the heart of the criminal abuse ecosystem. Monetization strategies form a spectrum between selling products to a user with their consent to stealing from a victim without consent. In order to monetize a victim, users are funneled from Twitter to another website via a link. The exception to this rule is when abuse lacks a clear path for generating a profit. Examples of this are celebrities who buy fake followers to inflate their popularity (thus never requiring a link to achieve a payout — the payout is external to Twitter) as well as politically-motivated attacks such as censoring speech or controlling the message surrounding emerging trends (where the payout is political capital or damage control). While the latter attacks are realistic threats, the vast majority of abuse currently targeting social networks is more criminal in nature.

Spamvertised Products include advertisements for pharmacuticals, replica goods, and pirated software. Spam in this case is a means to an end to getting users to willingly buy products, freely offering their credit card information in return for a product.

Fake Software includes any malware or webpage that prompts a user to buy ineffectual software. The most prominent approach here is selling rogue antivirus, where users are duped into paying an annual or lifetime fee in return for “anti-virus” software that in fact provides no protection.

Clickfraud generates revenue by compromising a victim’s machine or redirecting their traffic to simulate legitimate traffic to pay-per-click advertisements. These ads typically appear on pages controlled by miscreants, while the ads are syndicated from advertising networks such as Google AdSense. Money is thus siphoned from advertisers into the hands of criminals.

Banking Theft, epitomized by information stealers such as Zeus or SpyEye, relies on installing malware on a victim’s machine or phishing their credentials in order to harvest sensitive user data including documents, passwords, and banking credentials. A criminal can then sell access to these accounts or liquidate the account’s assets.

Underground Infrastructure is the final source of potential profit. Instead of directly going after assets controlled by a victim (e.g. wealth, traffic, credentials), criminals can sell access to a victim’s compromised machine and convert it into a proxy or web host. Alternatively, criminals can sell installs of malware to the pay-per-install market or exploit-as-a-service market, whereby another criminal that specializes in one of the aforementioned monetization techniques utilizes the compromised machine, paying a small finders fee to the criminal who actually compromises a host.

Summary

The process of monetizing victims in social networks is a complex chain of dependencies. If any component of that chain should fail, spam and abuse cannot be profitable. To simplify the abuse process for spammers, an underground economy has emerged that connects criminals with parties selling a range of specialized products and services including spam hosting, CAPTCHA solving services, pay-per-install hosts, and exploit kits. Even simple services such as garnering favorable reviews or writing web page content are for sale.

Specialization within this ecosystem is the norm. Organized criminal communities include carders that siphon credit card wealth; email spam affiliate programs; and browser exploit developers and traffic generators. These distinct roles allow miscreants to abstract away certain complexities of abuse, in turn selling their speciality to the underground market for a profit.

Research Survey: Social Network Abuse

I decided to compile a list of social networking papers that I’ve read. The list is likely incomplete, but gives shape to the current research pushes surrounding social network spam and abuse. Whenever appropriate, I detail the methodology for how a study was conducted; most data collection techniques carry an inherent bias that is worth being forthright about.

Social Network Spam and Abuse: Measurement and Detection

The following is a list of academic papers on the topics of social network spam and abuse. In particular, the papers cover (1) how spammers monetize social networks, (2) how spammers engage with social network users, and (3) how to detect both compromised and fraudulent accounts.

Detection falls into two categories: at-signup detection which attempts to identify spam accounts before an account takes any actions visible to social network users; and at-abuse detection which relies on identifying abusive behavior such as posting spam URLs or forming too many relationships. At-signup detection has yet to receive much academic treatment (mostly a data access problem), while at-abuse detection has been conducted based on tweet content; URL content and redirects; and account behaviors such as the frequency of tweeting or participation in trending topics. Papers that detect spam accounts based on the social graph are detailed in the following section.

  • [July, 2010] Detecting spammers on twitter: The authors develop a classifier to detect fraudulent accounts based on the fraction of URLs in tweets; the fraction of tweets with hashtags; account age; follower-following ratios; and other account-based features. The most discriminative features were the number of URLs sent and an account’s age.
    Dataset: 8,207 manually labeled Twitter accounts
    Time period: December 8, 2008 — September 24,2010
    Source: Crawl of all accounts with users IDs less than 80 million; filtered to only include accounts that tweet topics including (1) #musicmonday, (2) Boyle, (3) Jackson.
  • [October, 2010] @spam: The underground on 140 characters or less: Our study of compromised Twitter accounts, the clickthrough rate on spam tweets, and the ineffectiveness of blacklists at detecting social network spam in a timely fashion. Detailed summary here.
    Dataset: 3 million spam tweets classified by whether the URL in the tweet was blacklisted by URIBL, Joewein, or Google Safebrowsing. Classification includes the initial URL, all redirect URLs, and the final landing URL.
    Time period: Janurary, 2010 — February, 2010
    Source: Streaming API, predicated on containing URLs
  • [November, 2010] Detecting and characterizing social spam campaigns: The authors develop a classifier to detect Facebook spam accounts based on the bursty nature of spam campaigns (e.g. many messages sent in a short period) and diverse source of accounts (e.g. multiple accounts coordinating together, typically sending similar text). Surprisingly, 97% of the accounts identified were suspected of being compromised rather than fraudulent. The most popular spam types were “someone has a crush on you”-scams, ringtones, and pharma spam.
    Dataset: 187M wall posts, 212,863 of which are detected as spam sent by roughly 57,000 accounts (validated via blacklists or an obfuscation heuristic)
    Time period: January, 2008 — June, 2009
    Source: Crawl of Facebook networks, described in detail in User Interactions in Social Networks and their Implications
  • [December, 2010] Detecting spammers on social networks: The authors develop a classifier based on the fraction of messages containing URLs; the similarity of messages; total messages sent; and the total number of friends. The spam sent from detected accounts include dating, porn, ad-based monetization, and money making scams.
    Dataset: 11,699 Twitter accounts; 4,055 Facebook accounts
    Time period: June 6, 2009 — June 6, 2010
    Source: Spammers messaging or forming relationships with 300 passive honeypots each for Facebook and Twitter
  • [April, 2011] Facebook immune system: A brief summary of Facebook’s system for detecting spam. There is no mention of the volume of spam Facebook receives (though SEC filings say its roughly 1%) or the threats the service faces.
  • [May, 2011] Design and Evaluation of a Real-Time URL Spam Filtering Service: Our study on developing a classifier based on the content of URLs posted to social networks and email. Features include ngrams of the posted URL, interstitial redirects, and the final landing URL; HTML content; pop-ups and JavaScript event detection; headers; DNS data; and geolocation and routing information. Detailed summary here.
    Dataset: 567,784 spam URLs posted to Twitter (as identified by Google Safebrowsing, SURBL, URIBL, Anti-Phishing Work Group, and Phishtank); 1.25 million spam URLs in emails (as identified by spam traps)
    Time period: September, 2010 — October, 2010
    Source: Email spam traps; Twitter Streaming API
  • [November, 2011] Suspended accounts in retrospect: An analysis of twitter spam: Our analysis of fraudulent Twitter accounts, the tools used to generate spam, and the resulting spam campaigns and monetization strategies. Detailed summary here.
    Dataset: 1,111,776 accounts suspended by Twitter
    Time period: August 17, 2010 — March 4, 2011
    Source: Streaming API, predicated on containing URLs
  • [February, 2012] Towards Online Spam Filtering in Social Networks: The authors build a detection framework for Twitter spam that hinges on identifying duplicate content sent from multiple accounts. A number of other account-based features are used as well.
    Dataset: 217,802 spam wall posts from Facebook (from previous study); 467,390 spam tweets as identified by URL shorteners no longer serving the URL
    Time period: Janurary, 2008 — June, 2009 for Facebook; June 1, 2011 — July 21, 2011 for Twitter
    Source: Facebook Crawl; Twitter API predicated on trending topics
  • [February, 2012] Warningbird: Detecting suspicious urls in twitter stream: In contrast to content-based spam detection or account-based spam detection, the authors rely on the URL redirect chain used to cloak spam content as a detection mechanism. The core idea is that for cloaking to occur (e.g. when an automated crawler is shown one page and a victim a second, distinct page), some site must perform the multiplexing and that site is frequently re-used across campaigns. The final features used for classification include the redirect chain length, position of URL in redirect chain, distinct URLs leading to interstitial, and distinct URLs pointed to by interstitial. A number of account-based features are also used including age, number of posters, tweet similarity, and following-follower ratio.
    Dataset: 263,289 accounts suspended by Twitter and the URLs
    Time period: April, 2011 — August, 2011
    Source: Streaming API predicated on containing URLs
  • [December, 2012] Twitter games: how successful spammers pick targets: The authors examine how spammers engage with Twitter users (e.g. mentions, hashtags, retweets, social graph) and the types of spam sent. The vast majority of spammers are considered unsuccessful based on the speed they are suspended; more long lasting accounts rely on Twitter’s social graph and unsolicited mentions to spam.
    Dataset: 82,274 accounts suspended by Twitter
    Time period: November 21, 2011 — November 26, 2011
    Source: Streaming API
  • [February, 2013] Social Turing Tests: Crowdsourcing Sybil Detection: The authors examine the accuracy of using crowdsourcing (specifically Mechanical Turk) to identify fraudulent accounts in social networks in addition to the best criteria for selecting experts.
    Dataset: 573 fraudulent Facebook accounts with profile images appearing in Google Image Search, later deactivated by Facebook; 1082 fraudulent Renren accounts deactivated and provided by Renren
    Time period: December, 2011 — Janurary, 2012
    Source: Facebook Crawler; Renren data sharing agreement

Social Graphs of Spammers: Measurement and Detection

The following are a list of papers that leverage discrepancies in how legitimate users form social relationships in contrast to spammers as a detection mechanism. Many of these systems hinge on the assumption that spammers have a difficult time coercing legitimate users into following or befriending them, or alternatively, compromising accounts to seed relationships with. In practice, automated follow-back accounts and cultural norms may muddle their application. (For instance in Brazil and Turkey, most relationships are reciprocated; the social graph is more a status symbol than an act of curating interesting content or signifying trust.)

  • [September, 2006] Sybilguard: defending against sybil attacks via social networks: SybilGuard is an early work in detecting sybil accounts in social networks, though premairly for peer-to-peer networks and not online social networks. The primary assumption is that there is a small cut that exists between the legitimate social graph and spammers (who create a graph amongst themselves). If this is true, then the mixing time of random walks started from the legitimate network should rarely end in the sybil region due to limited paths. (Existing work including Measuring the mixing time of social graphs shows mixing times in online social networks are slower in practice than expected and thus some legitimate users may be construde as sybils due to slow mixing)
  • [February, 2009] Sybilinfer: Detecting sybil nodes using social networks: SybilInfer is identical in reasoning to SybilGuard, but the way in which random walks are performed differs, offering a better performance bound.
  • [June, 2010] SybilLimit: A near-optimal social network defense against sybil attacks:SybilLimit is a follow-on work to SybilGuard, improving performance guarantees.
  • [September, 2011] Spam filtering in twitter using sender-receiver relationship: The authors present a spam detection system for unsolicted mentions whereby the distance between users is used to classify communication as spam or benign.
    Dataset: 308 Twitter spam accounts posting 10K tweets
    Time period: February, 2011 — March, 2011
    Source: User reported spam accounts to @spam Twitter handle.
  • [September, 2011] Die Free or Live Hard? Empirical Evaluation and New Design for Fighting Evolving Twitter Spammers: The authors develop a classifier of spam accounts (though biased towards likely phished accounts) with network-based features including clustering, the bi-directionality of relationships, and betweenness centrality.
    Dataset: 2,060 accounts posting phising URLs on Twitter (from the perspective of Capture-HPC and Google Safebrowsing; includes redirects in blacklist check); drawn from a sample of 485,721 accounts
    Time period: Unspecified
    Source: Breadth first search over Twitter, seeded from Streaming API
  • [November, 2011] Uncovering Social Network Sybils in the Wild: The authors develop a sybil detection scheme based on clustering coefficients between accounts and the rate of incoming and outgoing relationship formation requests. A larger dataset is then used to understand how spammers form social relationships, where the authors find the vast majority of spam accounts do not form relationships amongst themselves.
    Dataset: 1000 fraudulent Renren accounts for classification; 660,000 fraudulent Renren accounts for study.
    Time period: Circa 2008 — February, 2011
    Source: Data sharing agreement with Renren
  • [April, 2012] Understanding and combating link farming in the twitter social network The authors examine how spammers form relationships in Twitter and find in 2009, the vast majority of spammers followed and were followed by ‘social capitalists’; legitimate users who automatically reciprocate relationships.
    Dataset: 41,352 suspended Twitter accounts that posted a blacklisted URL
    Time period: August, 2009 crawl; February, 2011 suspension check
    Source: Previous crawl of Twitter social graph in 2009
    Note: Lists of these users are available on blackhat forums; simply do a search for ‘twitter followback list’. Examples:

    • hxxp://www.blackhatworld.com/blackhat-seo/social-networking-sites/358556-free-list-18k-followback-twitter-users.html
    • hxxp://http://www.blackhatworld.com/blackhat-seo/social-networking-sites/365067-fresh-35k-twitter-followback-list.html
  • [April, 2012] Analyzing spammers’ social networks for fun and profit: a case study of cyber criminal ecosystem on twitter: The authors examine the social graph of a small subset of Twitter spammers (or compromised users) and determine that a large portion of their following arises from ‘social butterflies’ (e.g. ‘social capitalists’ in similar studies)
    Dataset: 2,060 accounts posting phising URLs on Twitter (from the perspective of Capture-HPC and Google Safebrowsing; includes redirects in blacklist check); drawn from a sample of 485,721 accounts
    Time period: Unspecified
    Source: Breadth first search over Twitter, seeded from Streaming API
  • [April, 2012] Aiding the Detection of Fake Accounts in Large Scale Social Online Services: The authors develop a SybilGuard-like algorithm and deploy it on the Tuenti social network, detecting 200K likely sybil accounts.
  • [August, 2012] Poultry Markets: On the Underground Economy of Twitter Followers: The authors examine an emerging marketplace for purchased relationships, an alterantive to link farming performed by follow-back accounts.
  • [October, 2012] Innocent by Association: Early Recognition of Legitimate Users: The authors develop a SybilGuard-like system, but rather than attempt to find sybil accounts, the system allows legitimate users to vouche for new users (through a transparent action such as sending a message) with the assumption legitimate users would otherwise not associate with spammers. Unvouched users can then be throttled via CAPTCHAs or other automation barriers to reduce the impact they have on a systme.

Social Astroturfing & Political Abuse

Abuse of social networks is not limited to spam and criminal monetization; a number of politically-motivated attacks have occurred over the past few years. These attacks aim to either sway public opinion, disseminate false information, or disrupt the conversations of legitimate users.

Service Abuse

Monetization in soical networks hinges on having URLs that convert traffic into a profit. In this process, a number of other services are abused, the most prominent of which are URL shorteners.

Social Malware & Malicious Applications

Compromised accounts in social networks allow miscreants to leverage the trust users place in their friends and family as a tool for increased clickthrough and propagation. The following is a list of papers detailing known social engineering campaigns or applications used to compromise social networking accounts.

Manufacturing Compromise: The Emergence of Exploit-as-a-Service

This post is based on research conducted in collaboration with Google, to appear in CCS 2012. A pdf is available under my publications. Any views or opinions discussed herein are my own and not those of Google.

Driveby downloads — webpages that attempt to exploit a victim’s browser or plugins (e.g. Flash, Java) — have emerged as one of the dominant vectors for infecting hosts with malware. This revolution in the underground ecosystem has been fueled by the exploit-as-a-service marketplace, where exploit kits such as Blackhole and Incognito provide easily configurable tools that handle all of the “dirty work” of exploiting a victim’s browser in return for a fee. This business model follows in the footsteps of a dramatic evolution in the world of for-profit malware over the last five years, where host compromise is now decoupled from host monetization. Specifically, the means by which a host initially falls under an attacker’s control are now independent of the means by which an(other) attacker abuses the host in order to realize a profit, such as sending spam, information theft, or fake anti-virus.

In the case of exploit kits, attackers can funnel traffic from compromised sites or SEO boosted content to exploit kits, taking control of a victim’s machine without any knowledge of the complexities surrounding browser and plugin vulnerabilities. These hosts can in turn be sold to the pay-per-install marketplace or directly monetized by the attacker. From the perspective of Google Chrome, driveby downloads outstrip social engineering as the most prominent threat, while Microsoft’s latest security intelligence report (SIRv12) highlights the growing threat of driveby downloads, shown below:

In order to understand the impact of the exploit-as-a-service paradigm on the malware ecosystem, we performed a detailed analysis of:

  1. The prevalence of exploit kits across malicious URLs
  2. The families of malware installed upon a successful browser exploit, compared to executable found in email spam, software torrents, the pay-per-install market, and live network traffic
  3. The traffic volume, lifetime, and popularity of malicious websites.

To carry out this study, we analyzed 77,000 malicious URLs provided to us by Google, along with a crowd-sourced feed of blacklisted URLs known to direct to exploit kits. These URLs led to over 10,000 distinct binaries, which we ran in a contained environment (i.e. no side-effects visible to the outside world) to determine the family of malware as well as its monetization approach. We also aggregated and executed over 50,000 distinct binaries pulled from email spam, software and warez torrents, pay-per-install distribution sites, and live network traffic containing malware from corporate settings.

Anatomy of Driveby Download

From the time a victim accesses a malicious website up to the installation of malware on their system, there is a complex chain of events that underpins a successful driveby download. The infection chain for a real driveby that appeared in our study is shown below, where I obfuscate only the compromised website that launched the attack:

In this particular case, victims that visited a compromised website [1] were funneled through a chain of redirects [2] before finally being exposed to an exploit kit [3]. Depending on the time the compromised site was visited, either Blackhole or a yet unknown exploit kit would attempt to exploit the victim’s browser. If successful, different malware including SpyEye (information stealer), ZeroAccess (information stealer), and Rena (fake anti-virus) supplied by third-parties [4] would be installed on the victim’s machine [5]. This chain highlights the multiple actors involved in the exploit-as-a-service market: attackers’s purchasing installs, exploit kit developers, and miscreants compromising websites and redirecting traffic to exploit kits. Depending an attacker’s preference, all three roles can be conducted by a single party or outsourced to the underground marketplace.

Popular Exploit Kits

Of the 77,000 URLs we received from Google’s Safe Browsing list, over 47% of initial domains tied to driveby downloads terminate at an exploit kit. Of the remaining domains, 49% lead directly to executables without a pack, and 4% could not be classified. The table below provides a detailed breakdown of the kits we identified:

Rank Exploit Kit Initial Domains Final Domains
1 Blackhole 28% 29%
2 Incognito 10% 13%
3 Unknown.1 4% 2%
4 Sakura 2% 3%
5 Crimepack 1% <1%
6 Unknown.2 1% 1%
7 Bleeding Life 1% <1%
8 Phoenix <1% 2%
9 Elenore <1% <1%
- Executable 49% 45%
#91;/table#93;
The most popular exploit kit is Blackhole which anecdotally based on screenshots like the one below has a success rate of 7-12% at compromising a victim's browser. Incognito follows in popularity along with a short list of other kits.
Our results show that exploit kits play a vital role in the driveby ecosystem. Surprisingly only a handful of kits exist making them one of the weakest links in the exploit-as-a-service marketplace. These types of bottlenecks are far more attractive for disruption compared to taking down the 6 300 unique domains hosting driveby exploits in our dataset (just a fraction of malicious sites in the wild).

Malware Dropped by Kits

We collect the unique binaries installed upon a successful exploit for each of the driveby domains in our dataset (10 308 binaries in total). During the same time period we also acquire a feed of executables found in email spam attachments (2 817 binaries) pay-per-install programs (2 691 binaries from the droppers that install a client's software) warez and torrents (17 182 binaries and compressed files) and live network traffic (28 300 binaries from Arbor ASERT). We execute all of these binaries in a contained environment which prohibits outgoing network traffic other than for manually crafted whitelist policies in order to allow test connections and guide execution.
Through a combination of automated clustering and manual labeling by analysts we classify the vast majority of binaries in our dataset with the most prominent families per infection vector shown below. (Note: torrents and live traffic contained a number of benign binaries bringing down the total fraction of malicious samples.)
Rank Driveby Dropper Attachment Torrent Live
1 Emit (12%) Clickpotato (6%) Lovegate (44%) Unknown.Adware.A (0.1%) TDSS (2%)
2 Fake WinZip (8%) Palevo (3%) MyDoom (6%) Sefnit (0.07%) Clickpotato (1%)
3 ZeroAccess (5%) NGRBot (2%) Bagle (1%) OpenCandy (0.07%) NGRBot (1%)
4 SpyEye (4%) Gigabid (2%) Sality (0.5%) Unknown.Adware.B (0.06%) Toggle Adware (0.5%)
5 Windows Custodian (4%) ZeroAccess (2%) TDSS (0.1%) ZeroAccess (.01%) ZeroAccess (0.3%)
6 Karagany (4%) Emit (1%) (0.03%) Whitesmoke (0.01%) Gigabid (0.2%)
Total 32 families 19 families 6 families 6 families 40 families

Through passive DNS data collected from a number of ISPs (details available in the paper), we are able to determine which families are installed most frequently by driveby domains. This provides a more meaningful ranking than using unique MD5 sums, which only measures polymorphism. We also compare whether any of the families installed by drivebys appear in our other feeds: (D)roppers, (A)ttachments, (L)ive, and (T)orrents.

Family Monetization Fraction of Installs Other Feeds
ZeroAccess Dropper 35% D;T;L
Windows Custodian Fake AV 10.3%
Karagany Dropper 9.5% D
SpyEye Information Stealer 8% D
TDSS Information Stealer 5.6% A;L
Cluster A Browser Hijacking 5.1%
Zbot Information Stealer 5% D
Multi Installer Dropper 3%
Medfos Browser Hijacking 2.6%
Cluster B Fake AV 2.2%
Clickpotato Adware 2.1% D;L
Perfect Keylogger Information Stealer 1.9% D;L
Emit Dropper 1.8% D;A;L
Sality Dropper 1.7% A
Votwup Denial of Service 1.6%
Fake Rena Fake AV 1.5%
Cluster C Information Stealer 0.7%

Variants including ZeroAccess and Emit rely on multiple infection vectors, while many of the other prominent variants are distributed solely through drivebys. Given that we identify 32 variants from drivebys and 19 from droppers, compared to only 6 from attachments and torrents, it is clear that the exploit-as-a-service and pay-per-install marketplace dominate the underground economy as a source of installs.

Catch Me If You Can

Using passive DNS data, we measure the time that a domain used to host an exploit kit receives traffic. We find malicious domains survive for a median of 2.5 hours before going dark, with 43% of compromised pages that siphon traffic towards exploit kits linking to more than one final domain. As such, attempting to detect sites hosting exploit kits is a losing battle where domain registration far outstrips the pace of detection. Instead, detection should concentrate on identifying compromised sites. Such detection should also occur in-browser in order to circumvent the challenges associated with cloaking or time of crawl vs. time of use variations.

Related Whitepapers

1

Politics as Usual

Social media has emerged as an influential platform for political engagement, allowing users to directly call out political opponents and publicly debate hot-button issues. Tools such as the twindex directly tap into this real-time stream of public sentiment, predicting political outcomes in the same way a traditional Gallup poll would. As social media matures and citizens place their trust in sites like Facebook and Twitter as a source of political truth, such trust is misplaced due to weaknesses in how popular content is bubbled up and how political accounts are ranked and recommended to users. Similarly, the viral nature of content makes mud slinging and misinformation all the more alluring — or, in other words, politics as usual.

Inflating Popularity

The only hurdle between me and gaining a million followers (besides having a private account) is $5,000. At least, that’s the case if I were to purchase followers on Twitter at a going rate of $5-20 per thousand. If popularity is simply a measure of counts rather than information diffusion (e.g. the thousands of Lady Gaga fans willing to retweet her content), then such metrics can be easily gamed due to the ease by which new Twitter accounts can be created. When it comes to social media, there is a fundamental tension between growth and security. Email confirmations, CAPTCHA solutions, and unique IP rules all stymie legitimate users from registering a new account, even if those same tools are necessary to dam the floodgates of spam. Similarly, the cost of false positives where a legitimate user is banned from communicating far outweigh the cost of false negatives (uncaught fraudulent accounts), tipping the balance in favor of miscreants and spammers accessing social media.

When Mitt Romney gained over 100,000 followers in a single day, the media balked at whether these accounts were real users. While political events can certainly trigger an influx of interest, all of the new followers reflected low in-degree accounts that no one else on Twitter was following. Now, well after the story broke, at least 60,000 of these accounts have been suspended, as seen from Romney’s twittercounter.

Whether the accounts were intentionally purchased by Mitt Romney or an adversary (or equally likely, a software glitch of spam accounts set to follow popular Twitter users in order to appear more realistic, but consequently all performing the same action) is unknown. But accusations of purchasing followers is nothing new, with similar rumors plaguing the Newt Gingrich campaign.

The Mitt Romney story also illustrates the susceptibility of brands in social media to smear attacks. If a political opponent purchases followers for a candidate and then cries wolf, the ‘evidence’ of new followers is plain to see, while the target of the attack can only vehemently deny involvement. Similarly, if a political brand launches a trending topic and it is co-opted by opponents (legitimately or not) in a 4chan like manner that degenerates into offensive content (hello pedobear), the original brand has no control over their message once it hits social media, even though its directly linked to their brand (e.g. on a Facebook page or on a promoted hashtag where the affiliation is clear). It’s the double edged sword of mass connectivity in social media.

Silencing Dissidents

One of the more sad applications of social media involves intentionally manipulating anti-spam tools to silence political dissidents. Both Facebook and Twitter grant users the ability to report offensive content, block messages from accounts, and report users for spam. These metrics in turn can be used for removing spam accounts, but are fragile to abuse. A prominent example of this abuse occurred during a political battle between far-right and far-left Israeli groups on Facebook, where thousands of users from one side would report bomb [Hebrew] an account, resulting in its temporary expulsion from Facebook.

While nation states have their own (legal) means to censor and control social media, the aforementioned attack is a chilling reminder of the adversarial nature of user-generated input. Taken at face value, user reports of child pornography or spam can be used to shut down the accounts of political adversaries where the only real victim is free speech.

Controlling Discussions

Social media allows millions of users to connect and discuss political concerns, but whether those issues or the accounts participating are real is an entirely different issue. On Facebook, public pages serve as a forum for commenting and discussion, while on Twitter trending topics allow users with no social connections to interact. The organic nature of how discussions are conceived and the fact that anyone can participate make fake accounts a valuable resource in skewing the tone of conversation. Fake stories, co-opting existing stories, and astroturfing are a growing problem in social media. As detailed in a previous post, the discussion surrounding the Russian parliamentary election on Twitter was swarmed with thousands of fake accounts, while topics like #freetibet continue to be attacked by politically-motivated bots. When topics are effectively voted up by users, there is no such thing as a direct democracy in the presence of thousands of fake accounts.