Discover millions of fake followers in Weibo

Y Zhang, J Lu - Social Network Analysis and Mining, 2016 - Springer
Social Network Analysis and Mining, 2016Springer
Weibo is the Chinese counterpart of Twitter, which has attracted hundreds of millions of
users. Just like other Online Social Networks (hereafter OSNs), Weibo has a large number of
fake accounts. They are created to sell their following links to customers, who want to boost
their follower counts. These bogus accounts are difficult to identify individually, especially
when they are created by sophisticated programs or controlled by human beings directly.
This paper proposes a novel fake account detection method that is based on the very …
Abstract
Weibo is the Chinese counterpart of Twitter, which has attracted hundreds of millions of users. Just like other Online Social Networks (hereafter OSNs), Weibo has a large number of fake accounts. They are created to sell their following links to customers, who want to boost their follower counts. These bogus accounts are difficult to identify individually, especially when they are created by sophisticated programs or controlled by human beings directly. This paper proposes a novel fake account detection method that is based on the very purpose of the existence of these accounts: they are created to follow their targets en masse, resulting in high-overlapping between the follower lists of their customers. This paper investigates the top Weibo accounts whose follower lists duplicate or nearly duplicate each other (hereafter called near-duplicates). Discovering near-duplicates is a challenging task. The network is large; the data in its entirety are not available; the pair-wise comparison is very expensive. We developed a sampling-based approach to discover all the near-duplicates of the top accounts, who have at least 50,000 followers. In the experiment, we found 395 near-duplicates, which leads us to 11.90 million fake accounts (4.56 % of total users) who send 741.10 million links (9.50 % of the entire edges). Furthermore, we characterize four typical structures of the spammers, cluster these spammers into 34 groups, and analyze the properties of each group.
Springer