Papers and Journal Articles by Kathleen M. Carley
Online social media has become an important platform to organize around different socio-cultural ... more Online social media has become an important platform to organize around different socio-cultural and political topics. An extensive scholarship has discussed how people are divided into echo-chamber-like groups. However, there is a lack of work related to quantifying hostile communication or affective polarization between two competing groups. This paper proposes a systematic, network-based methodology for examining affective polarization in online conversations. Further, we apply our framework to 100 weeks of Twitter discourse about climate change. We find that deniers of climate change (Disbelievers) are more hostile towards people who believe (Believers) in the anthropogenic cause of climate change than vice versa. Moreover, Disbelievers use more words and hashtags related to natural disasters during more hostile weeks as compared to Believers. These findings bear implications for studying affective polarization in online discourse, especially concerning the subject of climate change. Lastly, we discuss our findings in the context of increasingly important climate change communication research.
The COVID-19 pandemic of 2021 led to a worldwide health crisis that was accompanied by an infodem... more The COVID-19 pandemic of 2021 led to a worldwide health crisis that was accompanied by an infodemic. A group of 12 social media personalities, dubbed the "Disinformation Dozen", were identified as key in spreading disinformation regarding the COVID-19 virus, treatments, and vaccines. This study focuses on the spread of disinformation propagated by this group on Telegram, a mobile messaging and social media platform. After segregating users into three groupsthe Disinformation Dozen, bots, and humans, we perform an investigation with a dataset of Telegram messages from January to June 2023, comparatively analyzing temporal, topical, and network features. We observe that the Disinformation Dozen are highly involved in the initial dissemination of disinformation but are not the main drivers of the propagation of disinformation. Bot users are extremely active in conversation threads, while human users are active propagators of information, disseminating posts between Telegram channels through the forwarding mechanism.
Bots have been in the spotlight for many social media studies, for they have been observed to be ... more Bots have been in the spotlight for many social media studies, for they have been observed to be participating in the manipulation of information and opinions on social media. These studies analyzed the activity and influence of bots in a variety of contexts: elections, protests, health communication and so forth. Prior to this analyzes is the identification of bot accounts to segregate the class of social media users. In this work, we propose an ensemble method for bot detection, designing a multi-platform bot detection architecture to handle several problems along the bot detection pipeline: incomplete data input, minimal feature engineering, optimized classifiers for each data field, and also eliminate the need for a threshold value for classification determination. With these design decisions, we generalize our bot detection framework across Twitter, Reddit and Instagram. We also perform feature importance analysis, observing that the entropy of names and number of interactions (retweets/shares) are important factors in bot determination. Finally, we apply our multi-platform bot detector to the US 2020 presidential elections to identify and analyze bot activity across multiple social media platforms, showcasing the difference in online discourse of bots from different platforms.
Social media platforms are a key ground of information consumption and dissemination. Key figures... more Social media platforms are a key ground of information consumption and dissemination. Key figures like politicians, celebrities, and activists have leveraged on its wide user base for strategic communication. Strategic communications, or StratCom, is the deliberate act of information creation and distribution. Its techniques are used by key figures for establishing brand and amplifying messages. Automated scripts are used on top of personal touches to effectively perform these tasks. The combination of automation and manual online posting creates a Cyborg social media profile, which is a hybrid between bot and human. In this study, we establish a quantitative definition for a Cyborg account, an account that is detected as bot in one time window, and identified as human in another. This definition makes use of frequent changes in bot classification labels and large differences in bot likelihood scores to identify Cyborgs. We perform a large-scale analysis across over 3.1 million users from Twitter collected from two key events, the 2020 Coronavirus pandemic and the 2020 US Elections. We extract Cyborgs from two datasets and employ tools from network science, natural language processing, and manual annotation to characterize Cyborg accounts. Our analyses identify Cyborg accounts are constructed for strategic communication uses, have a strong duality in their bot/human classification and are tactically positioned in the social media network, aiding these accounts to promote their desired content. Cyborgs are also discovered to have long online lives, indicating their ability to evade bot detectors, or the graciousness of platforms to allow their operations.
In an attempt to mimic the complex paths through which unreliable content spreads between search ... more In an attempt to mimic the complex paths through which unreliable content spreads between search engines and social media, we explore the impact of incorporating both webgraph and large-scale social media contexts into website credibility classification and discovery systems. We further explore the usage of what we define as dredge words on social mediaterms or phrases for which unreliable domains rank highly. Through comprehensive graph neural network ablations, we demonstrate that curriculum-based heterogeneous graph models that leverage context from both webgraphs and social media data outperform homogeneous and single-mode approaches. We further demonstrate that the incorporation of dredge words into our model strongly associates unreliable websites with social media and online commerce platforms. Finally, we show our heterogeneous model greatly outperforms competing systems in the top-k identification of unlabeled unreliable websites. We demonstrate the strong unreliability signals present in the diverse paths that users follow to uncover unreliable content, and we release a novel dataset of dredge words.
Increasingly sources data is available in electronic text form such as email, blogs, news-article... more Increasingly sources data is available in electronic text form such as email, blogs, news-articles, and web page. For scientific usage, this data has to be converted to a form that can be statistically analyzed. This can be an arduous manual procedure. We present here a semi-automated approach that reduces coding time and enables the extraction from texts of a) ontologically classified networks, b) node attributes, and c) meta-data. We show that this approach makes it possible to combine network analysis and standard statistical analysis in reasoning about the material in the text and to reason both about content and the environment that produced the text. We illustrate this approach using data from two corpi – Enron email and data on political elite. Two software tools are used – AutoMap and ORA.
Uploads
Papers and Journal Articles by Kathleen M. Carley
network data that can be used for applications ranging
from command and control, to military intelligence, to
basic social science research. This project reviews
several methods available to extract email network data
and compares them in terms of data quality and
convenience of collection. In general, it is preferable to
obtain email data directly from the central SMTP email
server. In situations where this is not possible, alternative
approaches presented here can be useful. These
techniques for analyzing email data have been automated
in the Organizational Risk Analyzer (ORA) software