A Seed-Guided Latent Dirichlet Allocation Approach to Predict the Personality of Online Users Using the PEN Model
Abstract
:1. Introduction
2. Personality Model
3. Related Work
3.1. Overview of the Preliminary Study
3.2. Affective Computing
3.3. Dataless Topic Modeling
4. Problem Formulation and Methodology
4.1. Problem Formulation
4.2. Proposed Methodology
4.2.1. Data Cleansing and Linguistic Marker Identification
4.2.2. Topic Modeling
Algorithm 1: Topic Modeling with SLDA |
. : : . |
4.2.3. Cross Validation Criteria
- (1)
- Any training instances labeled as Psychoticism by SLDA must be correlated to Conscientiousness or Agreeableness scores provided in myPersonality. The two traits seem to be correlated with antagonism characteristics and Psychoticism [14]. Texts that were labeled as Psychoticism also may be correlated to Neuroticism due to the negative coverage;
- (2)
- Any training instances labeled as Extraversion or Neuroticism by SLDA must be directly correlated to the Extraversion or Neuroticism scores, respectively, as provided in myPersonality;
- (3)
5. Findings of the Study
5.1. Performance Comparison
5.2. Intrinsic Evaluation
5.2.1. Descriptive Statistics
5.2.2. Cosine Similarity
5.2.3. Seeking Ground Truth through Trait Correlation
5.2.4. Word Analysis
5.2.5. t-SNE Visualization
5.3. Extrinsic Evaluation
5.3.1. Evaluation Metrics
5.3.2. Machine Learning Classification
5.3.3. Confusion Matrix
6. Threat to Validity
7. Limitation and Future Direction
8. Conclusions and Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Mehta, Y.; Majumder, N.; Gelbukh, A.; Cambria, E. Recent trends in deep learning based personality detection. Artif. Intell. Rev. 2020, 53, 2313–2339. [Google Scholar] [CrossRef] [Green Version]
- Boduszek, D.; McLaughlin, C.; Hyland, P. Criminal attitudes of ex-prisoners: The role of personality, anti-social friends and recidivism. Int. J. Crim. 2011, 9, 1–10. [Google Scholar]
- Kamaluddin, M.R.; Shariff, N.S.M.; Othman, A.; Ismail, K.H.; Saat, G.A.M. Linking psychological traits with criminal behaviour: A review. ASEAN J. Psychiatry 2015, 16, 13–25. [Google Scholar]
- Wang, Z.; Wu, C.; Zhe, W.; Niu, X.; Wang, X. SMOTETomek-based resampling for personality recognition. IEEE Access 2019, 7, 129678–129689. [Google Scholar] [CrossRef]
- Zha, D.; Li, C. Multi-label dataless text classification with topic modeling. Knowl. Inf. Syst. 2019, 61, 137–160. [Google Scholar] [CrossRef] [Green Version]
- Wang, D.; Thint, M.; Al-Rubaie, A. Semi-supervised latent dirichlet allocation and its application for document classification. In Proceedings of the 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, Macau, China, 4–7 December 2012. [Google Scholar]
- Ferner, C.; Havas, C.; Birnbacher, E.; Wegenkittl, S.; Resch, B. Automated seeded latent dirichlet allocation for social media based event detection and mapping. Information 2020, 11, 376. [Google Scholar] [CrossRef]
- Jin, Y.; Bhatia, A.; Wanvarie, D. Seed word selection for weakly-supervised text classification with unsupervised error estimation. arXiv 2021, arXiv:2104.09765. [Google Scholar]
- Kherwa, P.; Bansal, P. Topic Modeling: A Comprehensive Review. EAI Endorsed Trans. Scalable Inf. Syst. 2020, 7, e2. [Google Scholar] [CrossRef] [Green Version]
- Toubia, O.; Iyengar, G.; Bunnell, R.; Lemaire, A. Extracting features of entertainment products: A guided latent dirichlet allocation approach informed by the psychology of media consumption. J. Mark. Res. 2018, 56, 18–36. [Google Scholar] [CrossRef]
- Li, C.; Xing, J.; Sun, A.; Ma, Z. Effective document labeling with very few seed words: A topic model approach. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management—CIKM’16, Indianapolis, IN, USA, 24–28 October 2016. [Google Scholar]
- Li, X.; Li, C.; Chi, J.; Ouyang, J.; Li, C. Dataless text classification: A topic modelling approach with document manifold. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management—CIKM’18, Torino, Italy, 22–26 October 2018. [Google Scholar]
- Li, C.; Chen, S.; Xing, J.; Sun, A.; Ma, Z. Seed-guided topic model for document filtering and classification. ACM Trans. Inf. Syst. 2019, 37, 1–37. [Google Scholar] [CrossRef]
- Lynam, D.R.; Miller, J.D. On the ubiquity and importance of antagonism. In Handbook of Antagonism; Elsevier: Amsterdam, The Netherlands, 2019; pp. 1–24. [Google Scholar]
- Ghafari, S.M.; Beheshti, A.; Joshi, A.; Paris, C.; Yakhchi, S.; Jolfaei, A.; Orgun, M.A. A dynamic deep trust prediction approach for online social networks. In Proceedings of the 18th International Conference on Advances in Mobile Computing & Multimedia, Chiang Mai, Thailand, 30 November–2 December 2020; pp. 11–19. [Google Scholar]
- De Meo, P.; Musial-Gabrys, K.; Rosaci, D.; Sarnè, G.M.L.; Aroyo, L. Using centrality measures to predict helpfulness-based reputation in trust networks. ACM Trans. Internet Technol. 2017, 17, 8. [Google Scholar] [CrossRef]
- Alkhamees, M.; Alsaleem, S.; Al-Qurishi, M.; Al-Rubaian, M.; Hussain, A. User trustworthiness in online social networks: A systematic review. Appl. Soft Comput. 2021, 103, 107159. [Google Scholar] [CrossRef]
- Argamon, S.; Dhawle, S.; Koppel, M.; Pennebaker, J.W. Lexical predictors of personality type. In Proceedings of the 2005 Joint Annual Meeting of the Interface and the Classification Society of North America, St. Louis, MI, USA, 8–12 June 2005; pp. 1–16. [Google Scholar]
- Park, G.; Schwartz, H.A.; Eichstaedt, J.C.; Kern, M.L.; Kosinski, M.; Stillwell, D.J.; Ungar, L.H.; Seligman, M.E. Automatic personality assessment through social media language. J. Pers. Soc. Psychol. 2015, 108, 934–952. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ruch, W.; Wagner, L.; Heintz, S. Humor, the PEN model of personality, and subjective well-being: Support for differential relationships with eight comic styles. Riv. Ital. di Studi sull’Umorismo 2018, 1, 31–44. [Google Scholar]
- Sáez, Y.; Navarro, C.; Mochón, M.A.; Isasi, P. A system for personality and happiness detection. Int. J. Interact. Multimed. Artif. Intell. 2014, 2, 7. [Google Scholar] [CrossRef] [Green Version]
- Sagadevan, S.; Malim, N.H.A.H.; Husin, M.H. Sentiment valences for automatic personality detection of online social networks users using three factor model. Procedia Comput. Sci. 2015, 72, 201–208. [Google Scholar] [CrossRef] [Green Version]
- Mohammadi, G.; Vinciarelli, A. Automatic personality perception: Prediction of trait attribution based on prosodic features extended abstract. In Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction (ACII), Xi’an, China, 21–24 September 2015; pp. 484–490. [Google Scholar]
- Finn, E. Swearing: The good, the bad & the ugly. ORTESOL J. 2017, 34, 17–26. [Google Scholar]
- Nielsen, F.A. A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. arXiv 2011, arXiv:1103.2903. [Google Scholar]
- Hoekstra, R.; Vugteveen, J.; Warrens, M.J.; Kruyen, P.M. An empirical analysis of alleged misunderstandings of coefficient alpha. Int. J. Soc. Res. Methodol. 2019, 22, 351–364. [Google Scholar] [CrossRef] [Green Version]
- Oberlander, J.; Nowson, S. Whose thumb is it anyway? Classifying author personality from weblog text. In Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, Sydney, Australia, 17–18 July 2006; pp. 627–634. [Google Scholar]
- Celli, F.; Pianesi, F.; Stillwell, D.; Kosinski, M. Workshop on computational personality recognition: Shared task. In Proceedings of the International AAAI Conference on Web and Social Media, Cambridge, MA, USA, 8–11 June 2013; Volume 7. [Google Scholar]
- Iacobelli, F.; Gill, A.J.; Nowson, S.; Oberlander, J. Large scale personality classification of bloggers. In Affective Computing and Intelligent Interaction; Springer: Berlin/Heidelberg, Germany, 2011; pp. 568–577. [Google Scholar]
- Junior, R.A.P.; Inkpen, D. Using cognitive computing to get insights on personality traits from twitter messages. In Advances in Artificial Intelligence; Mouhoub, M., Langlais, P., Eds.; Canadian AI 2017. Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2017; Volume 10233. [Google Scholar] [CrossRef]
- Sharma, S. Predicting Employability from User Personality Using Ensemble Modelling. Master’s Thesis, Thapar University, Patiala, India, 2015. [Google Scholar]
- Kunte, A.V.; Panicker, S. Using textual data for personality prediction:a machine learning approach. In Proceedings of the 2019 4th International Conference on Information Systems and Computer Networks (ISCON), Mathura, India, 21–22 November 2019; pp. 529–533. [Google Scholar]
- Saini, M.; Sharan, A. Ensemble learning to find deceptive reviews using personality traits and reviews specific features. J. Digit. Inf. Manag. 2017, 12, 84–94. [Google Scholar]
- Levitan, S.I.; Levitan, Y.; An, G.; Levine, M.; Levitan, R.; Rosenberg, A.; Hirschberg, J. Identifying individual differences in gender, ethnicity, and personality from dialogue for deception detection. In Proceedings of the Second Workshop on Computational Approaches to Deception Detection, San Diego, CA, USA, 12–17 June 2016. [Google Scholar]
- Agarwal, B. Personality detection from text: A review. Int. J. Comput. Syst. 2014, 1, 1–4. [Google Scholar]
- Mulay, P.; Joshi, R.R.; Misra, A.; Raje, R.R. Detection of personality traits of sarcastic people (PTSP): A social-IoT based approach. In Intelligent Systems Reference Library; Springer International Publishing: Cham, Switzerland, 2019; pp. 237–261. [Google Scholar]
- Liu, Y.; Wang, J.; Jiang, Y. PT-LDA: A latent variable model to predict personality traits of social network users. Neurocomputing 2016, 210, 155–163. [Google Scholar] [CrossRef]
- Moreno, D.R.J.; Gomez, J.C.; Almanza-Ojeda, D.-L.; Ibarra-Manzano, M.-A. Prediction of personality traits in twitter users with latent features. In Proceedings of the 2019 International Conference on Electronics, Communications and Computers (CONIELECOMP), Cholula, Mexico, 27 February–1 March 2019; pp. 176–181. [Google Scholar]
- Kwantes, P.J.; Derbentseva, N.; Lam, Q.; Vartanian, O.; Marmurek, H.H. Assessing the Big Five personality traits with latent semantic analysis. Pers. Individ. Differ. 2016, 102, 229–233. [Google Scholar] [CrossRef]
- Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
- Chen, X.; Xia, Y.; Jin, P.; Carroll, J. Dataless text classification with descriptive LDA. In Proceedings of the AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; Volume 29. [Google Scholar]
- Vendrow, J.; Haddock, J.; Rebrova, E.; Needell, D. On a guided nonnegative matrix factorization. arXiv 2021, arXiv:2010.11365v2. [Google Scholar]
- Jagarlamudi, J.; Daume, H.; Udupa, R. Incorporating lexical priors into topic models. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, 23–27 April 2012; pp. 204–213. [Google Scholar]
- Fard, M.M.; Thonet, T.; Gaussier, E. Seed-guided deep document clustering. In Lecture Notes in Computer Science; Springer Science and Business: Cham, Switzerland, 2020; pp. 3–16. [Google Scholar]
- Li, C.; Chen, S.; Qi, Y. Filtering and classifying relevant short text with a few seed words. Data Inf. Manag. 2019, 3, 165–186. [Google Scholar] [CrossRef] [Green Version]
- Kosinski, M.; Stillwell, D.; Graepel, T. Private traits and attributes are predictable from digital records of human behavior. Proc. Natl. Acad. Sci. USA 2013, 110, 5802–5805. [Google Scholar] [CrossRef] [Green Version]
- Alec, G.; Richa, B.; Lei, H. Twitter Sentiment Classification Using Distant Supervision; CS224N Project Report; Stanford University: Stanford, CA, USA, 2009; pp. 1–12. [Google Scholar]
- Sagadevan, S. Comparison of Machine Learning Algorithms for Personality Detection in Online Social Networking. Ph.D. Thesis, Universiti Sains Malaysia, Penang, Malaysia, 2017. [Google Scholar]
- Porter, M.F. An algorithm for suffix stripping. Program 1980, 14, 130–137. [Google Scholar] [CrossRef]
- Li, N.; Chow, C.-Y.; Zhang, J.-D. Seeded-BTM: Enabling biterm topic model with seeds for product aspect mining. In Proceedings of the 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), Zhangjiajie, China, 10–12 August 2019; pp. 2751–2758. [Google Scholar]
- Anoop, V.; Asharaf, S. A topic modeling guided approach for semantic knowledge discovery in e-commerce. Int. J. Interact. Multimed. Artif. Intell. 2017, 4, 40. [Google Scholar] [CrossRef] [Green Version]
- Scholte, R.H.; De Bruyn, E.E. Comparison of the Giant Three and the Big Five in early adolescents. Pers. Individ. Differ. 2003, 36, 1353–1371. [Google Scholar] [CrossRef]
- Dodds, P.S.; Clark, E.M.; Desu, S.; Frank, M.R.; Reagan, A.; Williams, J.R.; Mitchell, L.; Harris, K.D.; Kloumann, I.M.; Bagrow, J.; et al. Human language reveals a universal positivity bias. Proc. Natl. Acad. Sci. USA 2015, 112, 2389–2394. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Rocha, A.; Goldenstein, S.K. Multiclass from binary: Expanding one-versus-all, one-versus-one and ecoc-based approaches. IEEE Trans. Neural Netw. Learn. Syst. 2014, 25, 289–302. [Google Scholar] [CrossRef] [PubMed]
- Tijare, P.; Rani, P.J. Exploring popular topic models. J. Phys. Conf. Ser. 2020, 1706, 012171. [Google Scholar] [CrossRef]
- Ray, S.K.; Ahmad, A.; Kumar, C.A. Review and implementation of topic modeling in Hindi. Appl. Artif. Intell. 2019, 33, 979–1007. [Google Scholar] [CrossRef]
- Albalawi, R.; Yeap, T.H.; Benyoucef, M. Using topic modeling methods for short-text data: A comparative analysis. Front. Artif. Intell. 2020, 3, 42. [Google Scholar] [CrossRef]
- Towne, W.B.; Rose, C.P.; Herbsleb, J.D. Measuring similarity similarly: LDA and human perception. ACM Trans. Intell. Syst. Technol. 2016, 8, 7. [Google Scholar] [CrossRef]
- Röder, M.; Both, A.; Hinneburg, A. Exploring the space of topic coherence measures. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, Shanghai, China, 2–6 February 2015; pp. 399–408. [Google Scholar]
- Monaghan, P.; Chang, Y.-N.; Welbourne, S.; Brysbaert, M. Exploring the relations between word frequency, language exposure, and bilingualism in a computational model of reading. J. Mem. Lang. 2017, 93, 1–21. [Google Scholar] [CrossRef]
- Watanabe, K.; Zhou, Y. Theory-driven analysis of large corpora: Semi supervised topic classification of the UN speeches. Soc. Sci. Comput. Rev. 2020. [Google Scholar] [CrossRef] [Green Version]
- Kobak, D.; Berens, P. The art of using t-SNE for single-cell transcriptomics. Nat. Commun. 2019, 10, 5416. [Google Scholar] [CrossRef] [Green Version]
- Phan, X.-H.; Nguyen, L.; Horiguchi, S. Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In Proceedings of the 17th Information Conference on World Wide Web (WWW’08), Beijing, China, 21–25 April 2008. [Google Scholar]
- Resch, B.; Usländer, F.; Havas, C. Combining machine-learning topic models and spatiotemporal analysis of social media data for disaster footprint and damage assessment. Cartogr. Geogr. Inf. Sci. 2018, 45, 362–376. [Google Scholar] [CrossRef] [Green Version]
- Andrzejewski, D.; Zhu, D.; Craven, M.; Recht, B. A framework for incorporating general domain knowledge into latent dirichlet allocation using first-order logic. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, Barcelona, Spain, 16–22 July 2011. [Google Scholar]
- Platt, J.C. Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines; Technical Report MST-TR-98-14; Microsoft: Redmond, WA, USA, 1998. [Google Scholar]
- Quinlan, J.R. Induction of decision trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef] [Green Version]
- Van, T.P.; Thanh, T.M. Vietnamese news classification based on BoW with keywords extraction and neural network. In Proceedings of the 2017 21st Asia Pacific Symposium on Intelligent and Evolutionary Systems (IES), Hanoi, Vietnam, 15–17 November 2017; pp. 43–48. [Google Scholar]
- Chen, S.; Shen, B.; Wang, X.; Yoo, S.-J. A strong machine learning classifier and decision stumps based hybrid adaboost classification algorithm for cognitive radios. Sensors 2019, 19, 5077. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zadeh, P.; Hosseini, R.; Sra, S. Geometric mean metric learning. In Proceedings of the 33nd International Conference on Machine Learning (ICML), New York, NY, USA, 20–22 June 2016; pp. 2464–2471. [Google Scholar]
- Luque, A.; Carrasco, A.; Martín, A.; de las Heras, A. The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognit. 2019, 91, 216–231. [Google Scholar] [CrossRef]
- Powers, D.M. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness & correlation. J. Mach. Learn. Technol. 2011, 2, 37–63. [Google Scholar]
- Livieris, I.; Kiriakidou, N.; Stavroyiannis, S.; Pintelas, P. An Advanced CNN-LSTM model for cryptocurrency forecasting. Electronics 2021, 10, 287. [Google Scholar] [CrossRef]
- Mustafa, M.; Zeng, F.; Ghulam, H.; Arslan, H.M. Urdu documents clustering with unsupervised and semi-supervised probabilistic topic modeling. Information 2020, 11, 518. [Google Scholar] [CrossRef]
- Salem, H.; Shams, M.Y.; Elzeki, O.M.; Elfattah, M.A.; Al-Amri, J.F.; Elnazer, S. Fine-tuning fuzzy KNN classifier based on uncertainty membership for the medical diagnosis of diabetes. Appl. Sci. 2022, 12, 950. [Google Scholar] [CrossRef]
- Shaukat, K.; Luo, S.; Chen, S.; Liu, D. Cyber threat detection using machine learning techniques: A performance evaluation perspective. In Proceedings of the 2020 International Conference on Cyber Warfare and Security (ICCWS), Islamabad, Pakistan, 20–21 October 2020; pp. 1–6. [Google Scholar]
- Freund, Y.; Schapire, R.E. A Decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef] [Green Version]
- Adi, G.Y.N.N.; Harley, M.; Ong, V.; Suhartono, D.; Andangsari, W. Automatic personality recognition in bahasa indonesia: A semi-supervised approach. ICIC Express Lett. 2019, 13, 797–805. [Google Scholar] [CrossRef]
- Markovikj, D.; Gievska, S.; Kosinski, M.; Stillwell, D. Mining facebook data for predictive personality modeling. In Proceedings of the International AAAI Conference on Web and Social Media, Cambridge, MA, USA, 8–11 July 2013. [Google Scholar]
- Kamble, K.S.; Sengupta, J. Ensemble machine learning-based affective computing for emotion recognition using dual-decomposed EEG signals. IEEE Sens. J. 2021, 22, 2496–2507. [Google Scholar] [CrossRef]
- Dupré, D.; Krumhuber, E.G.; Küster, D.; McKeown, G.J. A performance comparison of eight commercially available automatic classifiers for facial affect recognition. PLoS ONE 2020, 15, e0231968. [Google Scholar] [CrossRef] [PubMed]
- Abro, S.; Shaikh, S.; Hussain, Z.; Ali, Z.; Khan, S.; Mujtaba, G. Automatic hate speech detection using machine learning: A comparative study. Int. J. Adv. Comput. Sci. Appl. 2020, 11, 8. [Google Scholar] [CrossRef]
- Alam, F.; Riccardi, G. Comparative study of speaker personality traits recognition in conversational and broadcast news speech. In Proceedings of the International Conference of Inter Speech, Lyon, France, 25–29 August 2013. [Google Scholar]
- Rennie, J.D.M.; Shih, L.; Teevan, L.; Karger, D.R. Tackling the poor assumptions of naive Bayes text classifiers. In Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington, DC, USA, 21–24 August 2003. [Google Scholar]
- Brownlee, J. Deep Learning for Natural Language Processing: Develop Deep Learning Models for Your Natural Language Problems; Machine Learning Mastery: San Francisco, CA, USA, 2017; Available online: https://www.technocourses.com/wp-ontent/uploads/2020/09/nlp.pdf (accessed on 28 October 2017).
- Cao, H.; Li, X.-L.; Woon, Y.-K.; Ng, S.-K. SPO: Structure preserving oversampling for imbalanced time series classification. In Proceedings of the 2011 IEEE 11th International Conference on Data Mining, Vancouver, BC, Canada, 11–14 December 2011; pp. 1008–1013. [Google Scholar]
- Tang, Y.; Zhang, Y.-Q.; Chawla, N.V.; Krasser, S. SVMs modeling for highly imbalanced classification. IEEE Trans. Syst. Man Cybern. Part B 2009, 39, 281–288. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Solé, X.; Ramisa, A.; Torras, C. Evaluation of random forests on large-scale classification problems using a bag-of-visual-words representation. In Proceedings of the Catalan Conference on Artificial Intelligence, Frontiers in Artificial Intelligence and Applications; IOS Press: Barcelona, Spain, 2014; pp. 273–276. [Google Scholar]
- Mairesse, F.; Walker, M. Words mark the nerds: Computational models of personality recognition through language. In Proceedings of the Annual Meeting of the Cognitive Science Society, Vancouver, BC, Canada, 26–29 July 2006; Volume 28. [Google Scholar]
- McGrath, J.; Saha, S.; Chant, D.; Welham, J. Schizophrenia: A concise overview of incidence, prevalence, and mortality. Epidemiol. Rev. 2008, 30, 67–76. [Google Scholar] [CrossRef] [Green Version]
- Amirhosseini, M.H.; Kazemian, H. Machine learning approach to personality type prediction based on the myers–briggs type indicator®. Multimodal Technol. Interact. 2020, 4, 9. [Google Scholar] [CrossRef] [Green Version]
- Madisetty, S.; Desarkar, M.S. A neural network-based ensemble approach for spam detection in twitter. IEEE Trans. Comput. Soc. Syst. 2018, 5, 973–984. [Google Scholar] [CrossRef]
Trait | Characteristics |
---|---|
Extraversion | Sociable, lively, active, assertive, sensation seeking, carefree, dominant, surgent, and venturesome. |
Neuroticism | Anxious, depressed, guilt feelings, low self-esteem, tense, irrational, shy, moody, and emotional. |
Psychoticism | Aggressive, cold, egocentric, impersonal, impulsive, antisocial, unempathetic, creative, and tough-minded. |
Psychoticism | Extraversion | Neuroticism |
---|---|---|
Ass, Asshole, Assfucking, Cum, Bullshit, Wtf, Damn, Dick, Catastrophic, Fuck, Fucktard, Fuking, Piss, Shit, Bastard, Bitch, Cock, Cocksucker, Cunt, Nigger, Niggas, Mofo, Penis, Goddamnit, Motherfucker | Like, Good, Love, Happy, Fun, Great, Better, Lol, Please, Nice, Hope, Best, Awesome, Thank, Feeling, Pretty, Wish, Amazing, Cool, Wonderful, Wow, Beautiful, Care, Luck, Kind, Super, Funny, Yeah, Enjoy, Win, Hahaha, Glad, Peace, Excited | Bad, Stupid, Suck, Crap, Sad, Bore, Mad, Hurt, Kill, Stuck, Poor, Dead, Annoy, Sore, Sigh, Slap, Grrr, Worst, Disappoint, Fear, Weak, Weird, Fool, Difficult, Doubt, Upset, Idiot, Dumb, Lame, Hate, Shame, Afraid, Disgust, Sick, Arghhh, Foolish, Anxious, Hopeless |
Notion | Description |
---|---|
D | Total number of documents in each dataset |
T | Total number of topics. |
V | The vocabulary of attributes |
S | The vocabulary of seed words |
A | A regular attribute in the document |
S | A seed word in the document |
Θd | The topic distribution of document d |
Φt | The word distribution of topic t |
δa,t | The probability of attribute a being a latent feature for category t |
β, γ | Dirichlet Priors |
Non−Seeded Topic Model | ||||||
myPersonality | Sentiment140 | |||||
Model | Distribution | Perplexity | Coherence | Perplexity | Coherence | |
LDA | Multiclass | 9.85 | 0.4976 | 7.14 | 0.4621 | |
One vs. All | 12.65 | 0.4643 | 12.65 | 0.4465 | ||
NMF | Multiclass | 10.56 | 0.4839 | 7.35 | 0.4328 | |
One vs. All | 11.86 | 0.4601 | 13.81 | 0.4483 | ||
LSA | Multiclass | 15.61 | 0.4543 | 13.34 | 0.4254 | |
One vs. All | 18.96 | 0.4471 | 14.46 | 0.4136 | ||
Seed−Guided Topic Model | ||||||
myPersonality | Sentiment140 | |||||
Number of seed words | Model | Distribution | Perplexity | Coherence | Perplexity | Coherence |
50 | SLDA | Multiclass | −3.21 | 0.5112 | −3.43 | 0.5274 |
One vs. All | −3.23 | 0.5287 | −3.54 | 0.5443 | ||
GNMF | Multiclass | −3.23 | 0.5087 | −3.46 | 0.5254 | |
One vs. All | −3.27 | 0.5293 | −3.23 | 0.5467 | ||
40 | SLDA | Multiclass | −3.13 | 0.5441 | −3.20 | 0.5751 |
One vs. All | −3.25 | 0.5824 | −3.46 | 0.6164 | ||
GNMF | Multiclass | −3.20 | 0.5465 | −3.27 | 0.5673 | |
One vs. All | −3.17 | 0.5831 | −3.49 | 0.5877 | ||
30 | SLDA | Multiclass | −2.87 | 0.6331 | −2.93 | 0.6775 |
One vs. All | −2.31 | 0.6539 | −3.05 | 0.6643 | ||
GNMF | Multiclass | −2.88 | 0.6231 | −2.99 | 0.6621 | |
One vs. All | −2.32 | 0.6321 | −3.09 | 0.6712 | ||
20 | SLDA | Multiclass | –2.78 | 0.7293 | –2.85 | 0.7824 |
One vs. All | –2.03 | 0.7739 | –2.27 | 0.7412 | ||
GNMF | Multiclass | −2.98 | 0.6854 | −3.01 | 0.7061 | |
One vs. All | 2.29 | 0.6935 | −3.05 | 0.7276 | ||
10 | SLDA | Multiclass | −3.12 | 0.6634 | −2.99 | 0.7012 |
One vs. All | −2.78 | 0.6645 | −3.01 | 0.6943 | ||
GNMF | Multiclass | −3.17 | 0.6212 | −3.02 | 0.6273 | |
One vs. All | −2.76 | 0.6572 | −2.98 | 0.6632 |
MyPersonality | Sentiment140 | |
---|---|---|
Distribution | Intra | Intra |
Multiclass | 0.832 | 0.771 |
One vs. All | 0.827 | 0.764 |
Num | Instance | PEN Trait |
---|---|---|
1 | “photovia fuck yeah skinny bitch people really” | Psychoticism |
2 | “goin kill alicia gave fucking sickness ughhh wtf” | Psychoticism |
3 | “really upset louisville concert cancelled scared happen wnashville” | Neuroticism |
4 | “well had midwife evil evil woman gave anti jab hurt like hell baby think” | Neuroticism |
5 | “need someone pr experience volunteer help interested helping save world” | Extraversion |
6 | “happy thanksgiving facebook friends family thankful wonderful” | Extraversion |
Num | Instance |
---|---|
1 | “fucking assholes poor little girl rip khyra” |
2 | “wut hummm waitin cum power like bf” |
3 | “swearbot shit piss cunt cocksucker motherfucker tits fart turd twat blink said best” |
4 | “photovia fuck yeah skinny bitch people really” |
5 | “goin kill alicia gave fucking sickness ughhh wtf” |
6 | “fucking assholes poor little girl rip khyra” |
myPersonality | Probability | Sentiment140 | Probability | |
---|---|---|---|---|
Multiclass | Amazing | 0.040 | Amazing | 0.054 |
Sad | 0.040 | Annoy | 0.039 | |
Motherfucker | 0.046 | Hell | 0.041 | |
One-vs-all | Asshole | 0.042 | Stupid | 0.052 |
Fuck | 0.051 | Asshole | 0.044 | |
Miss | 0.062 | Hurt | 0.051 |
myPersonality(Multiclass) | |||||||
Language Model | ML Classifier | Recall | Precision | F1 | AUC | GM | Time Complexity |
Unigram | SMO | 0.979 | 0.979 | 0.979 | 0.970 | 0.968 | 4.45 s |
NB | 0.739 | 0.897 | 0.810 | 0.764 | 0.819 | 2.05 s | |
C4.5 | 0.967 | 0.965 | 0.966 | 0.960 | 0.965 | 92.54 s | |
KNN | 0.939 | 0.932 | 0.935 | 0.930 | 0.930 | 0.01 s | |
RF | 0.979 | 0.961 | 0.970 | 0.965 | 0.966 | 71.40 s | |
Ada | 0.968 | 0.962 | 0.965 | 0.954 | 0.964 | 9.91 s | |
Bigram | SMO | 0.899 | 0.883 | 0.888 | 0.889 | 0.890 | 0.28 s |
NB | 0.891 | 0.895 | 0.893 | 0.885 | 0.890 | 0.28 s | |
C4.5 | 0.888 | 0.887 | 0.857 | 0.884 | 0.886 | 0.53 s | |
KNN | 0.897 | 0.882 | 0.887 | 0.888 | 0.889 | 0.28 s | |
RF | 0.895 | 0.875 | 0.885 | 0.887 | 0.887 | 99.33 s | |
Ada | 0.893 | 0.895 | 0.894 | 0.888 | 0.892 | 0.24 s | |
Trigram | SMO | 0.930 | 0.920 | 0.925 | 0.921 | 0.924 | 0.05 s |
NB | 0.929 | 0.918 | 0.923 | 0.920 | 0.921 | 0.05 s | |
C4.5 | 0.918 | 0.914 | 0.916 | 0.911 | 0.915 | 0.05 s | |
KNN | 0.930 | 0.920 | 0.916 | 0.924 | 0.923 | 0.05 s | |
RF | 0.930 | 0.920 | 0.916 | 0.924 | 0.922 | 8.46 s | |
Ada | 0.920 | 0.920 | 0.920 | 0.916 | 0.919 | 0.04 s | |
Sentiment140 (Multiclass) | |||||||
Language Model | ML Classifier | Recall | Precision | F1 | AUC | GM | Time Complexity |
Unigram | SMO | 0.995 | 0.995 | 0.995 | 0.989 | 0.989 | 172.43 s |
NB | 0.841 | 0.939 | 0.887 | 0.814 | 0.863 | 24.68 s | |
C4.5 | 0.991 | 0.991 | 0.991 | 0.981 | 0.984 | 3379.83 s | |
KNN | 0.967 | 0.966 | 0.966 | 0.979 | 0.956 | 15.04 s | |
RF | 0.986 | 0.986 | 0.986 | 0.983 | 0.978 | 2589.97 s | |
Ada | 0.990 | 0.990 | 0.990 | 0.984 | 0.982 | 194.9 s | |
Bigram | SMO | 0.959 | 0.958 | 0.958 | 0.954 | 0.987 | 654.59 s |
NB | 0.284 | 0.815 | 0.421 | 0.376 | 0.264 | 64.33 s | |
C4.5 | 0.952 | 0.951 | 0.951 | 0.952 | 0.944 | 12,409.02 s | |
KNN | 0.954 | 0.946 | 0.946 | 0.953 | 0.941 | 27.45 s | |
RF | 0.948 | 0.946 | 0.947 | 0.948 | 0.940 | 5800.23 s | |
Ada | 0.954 | 0.951 | 0.952 | 0.944 | 0.949 | 157.20 s | |
Trigram | SMO | 0.947 | 0.947 | 0.947 | 0.946 | 0.941 | 2.34 s |
NB | 0.947 | 0.935 | 0.929 | 0.937 | 0.939 | 26.01 s | |
C4.5 | 0.947 | 0.947 | 0.947 | 0.939 | 0.944 | 690.05 s | |
KNN | 0.947 | 0.929 | 0.938 | 0.934 | 0.940 | 0.05 s | |
RF | 0.942 | 0.942 | 0.942 | 0.921 | 0.941 | 795.85 s | |
Ada | 0.945 | 0.945 | 0.945 | 0.922 | 0.942 | 126.08 s |
myPersonality (One vs. All) | |||||||
Language Model | ML Classifier | Recall | Precision | F1 | AUC | GM | Time Complexity |
Unigram | SMO | 0.999 | 0.999 | 0.995 | 0.997 | 0.998 | 1.07 s |
NB | 0.822 | 0.958 | 0.885 | 0.794 | 0.781 | 1.47 s | |
C4.5 | 0.992 | 0.999 | 0.999 | 0.991 | 0.995 | 11.41 s | |
KNN | 0.992 | 0.993 | 0.992 | 0.992 | 0.995 | 0.05 s | |
RF | 0.996 | 0.996 | 0.996 | 0.985 | 0.994 | 35.66 s | |
Ada | 0.945 | 0.945 | 0.945 | 0.945 | 0.939 | 0.26 s | |
Bigram | SMO | 0.969 | 0.964 | 0.966 | 0.966 | 0.955 | 4.01 s |
NB | 0.942 | 0.939 | 0.940 | 0.938 | 0.922 | 6.16 s | |
C4.5 | 0.965 | 0.963 | 0.964 | 0.961 | 0.919 | 36.94 s | |
KNN | 0.946 | 0.947 | 0.946 | 0.947 | 0.931 | 0.09 s | |
RF | 0.940 | 0.964 | 0.952 | 0.931 | 0.936 | 141.61 s | |
Ada | 0.964 | 0.959 | 0.961 | 0.953 | 0.961 | 6.67 s | |
Trigram | SMO | 0.998 | 0.998 | 0.998 | 0.996 | 0.997 | 0.04 s |
NB | 0.998 | 0.998 | 0.998 | 0.995 | 0.995 | 0.19 s | |
C4.5 | 0.989 | 0.989 | 0.988 | 0.983 | 0.985 | 1.04 s | |
KNN | 0.998 | 0.998 | 0.998 | 0.996 | 0.997 | 0.04 s | |
RF | 0.998 | 0.998 | 0.998 | 0.996 | 0.997 | 42.95 s | |
Ada | 0.998 | 0.998 | 0.998 | 0.995 | 0.997 | 0.46 s | |
Sentiment140 (One-vs.-All) | |||||||
Language Model | ML classifier | Recall | Precision | F1 | AUC | GM | Time Complexity |
Unigram | SMO | 0.996 | 0.996 | 0.996 | 0.989 | 0.996 | 295.83 s |
NB | 0.950 | 0.960 | 0.955 | 0.948 | 0.948 | 48.56 s | |
C4.5 | 0.992 | 0.992 | 0.992 | 0.973 | 0.991 | 3345.64 s | |
KNN | 0.989 | 0.990 | 0.989 | 0.968 | 0.987 | 0.05 s | |
RF | 0.989 | 0.989 | 0.989 | 0.967 | 0.987 | 2970.03 s | |
Ada | 0.994 | 0.994 | 0.993 | 0.990 | 0.993 | 1865.02 s | |
Bigram | SMO | 0.958 | 0.955 | 0.956 | 0.826 | 0.958 | 546.02 s |
NB | 0.105 | 0.850 | 0.187 | 0.461 | 0.126 | 120.02 s | |
C4.5 | 0.941 | 0.938 | 0.939 | 0.874 | 0.925 | 15,467.56 s | |
KNN | 0.942 | 0.937 | 0.939 | 0.816 | 0.921 | 965.27 s | |
RF | 0.940 | 0.938 | 0.939 | 0.856 | 0.918 | 5634.67 s | |
Ada | 0.944 | 0.944 | 0.944 | 0.861 | 0.922 | 1259.64 s | |
Trigram | SMO | 0.949 | 0.921 | 0.935 | 0.933 | 0.941 | 24 s |
NB | 0.950 | 0.952 | 0.951 | 0.940 | 0.939 | 29.02 s | |
C4.5 | 0.950 | 0.952 | 0.951 | 0.940 | 0.942 | 128.65 s | |
KNN | 0.950 | 0.946 | 0.948 | 0.939 | 0.942 | 347.28 s | |
RF | 0.905 | 0.903 | 0.904 | 0.941 | 0.912 | 504.24 s | |
Ada | 0.947 | 0.947 | 0.947 | 0.939 | 0.941 | 630.32 s |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sagadevan, S.; Malim, N.H.A.H.; Husin, M.H. A Seed-Guided Latent Dirichlet Allocation Approach to Predict the Personality of Online Users Using the PEN Model. Algorithms 2022, 15, 87. https://doi.org/10.3390/a15030087
Sagadevan S, Malim NHAH, Husin MH. A Seed-Guided Latent Dirichlet Allocation Approach to Predict the Personality of Online Users Using the PEN Model. Algorithms. 2022; 15(3):87. https://doi.org/10.3390/a15030087
Chicago/Turabian StyleSagadevan, Saravanan, Nurul Hashimah Ahamed Hassain Malim, and Mohd Heikal Husin. 2022. "A Seed-Guided Latent Dirichlet Allocation Approach to Predict the Personality of Online Users Using the PEN Model" Algorithms 15, no. 3: 87. https://doi.org/10.3390/a15030087
APA StyleSagadevan, S., Malim, N. H. A. H., & Husin, M. H. (2022). A Seed-Guided Latent Dirichlet Allocation Approach to Predict the Personality of Online Users Using the PEN Model. Algorithms, 15(3), 87. https://doi.org/10.3390/a15030087