skip to main content
10.1145/3289600.3291034acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Neural Demographic Prediction using Search Query

Published: 30 January 2019 Publication History

Abstract

Demographics of online users such as age and gender play an important role in personalized web applications. However, it is difficult to directly obtain the demographic information of online users. Luckily, search queries can cover many online users and the search queries from users with different demographics usually have some difference in contents and writing styles. Thus, search queries can provide useful clues for demographic prediction. In this paper, we study predicting users' demographics based on their search queries, and propose a neural approach for this task. Since search queries can be very noisy and many of them are not useful, instead of combining all queries together for user representation, in our approach we propose a hierarchical user representation with attention (HURA) model to learn informative user representations from their search queries. Our HURA model first learns representations for search queries from words using a word encoder, which consists of a CNN network and a word-level attention network to select important words. Then we learn representations of users based on the representations of their search queries using a query encoder, which contains a CNN network to capture the local contexts of search queries and a query-level attention network to select informative search queries for demographic prediction. Experiments on two real-world datasets validate that our approach can effectively improve the performance of search query based age and gender prediction and consistently outperform many baseline methods.

References

[1]
Bin Bi, Milad Shokouhi, Michal Kosinski, and Thore Graepel. 2013. Inferring the demographics of search users: Social data meets search queries. In WWW . 131--140.
[2]
Rich Caruana, Steve Lawrence, and C Lee Giles. 2001. Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping. In NIPS . 402--408.
[3]
Aron Culotta, Nirmal Kumar Ravi, and Jennifer Cutler. 2016. Predicting Twitter user demographics using distant supervision from website traffic data. Journal of Artificial Intelligence Research, Vol. 55 (2016), 389--408.
[4]
Yann Dauphin, Harm de Vries, and Yoshua Bengio. 2015. Equilibrated adaptive learning rates for non-convex optimization. In NIPS . 1504--1512.
[5]
Mostafa Dehghani, Sascha Rothe, Enrique Alfonseca, and Pascal Fleury. 2017. Learning to attend, copy, and generate for session-based query suggestion. In CIKM. ACM, 1747--1756.
[6]
Golnoosh Farnadi, Jie Tang, Martine De Cock, and Marie-Francine Moens. 2018. User Profiling through Deep Multimodal Fusion. In WSDM. 171--179.
[7]
Katja Filippova. 2012. User demographics and language in an implicit social network. In EMNLP . 1478--1488.
[8]
Clayton Fink, Jonathon Kopecky, and Maksym Morawski. 2012. Inferring Gender from the Content of Tweets: A Region Specific Example. In ICWSM . 459--462.
[9]
Sharad Goel, Jake M Hofman, and M Irmak Sirer. 2012. Who Does What on the Web: A Large-Scale Study of Browsing Behavior. In ICWSM . 120--137.
[10]
Sumit Goswami, Sudeshna Sarkar, and Mayur Rustagi. 2009. Stylometric Analysis of Bloggers' Age and Gender. In ICWSM. 214--217.
[11]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation, Vol. 9, 8 (1997), 1735--1780.
[12]
Jian Hu, Hua-Jun Zeng, Hua Li, Cheng Niu, and Zheng Chen. 2007. Demographic prediction based on user's browsing behavior. In WWW. 151--160.
[13]
Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2017. Bag of Tricks for Efficient Text Classification. In EACL, Vol. 2. 427--431.
[14]
Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In EMNLP . 1746--1751.
[15]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature, Vol. 521, 7553 (2015), 436.
[16]
Wen Li and Markus Dickinson. 2017. Gender Prediction for Chinese Social Media Data. In Proc. of Recent Advances in Natural Language Processing. 438--445.
[17]
Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective Approaches to Attention-based Neural Machine Translation. In EMNLP . 1412--1421.
[18]
Sunghwan Mac Kim, Qiongkai Xu, Lizhen Qu, Stephen Wan, and Cécile Paris. 2017. Demographic Inference on Twitter using Recursive Neural Networks. In ACL, Vol. 2. 471--477.
[19]
Ian MacKinnon and Robert H Warren. 2007. Age and geographic inferences of the LiveJournal social network. In Statistical Network Analysis: Models, Issues, and New Directions. Springer, 176--178.
[20]
Eric Malmi and Ingmar Weber. 2016. You Are What Apps You Use: Demographic Prediction Based on User's Apps. In ICWSM . 635--638.
[21]
Alan Mislove, Sune Lehmann, Yong-Yeol Ahn, Jukka-Pekka Onnela, and J Niels Rosenquist. 2011. Understanding the Demographics of Twitter Users. ICWSM, Vol. 11, 5th (2011), 25.
[22]
Bhaskar Mitra. 2015. Exploring session context using distributed representations of queries and reformulations. In SIGIR. 3--12.
[23]
Saif M Mohammad and Tony Wenda Yang. 2011. Tracking sentiment in mail: How genders differ on emotional axes. In Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis . 70--79.
[24]
Antonio A Morgan-Lopez, Annice E Kim, Robert F Chew, and Paul Ruddle. 2017. Predicting age groups of Twitter users based on language and metadata features. PloS one, Vol. 12, 8 (2017), e0183537.
[25]
Dong Nguyen, Rilana Gravel, Dolf Trieschnigg, and Theo Meder. 2013. " How Old Do You Think I Am?" A Study of Language and Age in Twitter. In ICWSM . 439--448.
[26]
Dong Nguyen, Noah A Smith, and Carolyn P Rosé. 2011. Author age prediction from text using linear regression. In Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities. 115--123.
[27]
Dong Nguyen, Dolf Trieschnigg, A Seza Doug ruöz, Rilana Gravel, Mariët Theune, Theo Meder, and Franciska De Jong. 2014. Why gender and age prediction from tweets is hard: Lessons from a crowdsourcing experiment. In COLING. 1950--1961.
[28]
Claudia Peersman, Walter Daelemans, and Leona Van Vaerenbergh. 2011. Predicting age and gender in online social networks. In Proceedings of the 3rd International Workshop on Search and Mining User-Generated Contents . 37--44.
[29]
Bryan Perozzi and Steven Skiena. 2015. Exact age prediction in social networks. In WWW. 91--92.
[30]
Zhen Qin, Yilei Wang, Yong Xia, Hongrong Cheng, Yingjie Zhou, Zhengguo Sheng, and Victor CM Leung. 2014. Demographic information prediction based on smartphone application usage. In 2014 International Conference on Smart Computing. IEEE, 183--190.
[31]
Delip Rao, David Yarowsky, Abhishek Shreevats, and Manaswi Gupta. 2010. Classifying latent user attributes in twitter. In Proceedings of the 2nd International Workshop on Search and Mining User-Generated Contents. 37--44.
[32]
Sara Rosenthal and Kathleen McKeown. 2011. Age prediction in blogs: A study of style, content, and online behavior in pre-and post-social media generations. In ACL. 763--772.
[33]
K Santosh, Romil Bansal, Mihir Shekhar, and Vasudeva Varma. 2013. Author profiling: Predicting age and gender from blogs. Notebook for PAN at CLEF (2013), 119--124.
[34]
Alessandro Sordoni, Yoshua Bengio, Hossein Vahabi, Christina Lioma, Jakob Grue Simonsen, and Jian-Yun Nie. 2015. A hierarchical recurrent encoder-decoder for generative context-aware query suggestion. In CIKM. ACM, 553--562.
[35]
Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, Vol. 15, 1 (2014), 1929--1958.
[36]
Jingjing Wang, Shoushan Li, and Guodong Zhou. 2017. Joint learning on relevant user attributes in micro-blog. In IJCAI . 4130--4136.
[37]
Liang Wang, Qi Li, Xuan Chen, and Sujian Li. 2016. Multi-task Learning for Gender and Age Prediction on Chinese Microblog. (2016), 189--200.
[38]
Ingmar Weber and Carlos Castillo. 2010. The demographics of web search. In SIGIR. 523--530.
[39]
Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In NAACL . 1480--1489.
[40]
Josh Jia-Ching Ying, Yao-Jen Chang, Chi-Min Huang, and Vincent S Tseng. 2012. Demographic prediction based on users mobile behaviors. Mobile Data Challenge (2012), 1--6.
[41]
Dong Zhang, Shoushan Li, Hongling Wang, and Guodong Zhou. 2016. User classification with multiple textual perspectives. In COLING . 2112--2121.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining
January 2019
874 pages
ISBN:9781450359405
DOI:10.1145/3289600
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 January 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. demographic prediction
  2. search query
  3. user modeling

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China
  • National Key Research and Development Program of China

Conference

WSDM '19

Acceptance Rates

WSDM '19 Paper Acceptance Rate 84 of 511 submissions, 16%;
Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media