skip to main content
10.1145/3308558.3313596acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

SpeedReader: Reader Mode Made Fast and Private

Published: 13 May 2019 Publication History

Abstract

Most popular web browsers include “reader modes” that improve the user experience by removing un-useful page elements. Reader modes reformat the page to hide elements that are not related to the page's main content. Such page elements include site navigation, advertising related videos and images, and most JavaScript. The intended end result is that users can enjoy the content they are interested in, without distraction.
In this work, we consider whether the “reader mode” can be widened to also provide performance and privacy improvements. Instead of its use as a post-render feature to clean up the clutter on a page we propose SpeedReader as an alternative multistep pipeline that is part of the rendering pipeline. Once the tool decides during the initial phase of a page load that a page is suitable for reader mode use, it directly applies document tree translation before the page is rendered. Based on our measurements, we believe that SpeedReader can be continuously enabled in order to drastically improve end-user experience, especially on slow mobile connections. Combined with our approach to predicting which pages should be rendered in reader mode with 91% accuracy, SpeedReader achieves average speedups and bandwidth reductions of up to 27 × and 84 ×, respectively. We further find that our novel “reader mode” approach brings with it significant privacy improvements to users. Our approach effectively removes all commonly recognized trackers, issues 115 fewer requests to third parties, and interacts with 64 fewer trackers on average, on transformed pages.

References

[1]
Amazon. {n. d.}. Amazon Silk Documentation. docs.aws.amazon.com/silk/index.html
[2]
Arc90. {n. d.}. Readability - An Arc90 Lab Experiment. http://ejucovy.github.io/readability/
[3]
Marco Baroni, Francis Chantree, Adam Kilgarriff, and Serge Sharoff. 2008. Cleaneval: a Competition for Cleaning Web Pages. In LREC.
[4]
Alexander Borisov. {n. d.}. myHTML - Fast C/C++ HTML 5 Parser. Using threads.https://github.com/lexborisov/myhtml
[5]
Anna Bouch, Allan Kuchinsky, and Nina Bhatti. 2000. Quality is in the Eye of the Beholder: Meeting Users' Requirements for Internet Quality of Service. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems(CHI '00). ACM, New York, NY, USA, 297-304.
[6]
Giorgio Brajnik and Silvia Gabrielli. 2010. A review of online advertising effects on the user experience. International Journal of Human-Computer Interaction 26, 10(2010), 971-997.
[7]
Michael Butkiewicz, Harsha V. Madhyastha, and Vyas Sekar. 2011. Understanding Website Complexity: Measurements, Metrics, and Implications. In Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference(IMC '11). ACM, New York, NY, USA, 313-328.
[8]
Deng Cai, Shipeng Yu, Ji-Rong Wen, and Wei-Ying Ma. 2003. VIPS: a Vision-based Page Segmentation Algorithm. (November 2003), 28. https://www.microsoft.com/en-us/research/publication/vips-a-vision-based-page-segmentation-algorithm/
[9]
Mozilla Corporation. 2018. Readability.js. https://github.com/mozilla/readability
[10]
EasyList. 2018. About EasyList. https://easylist.to/pages/about.html
[11]
EasyList. 2018. EasyList Github repository. https://github.com/easylist/easylist
[12]
Steven Englehardt and Arvind Narayanan. 2016. Online Tracking: A 1-million-site Measurement and Analysis. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security(CCS '16). ACM, New York, NY, USA, 1388-1401.
[13]
Jian Fan, Ping Luo, Suk Hwan Lim, Sam Liu, Parag Joshi, and Jerry Liu. 2011. Article Clipper: A System for Web Article Extraction. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD '11). ACM, New York, NY, USA, 743-746.
[14]
David Gibson, Kunal Punera, and Andrew Tomkins. 2005. The Volume and Evolution of Web Page Templates. In Special Interest Tracks and Posters of the 14th International Conference on World Wide Web(WWW '05). ACM, New York, NY, USA, 830-839.
[15]
Eyeo GmbH. 2018. Adblock Plus. https://adblockplus.org/
[16]
Utkarsh Goel, Moritz Steiner, Mike P Wittie, Martin Flack, and Stephen Ludin. 2017. Measuring What is Not Ours: A Tale of 3rd Party Performance. In International Conference on Passive and Active Network Measurement. Springer, 142-155.
[17]
Google. {n. d.}. Accelerated Mobile Pages Project. https://www.ampproject.org
[18]
Suhit Gupta, Gail Kaiser, David Neistadt, and Peter Grimm. 2003. DOM-based Content Extraction of HTML Documents. In Proceedings of the 12th International Conference on World Wide Web(WWW '03). ACM, New York, NY, USA, 207-214.
[19]
Simon Harper, Eleni Michailidou, and Robert Stevens. 2009. Toward a Definition of Visual Complexity As an Implicit Measure of Cognitive Load. ACM Trans. Appl. Percept. 6, 2, Article 10 (March 2009), 18 pages.
[20]
Raymond Hill. 2018. uBlock Origin - An efficient blocker for Chromium and Firefox. Fast and lean.https://github.com/gorhill/uBlock
[21]
Brave Software Inc.2018. Brave Ad Block. https://github.com/brave/ad-block
[22]
Google Inc.{n. d.}. Catapult - Web Page Replay. https://github.com/catapult-project/catapult.git
[23]
Google Inc.2018. DOM Distiller. https://github.com/chromium/dom-distiller
[24]
Christian Kohlschütter, Peter Fankhauser, and Wolfgang Nejdl. 2010. Boilerplate Detection Using Shallow Text Features. In Proceedings of the Third ACM International Conference on Web Search and Data Mining(WSDM '10). ACM, New York, NY, USA, 441-450.
[25]
Balachander Krishnamurthy and Craig Wills. 2009. Privacy Diffusion on the Web: A Longitudinal Perspective. In Proceedings of the 18th International Conference on World Wide Web(WWW '09). ACM, New York, NY, USA, 541-550.
[26]
Deepak Kumar, Zane Ma, Zakir Durumeric, Ariana Mirian, Joshua Mason, J. Alex Halderman, and Michael Bailey. 2017. Security Challenges in an Increasingly Tangled Web. In Proceedings of the 26th International Conference on World Wide Web(WWW '17). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 677-684.
[27]
Eduardo Sany Laber, Críston Pereira de Souza, Iam Vita Jabour, Evelin Carvalho Freire de Amorim, Eduardo Teixeira Cardoso, Raúl Pierre Rentería, Lúcio Cunha Tinoco, and Caio Dias Valentim. 2009. A Fast and Simple Method for Extracting Relevant Content from News Webpages. In Proceedings of the 18th ACM Conference on Information and Knowledge Management(CIKM '09). ACM, New York, NY, USA, 1685-1688.
[28]
Timothy Libert. 2015. Exposing the Invisible Web: An Analysis of Third-Party HTTP Requests on 1 Million Websites. International Journal of Communication 9, 0 (2015). https://ijoc.org/index.php/ijoc/article/view/3646
[29]
Shian-Hua Lin and Jan-Ming Ho. 2002. Discovering Informative Content Blocks from Web Documents. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD '02). ACM, New York, NY, USA, 588-593.
[30]
Georg Merzdovnik, Markus Huber, Damjan Buhov, Nick Nikiforakis, Sebastian Neuner, Martin Schmiedecker, and Edgar Weippl. 2017. Block me if you can: A large-scale study of tracker-blocking tools. In Security and Privacy (EuroS&P), 2017 IEEE European Symposium on. IEEE, 319-333.
[31]
Georg Merzdovnik, Markus Huber, Damjan Buhov, Nick Nikiforakis, Sebastian Neuner, Martin Schmiedecker, and Edgar Weippl. 2017. Block Me if You Can: A Large-Scale Study of Tracker-Blocking Tools. Proceedings - 2nd IEEE European Symposium on Security and Privacy, EuroS and P 2017 (2017), 319-333.
[32]
mikesizz. {n. d.}. RedditList - Tracking the top 5000 subreddits. http://redditlist.com/
[33]
Thomas Nagele. 2015. Client-side performance profiling of JavaScript for web applications. Master Thesis. Radboud University Nijmegen.
[34]
Nick Nikiforakis, Luca Invernizzi, Alexandros Kapravelos, Steven Van Acker, Wouter Joosen, Christopher Kruegel, Frank Piessens, and Giovanni Vigna. 2012. You Are What You Include: Large-scale Evaluation of Remote Javascript Inclusions. In Proceedings of the 2012 ACM Conference on Computer and Communications Security(CCS '12). ACM, New York, NY, USA, 736-747.
[35]
Jeff Pasternack and Dan Roth. 2009. Extracting Article Text from the Web with Maximum Subsequence Segmentation. In Proceedings of the 18th International Conference on World Wide Web(WWW '09). ACM, New York, NY, USA, 971-980.
[36]
Enric Pujol, Oliver Hohlfeld, and Anja Feldmann. 2015. Annoyed Users: Ads and Ad-Block Usage in the Wild. In Proceedings of the 2015 Internet Measurement Conference(IMC '15). ACM, New York, NY, USA, 93-106.
[37]
Paruj Ratanaworabhan, Benjamin Livshits, and Benjamin G. Zorn. 2010. JSMeter: Comparing the Behavior of JavaScript Benchmarks with Real Web Applications. In Proceedings of the 2010 USENIX Conference on Web Application Development(WebApps'10). USENIX Association, Berkeley, CA, USA, 3-3. http://dl.acm.org/citation.cfm?id=1863166.1863169
[38]
Borut Sluban and Miha Grcar. 2013. URL tree: efficient unsupervised content extraction from streams of web documents. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management(CIKM '13). ACM, New York, NY, USA, 2267-2272.
[39]
Peter Snyder, Lara Ansari, Cynthia Taylor, and Chris Kanich. 2016. Browser Feature Usage on the Modern Web. In Proceedings of the 2016 Internet Measurement Conference(IMC '16). ACM, New York, NY, USA, 97-110.
[40]
Fei Sun, Dandan Song, and Lejian Liao. 2011. DOM Based Content Extraction via Text Density. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR '11). ACM, New York, NY, USA, 245-254.
[41]
Alexandre N. Tuch, Javier A. Bargas-Avila, Klaus Opwis, and Frank H. Wilhelm. 2009. Visual complexity of websites: Effects on users' experience, physiology, performance, and memory. International Journal of Human-Computer Studies 67, 9 (2009), 703 - 715.
[42]
Antoine Vastel, Peter Snyder, and Benjamin Livshits. 2018. Who Filters the Filters: Understanding the Growth, Usefulness and Efficiency of Crowdsourced Ad Blocking. (2018). http://arxiv.org/abs/1810.09160
[43]
Thijs Vogels, Octavian-Eugen Ganea, and Carsten Eickhoff. 2018. Web2Text: Deep Structured Boilerplate Removal. In European Conference on Information Retrieval. Springer, 167-179.
[44]
Xiao Sophia Wang, Aruna Balasubramanian, Arvind Krishnamurthy, and David Wetherall. 2013. Demystifying Page Load Performance with WProf. In Presented as part of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13). USENIX, Lombard, IL, 473-485. https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/wang_xiao
[45]
Xiao Sophia Wang, Arvind Krishnamurthy, and David Wetherall. 2016. Speeding up Web Page Loads with Shandian. In 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16). USENIX Association, Santa Clara, CA, 109-122. https://www.usenix.org/conference/nsdi16/technical-sessions/presentation/wang
[46]
Zhiheng Wang. 2012. Navigation Timing. W3C Recommendation. W3C. http://www.w3.org/TR/2012/REC-navigation-timing-20121217/.
[47]
Tim Weninger, William H. Hsu, and Jiawei Han. 2010. CETR: Content Extraction via Tag Ratios. In Proceedings of the 19th International Conference on World Wide Web(WWW '10). ACM, New York, NY, USA, 971-980.
[48]
Shanchan Wu, Jerry Liu, and Jian Fan. 2015. Automatic Web Content Extraction by Combination of Learning and Grouping. In Proceedings of the 24th International Conference on World Wide Web(WWW '15). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 1264-1274.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '19: The World Wide Web Conference
May 2019
3620 pages
ISBN:9781450366748
DOI:10.1145/3308558
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • IW3C2: International World Wide Web Conference Committee

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Ad Blocking
  2. Boilerplate Removal
  3. Reader Mode
  4. Web Document Classification
  5. Web Performance

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

WWW '19
WWW '19: The Web Conference
May 13 - 17, 2019
CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)4
Reflects downloads up to 21 Sep 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media