default search action
SC 2016: Salt Lake City, UT, USA
- John West, Cherri M. Pancake:
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016, Salt Lake City, UT, USA, November 13-18, 2016. IEEE Computer Society 2016, ISBN 978-1-4673-8815-3
ACM Gordon Bell finalist I
- Peter E. Vincent, Freddie D. Witherden, Brian C. Vermeire, Jin Seok Park, Arvind Iyer:
Towards green aviation with python at petascale. 1-11 - Jean-Luc Fattebert, Daniel Osei-Kuffuor, Erik W. Draeger, Tadashi Ogitsu, William D. Krauss:
Modeling dilute solutions using first-principles molecular dynamics: computing more than a million atoms with over a million cores. 12-22 - Takayuki Muranushi, Hideyuki Hotta, Junichiro Makino, Seiya Nishizawa, Hirofumi Tomita, Keigo Nitadori, Masaki Iwasawa, Natsuki Hosono, Yutaka Maruyama, Hikaru Inoue, Hisashi Yashiro, Yoshifumi Nakamura:
Simulations of below-ground dynamics of fungi: 1.184 pflops attained by automated generation and autotuning of temporal blocking codes. 23-33
ACM Gordon Bell finalist II
- Jian Zhang, Chunbao Zhou, Yangang Wang, Lili Ju, Qiang Du, Xuebin Chi, Dongsheng Xu, Dexun Chen, Yong Liu, Zhao Liu:
Extreme-scale phase field simulations of coarsening dynamics on the sunway taihulight supercomputer. 34-45 - Fangli Qiao, Wei Zhao, Xunqiang Yin, Xiaomeng Huang, Xin Liu, Qi Shu, Guansuo Wang, Zhenya Song, Xinfang Li, Haixing Liu, Guangwen Yang, Yeli Yuan:
A highly effective global surface wave numerical simulation with ultra-high resolution. 46-56 - Chao Yang, Wei Xue, Haohuan Fu, Hongtao You, Xinliang Wang, Yulong Ao, Fangfang Liu, Lin Gan, Ping Xu, Lanning Wang, Guangwen Yang, Weimin Zheng:
10M-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics. 57-68
Molecular dynamics simulation
- Markus Höhnerbach, Ahmed E. Ismail, Paolo Bientinesi:
The vectorization of the tersoff multi-body potential: an exercise in performance portability. 69-81 - W. Michael Brown, Andrey Semin, Michael Hebenstreit, Sergey Khvostov, Karthik Raman, Steven J. Plimpton:
Increasing molecular dynamics simulation rates with an 8-fold increase in electrical power efficiency. 82-95 - A. Pozdneev, Valéry Weber, Teodoro Laino, Constantine Bekas, Alessandro Curioni:
Enhanced MPSM3 for applications to quantum biological simulations. 96-106
State-of-the-practice: advanced applications development
- Sandra Wienke, Julian Miller, Martin Schulz, Matthias S. Müller:
Development effort estimation in HPC. 107-118 - Ahmed E. Helal, Paul Sathre, Wu-chun Feng:
MetaMorph: a library framework for interoperable kernels on multi- and many-core clusters. 119-129 - Jun Sawada, Filipp Akopyan, Andrew S. Cassidy, Brian Taba, Michael V. DeBole, Pallab Datta, Rodrigo Alvarez-Icaza, Arnon Amir, John V. Arthur, Alexander Andreopoulos, Rathinakumar Appuswamy, Heinz Baier, Davis Barch, David J. Berg, Carmelo di Nolfo, Steven K. Esser, Myron Flickner, Thomas A. Horvath, Bryan L. Jackson, Jeff Kusnitz, Scott Lekuch, Michael Mastro, Timothy Melano, Paul A. Merolla, Steven E. Millman, Tapan K. Nayak, Norm Pass, Hartmut E. Penner, William P. Risk, Kai Schleupen, Benjamin G. Shaw, Hayley Wu, Brian Giera, Adam T. Moody, T. Nathan Mundhenk, Brian Van Essen, Eric X. Wang, David P. Widemann, Qing Wu, William E. Murphy, Jamie K. Infantolino, James A. Ross, Dale R. Shires, Manuel M. Vindiola, Raju Namburu, Dharmendra S. Modha:
Truenorth ecosystem for brain-inspired computing: scalable systems, software, and applications. 130-141
Systems and networks I
- Jens Domke, Torsten Hoefler:
Scheduling-aware routing for supercomputers. 142-153 - Nikhil Jain, Abhinav Bhatele, Sam White, Todd Gamblin, Laxmikant V. Kalé:
Evaluating HPC networks via simulation of parallel workloads. 154-165 - Ke Wen, Payman Samadi, Sébastien Rumley, Christine P. Chen, Yiwen Shen, Meisam Bahadori, Keren Bergman, Jeremiah J. Wilke:
Flexfly: enabling a reconfigurable dragonfly through silicon photonics. 166-177
Numerical algorithms I
- James Kestyn, Vasileios Kalantzis, Eric Polizzi, Yousef Saad:
PFEAST: a high performance sparse eigenvalue solver using distributed-memory linear solvers. 178-189 - Pierre Jolivet, Pierre-Henri Tournier:
Block iterative methods and recycling for improved scalability of linear solvers. 190-203 - Paul R. Eller, William Gropp:
Scalable non-blocking preconditioned conjugate gradient methods. 204-215
Resilience and error handling
- Ignacio Laguna, Martin Schulz:
Pinpointing scale-dependent integer overflow bugs in large-scale parallel applications. 216-227 - Qingrui Liu, Changhee Jung, Dongyoon Lee, Devesh Tiwari:
Compiler-directed lightweight checkpointing for fine-grained guaranteed soft error recovery. 228-239 - Guanpeng Li, Karthik Pattabiraman, Chen-Yong Cher, Pradip Bose:
Understanding error propagation in GPGPU applications. 240-251
Scientific data management and visualization
- Markus Mäsker, Lars Nagel, Tim Süß, André Brinkmann, Lennart Sorth:
Simulation and performance analysis of the ECMWF tape library system. 252-263 - Martin Burtscher, Hari Mukka, Annie Yang, Farbod Hesaaraki:
Real-time synthesis of compression algorithms for scientific data. 264-275 - Matthew Larsen, Cyrus Harrison, James Kress, David Pugmire, Jeremy S. Meredith, Hank Childs:
Performance modeling of in situ rendering. 276-287
Topics in distributed computing
- Engin Arslan, Kemal Guner, Tevfik Kosar:
HARP: predictive transfer optimization based on historical analysis and real-time probing. 288-299 - Feng Yan, Yuxiong He, Olatunji Ruwase, Evgenia Smirni:
SERF: efficient scheduling for fast deep neural network serving via judicious parallelism. 300-311
Resilience
- George Bosilca, Aurélien Bouteiller, Amina Guermouche, Thomas Hérault, Yves Robert, Pierre Sens, Jack J. Dongarra:
Failure detection and propagation in HPC systems. 312-322 - Scott Levy, Kurt B. Ferreira, Patrick G. Bridges:
Improving application resilience to memory errors with lightweight compression. 323-334 - Xiang Ni, Laxmikant V. Kalé:
FlipBack: automatic targeted protection against silent data corruption. 335-346
Tensor and graph algorithms
- Scott Sallinen, Keita Iwabuchi, Suraj Poudel, Maya B. Gokhale, Matei Ripeanu, Roger A. Pearce:
Graph colouring as a challenge problem for dynamic graph processing on distributed systems. 347-358 - Shaden Smith, Jongsoo Park, George Karypis:
An exploration of optimization algorithms for high performance tensor completion. 359-371 - Md. Maksudul Alam, Maleq Khan, Anil Vullikanti, Madhav V. Marathe:
An efficient and scalable algorithmic method for generating large: scale random graphs. 372-383
Performance measurement and analysis
- Oscar H. Mondragon, Patrick G. Bridges, Scott Levy, Kurt B. Ferreira, Patrick M. Widener:
Understanding performance interference in next-generation HPC systems. 384-395 - Maria Dimakopoulou, Stéphane Eranian, Nectarios Koziris, Nicholas Bambos:
Reliable and efficient performance monitoring in linux. 396-408 - Hamid Reza Zohouri, Naoya Maruyama, Aaron Smith, Motohiko Matsuda, Satoshi Matsuoka:
Evaluating and optimizing OpenCL kernels for high performance computing with FPGAs. 409-420
Systems and networks II
- Jason Lee, Zhou Tong, Karthik Achalkar, Xin Yuan, Michael Lang:
Enhancing infiniband with openflow-style SDN capability. 421-432 - Mingzhe Li, Khaled Hamidouche, Xiaoyi Lu, Hari Subramoni, Jie Zhang, Dhabaleswar K. Panda:
Designing MPI library with on-demand paging (ODP) of infiniband: challenges and benefits. 433-443 - Nikola Rajovic, Alejandro Rico, Filippo Mantovani, Daniel Ruiz, Josep Oriol Vilarrubi, Constantino Gómez, Luna Backes, Diego Nieto, Harald Servat, Xavier Martorell, Jesús Labarta, Eduard Ayguadé, Chris Adeniyi-Jones, Said Derradji, Hervé Gloaguen, Piero Lanucara, Nico Sanna, Jean-François Méhaut, Kevin Pouget, Brice Videau, Eric Boyer, Momme Allalen, Axel Auweter, David Brayford, Daniele Tafani, Volker Weinberg, Dirk Brömmel, René Halver, Jan H. Meinke, Ramón Beivide, Mariano Benito, Enrique Vallejo, Mateo Valero, Alex Ramírez:
The mont-blanc prototype: an alternative approach for HPC systems. 444-455
Compilation for enhanced parallelism
- Martin Kong, Louis-Noël Pouchet, P. Sadayappan, Vivek Sarkar:
PIPES: a language and compiler for task-based programming on distributed-memory clusters. 456-467 - Samyam Rajbhandari, Jinsung Kim, Sriram Krishnamoorthy, Louis-Noël Pouchet, Fabrice Rastello, Robert J. Harrison, P. Sadayappan:
A domain-specific compiler for a parallel multiresolution adaptive numerical simulation environment. 468-479 - Anand Venkat, Mahdi Soltan Mohammadi, Jongsoo Park, Hongbo Rong, Rajkishore Barik, Michelle Mills Strout, Mary W. Hall:
Automating wavefront parallelization for sparse matrix computations. 480-491
Fluid dynamics
- Anshu Dubey, Hajime Fujita, Daniel T. Graves, Andrew A. Chien, Devesh Tiwari:
Granularity and the cost of error recovery in resilient AMR scientific applications. 492-501 - William M. Tang, Bei Wang, Stéphane Ethier, Grzegorz Kwasniewski, Torsten Hoefler, Khaled Z. Ibrahim, Kamesh Madduri, Samuel Williams, Leonid Oliker, Carlos Rosales-Fernandez, Timothy J. Williams:
Extreme scale plasma turbulence simulations on top supercomputers worldwide. 502-513 - Arash Bakhtiari, Dhairya Malhotra, Amir Raoofy, Miriam Mehl, Hans-Joachim Bungartz, George Biros:
A parallel arbitrary-order accurate AMR algorithm for the scalar advection-diffusion equation. 514-525
Performance tools
- Thomas Grass, César Allande, Adrià Armejach, Alejandro Rico, Eduard Ayguadé, Jesús Labarta, Mateo Valero, Marc Casas, Miquel Moretó:
MUSA: a multi-level simulation approach for next-generation HPC machines. 526-537 - Tanzima Z. Islam, Jayaraman J. Thiagarajan, Abhinav Bhatele, Martin Schulz, Todd Gamblin:
A machine learning framework for performance coverage analysis of proxy applications. 538-549 - David Böhme, Todd Gamblin, David Beckingsale, Peer-Timo Bremer, Alfredo Giménez, Matthew P. LeGendre, Olga Pearce, Martin Schulz:
Caliper: performance introspection for HPC software stacks. 550-560
Storage systems
- Narges Shahidi, Mohammad Arjomand, Myoungsoo Jung, Mahmut T. Kandemir, Chita R. Das, Anand Sivasubramaniam:
Exploring the potentials of parallel garbage collection in SSDs for enterprise storage systems. 561-572 - Pierre Matri, Alexandru Costan, Gabriel Antoniu, Jesús Montes, María S. Pérez:
Týr: blob storage meets built-in transactions. 573-584 - Jay F. Lofstead, Ivo Jimenez, Carlos Maltzahn, Quincey Koziol, John Bent, Eric Barton:
DAOS and friends: a proposal for an exascale storage system. 585-596
Accelerator programming tools
- Junghyun Kim, Yong-Jun Lee, Jung-Ho Park, Jaejin Lee:
Translating OpenMP device constructs to OpenCL using unnecessary data transfer elimination. 597-608 - Tobias Gysi, Jeremia Bär, Torsten Hoefler:
dCUDA: hardware supported overlap of computation and communication. 609-620 - Mohamed Wahib, Naoya Maruyama, Takayuki Aoki:
Daino: a high-level framework for parallel and efficient AMR on GPUs. 621-632
Memory and power
- Chao Li, Yi Yang, Min Feng, Srimat T. Chakradhar, Huiyang Zhou:
Optimizing memory efficiency for deep convolutional neural networks on GPUs. 633-644 - Leonardo Bautista-Gomez, Ferad Zyulkyarov, Osman S. Unsal, Simon McIntosh-Smith:
Unprotected computing: a large-scale study of DRAM raw error rate on a supercomputer. 645-655 - Sean Wallace, Xu Yang, Venkatram Vishwanath, William E. Allcock, Susan Coghlan, Michael E. Papka, Zhiling Lan:
A data driven scheduling approach for power management on HPC systems. 656-666
Numerical algorithms, part II
- Jieyang Chen, Li Tan, Panruo Wu, Dingwen Tao, Hongbo Li, Xin Liang, Sihuan Li, Rong Ge, Laxmi N. Bhuyan, Zizhong Chen:
GreenLA: green linear algebra software for GPU-accelerated heterogeneous computing. 667-677 - Duane Merrill, Michael Garland:
Merge-based parallel sparse matrix-vector multiplication. 678-689 - Jianyu Huang, Tyler M. Smith, Greg M. Henry, Robert A. van de Geijn:
Strassen's algorithm reloaded. 690-701
Data analytics
- Preeti Malakar, Venkatram Vishwanath, Christopher Knight, Todd S. Munson, Michael E. Papka:
Optimal execution of co-analysis for large-scale molecular dynamics simulations. 702-715 - Ehab Abdelhamid, Ibrahim Abdelaziz, Panos Kalnis, Zuhair Khayyat, Fuad T. Jamour:
Scalemine: scalable parallel frequent subgraph mining in a single large graph. 716-727 - Dmitriy Morozov, Tom Peterka:
Efficient delaunay tessellation through K-D tree decomposition. 728-738
Performance analysis of network systems
- Maxime Martinasso, Grzegorz Kwasniewski, Sadaf R. Alam, Thomas C. Schulthess, Torsten Hoefler:
A PCIe congestion-aware performance model for densely populated accelerator servers. 739-749 - Xu Yang, John Jenkins, Misbah Mubarak, Robert B. Ross, Zhiling Lan:
Watch out for the bully!: job interference study on dragonfly network. 750-760 - Sangeetha Abdu Jyothi, Ankit Singla, Brighten Godfrey, Alexandra Kolla:
Measuring and understanding throughput of network topologies. 761-772
Combinatorial and multigrid algorithms
- Arif M. Khan, Alex Pothen, Md. Mostofa Ali Patwary, Mahantesh Halappanavar, Nadathur Rajagopalan Satish, Narayanan Sundaram, Pradeep Dubey:
Designing scalable b-Matching algorithms on distributed memory multiprocessors by approximation. 773-783 - Sriram P. Chockalingam, Sharma V. Thankachan, Srinivas Aluru:
A parallel algorithm for finding all pairs k-mismatch maximal common substrings. 784-794 - Michael A. Clark, Bálint Joó, Alexei Strelchenko, Michael Cheng, Arjun Singh Gambhir, Richard C. Brower:
Accelerating lattice QCD multigrid on GPUs using fine-grained parallelization. 795-806
File systems and I/O
- Teng Wang, Kathryn M. Mohror, Adam Moody, Kento Sato, Weikuan Yu:
An ephemeral burst-buffer file system for scientific applications. 807-818 - Yang Liu, Raghul Gunasekaran, Xiaosong Ma, Sudharshan S. Vazhkudai:
Server-side log data analytics for I/O workload characterization and coordination on large shared storage systems. 819-829 - Pradeep Kumar, H. Howie Huang:
G-store: high-performance graph store for trillion-edge processing. 830-841
Inverse problems and quantum circuits
- Andreas Mang, Amir Gholami, George Biros:
Distributed-memory large deformation diffeomorphic 3D image registration. 842-853 - Aleksandar Zlateski, Kisuk Lee, H. Sebastian Seung:
ZNNi: maximizing the inference throughput of 3D convolutional networks on CPUs and GPUs. 854-865 - Thomas Häner, Damian S. Steiger, Mikhail Smelyanskiy, Matthias Troyer:
High performance emulation of quantum circuits. 866-874
Manycore architectures
- Shanjiang Tang, Bingsheng He, Shuhao Zhang, Zhaojie Niu:
Elastic multi-resource fairness: balancing fairness and efficiency in coupled CPU-GPU architectures. 875-886 - Cheng-Chieh Huang, Vijay Nagarajan, Arpit Joshi:
DCA: a DRAM-cache-aware DRAM controller. 887-897 - Zhen Lin, Lars Nyland, Huiyang Zhou:
Enabling efficient preemption for SIMT architectures with lightweight context switching. 898-908
State-of-the-practice: system characterization and design
- Edgar A. León, Ian Karlin, Abhinav Bhatele, Steven H. Langer, Chris Chambreau, Louis H. Howell, Trent D'Hooge, Matthew L. Leininger:
Characterizing parallel scientific applications on commodity clusters: an empirical study of a tapered fat-tree. 909-920 - Utkarsh Ayachit, Andrew C. Bauer, Earl P. N. Duque, Greg Eisenhauer, Nicola J. Ferrier, Junmin Gu, Kenneth E. Jansen, Burlen Loring, Zarija Lukic, Suresh Menon, Dmitriy Morozov, Patrick O'Leary, Reetesh Ranjan, Michel E. Rasquin, Christopher P. Stone, Venkatram Vishwanath, Gunther H. Weber, Brad Whitlock, Matthew Wolf, K. John Wu, E. Wes Bethel:
Performance analysis, design considerations, and applications of extreme-scale in situ infrastructures. 921-932
Task-oriented runtimes
- Michael LeBeane, Brandon Potter, Abhisek Pan, Alexandru Dutu, Vinay Agarwala, Wonchan Lee, Deepak Majeti, Bibek Ghimire, Eric Van Tassell, Samuel Wasmundt, Brad Benton, Maurício Breternitz, Michael L. Chu, Mithuna Thottethodi, Lizy K. John, Steven K. Reinhardt:
Extended task queuing: active messages for heterogeneous systems. 933-944 - Tan Nguyen, Didem Unat, Weiqun Zhang, Ann S. Almgren, Muhammed Nufail Farooqi, John Shalf:
Perilla: metadata-based optimizations of an asynchronous runtime for adaptive mesh refinement. 945-956
Accelerating science
- Daniel Roten, Yifeng Cui, Kim B. Olsen, Steven M. Day, Kyle Withers, William H. Savran, Peng Wang, Dawei Mu:
High-frequency nonlinear earthquake simulations on petascale heterogeneous supercomputers. 957-968 - Haohuan Fu, Junfeng Liao, Wei Xue, Lanning Wang, Dexun Chen, Long Gu, Jinxiu Xu, Nan Ding, Xinliang Wang, Conghui He, Shizhen Xu, Yishuang Liang, Jiarui Fang, Yuanchao Xu, Weijie Zheng, Jingheng Xu, Zhen Zheng, Wanjing Wei, Xu Ji, He Zhang, Bingwei Chen, Kaiwei Li, Xiaomeng Huang, Wenguang Chen, Guangwen Yang:
Refactoring and optimizing the community atmosphere model (CAM) on the sunway taihulight supercomputer. 969-980 - Alexander Heinecke, Greg Henry, Maxwell Hutchinson, Hans Pabst:
LIBXSMM: accelerating small matrix multiplications by runtime code generation. 981-991
Clouds & job scheduling
- Supreeth Shastri, Amr Rizk, David E. Irwin:
Transient guarantees: maximizing the value of idle cloud capacity. 992-1002 - Wei Wang, Baochun Li, Ben Liang, Jun Li:
Multi-resource fair sharing for datacenter jobs with placement constraints. 1003-1014 - Christopher Zimmer, Saurabh Gupta, Scott Atchley, Sudharshan S. Vazhkudai, Carl Albing:
A multi-faceted approach to job placement for improved performance on extreme-scale systems. 1015-1025
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.