Hegde P and de Veciana G. (2024). Optimal Aggregation via Overlay Trees: Delay-MSE Tradeoffs under Failures. Proceedings of the ACM on Measurement and Analysis of Computing Systems. 8:3. (1-37). Online publication date: 10-Dec-2024.

https://doi.org/10.1145/3700423

Di Girolamo S, De Sensi D, Taranov K, Malesevic M, Besta M, Schneider T, Kistler S and Hoefler T. (2022). Building Blocks for Network-Accelerated Distributed File Systems SC22: International Conference for High Performance Computing, Networking, Storage and Analysis. 10.1109/SC41404.2022.00015. 978-1-6654-5444-5. (1-14).

https://ieeexplore.ieee.org/document/10046100/

Traff J. (2022). Fast(er) Construction of Round-optimal $n$-Block Broadcast Schedules 2022 IEEE International Conference on Cluster Computing (CLUSTER). 10.1109/CLUSTER51413.2022.00028. 978-1-6654-9856-2. (142-151).

https://ieeexplore.ieee.org/document/9912694/

Margolin A and Barak A. (2020). Tree‐based fault‐tolerant collective operations for MPI. Concurrency and Computation: Practice and Experience. 10.1002/cpe.5826. 33:14. Online publication date: 25-Jul-2021.

https://onlinelibrary.wiley.com/doi/10.1002/cpe.5826

Küttler M, Planeta M, Bierbaum J, Weinhold C, Härtig H, Barak A and Hoefler T. Corrected trees for reliable group communication. Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming. (287-299).

https://doi.org/10.1145/3293883.3295721

Gerstenberger R, Besta M and Hoefler T. (2018). Enabling highly scalable remote memory access programming with MPI-3 one sided. Communications of the ACM. 61:10. (106-113). Online publication date: 26-Sep-2018.

https://doi.org/10.1145/3264413

Kuban R, Rotta R and Nolte J. (2018). Help Your Busy Neighbors: Dynamic Multicasts over Static Topologies. Euro-Par 2017: Parallel Processing Workshops. 10.1007/978-3-319-75178-8_51. (636-647).

http://link.springer.com/10.1007/978-3-319-75178-8_51

Ramos S and Hoefler T. (2016). Cache Line Aware Algorithm Design for Cache-Coherent Architectures. IEEE Transactions on Parallel and Distributed Systems. 27:10. (2824-2837). Online publication date: 1-Oct-2016.

https://doi.org/10.1109/TPDS.2016.2516540

Marendic P, Lemeire J, Vucinic D and Schelkens P. (2016). A novel MPI reduction algorithm resilient to imbalances in process arrival times. The Journal of Supercomputing. 72:5. (1973-2013). Online publication date: 1-May-2016.

https://doi.org/10.1007/s11227-016-1707-x

Gallopoulos E, Philippe B and Sameh A. (2016). Parallel Programming Paradigms. Parallelism in Matrix Computations. 10.1007/978-94-017-7188-7_1. (3-16).

https://link.springer.com/10.1007/978-94-017-7188-7_1

Peng I, Markidis S and Laure E. The Cost of Synchronizing Imbalanced Processes in Message Passing Systems. Proceedings of the 2015 IEEE International Conference on Cluster Computing. (408-417).

https://doi.org/10.1109/CLUSTER.2015.63

Ramos S and Hoefler T. Cache Line Aware Optimizations for ccNUMA Systems. Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing. (85-88).

https://doi.org/10.1145/2749246.2749256

Nürnberger S, Rotta R, Drescher G, Danner D and Nolte J. (2015). Diamond Rings: Acknowledged Event Propagation in Many-Core Processors. Euro-Par 2015: Parallel Processing Workshops. 10.1007/978-3-319-27308-2_58. (722-733).

http://link.springer.com/10.1007/978-3-319-27308-2_58

Lin H, Lin T and Wu C. (2014). Constructing application-layer multicast trees for minimum-delay message distribution. Information Sciences. 10.1016/j.ins.2014.03.130. 279. (433-445). Online publication date: 1-Sep-2014.

https://linkinghub.elsevier.com/retrieve/pii/S0020025514004277

Hoefler T and Moor D. (2014). Energy, Memory, and Runtime Tradeoffs for Implementing Collective Communication Operations. Supercomputing Frontiers and Innovations: an International Journal. 1:2. (58-75). Online publication date: 9-Jul-2014.

https://doi.org/10.14529/jsfi140204

Gerstenberger R, Besta M and Hoefler T. (2014). Enabling highly-scalable remote memory access programming with MPI-3 One Sided. Scientific Programming. 22:2. (75-91). Online publication date: 1-Apr-2014.

https://doi.org/10.1155/2014/571902

Gerstenberger R, Besta M and Hoefler T. Enabling highly-scalable remote memory access programming with MPI-3 one sided. Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. (1-12).

https://doi.org/10.1145/2503210.2503286

Ramos S and Hoefler T. Modeling communication in cache-coherent SMP systems. Proceedings of the 22nd international symposium on High-performance parallel and distributed computing. (97-108).

https://doi.org/10.1145/2493123.2462916

Ramos S and Hoefler T. Modeling communication in cache-coherent SMP systems. Proceedings of the 22nd international symposium on High-performance parallel and distributed computing. (97-108).

https://doi.org/10.1145/2462902.2462916

Hoefler T and Schneider T. Optimization principles for collective neighborhood communications. Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. (1-10).

/doi/10.5555/2388996.2389129

HU K, DING Y, ZHANG X and JIANG S. (2012). A Scalable Infrastructure for Online Performance Analysis on CFD Application. Chinese Journal of Aeronautics. 10.1016/S1000-9361(11)60418-4. 25:4. (546-558). Online publication date: 1-Aug-2012.

https://linkinghub.elsevier.com/retrieve/pii/S1000936111604184

Lin H, Lin T, Wu C and Yang H. Construction of Application-Layer Multicast Trees for Message Distribution. Proceedings of the International Symposium on Parallel and Distributed Processing with Applications. (109-114).

https://doi.org/10.1109/ISPA.2010.44

Khuller S, Kim Y and Wan Y. (2010). Broadcasting on Networks of Workstations. Algorithmica. 57:4. (848-868). Online publication date: 1-Aug-2010.

/doi/10.5555/3118226.3118461

Khuller S, Kim Y and Wan Y. (2008). Broadcasting on Networks of Workstations. Algorithmica. 10.1007/s00453-008-9249-0. 57:4. (848-868). Online publication date: 1-Aug-2010.

http://link.springer.com/10.1007/s00453-008-9249-0

Castaldo A and Whaley R. (2010). Scaling LAPACK panel operations using parallel cache assignment. ACM SIGPLAN Notices. 45:5. (223-232). Online publication date: 1-May-2010.

https://doi.org/10.1145/1837853.1693484

Liu L and Li Z. (2010). Improving parallelism and locality with asynchronous algorithms. ACM SIGPLAN Notices. 45:5. (213-222). Online publication date: 1-May-2010.

https://doi.org/10.1145/1837853.1693483

Zhang E, Jiang Y and Shen X. (2010). Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs?. ACM SIGPLAN Notices. 45:5. (203-212). Online publication date: 1-May-2010.

https://doi.org/10.1145/1837853.1693482

Hoefler T, Siebert C and Lumsdaine A. (2010). Scalable communication protocols for dynamic sparse data exchange. ACM SIGPLAN Notices. 45:5. (159-168). Online publication date: 1-May-2010.

https://doi.org/10.1145/1837853.1693476

Hofmeyr S, Iancu C and Blagojević F. (2010). Load balancing on speed. ACM SIGPLAN Notices. 45:5. (147-158). Online publication date: 1-May-2010.

https://doi.org/10.1145/1837853.1693475

Hoefler T, Siebert C and Lumsdaine A. Scalable communication protocols for dynamic sparse data exchange. Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. (159-168).

https://doi.org/10.1145/1693453.1693476

Hoefler T, Schneider T and Lumsdaine A. (2009). LogGP in theory and practice – An in-depth analysis of modern interconnection networks and benchmarking methods for collective operations. Simulation Modelling Practice and Theory. 10.1016/j.simpat.2009.06.007. 17:9. (1511-1521). Online publication date: 1-Oct-2009.

http://linkinghub.elsevier.com/retrieve/pii/S1569190X09000811

Liu P, Kuo M and Wang D. (2007). An Approximation Algorithm and Dynamic Programming for Reduction in Heterogeneous Environments. Algorithmica. 10.1007/s00453-007-9113-7. 53:3. (425-453). Online publication date: 1-Mar-2009.

http://link.springer.com/10.1007/s00453-007-9113-7

Drozdowski M. (2009). Scheduling with Communication Delays. Scheduling for Parallel Processing. 10.1007/978-1-84882-310-5_6. (209-299).

http://link.springer.com/10.1007/978-1-84882-310-5_6

Bonald T, Massoulié L, Mathieu F, Perino D and Twigg A. (2008). Epidemic live streaming. ACM SIGMETRICS Performance Evaluation Review. 36:1. (325-336). Online publication date: 12-Jun-2008.

https://doi.org/10.1145/1384529.1375494

Bonald T, Massoulié L, Mathieu F, Perino D and Twigg A. Epidemic live streaming. Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems. (325-336).

https://doi.org/10.1145/1375457.1375494

Arge L, Goodrich M, Nelson M and Sitchinava N. Fundamental parallel algorithms for private-cache chip multiprocessors. Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures. (197-206).

https://doi.org/10.1145/1378533.1378573

Santos E, Rickman J, Muthukrishnan G and Feng S. (2008). Efficient algorithms for parallelizing Monte Carlo simulations for 2D Ising spin models. The Journal of Supercomputing. 44:3. (274-290). Online publication date: 1-Jun-2008.

https://doi.org/10.1007/s11227-007-0163-z

Adler M, Gong Y and Rosenberg A. (2008). On “Exploiting” Node-Heterogeneous Clusters Optimally. Theory of Computing Systems. 42:4. (465-487). Online publication date: 17-Mar-2008.

https://doi.org/10.1007/s00224-007-9001-1

Shan H, Strohmaier E, Qiang J, Bailey D and Yelick K. Performance modeling and optimization of a high energy colliding beam simulation code. Proceedings of the 2006 ACM/IEEE conference on Supercomputing. (97-es).

https://doi.org/10.1145/1188455.1188557

Shan H, Strohmaier E, Qiang J, Bailey D and Yelick K. (2006). Performance Modeling and Optimization of a High Energy Colliding Beam Simulation Code ACM/IEEE SC 2006 Conference (SC'06). 10.1109/SC.2006.48. 0-7695-2700-0. (48-48).

http://ieeexplore.ieee.org/document/4090222/

Gai A and viennot L. (2006). Incentive, Resilience and Load Balancing in Multicasting through Clustered de Bruijn Overlay Network 2006 14th IEEE International Conference on Networks. 10.1109/ICON.2006.302673. 0-7803-9746-0. (1-6).

http://ieeexplore.ieee.org/document/4087744/

Li J, Gu N and Jia W. (2006). An Efficient Fibonacci Series Based Hierarchical Application-Layer Multicast Protocol. Mobile Ad-hoc and Sensor Networks. 10.1007/11943952_12. (131-142).

http://link.springer.com/10.1007/11943952_12

Rosenberg A. Changing Challenges for Collaborative Algorithmics. Handbook of Nature-Inspired and Innovative Computing. 10.1007/0-387-27705-6_1. (1-44).

http://link.springer.com/10.1007/0-387-27705-6_1

Khuller S, Kim Y and Wan Y. Broadcasting on networks of workstations. Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures. (279-288).

https://doi.org/10.1145/1073970.1074017

Fraigniaud P, Mans B and Rosenberg A. (2005). Efficient trigger-broadcasting in heterogeneous clusters. Journal of Parallel and Distributed Computing. 65:5. (628-642). Online publication date: 1-May-2005.

https://doi.org/10.1016/j.jpdc.2004.12.003

Rosenberg A and Yurkewych M. (2005). Guidelines for Scheduling Some Common Computation-Dags for Internet-Based Computing. IEEE Transactions on Computers. 54:4. (428-438). Online publication date: 1-Apr-2005.

https://doi.org/10.1109/TC.2005.65

Santos E. (2004). Optimal and efficient parallel tridiagonal solvers using direct methods. The Journal of Supercomputing. 30:2. (97-115). Online publication date: 1-Nov-2004.

https://doi.org/10.1023/B:SUPE.0000040615.60545.c6

Rosenberg A. (2004). On Scheduling Mesh-Structured Computations for Internet-Based Computing. IEEE Transactions on Computers. 53:9. (1176-1186). Online publication date: 1-Sep-2004.

https://doi.org/10.1109/TC.2004.64

Ooshita F, Matsumae S, Masuzawa T and Tokura N. (2004). Scheduling for broadcast operation in heterogeneous parallel computing environments. Systems and Computers in Japan. 10.1002/scj.10533. 35:5. (44-54). Online publication date: 1-May-2004.

https://onlinelibrary.wiley.com/doi/10.1002/scj.10533

Khuller S and Kim Y. On broadcasting in heterogenous networks. Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms. (1011-1020).

/doi/10.5555/982792.982944

Khuller S, Kim Y and Woeginger G. (2004). Approximation Schemes for Broadcasting in Heterogenous Networks. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. 10.1007/978-3-540-27821-4_15. (163-170).

http://link.springer.com/10.1007/978-3-540-27821-4_15

Liu P, Wang D and Guo Y. (2004). An Approximation Algorithm for Broadcast Scheduling in Heterogeneous Clusters. Real-Time and Embedded Computing Systems and Applications. 10.1007/978-3-540-24686-2_3. (38-52).

http://link.springer.com/10.1007/978-3-540-24686-2_3

Ramachandran V, Grayson B and Dahlin M. (2003). Emulations between QSM, BSP and LogP. Journal of Parallel and Distributed Computing. 63:12. (1175-1192). Online publication date: 1-Dec-2003.

https://doi.org/10.1016/j.jpdc.2003.04.001

Roth P, Arnold D and Miller B. MRNet. Proceedings of the 2003 ACM/IEEE conference on Supercomputing.

https://doi.org/10.1145/1048935.1050172

Tam A and Wang C. (2003). Contention-Aware Communication Schedule for High-Speed Communication. Cluster Computing. 6:4. (339-353). Online publication date: 1-Oct-2003.

https://doi.org/10.1023/A:1025765910100

Bernaschi M, Iannello G and Lauria M. (2003). Efficient implementation of reduce-scatter in MPI. Journal of Systems Architecture: the EUROMICRO Journal. 49:3. (89-108). Online publication date: 1-Aug-2003.

https://doi.org/10.1016/S1383-7621(03)00059-6

Kohout J and George A. (2003). A high-performance communication service for parallel computing on distributed DSP systems. Parallel Computing. 29:7. (851-878). Online publication date: 1-Jul-2003.

https://doi.org/10.1016/S0167-8191(03)00061-9

Adler M, Gong Y and Rosenberg A. Optimal sharing of bags of tasks in heterogeneous clusters. Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures. (1-10).

https://doi.org/10.1145/777412.777414

Santos E. (2003). Parallel Complexity of Matrix Multiplication. The Journal of Supercomputing. 25:2. (155-175). Online publication date: 1-Jun-2003.

https://doi.org/10.1023/A:1023996628662

Adler M, Gong Y and Rosenberg A. Asymptotically Optimal Worksharing in HNOWs. Proceedings of the 36th annual symposium on Simulation.

/doi/10.5555/786111.786246

Adler M, Ying Gong and Rosenberg A. Asymptotically optimal worksharing in HNOWs: how long is "sufficiently long?" 36th Annual Simulation Symposium (ANSS-36 2003). 10.1109/SIMSYM.2003.1192796. 0-7695-1911-3. (39-46).

http://ieeexplore.ieee.org/document/1192796/

Rosenberg A. To BSP or not to BSP in heterogeneous NOWs International Parallel and Distributed Processing Symposium (IPDPS 2003). 10.1109/IPDPS.2003.1213308. 0-7695-1926-1. (14).

http://ieeexplore.ieee.org/document/1213308/

Tipparaju V, Nieplocha J and Panda D. Fast collective operations using shared and remote memory access protocols on clusters International Parallel and Distributed Processing Symposium (IPDPS 2003). 10.1109/IPDPS.2003.1213188. 0-7695-1926-1. (10).

http://ieeexplore.ieee.org/document/1213188/

Vorakosit T and Uthayopas P. (2003). Generating an Efficient Dynamics Multicast Tree under Grid Environment. Recent Advances in Parallel Virtual Machine and Message Passing Interface. 10.1007/978-3-540-39924-7_85. (636-643).

http://link.springer.com/10.1007/978-3-540-39924-7_85

Santos E, Feng S and Rickman J. Efficient Parallel Algorithms for 2-Dimensional Ising Spin Models. Proceedings of the 16th International Parallel and Distributed Processing Symposium.

/doi/10.5555/645610.661726

Santos E. (2002). Optimal and Efficient Algorithms for Summing and Prefix Summing on Parallel Machines. Journal of Parallel and Distributed Computing. 62:4. (517-543). Online publication date: 1-Apr-2002.

https://doi.org/10.1006/jpdc.2000.1698

Bernaschi M, Iannello G and Lauria M. Efficient implementation of reduce-scatter in MPI. Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing. (301-308).

/doi/10.5555/1895489.1895529

Santos E, Shuangtong Feng and Rickman J. (2002). Efficient parallel algorithms for 2-dimensional ising spin models Proceedings 16th International Parallel and Distributed Processing Symposium. IPDPS 2002. 10.1109/IPDPS.2002.1016660. 0-7695-1573-8. (8 pp).

http://ieeexplore.ieee.org/document/1016660/

Bernaschi M, Iannello G and Lauria M. Efficient implementation of reduce-scatter in MPI 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing. 10.1109/EMPDP.2002.994296. 0-7695-1444-8. (301-308).

http://ieeexplore.ieee.org/document/994296/

Kielmann T, Bal H, Gorlatch S, Verstoep K and Hofman R. (2001). Network performance-aware collective communication for clustered wide-area systems. Parallel Computing. 27:11. (1431-1456). Online publication date: 1-Oct-2001.

https://doi.org/10.1016/S0167-8191(01)00098-9

Jain S. (2001). Branch and bound on the network model. Theoretical Computer Science. 255:1-2. (107-123). Online publication date: 28-Mar-2001.

https://doi.org/10.1016/S0304-3975(99)00158-9

Cappello F, Fraigniaud P, Mans B and Rosenberg A. HiHCoHP-Toward a realistic communication model for hierarchical hyperclusters of heterogeneous processors IEEE International Symposium on Parallel and Distributed Processing. 10.1109/IPDPS.2001.924978. 0-7695-0990-8. (6).

http://ieeexplore.ieee.org/document/924978/

Bernaschi M and Richelli G. MPI collective communication operations on large shared memory systems Ninth Euromicro Workshop on Parallel and Distributed Processing. 10.1109/EMPDP.2001.905038. 0-7695-0987-8. (159-164).

http://ieeexplore.ieee.org/document/905038/

Rosenberg A. (2001). Sharing partitionable workloads in heterogeneous NOWs: greedier is not better Proceedings 2001 IEEE International Conference on Cluster Computing. 10.1109/CLUSTR.2001.959961. 0-7695-1116-3. (124-131).

https://ieeexplore.ieee.org/document/959961/

Ramachandran V. (2001). Parallel Algorithm Design with Coarse-Grained Synchronization. Computational Science - ICCS 2001. 10.1007/3-540-45718-6_67. (619-627).

http://link.springer.com/10.1007/3-540-45718-6_67

Löwe W and Liebrich A. (2001). VizzScheduler - A Framework for the Visualization of Scheduling Algorithms. Euro-Par 2001 Parallel Processing. 10.1007/3-540-44681-8_10. (62-66).

http://link.springer.com/10.1007/3-540-44681-8_10

Liu P and Sheng T. Broadcast scheduling optimization for heterogeneous cluster systems. Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures. (129-136).

https://doi.org/10.1145/341800.341816

Löwe W and Zimmermann W. (2000). Scheduling balanced task-graphs to LogP-machines. Parallel Computing. 26:9. (1083-1108). Online publication date: 1-Jul-2000.

https://doi.org/10.1016/S0167-8191(00)00030-2

Verriet J. (2000). Scheduling outtrees of height one in the LogP model. Parallel Computing. 26:9. (1065-1082). Online publication date: 1-Jul-2000.

https://doi.org/10.1016/S0167-8191(00)00029-6

Domani T, Kolodner E and Petrank E. (2000). A generational on-the-fly garbage collector for Java. ACM SIGPLAN Notices. 35:5. (274-284). Online publication date: 1-May-2000.

https://doi.org/10.1145/358438.349336

Cannarozzi D, Plezbert M and Cytron R. (2000). Contaminated garbage collection. ACM SIGPLAN Notices. 35:5. (264-273). Online publication date: 1-May-2000.

https://doi.org/10.1145/358438.349334

Fähndrich M, Rehof J and Das M. (2000). Scalable context-sensitive flow analysis using instantiation constraints. ACM SIGPLAN Notices. 35:5. (253-263). Online publication date: 1-May-2000.

https://doi.org/10.1145/358438.349332

Wan Z and Hudak P. (2000). Functional reactive programming from first principles. ACM SIGPLAN Notices. 35:5. (242-252). Online publication date: 1-May-2000.

https://doi.org/10.1145/358438.349331

Bar-Noy A, Kipnis S and Schieber B. (2000). Optimal multiple message broadcasting in telephone-like communication systems. Discrete Applied Mathematics. 100:1-2. (1-15). Online publication date: 15-Mar-2000.

https://doi.org/10.1016/S0166-218X(99)00155-9

Kielmann T, Hofman R, Bal H, Plaat A and Bhoedjang R. (1999). MagPIe. ACM SIGPLAN Notices. 34:8. (131-140). Online publication date: 1-Aug-1999.

https://doi.org/10.1145/329366.301116

Gao L, Rosenberg A and Sitaraman R. (1999). Optimal Clustering of Tree-Sweep Computations for High-Latency Parallel Environments. IEEE Transactions on Parallel and Distributed Systems. 10:8. (813-824). Online publication date: 1-Aug-1999.

https://doi.org/10.1109/71.790599

Golin M and Schuster A. (1999). Optimal point-to-point broadcast algorithms via lopsided trees. Discrete Applied Mathematics. 93:2-3. (233-263). Online publication date: 1-Jul-1999.

https://doi.org/10.1016/S0166-218X(99)00107-9

Kielmann T, Hofman R, Bal H, Plaat A and Bhoedjang R. MagPIe. Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming. (131-140).

https://doi.org/10.1145/301104.301116

Bar-Noy A and Ho C. (1999). Broadcasting Multiple Messages in the Multiport Model. IEEE Transactions on Parallel and Distributed Systems. 10:5. (500-508). Online publication date: 1-May-1999.

https://doi.org/10.1109/71.770196

Bernaschi M, Iannello G and Lauria M. (1999). Experimental results about MPI collective communication operations. High-Performance Computing and Networking. 10.1007/BFb0100638. (774-783).

http://link.springer.com/10.1007/BFb0100638

Löwe W and Zimmermann W. (1999). Scheduling Iterative Programs onto LogP-Machine. Euro-Par’99 Parallel Processing. 10.1007/3-540-48311-X_43. (332-339).

http://link.springer.com/10.1007/3-540-48311-X_43

Canonico R, Cristaldi R and Iannello G. (1999). A Scalable Flow Control Algorithm for the Fast Messages Communication Library. Network-Based Parallel Computing. Communication, Architecture, and Applications. 10.1007/10704826_6. (77-90).

http://link.springer.com/10.1007/10704826_6

Kutten S and Peleg D. (1999). Fault-Local Distributed Mending. Journal of Algorithms. 30:1. (144-165). Online publication date: 1-Jan-1999.

https://doi.org/10.1006/jagm.1998.0972

Li X, Shou B and Zheng S. (1998). Research on the optimal parallel algorithms of broadcast-class problems. Journal of Computer Science and Technology. 10.1007/BF02948504. 13:5. (455-463). Online publication date: 1-Sep-1998.

http://link.springer.com/10.1007/BF02948504

Hambrusch S and Khokhar A. (1998). Scalable S-To-P Broadcasting on Message-Passing MPPs. IEEE Transactions on Parallel and Distributed Systems. 9:8. (758-768). Online publication date: 1-Aug-1998.

https://doi.org/10.1109/71.706048

Bar-Noy A, Guha S, Naor J and Schieber B. Multicasting in heterogeneous networks. Proceedings of the thirtieth annual ACM symposium on Theory of computing. (448-453).

https://doi.org/10.1145/276698.276857

Eisenbiegler J, Löwe W and Zimmermann W. (1998). BSP, LogP, and oblivious programs. Euro-Par’98 Parallel Processing. 10.1007/BFb0057942. (865-874).

http://link.springer.com/10.1007/BFb0057942

Iannello G. (1997). Efficient Algorithms for the Reduce-Scatter Operation in LogGP. IEEE Transactions on Parallel and Distributed Systems. 8:9. (970-982). Online publication date: 1-Sep-1997.

https://doi.org/10.1109/71.615442

Gibbons P, Matias Y and Ramachandran V. Can shared-memory model serve as a bridging model for parallel computation?. Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures. (72-83).

https://doi.org/10.1145/258492.258500

Goodrich M. Randomized fully-scalable BSP techniques for multi-searching and convex hull construction. Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms. (767-776).

/doi/10.5555/314161.314442

Löwe W, Zimmermann W and Eisenbiegler J. (1997). On linear schedules of task graphs for generalized logp-machines. Euro-Par'97 Parallel Processing. 10.1007/BFb0002832. (895-904).

http://link.springer.com/10.1007/BFb0002832

Santos E. (1997). Optimal parallel algorithms for solving tridiagonal linear systems. Euro-Par'97 Parallel Processing. 10.1007/BFb0002802. (700-709).

http://link.springer.com/10.1007/BFb0002802

Bilardi G, Codenotti B, Del Corso G, Pinotti C and Resta G. (1997). Broadcast and associative operations on fat-trees. Euro-Par'97 Parallel Processing. 10.1007/BFb0002734. (196-207).

http://link.springer.com/10.1007/BFb0002734

Culler D, Karp R, Patterson D, Sahay A, Santos E, Schauser K, Subramonian R and von Eicken T. (1996). LogP. Communications of the ACM. 39:11. (78-85). Online publication date: 1-Nov-1996.

https://doi.org/10.1145/240455.240477

Suciu D. Implementation and Analysis of a Parallel Collection Query Language. Proceedings of the 22th International Conference on Very Large Data Bases. (366-377).

/doi/10.5555/645922.673488

JáJá J and Ryu K. (1996). The Block Distributed Memory Model. IEEE Transactions on Parallel and Distributed Systems. 7:8. (830-840). Online publication date: 1-Aug-1996.

https://doi.org/10.1109/71.532114

Dusseau A, Culler D, Schauser K and Martin R. (1996). Fast Parallel Sorting Under LogP. IEEE Transactions on Parallel and Distributed Systems. 7:8. (791-805). Online publication date: 1-Aug-1996.

https://doi.org/10.1109/71.532111

Bilardi G, Herley K, Pietracaprina A, Pucci G and Spirakis P. BSP vs LogP. Proceedings of the eighth annual ACM symposium on Parallel Algorithms and Architectures. (25-32).

https://doi.org/10.1145/237502.237504

Bar-Noy A and Ho C. Broadcasting Multiple Messages in the Multiport Model. Proceedings of the 10th International Parallel Processing Symposium. (781-788).

/doi/10.5555/645606.660864

Bruck J, De Coster L, Dewulf N, Ho C and Lauwereins R. (1996). On the Design and Implementation of Broadcast and Global Combine Operations Using the Postal Model. IEEE Transactions on Parallel and Distributed Systems. 7:3. (256-265). Online publication date: 1-Mar-1996.

https://doi.org/10.1109/71.491579

LI Z, MILLS P and REIF J. (1996). MODELS AND RESOURCE METRICS FOR PARALLEL AND DISTRIBUTED COMPUTATION∗. Parallel Algorithms and Applications. 10.1080/10637199608915543. 8:1. (35-59). Online publication date: 1-Jan-1996.

http://www.tandfonline.com/doi/abs/10.1080/10637199608915543

Kutten S. (1996). Scalable fault tolerance. SOFSEM'96: Theory and Practice of Informatics. 10.1007/BFb0037411. (286-306).

http://link.springer.com/10.1007/BFb0037411

Löwe W, Eisenbiegler J and Zimmermann W. (1996). Optimization of parallel programs on machines with expensive communication. Euro-Par'96 Parallel Processing. 10.1007/BFb0024754. (602-610).

http://link.springer.com/10.1007/BFb0024754

Zimmermann W, Löwe W and Gottlieb J. (1996). On Design and Implementation of Parallel Algorithms for Solving Inverse Problems. Parameter Identification and Inverse Problems in Hydrology, Geology and Ecology. 10.1007/978-94-009-1704-0_20. (283-297).

http://www.springerlink.com/index/10.1007/978-94-009-1704-0_20

Bernaschi M, Papetti F and Iannello G. (1996). Efficient collective communication operations for parallel industrial codes. High-Performance Computing and Networking. 10.1007/3-540-61142-8_620. (729-735).

http://link.springer.com/10.1007/3-540-61142-8_620

Kwon O and Chwa K. (2006). Multiple message broadcasting in communication networks. Networks. 10.1002/net.3230260409. 26:4. (253-261). Online publication date: 1-Dec-1995.

https://onlinelibrary.wiley.com/doi/10.1002/net.3230260409

Bar-Noy A, Bruck J, Ho C, Kipnis S and Schieber B. (1995). Computing Global Combine Operations in the Multiport Postal Model. IEEE Transactions on Parallel and Distributed Systems. 6:8. (896-900). Online publication date: 1-Aug-1995.

https://doi.org/10.1109/71.406965

Alexandrov A, Ionescu M, Schauser K and Scheiman C. LogGP. Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures. (95-105).

https://doi.org/10.1145/215399.215427

Löwe W and Zimmermann W. Upper time bounds for executing PRAM-programs on the LogP-machine. Proceedings of the 9th international conference on Supercomputing. (41-50).

https://doi.org/10.1145/224538.224543

Snyder L. (1995). Experimental validation of models of parallel computation. Computer Science Today. 10.1007/BFb0015238. (78-100).

http://link.springer.com/10.1007/BFb0015238

Jain S. (1995). Branch and bound on the network model. Foundations of Software Technology and Theoretical Computer Science. 10.1007/3-540-60692-0_37. (11-21).

http://link.springer.com/10.1007/3-540-60692-0_37

Goodrich M. (1993). Parallel algorithms column 1. ACM SIGACT News. 24:4. (16-21). Online publication date: 1-Dec-1993.

https://doi.org/10.1145/164996.165002

Cordy J and Graham T. (1987). Design of an interpretive environment for Turing. ACM SIGPLAN Notices. 22:7. (199-204). Online publication date: 1-Jul-1987.

https://doi.org/10.1145/960114.29671

Davidson J and Gresh J. (1987). Cint: a RISC interpreter for the C programming language. ACM SIGPLAN Notices. 22:7. (189-198). Online publication date: 1-Jul-1987.

https://doi.org/10.1145/960114.29670

Offutt A and King K. (1987). A Fortran 77 interpreter for mutation analysis. ACM SIGPLAN Notices. 22:7. (177-188). Online publication date: 1-Jul-1987.

https://doi.org/10.1145/960114.29669

Johnson G. (1987). GL—a denotational testbed with continuations and partial continuations as first-class objects. ACM SIGPLAN Notices. 22:7. (165-176). Online publication date: 1-Jul-1987.

https://doi.org/10.1145/960114.29668

Koskimies K and Paakki J. (1987). TOOLS: a unifying approach to object-oriented language interpretation. ACM SIGPLAN Notices. 22:7. (153-164). Online publication date: 1-Jul-1987.

https://doi.org/10.1145/960114.29667