Marc Snir

If you made any changes in Pure these will be visible here soon.

Research Output

2019

Aluminum: An Asynchronous, GPU-Aware Communication Library Optimized for Large-Scale Training of Deep Neural Networks on HPC Systems

Dryden, N., Maruyama, N., Moon, T., Benson, T., Yoo, A., Snir, M. & Van Essen, B., Feb 8 2019, Proceedings of MLHPC 2018: Machine Learning in HPC Environments, Held in conjunction with SC 2018: The International Conference for High Performance Computing, Networking, Storage and Analysis. Institute of Electrical and Electronics Engineers Inc., p. 1-13 13 p. 8638639. (Proceedings of MLHPC 2018: Machine Learning in HPC Environments, Held in conjunction with SC 2018: The International Conference for High Performance Computing, Networking, Storage and Analysis).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Automatic generation of benchmarks for I/O-intensive parallel applications

Hao, M., Zhang, W., Zhang, Y., Snir, M. & Yang, L. T., Feb 2019, In : Journal of Parallel and Distributed Computing. 124, p. 1-13 13 p.

Research output: Contribution to journalArticle

Channel and filter parallelism for large-scale CNN training

Dryden, N., Maruyama, N., Moon, T., Benson, T., Snir, M. & Van Essen, B., Nov 17 2019, Proceedings of SC 2019: The International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society, a10. (International Conference for High Performance Computing, Networking, Storage and Analysis, SC).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Characterizing and Understanding HPC Job Failures over the 2K-Day Life of IBM BlueGene/Q System

Di, S., Guo, H., Pershey, E., Snir, M. & Cappello, F., Jun 2019, Proceedings - 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2019. Institute of Electrical and Electronics Engineers Inc., p. 473-484 12 p. 8809553. (Proceedings - 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2019).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Exploring Properties and Correlations of Fatal Events in a Large-Scale HPC System

Di, S., Guo, H., Gupta, R., Pershey, E. R., Snir, M. & Cappello, F., Feb 1 2019, In : IEEE Transactions on Parallel and Distributed Systems. 30, 2, p. 361-374 14 p., 8436427.

Research output: Contribution to journalArticle

Exploring the feasibility of lossy compression for PDE simulations

Calhoun, J., Cappello, F., Olson, L. N., Snir, M. & Gropp, W. D., Mar 1 2019, In : International Journal of High Performance Computing Applications. 33, 2, p. 397-410 14 p.

Research output: Contribution to journalArticle

Gluon-async: A bulk-asynchronous system for distributed and heterogeneous graph analytics

Dathathri, R., Gill, G., Hoang, L., Jatala, V., Pingali, K., Nandivada, V. K., Dang, H. V. & Snir, M., Sep 2019, Proceedings - 2019 28th International Conference on Parallel Architectures and Compilation Techniques, PACT 2019. Institute of Electrical and Electronics Engineers Inc., p. 15-28 14 p. 8891625. (Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT; vol. 2019-September).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Guest Editorial: Special Issue on Network and Parallel Computing for Emerging Architectures and Applications

Zhang, F., Zhai, J., Snir, M., Jin, H., Kasahara, H. & Valero, M., Jun 15 2019, In : International Journal of Parallel Programming. 47, 3, p. 343-344 2 p.

Research output: Contribution to journalEditorial

Open Access

Improving strong-scaling of CNN training by exploiting finer-grained parallelism

Dryden, N., Maruyama, N., Benson, T., Moon, T., Snir, M. & Van Essen, B., May 2019, Proceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium, IPDPS 2019. Institute of Electrical and Electronics Engineers Inc., p. 210-220 11 p. 8820780. (Proceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium, IPDPS 2019).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Optimizing I/O performance of HPC applications with autotuning

Behzad, B., Byna, S., Prabhat & Snir, M., Mar 2019, In : ACM Transactions on Parallel Computing. 5, 4, 15.

Research output: Contribution to journalArticle

2018

A lightweight communication runtime for distributed graph analytics

Dang, H. V., Dathathri, R., Gill, G., Brooks, A., Dryden, N., Lenharth, A., Hoang, L., Pingali, K. & Snir, M., Aug 3 2018, Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018. Institute of Electrical and Electronics Engineers Inc., p. 980-989 10 p. 8425251. (Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Argobots: A Lightweight Low-Level Threading and Tasking Framework

Seo, S., Amer, A., Balaji, P., Bordage, C., Bosilca, G., Brooks, A., Carns, P., Castello, A., Genet, D., Herault, T., Iwasaki, S., Jindal, P., Kale, L. V., Krishnamoorthy, S., Lifflander, J., Lu, H., Meneses, E., Snir, M., Sun, Y., Taura, K. & 1 others, Beckman, P., Mar 1 2018, In : IEEE Transactions on Parallel and Distributed Systems. 29, 3, p. 512-526 15 p., 8082139.

Research output: Contribution to journalArticle

FULT: Fast user-level thread scheduling using bit-vectors

Dang, H. V. & Snir, M., Aug 13 2018, Proceedings of the 47th International Conference on Parallel Processing, ICPP 2018. Association for Computing Machinery, a71. (ACM International Conference Proceeding Series).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Gluon: A communication-optimizing substrate for distributed heterogeneous graph analytics

Dathathri, R., Gill, G., Hoang, L., Dang, H. V., Brooks, A., Dryden, N., Snir, M. & Pingali, K., Jun 11 2018, PLDI 2018 - Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. Foster, J. S., Grossman, D. & Foster, J. S. (eds.). Association for Computing Machinery, p. 752-768 17 p. (Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Gluon: A communication-optimizing substrate for distributed heterogeneous graph analytics

Dathathri, R., Gill, G., Hoang, L., Dang, H. V., Brooks, A., Dryden, N., Snir, M. & Pingali, K., Jun 11 2018, In : ACM SIGPLAN Notices. 53, 4, p. 752-768 17 p.

Research output: Contribution to journalArticle

Open Access

Neural Network Based Silent Error Detector

Wang, C., Dryden, N., Cappello, F. & Snir, M., Oct 29 2018, Proceedings - 2018 IEEE International Conference on Cluster Computing, CLUSTER 2018. Institute of Electrical and Electronics Engineers Inc., p. 168-178 11 p. 8514878. (Proceedings - IEEE International Conference on Cluster Computing, ICCC; vol. 2018-September).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Preface

Zhang, F., Zhai, J., Snir, M., Jin, H., Kasahara, H. & Valero, M., Jan 1 2018, In : Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 11276 LNCS, p. V

Research output: Contribution to journalEditorial

Technical perspective: The future of MPI

Snir, M., Oct 2018, In : Communications of the ACM. 61, 10, 1 p.

Research output: Contribution to journalComment/debate

2017

Eliminating contention bottlenecks in multithreaded MPI

Dang, H. V., Snir, M. & Gropp, W., Nov 2017, In : Parallel Computing. 69, p. 1-23 23 p.

Research output: Contribution to journalArticle

LOGAIDER: A Tool for Mining Potential Correlations of HPC Log Events

Di, S., Gupta, R., Snir, M., Pershey, E. & Cappello, F., Jul 10 2017, Proceedings - 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2017. Institute of Electrical and Electronics Engineers Inc., p. 442-451 10 p. 7973730. (Proceedings - 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2017).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Predicting HPC parallel program performance based on LLVM compiler

Zhang, W., Hao, M. & Snir, M., Jun 1 2017, In : Cluster Computing. 20, 2, p. 1179-1192 14 p.

Research output: Contribution to journalArticle

The informal guide to ACM fellow nominations: Recommendations for a successful nomination process

Snir, M., Jul 2017, In : Communications of the ACM. 60, 7, p. 32-34 3 p.

Research output: Contribution to journalReview article

Towards a more complete understanding of SDC propagation

Calhoun, J., Snir, M., Olson, L. N. & Gropp, W. D., Jun 26 2017, HPDC 2017 - Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing. Association for Computing Machinery, Inc, p. 131-142 12 p. (HPDC 2017 - Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2016

Damaris: Addressing performance variability in data management for post-petascale simulations

Dorier, M., Antoniu, G., Cappello, F., Snir, M., Sisneros, R., Yildiz, O., Ibrahim, S., Peterka, T. & Orf, L., Dec 2016, In : ACM Transactions on Parallel Computing. 3, 3

Research output: Contribution to journalArticle

Overcoming the power wall by exploiting inexactness and emerging COTS architectural features: Trading precision for improving application quality

Fagan, M., Schlachter, J., Yoshii, K., Leyffer, S., Palem, K., Snir, M., Wild, S. M. & Enz, C., Jul 2 2016, Proceedings - 29th IEEE International System on Chip Conference, SOCC 2016. Bhatia, K., Alioto, M., Zhao, D., Marshall, A. & Sridhar, R. (eds.). IEEE Computer Society, p. 241-246 6 p. 7905477. (International System on Chip Conference; vol. 0).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Reducing Waste in Extreme Scale Systems through Introspective Analysis

Bautista-Gomez, L., Gainaru, A., Perarnau, S., Tiwari, D., Gupta, S., Engelmann, C., Cappello, F. & Snir, M., Jul 18 2016, Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016. Institute of Electrical and Electronics Engineers Inc., p. 212-221 10 p. 7516017. (Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Towards millions of communicating threads

Dang, H. V., Snir, M. & Gropp, W., Sep 25 2016, Proceedings of the 23rd European MPI Users' Group Meeting, EuroMPI 2016. Association for Computing Machinery, p. 1-14 14 p. (ACM International Conference Proceeding Series; vol. 25-28-September-2016).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Universal parallel computing research center at Illinois: Making parallel programming synonymous with programming

Snir, M., May 24 2016, 2009 IEEE Hot Chips 21 Symposium, HCS 2009. Institute of Electrical and Electronics Engineers Inc., 7478357

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2015

A general space-filling curve algorithm for partitioning 2D meshes

Sasidharan, A., Dennis, J. M. & Snir, M., Nov 23 2015, Proceedings - 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security and 2015 IEEE 12th International Conference on Embedded Software and Systems, HPCC-CSS-ICESS 2015. Institute of Electrical and Electronics Engineers Inc., p. 875-879 5 p. 7336274. (Proceedings - 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security and 2015 IEEE 12th International Conference on Embedded Software and Systems, HPCC-CSS-ICESS 2015).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Design of a Multithreaded Barnes-Hut Algorithm for Multicore Clusters

Zhang, J., Behzad, B. & Snir, M., Jul 1 2015, In : IEEE Transactions on Parallel and Distributed Systems. 26, 7, p. 1861-1873 13 p., 6837521.

Research output: Contribution to journalArticle

Distributed monitoring and management of exascale systems in the Argo project

Perarnau, S., Thakur, R., Iskra, K., Raffenetti, K., Cappello, F., Gupta, R., Beckman, P., Snir, M., Hoffmann, H., Schulz, M. & Rountree, B., 2015, Distributed Applications and Interoperable Systems - 15th IFIP WG 6.1 International Conference, DAIS 2015 Held as Part of the 10th International Federated Conference on Distributed Computing Techniques, DisCoTec 2015, Proceedings. Bessani, A. & Bouchenak, S. (eds.). Springer-Verlag Berlin Heidelberg, p. 173-178 6 p. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); vol. 9038).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Dynamic model-driven parallel I/O performance tuning

Behzad, B., Byna, S., Wild, S. M., Prabhat & Snir, M., Oct 26 2015, Proceedings - 2015 IEEE International Conference on Cluster Computing, CLUSTER 2015. Institute of Electrical and Electronics Engineers Inc., p. 184-193 10 p. 7307584. (Proceedings - IEEE International Conference on Cluster Computing, ICCC; vol. 2015-October).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Pattern-driven parallel I/O tuning

Behzad, B., Byna, S., Prabhat & Snir, M., Nov 15 2015, Proceedings of PDSW 2015: 10th Parallel Data Storage Workshop - Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis. Association for Computing Machinery, Inc, p. 43-48 6 p. (Proceedings of PDSW 2015: 10th Parallel Data Storage Workshop - Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

PPL: An abstract runtime system for hybrid parallel programming

Brooks, A., Dang, H. V., Dryden, N. & Snir, M., Nov 15 2015, Proceedings of ESPM2 2015: 1st International Workshop on Extreme Scale Programming Models and Middleware - Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis. Association for Computing Machinery, Inc, p. 2-9 8 p. (Proceedings of ESPM2 2015: 1st International Workshop on Extreme Scale Programming Models and Middleware - Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Scheduling the I/O of HPC Applications under Congestion

Gainaru, A., Aupy, G., Benoit, A., Cappello, F., Robert, Y. & Snir, M., Jul 17 2015, Proceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium, IPDPS 2015. Institute of Electrical and Electronics Engineers Inc., p. 1013-1022 10 p. 7161586. (Proceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium, IPDPS 2015).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Towards a more fault resilient multigrid solver

Calhoun, J., Olson, L., Snir, M. & Gropp, W. D., Jan 1 2015, In : Simulation Series. 47, 4, p. 1-8 8 p.

Research output: Contribution to journalConference article

Understanding the propagation of error due to a silent data corruption in a sparse matrix vector multiply

Calhoun, J., Snir, M., Olson, L. & Garzaran, M. J., Oct 26 2015, Proceedings - 2015 IEEE International Conference on Cluster Computing, CLUSTER 2015. Institute of Electrical and Electronics Engineers Inc., p. 541-542 2 p. 7307650. (Proceedings - IEEE International Conference on Cluster Computing, ICCC; vol. 2015-October).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2014

Addressing failures in exascale computing

Snir, M., Wisniewski, R. W., Abraham, J. A., Adve, S. V., Bagchi, S., Balaji, P., Belak, J., Bose, P., Cappello, F., Carlson, B., Chien, A. A., Coteus, P., Debardeleben, N. A., Diniz, P. C., Engelmann, C., Erez, M., Fazzari, S., Geist, A., Gupta, R., Johnson, F. & 8 others, Krishnamoorthy, S., Leyffer, S., Liberty, D., Mitra, S., Munson, T., Schreiber, R., Stearley, J. & Hensbergen, E. V., May 2014, In : International Journal of High Performance Computing Applications. 28, 2, p. 129-173 45 p.

Research output: Contribution to journalArticle

Automatic generation of I/O kernels for HPC applications

Behzad, B., Dang, H. V., Hariri, F., Zhang, W. & Snir, M., Jan 20 2014, Proceedings of PDSW 2014: 9th Parallel Data Storage Workshop - Held in Conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis. Institute of Electrical and Electronics Engineers Inc., p. 31-36 6 p. 7016280. (Proceedings of PDSW 2014: 9th Parallel Data Storage Workshop - Held in Conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Enabling communication concurrency through flexible MPI endpoints

Dinan, J., Grant, R. E., Balaji, P., Goodell, D., Miller, D., Snir, M. & Thakur, R., Nov 20 2014, In : International Journal of High Performance Computing Applications. 28, 4, p. 390-405 16 p.

Research output: Contribution to journalArticle

Improved MPI collectives for MPI processes in shared address spaces

Li, S., Hoefler, T., Hu, C. & Snir, M., Nov 15 2014, In : Cluster Computing. 17, 4, p. 1139-1155 17 p.

Research output: Contribution to journalArticle

Improving parallel I/O autotuning with performance modeling

Behzad, B., Byna, S., Wild, S. M., Prabhat & Snir, M., Jan 1 2014, HPDC 2014 - Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing. Association for Computing Machinery, p. 253-256 4 p. (HPDC 2014 - Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

The future of supercomputing

Snir, M., Jan 1 2014, ICS 2014 - Proceedings of the 28th ACM International Conference on Supercomputing. Association for Computing Machinery, p. 261-262 2 p. (Proceedings of the International Conference on Supercomputing).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Toward exascale resilience: 2014 update

Cappello, F., Geist, A., Gropp, W., Kale, S., Kramer, B. & Snir, M., Jan 1 2014, In : Supercomputing Frontiers and Innovations. 1, 1, p. 4-27 24 p.

Research output: Contribution to journalArticle

2013

Enabling MPI interoperability through flexible communication endpoints

Dinan, J., Balaji, P., Goodell, D., Miller, D., Snir, M. & Thakur, R., Jan 1 2013, Proceedings of the 20th European MPI Users' Group Meeting, EuroMPI 2013. Association for Computing Machinery, p. 13-18 6 p. (ACM International Conference Proceeding Series).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Failure prediction for HPC systems and applications: Current situation and open issues

Gainaru, A., Cappello, F., Snir, M. & Kramer, W., Aug 1 2013, In : International Journal of High Performance Computing Applications. 27, 3, p. 273-282 10 p.

Research output: Contribution to journalArticle

NUMA-aware shared-memory collective communication for MPI

Li, S., Hoefler, T. & Snir, M., Jul 17 2013, p. 85-96. 12 p.

Research output: Contribution to conferencePaper

Programming for exascale computers

Gropp, W. & Snir, M., Nov 2013, In : Computing in Science and Engineering. 15, 6, p. 27-35 9 p., 6636318.

Research output: Contribution to journalArticle