Filter
Conference contribution

Search results

  • 2015

    Detecting and correcting data corruption in stencil applications through multivariate interpolation

    Gomez, L. A. B. & Cappello, F., Oct 26 2015, Proceedings - 2015 IEEE International Conference on Cluster Computing, CLUSTER 2015. Institute of Electrical and Electronics Engineers Inc., p. 595-602 8 p. 7307657. (Proceedings - IEEE International Conference on Cluster Computing, ICCC; vol. 2015-October).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • Detecting silent data corruption for extreme-scale MPI applications

    Bautista-Gomez, L. & Cappello, F., Sep 21 2015, Proceedings of the 22nd European MPI Users' Group Meeting, EuroMPI 2015. Association for Computing Machinery, a12. (ACM International Conference Proceeding Series; vol. 21-23-September-2015).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Open Access
  • Distributed monitoring and management of exascale systems in the Argo project

    Perarnau, S., Thakur, R., Iskra, K., Raffenetti, K., Cappello, F., Gupta, R., Beckman, P., Snir, M., Hoffmann, H., Schulz, M. & Rountree, B., 2015, Distributed Applications and Interoperable Systems - 15th IFIP WG 6.1 International Conference, DAIS 2015 Held as Part of the 10th International Federated Conference on Distributed Computing Techniques, DisCoTec 2015, Proceedings. Bessani, A. & Bouchenak, S. (eds.). Springer, p. 173-178 6 p. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); vol. 9038).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • Exploiting spatial smoothness in HPC applications to detect silent data corruption

    Bautista-Gomez, L. & Cappello, F., Nov 23 2015, Proceedings - 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security and 2015 IEEE 12th International Conference on Embedded Software and Systems, HPCC-CSS-ICESS 2015. Institute of Electrical and Electronics Engineers Inc., p. 128-133 6 p. 7336154. (Proceedings - 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security and 2015 IEEE 12th International Conference on Embedded Software and Systems, HPCC-CSS-ICESS 2015).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • Fault-tolerant protocol for hybrid task-parallel message-passing applications

    Martsinkevich, T., Subasi, O., Unsal, O., Labarta, J. & Cappello, F., Oct 26 2015, Proceedings - 2015 IEEE International Conference on Cluster Computing, CLUSTER 2015. Institute of Electrical and Electronics Engineers Inc., p. 563-570 8 p. 7307653. (Proceedings - IEEE International Conference on Cluster Computing, ICCC; vol. 2015-October).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • Lightweight silent data corruption detection based on runtime data analysis for hpc applications

    Berrocal, E., Bautista-Gomez, L., Di, S., Lan, Z. & Cappello, F., Jun 15 2015, HPDC 2015 - Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing. Association for Computing Machinery, p. 275-278 4 p. (HPDC 2015 - Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Open Access
  • Scheduling the I/O of HPC Applications under Congestion

    Gainaru, A., Aupy, G., Benoit, A., Cappello, F., Robert, Y. & Snir, M., Jul 17 2015, Proceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium, IPDPS 2015. Institute of Electrical and Electronics Engineers Inc., p. 1013-1022 10 p. 7161586. (Proceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium, IPDPS 2015).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Open Access
  • 2014

    Detecting silent data corruption through data dynamic monitoring for scientific applications

    Bautista Gomez, L. & Cappello, F., 2014, PPoPP 2014 - Proceedings of the 2014 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. p. 381-382 2 p. (Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • GPGPUs: How to combine high computational power with high reliability

    Gomez, L. B., Cappello, F., Carro, L., Debardeleben, N., Fang, B., Gurumurthi, S., Pattabiraman, K., Rech, P. & Reorda, M. S., 2014, Proceedings - Design, Automation and Test in Europe, DATE 2014. Institute of Electrical and Electronics Engineers Inc., 6800555. (Proceedings -Design, Automation and Test in Europe, DATE).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • HPDC 2014 chairs' message

    Plale, B., Cappello, F., Ripeanu, M. & Xu, D., 2014, HPDC 2014 - Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing. Association for Computing Machinery, p. iii (HPDC 2014 - Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • Optimization of multi-level checkpoint model for large scale HPC applications

    Di, S., Bouguerra, M. S., Bautista-Gomez, L. & Cappello, F., 2014, Proceedings - IEEE 28th International Parallel and Distributed Processing Symposium, IPDPS 2014. IEEE Computer Society, p. 1181-1190 10 p. 6877346. (Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • POSTER: Energy-performance tradeoffs in multilevel checkpoint strategies

    Gomez, L. A. B., Balaprakash, P., Bouguerra, M. S., Wild, S. M., Cappello, F. & Hovland, P. D., Nov 26 2014, 2014 IEEE International Conference on Cluster Computing, CLUSTER 2014. Institute of Electrical and Electronics Engineers Inc., p. 278-279 2 p. 6968749. (2014 IEEE International Conference on Cluster Computing, CLUSTER 2014).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • 2013

    AI-Ckpt: Leveraging memory access patterns for adaptive asynchronous incremental checkpointing

    Nicolae, B. & Cappello, F., 2013, HPDC 2013 - Proceedings of the 22nd ACM International Symposium on High-Performance Parallel and Distributed Computing. Association for Computing Machinery, p. 155-166 12 p. (HPDC 2013 - Proceedings of the 22nd ACM International Symposium on High-Performance Parallel and Distributed Computing).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Open Access
  • Characterizing cloud applications on a Google data center

    Di, S., Kondo, D. & Cappello, F., 2013, Proceedings: International Conference on Parallel Processing - The 42nd Annual Conference, ICPP 2013. Institute of Electrical and Electronics Engineers Inc., p. 468-473 6 p. 6687380. (Proceedings of the International Conference on Parallel Processing).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • Improving floating point compression through binary masks

    Gomez, L. A. B. & Cappello, F., 2013, Proceedings - 2013 IEEE International Conference on Big Data, Big Data 2013. IEEE Computer Society, p. 326-331 6 p. 6691591. (Proceedings - 2013 IEEE International Conference on Big Data, Big Data 2013).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • Multi-criteria checkpointing strategies: Response-time versus resource utilization

    Bouteiller, A., Cappello, F., Dongarra, J., Guermouche, A., Hérault, T. & Robert, Y., 2013, Euro-Par 2013 Parallel Processing - 19th International Conference, Proceedings. p. 420-431 12 p. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); vol. 8097 LNCS).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Open Access
  • Optimization of cloud task processing with checkpoint-restart mechanism

    Di, S., Robert, Y., Vivien, F., Kondo, D., Wang, C. L. & Cappello, F., 2013, Proceedings of SC 2013: The International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society, 64. (International Conference for High Performance Computing, Networking, Storage and Analysis, SC).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • SPBC: Leveraging the characteristics of MPI HPC applications for scalable checkpointing

    Ropars, T., Martsinkevich, T. V., Guermouche, A., Schiper, A. & Cappello, F., 2013, Proceedings of SC 2013: The International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society, 8. (International Conference for High Performance Computing, Networking, Storage and Analysis, SC).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • Towards an energy estimator for fault tolerance protocols

    Diouri, M. E. M., Glück, O., Lefèvre, L. & Cappello, F., 2013, PPoPP 2013 - Proceedings of the 2013 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. p. 313-314 2 p. (Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • 2012

    A hybrid local storage transfer scheme for live migration of I/O intensive workloads

    Nicolae, B. & Cappello, F., 2012, HPDC '12 - Proceedings of the 21st ACM Symposium on High-Performance Parallel and Distributed Computing. p. 85-96 12 p. (HPDC '12 - Proceedings of the 21st ACM Symposium on High-Performance Parallel and Distributed Computing).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • Damaris: How to efficiently leverage multicore parallelism to achieve scalable, jitter-free I/O

    Dorier, M., Antoniu, G., Cappello, F., Snir, M. & Orf, L., 2012, Proceedings - 2012 IEEE International Conference on Cluster Computing, CLUSTER 2012. IEEE Computer Society, p. 155-163 9 p. 6337776. (Proceedings - 2012 IEEE International Conference on Cluster Computing, CLUSTER 2012).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Open Access
  • Energy considerations in checkpointing and fault tolerance protocols

    Diouri, M. E. M., Glück, O., Lefevre, L. & Cappello, F., 2012, 2012 IEEE/IFIP 42nd International Conference on Dependable Systems and Networks Workshops, DSN-W 2012. 6264670. (Proceedings of the International Conference on Dependable Systems and Networks).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • Fault prediction under the microscope: A closer look into HPC systems

    Gainaru, A., Cappello, F., Snir, M. & Kramer, W., 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2012. 6468487. (International Conference for High Performance Computing, Networking, Storage and Analysis, SC).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • Hierarchical clustering strategies for fault tolerance in large scale HPC systems

    Bautista-Gomez, L., Ropars, T., Maruyama, N., Cappello, F. & Matsuoka, S., 2012, Proceedings - 2012 IEEE International Conference on Cluster Computing, CLUSTER 2012. IEEE Computer Society, p. 355-363 9 p. 6337798. (Proceedings - 2012 IEEE International Conference on Cluster Computing, CLUSTER 2012).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • HydEE: Failure containment without event logging for large scale send-deterministic MPI applications

    Guermouche, A., Ropars, T., Snir, M. & Cappello, F., 2012, Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012. p. 1216-1227 12 p. 6267924. (Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Open Access
  • Scalable Reed-Solomon-based reliable local storage for HPC applications on IaaS clouds

    Gomez, L. B., Nicolae, B., Maruyama, N., Cappello, F. & Matsuoka, S., 2012, Parallel Processing - 18th International Conference, Euro-Par 2012, Proceedings. p. 313-324 12 p. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); vol. 7484 LNCS).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Open Access
  • Taming of the shrew: Modeling the normal and faulty behaviour of large-scale HPC systems

    Gainaru, A., Cappello, F. & Kramer, W., 2012, Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012. p. 1168-1179 12 p. 6267920. (Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • 2011

    Adaptive event prediction strategy with dynamic time window for large-scale HPC systems

    Gainaru, A., Cappello, F., Fullop, J., Trausan-Matu, S. & Kramer, W., 2011, Managing Large-Scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques, SLAML'11. 4. (Managing Large-Scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques, SLAML'11).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Open Access
  • BlobCR: Efficient checkpoint-restart for HPC applications on iaas clouds using virtual disk image snapshots

    Nicolae, B. & Cappello, F., 2011, Proceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis. 34. (Proceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • Comparing archival policies for Blue Waters

    Cappello, F., Jacquelin, M., Marchal, L., Robert, Y. & Snir, M., 2011, 18th International Conference on High Performance Computing, HiPC 2011. IEEE Computer Society, 6152428. (18th International Conference on High Performance Computing, HiPC 2011).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Open Access
  • Event log mining tool for large scale HPC systems

    Gainaru, A., Cappello, F., Trausan-Matu, S. & Kramer, B., 2011, Euro-Par 2011 Parallel Processing - 17th International Conference, Proceedings. PART 1 ed. p. 52-64 13 p. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); vol. 6852 LNCS, no. PART 1).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • FTI: High performance fault tolerance interface for hybrid systems

    Bautista-Gomez, L., Komatitsch, D., Maruyama, N., Tsuboi, S., Cappello, F. & Matsuoka, S., 2011, Proceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis. 32. (Proceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • Modeling and tolerating heterogeneous failures in large parallel systems

    Heien, E., Kondo, D., Gainaru, A., Lapine, D., Kramer, B. & Cappello, F., 2011, Proceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis. 45. (Proceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • On the use of cluster-based partial message logging to improve fault tolerance for MPI HPC applications

    Ropars, T., Guermouche, A., Uçar, B., Meneses, E., Kalé, L. V. & Cappello, F., 2011, Euro-Par 2011 Parallel Processing - 17th International Conference, Proceedings. PART 1 ed. Springer, p. 567-578 12 p. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); vol. 6852 LNCS, no. PART 1).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Open Access
  • Optimizing multi-deployment on clouds by means of self-adaptive prefetching

    Nicolae, B., Cappello, F. & Antoniu, G., 2011, Euro-Par 2011 Parallel Processing - 17th International Conference, Proceedings. PART 1 ed. p. 503-513 11 p. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); vol. 6852 LNCS, no. PART 1).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Open Access
  • Uncoordinated checkpointing without domino effect for send-deterministic MPI applications

    Guermouche, A., Ropars, T., Brunet, E., Snir, M. & Cappello, F., 2011, Proceedings - 25th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2011. IEEE Computer Society, p. 989-1000 12 p. 6012907. (Proceedings - 25th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2011).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Open Access
  • 2010

    Checkpointing vs. migration for post-petascale supercomputers

    Cappello, F., Casanovay, H. & Robertz, Y., 2010, Proceedings - 39th International Conference on Parallel Processing, ICPP 2010. p. 168-177 10 p. 5599161. (Proceedings of the International Conference on Parallel Processing).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • Distributed diskless checkpoint for large scale systems

    Gomez, L. B., Maruyama, N., Cappello, F. & Matsuoka, S., 2010, CCGrid 2010 - 10th IEEE/ACM International Conference on Cluster, Cloud, and Grid Computing. p. 63-72 10 p. 5493491. (CCGrid 2010 - 10th IEEE/ACM International Conference on Cluster, Cloud, and Grid Computing).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • Low-overhead diskless checkpoint for hybrid computing systems

    Bautista Gomez, L., Nukada, A., Maruyama, N., Cappello, F. & Matsuoka, S., 2010, 17th International Conference on High Performance Computing, HiPC 2010. IEEE Computer Society, 5713163. (17th International Conference on High Performance Computing, HiPC 2010).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • Message from the program committee co-chairs and steering committee chair NCA 2010

    Cappello, F., Schwefel, H. P. & Avresky, D. R., 2010, 2010 Ninth IEEE International Symposium on Network Computing and Applications. IEEE Computer Society, p. ix 1 p. (Proceedings - 2010 9th IEEE International Symposium on Network Computing and Applications, NCA 2010).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • On communication determinism in parallel HPC applications

    Cappello, F., Guermouche, A. & Snir, M., 2010, 2010 Proceedings of 19th International Conference on Computer Communications and Networks, ICCCN 2010. Institute of Electrical and Electronics Engineers Inc., 5560143. (Proceedings - International Conference on Computer Communications and Networks, ICCCN).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Open Access
  • Planning large data transfers in institutional grids

    Bouabache, F., Herault, T., Peyronnet, S. & Cappello, F., 2010, CCGrid 2010 - 10th IEEE/ACM International Conference on Cluster, Cloud, and Grid Computing. p. 547-552 6 p. 5493438. (CCGrid 2010 - 10th IEEE/ACM International Conference on Cluster, Cloud, and Grid Computing).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • 2009

    An information brokering service provider (IBSP) for virtual clusters

    Podesta, R., Iniesta, V., Rezmerita, A. & Cappello, F., 2009, On the Move to Meaningful Internet Systems: OTM 2009 - Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009, Proceedings. PART 1 ed. p. 165-182 18 p. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); vol. 5870 LNCS, no. PART 1).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • BLAST application with data-aware desktop grid middleware

    He, H., Fedak, G., Tang, B. & Cappello, F., 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, CCGRID 2009. p. 284-291 8 p. 5071883. (2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, CCGRID 2009).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • Cost-benefit analysis of cloud codlputing versus desktop grids

    Kondo, D., Javadi, B., Malecot, P., Cappello, F. & Anderson, D. P., 2009, IPDPS 2009 - Proceedings of the 2009 IEEE International Parallel and Distributed Processing Symposium. 5160911. (IPDPS 2009 - Proceedings of the 2009 IEEE International Parallel and Distributed Processing Symposium).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • Emulation platform for high accuracy failure injection in grids

    Herault, T., Jan, M., Largillier, T., Peyronnet, S., Quetier, B. & Cappello, F., 2009, High Speed and Large Scale Scientific Computing. IOS Press BV, p. 127-140 14 p. (Advances in Parallel Computing; vol. 18).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • High accuracy failure injection in parallel and distributed systems using virtualization

    Hérault, T., Largillier, T., Peyronnet, S., Quétier, B., Cappello, F. & Jan, M., 2009, Proceedings of the 6th ACM Conference on Computing Frontiers, CF 2009. p. 193-196 4 p. (Proceedings of the 6th ACM Conference on Computing Frontiers, CF 2009).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • Message from the Program Co-Chairs

    Cappello, F. & Wang, C. L., 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, CCGRID 2009. IEEE Computer Society, p. xiv (2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, CCGRID 2009).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • MPI applications on grids: A topology aware approach

    Coti, C., Herault, T. & Cappello, F., 2009, Euro-Par 2009 Parallel Processing - 15th International Euro-Par Conference, Proceedings. p. 466-477 12 p. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); vol. 5704 LNCS).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • 2008

    A distributed and replicated service for checkpoint storage

    Bouabache, F., Herault, T., Fedak, G. & Cappello, F., 2008, Making Grids Work - Proceedings of the CoreGRID Workshop on Programming Models Grid and P2P System Architecture Grid Systems, Tools and Environments. Springer, p. 295-306 12 p. (Making Grids Work - Proceedings of the CoreGRID Workshop on Programming Models Grid and P2P System Architecture Grid Systems, Tools and Environments).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution