This paper describes a measurement-based performability model based on error and resource usage data collected on a multiprocessor system. A method for identifying the model structure is introduced and the resulting model is validated against real data. Model development from the collection of raw data to the estimation of the expected reward is described. Both normal and error behavior of the system are characterized. The measured data show that the holding times in key operational and error states are not simple exponentials and that a semi-Markov process is necessary to model the system behavior. A reward function, based on the service rate and the error rate in each state, is then defined in order to estimate the performability of the system and to depict the cost of different types of errors.
ASJC Scopus subject areas
- Theoretical Computer Science
- Hardware and Architecture
- Computational Theory and Mathematics