Masters

Work details

Title: Wear-out Analysis of Error Correction Techniques in Phase-Change Memories

Supervisors: Guido Araújo and Rodolfo Azevedo.

Place of development: Computer Systems Laboratory (LSC) - University of Campinas (UNICAMP) - Brazil

You can find a PDF copy of my dissertation (in Portuguese) here, in the digital library of the UNICAMP. You may find interesting to look this Technical Report that contains much of the work done in my masters.

Finally, you may want to read this published paper in the Design Automation & Test in Europe (DATE) 2014.


About the software used:
  • SPEC CPU2006 v1.2 and v1.0.
  • A cache simulator developed using PIN Tool.
  • PCM lifetime simulator written in Python.
  • Analytical probabilistic model calculator written in Python.

About the infrastructure:

  • A cluster integrating a dozen of computers with processors Intel Xeon, AMD Opteron, and Intel Quad Core. The total number of cores exceeds 100 cores.
  • Condor software for queue and schedule management.
  • Ubuntu 10.04.2 LTS.
The cache and PCM lifetime simulators were initially developed by my co-supervisor. And I extended them, adding some useful features for my work.

SPEC execution

One of the challenges of this work was to perform the evaluation of bit-flip probability using the cache simulator, since instrumenting the SPEC2006 proved to be complicated. The SPEC2006 is able to set up a PIN Tool to be performed during its execution. Also, to parallelize the execution of its benchmarks. In spite of that, the execution of a benchmark do not parallelize its inputs, namely, each benchmark input will be performed one after other. Which in my case would have taken weeks to be finished.

In order to speed up each benchmark execution I created two bash scripts. Given some local folder in your user area, you execute this script illuminati.sh to copy the essential files that allow to perform locally the execution of the SPEC2006. The files that are needed to be copied are described in the illuminati.csv.

After that, given a PIN Tool, like this one that is an example available in PIN Tools kit, the script run_pcm_spec.sh (for SPEC2006 v.1.0, run_pcm_spec-v1.2.sh for SPEC2006 v.1.2) have to be executed to create a condor submit file for every benchmark input. Each of this files will enable the execution of a benchmark input as one process in Condor. A file that submit all those condor files is create as well.

Some details: this example of PIN Tool was modified to write the output file in a specified folder which is a necessary parameter. The whole SPEC2006 have already been compiled. The run_pcm_spec scripts will accuse errors for one or two benchmarks, which will happen because those benchmarks have a different compiled binary name from the benchmark's name. To solve the issue you will have to access the build (or run) folder created after compiling the SPEC and make a new copy of the binary, naming it as the benchmark's name. If you are using the version 1.0. the patches available will have to be applied.

Analytical probabilistic models

First, I would like to talk about the motivation behind in modeling the Error Correction Techniques (ECTs). Well, the cluster used to simulate each ECT in this work has more than 100 cores. Each core of the cluster performed ten simulation. Namely, each ECT had more than 1000 simulations. However, the same result (actually with less than 0.5% of error) was achieved running the analytical probabilistic model calculator in a single notebook core, and it took few minutes. =)

The analytical probabilistic model calculator is available here.

The file contains rough codes -- namely, non-commented and non-much-organized codes -- that I used to compute the analytical models of ECP and SECDED. Besides, it contains my attempts to compute the models of SAFER, DRM and FREE-p as well.

The propagefailure.py is a program to calculate the probability of a given configuration of the generalized birthday problem (Mckinney, E. H. The American Mathematical Monthly, No 4, Vol 73, Apr. 1966). It is used to pre-compute the page failure probabilities of one of the ECP's model.

The pcmStatSim.py receives a configuration model, like those that are in data folder, and starts the execution of the selected model located in mod folder. The output is given in a profile. Some non-defined outputs maybe appear and could make no sense, they were a kind of debugging I did.

I have to organize everything I used (I really would like to do that), but in the near future. At the end of the year, it is likely that everything posted here will be commented and more clear -- do not believe in that, I said that a year ago and nothing happened until now. =P