Project Stage 2: Benchmarks

     For stage 2 of the project I built my package on the systems I plan to test on and then ran a standardized test a number of times to get a benchmark for performance that I can use to measure the impact I have with my code changes. Since I am going to be testing on both x86_64 and Aarch64 I needed to mimic the build process I followed in stage 1 to have ssdeep on all required systems. First up was Aarchie, after using wget to retrieve the tarball from ssdeep's website I used tar -xvf to unpack it into my project directory. From there a simple ./configure followed by make -j4 (-j4 to allow 4 concurrent jobs to run) resulted in a complete build with the ssdeep executable ready to test. My smoke test was simply calling ./ssdeep on the tarball and I was returned:

ssdeep,1.1--blocksize:hash:hash,filename
12288:IvvRytd0lLS4iXBpfBof3msXd2B1mOfw68Tsd:IvZvlLS40n02st2/mOt8od,"/home/cmcmanus/proj/ssdeep-2.14.1.tar.gz"

     Thus my Aarchie build passed my smoke test and it was time to move on to the new x86_64 machine Yaggi since xerxes will limit my ability to leverage the perf profiling tool. Thus I ran through the same process as above and my smoke test results were:

ssdeep,1.1--blocksize:hash:hash,filename
12288:IvvRytd0lLS4iXBpfBof3msXd2B1mOfw68Tsd:IvZvlLS40n02st2/mOt8od,"/home/cmcmanus/proj/ssdeep-2.14.1.tar.gz"

     This confirmed that both my builds of ssdeep are working the same way and both have passed the smoke test meaning I could move on and start my benchmarks. The first step to this was to establish a standard set of data so that all my tests are operating on the same resources which will allow me to meaningfully compare my results. In order to get ssdeep to run long enough to notice fluctuations in performance I decided to try 1GB worth of mixed data (text, images and video). The mixed data will show me if ssdeep is handling different types of data in different functions when I profile it. Ssdeep also allows for recursive hashing so I have my data broken down into a multi-folder structure to maximize my data complexity and find any potential areas of improvement.

     Since my data and builds were now set up it was time to begin running my benchmark tests. There were, however, a few things to keep in mind before I ran the first command. The first was the potential for resources to be taken by another user while I was running my benchmarks. To mitigate this chance I ran a who command before I ran my test to see how much traffic was on the machine. If more than two other users were currently logged in I swapped to the other environment I am using and checked there and in the odd occasion where both were busy I waited an hour and checked both again. Once the traffic was low I used the top function to make sure that those users were not currently running tests of their own. Having ensured both these conditions were met, while not entirely ruling out resource fluctuations in my testing, will minimize the chance for a conflict with another student. The other factor I needed to consider was the potential for caching to effect performance. To combat this I ran my test 3 times before recording data to allow any caching that is happening to settle so I got more consistent timing.

     Having factored these outside forces in, and attempted to protect the performance from them as much as possible, I began running tests using time as the measurement of performance. To capture the time data I used perf stat to accurately depict how long it was running and how many cycles it used. Below is a list of results:

     Aarchie:

  1. 20.902915061 seconds time elapsed
  2. 20.862901730 seconds time elapsed
  3. 20.899303422 seconds time elapsed
  4. 20.882091638 seconds time elapsed
  5. 20.873999518 seconds time elapsed
  6. 20.869303300 seconds time elapsed
  7. 20.880638535 seconds time elapsed
  8. 20.887912284 seconds time elapsed
  9. 20.869638112 seconds time elapsed
  10. 20.894045743 seconds time elapsed
  11. 20.885457030 seconds time elapsed
  12. 20.882969729 seconds time elapsed
  13. 20.863210255 seconds time elapsed
  14. 20.890889080 seconds time elapsed
  15. 20.850227428 seconds time elapsed
     Average: 20.879700191 seconds time elapsed

     Yaggi:
  1. 8.051408066 seconds time elapsed
  2. 8.054790075 seconds time elapsed
  3. 8.054916242 seconds time elapsed
  4. 8.029347008 seconds time elapsed
  5. 8.067165928 seconds time elapsed
  6. 8.049327826 seconds time elapsed
  7. 8.057928442 seconds time elapsed
  8. 8.049270394 seconds time elapsed
  9. 8.051785455 seconds time elapsed
  10. 8.087810967 seconds time elapsed
  11. 8.054713720 seconds time elapsed
  12. 8.047773721 seconds time elapsed
  13. 8.073283661 seconds time elapsed
  14. 8.048059428 seconds time elapsed
  15. 8.050254100 seconds time elapsed
      Average: 8.0551890022 seconds time elapsed

     While running my tests I did get a few data points that fell relatively far from most of the other results that I was seeing. In these cases I ran top again to see if I could identify the cause and, as expected, another user had begun their own tests. Thus when calculating the average of my results I excluded all times that had a 0.2 variance from the median times as these were the obvious outliers which would skew my resulting average. This still left some variation from run to run but perf stat breaks the timing into user and system times and most of the variation came from system time which indicates to me that the variation is a result of external system processes being run while my test was running.

     These benchmarks have given me a good indicator that there is room for improvement in the Aarch64 code so that is where I will start my testing process. I will keep the x86_64 version in mind when testing my changes though to ensure I don't slow it down while I speed Aarch64 up.

Comments

Popular posts from this blog

Lab 3: Compiled C Code

Lab 1: Investigating Open Source Development

Lab 4: Assembler