Posts

Project Stage 3 - Optimization

     Finally the time has come for me to implement my plan for optimization and see if my theories hold true. As detailed in my plan I first looked at compiler flags to see if there could be any significant gains while still maintaining full functionality. My first step was to read through the Makefile to understand where I need to make my changes and test the timings. This proved to be somewhat of a challenge as the Makefile for ssdeep is very extensive with a variety of conditional checks and error handling. Having read through the first chunk of code I quickly realized that I was not equipped to fully understand what the Makefile was doing. I decided to then less the Makefile and search for "flags" which led me to a section of the Makefile that looked promising: CFLAGS = -g -O2 CPP = gcc -E CPPFLAGS = -I/usr/local/include CXX = g++ CXXCPP = g++ -E CXXDEPMODE = depmode=gcc3 CXXFLAGS = -g -O2      My first test is simply to kick the optimization up to -O3 to see if

Lab 5: Algorithm Selection

     For this lab I was tasked with comparing the performance of three different volume scaling algorithms under various conditions. Instead of true volume scaling this lab will create an array of 10 million integers which will then be scaled by 0.75 and stored in a new array. The three approaches for this are: - Multiply each value by a float of 0.75 - Pre-Calculate all possible values multiplied by the scaling factor (0.75) and run a look-up for each      value - Convert the scaling factor into an integer by multiplying it  binary number representing a fixed-point value 1 To isolate the scaling loop I will run each method once without scaling and then once again with the scaling loop included and take the difference as a representation of time spent on that algorithm. Method 1: Method 1 is the naive implementation that was provided to us initially in the lab. It simply runs through each value and multiplies it storing the new value in the output array. The slowdown he

Lab 4: Assembler

     When first starting this lab I was excited to dive in to assembler as it gives a granular understanding of all steps involved in processes that are taken for granted in higher level languages. So after reading through the basic instructions used in assembler I began on the task of writing a program that loops from 0 - 30 printing "Loop: X" with X being the current loop count.      I immediately realized the significant difference between assembler and every other language I have coded with before. Instead of simply declaring variables and doing calculations however I want and dumping the results to the screen, its important to line all of the data I want up before executing any command and this lab really highlights this process. So we know the end result is a printed string 30 times, step 1 of which is getting a string to print. I created a .data section to hold both my string and the length of that string. This is important because unlike in C I can't replace X w

Project Stage 2: Benchmarks

     For stage 2 of the project I built my package on the systems I plan to test on and then ran a standardized test a number of times to get a benchmark for performance that I can use to measure the impact I have with my code changes. Since I am going to be testing on both x86_64 and Aarch64 I needed to mimic the build process I followed in stage 1 to have ssdeep on all required systems. First up was Aarchie, after using wget to retrieve the tarball from ssdeep's website I used tar -xvf to unpack it into my project directory. From there a simple ./configure followed by make -j4 (-j4 to allow 4 concurrent jobs to run) resulted in a complete build with the ssdeep executable ready to test. My smoke test was simply calling ./ssdeep on the tarball and I was returned: ssdeep,1.1--blocksize:hash:hash,filename 12288:IvvRytd0lLS4iXBpfBof3msXd2B1mOfw68Tsd:IvZvlLS40n02st2/mOt8od,"/home/cmcmanus/proj/ssdeep-2.14.1.tar.gz"      Thus my Aarchie build passed my smoke test and it w

Project Stage 1: Planning

     For the first stage of my final project I had to select a hash, checksum or compression package to try to optimize. The package also had to compile down to machine code, like C, C++ or Assembler. Each student needed to select a different package to tackle. To keep as many possibilities for optimization open as possible, I decided to look both on Aarchie and Xerxes for packages available on both.      I began my search by dnf searching for hash checksum and compression which gave me a ton of results. After sifting through these I first thought to focus on zpaq but discovered it's archiving and journaling processes were pretty complex and difficult to read through so I moved on. Next I was going to work with spooky hash but after building and profiling it I discovered one of my classmates had beaten me to the punch.      Thus I landed on ssdeep, a program used to take input files and create a context triggered piecewise hash, or fuzzy hash. It can also compare signatures for

Lab 3: Compiled C Code

Image
     In this lab I investigated the transformation of my code from source C code to C compiler output to learn exactly what the compiler is doing when I call it. My initial program was a simple Hello World program: #include <stdio> int main(){         printf("Hello World!"); }      My first compilation was a simple gcc with most of the optimization and fancy tricks stripped out. This was done with the -O0 and -fno-builtin flags, -O0 meaning do not use any optimizations when compiling and -fno-builtin meaning do use any built in function optimizations. So : gcc -g -O0 -fno-builtin -o hello1 hello.c.       The compiler then returns an Executable and Linkable Format file (ELF for short) named hello1 which I used to examine what it had done to my code. The first thing I noticed after using objdump to look at the assembly code that was output was that my simple four line program had ballooned out to 199 lines which led me to wonder how much bigger my elf fi

Lab2B: Building glibc

      For part B of Lab 2 I was tasked with building my own version of glibc and then introducing a bug into it to ensure I am using my build instead of the installed version. Glibc is GNU's implementation on the standard C Library which underpins many functions of the Linux kernel, so it is very important that my build and tests don't affect the systems version of glibc or my classmates will not be happy with me.      The first step is to get the source code for glibc. Since I used wget and tar for part A, I will use git this time to get a feel for both methods. So running git clone git://sourceware.org/git/glibc.git  gave me the source code to build from but before I did there is an important step for glibc which is to create a separate directory to build in as glibc uses two parallel trees for source and build codes. This is nice so that if I have messed up my build I can just clear out that directory and still have the source code to start my build again without having to