A look at Mach-O

1 Comment

Yesterday (2010-11-06), when I took a break from studying, I started looking at the Mach-O Format used by Apple’s Mac OS X.  It is a format for executable,  similar in purpose to the PE Format (Portable Executable) that is used by Microsoft Windows. That said both are not limited just to runnable executable but could also be a dynamic library (Usually denoted by dll extension for/on Windows). I started working on a simple C program, were you provide it the path to a Mach-O binary and it would print on the information. I am aware there is objdump and a few other tools which will do this but I was after learning the format, not after a way to decode it, and none of them had a nice modular/library interface to being able to do such a thing anyway.

The binary I was looking at was “Mach-O fat file with 2 architectures” but more on this later. I was able to use 7-zip to extract one of the architecture contained in it out. The format is documented on the Apple Developer site, and the main structure is as follows:

struct mach_header macho_header_struct
{
    uint32_t magic;
    cpu_type_t cputype;
    cpu_subtype_t cpusubtype;
    uint32_t filetype;
    uint32_t ncmds;
    uint32_t sizeofcmds;
    uint32_t flags;
};

 

Ignoring the error checking, the actual workload looked something like this:

struct mach_header header;
bufferSize = fread (&header,1,sizeof(header), file);
printf("mach object header\n"); 
printf("magic number\t%08x\n", header.magic); 
printf("cputype\t\t%08x\t%s\n", header.cputype, cputype_to_string(false, header.cputype)); 
printf("subcputype\t%08x\n",  header.cpusubtype); 
printf("filetype\t%08x\t%s\n",  header.filetype, filetype_to_string(header.filetype)); 
printf("ncmds\t\t%08x\n",  header.ncmds); 
printf("sizeofcmds\t%08x\n",  header.sizeofcmds); 
printf("flags\t\t%08x\n",  header.flags); 
 

Once I had that working, the next step I choose to take was to look into the “fat” file again. Since I had the above in place I realised that if it ias a “fat” executable it has a different magic number and it matches the magic number defined in the documentation. Instead of a mach_header there is a fat header which has the magic number and the number of architectures. Straight after that structure in the file is an array of structures containing the information about where to find the mach_header for each architecture. It would seem the Apple marketing term for a “fat” executable is a Universal binary as it can be used on more then one platform.

This diagram shows the layout of data for an fat executable, it shows the fields in the fat_header and fat_arch structures. Something I found interesting was the offset, the offset is from the start of the file and not from the offset location or at the least the start of the fat_arch structure. After the fat_arch should be the first mach_header structure. Something that may be interesting to follow up is if the mach_header comes straight after the last fat_arch structure or is there padding/zeroed space or some other undocumented metadata that I haven’t came across in my studies so far. Since the location is based from the offset it would suggest there no requirement for it to be straight after.

macho_header

I then was able to seek to the offset of one of the fat_arch and then the code above I had previously written for just reading the mach_header and printing it out, worked. The next step from here is to decode the load commands into the appropriate structures, a problem here is there are 37 structures. I can see two ways of dealing with this, the first way is create each different structure read it, then I can also print out information while I work but this requires doing it for all 37 structures. The second option is generically allocate it, since the command structure all have 2 fields the type and the size, so the size could be used to allocate the amount of memory it requires, then it can just read in the structure as bytes, delaying the structure till later where it can be casted on use. Another aspect I will need to think about soon is and appropriate API and how to go about making the current stuff more modular, so for example for a fat binary it will give the array of fat_arch structures, then that can be used to then get a mach_header, followed by a load_commands or maybe even just get_commands, that is just some food for my thoughts.

Advertisements

Fibonacci in x86 Assembly

Leave a comment

On the 31st of October 2009, sometime in the afternoon, the topic of Fibonacci came up, on the computer science channel for the University of Adelaide. I think the weeks coding activity for first years was to write a recursive Fibonacci algorithm. That said, future post will be why recursive Fibonacci is the wrong way of thinking about the problem. I was in my third year and for fun I decided that instead I would write an iterative one, in p in C. After that was done was still not happy with that as being an end point so I decided it was time to learn a little bit of X86 assembly. So the logical step here was to translate the algorithm I wrote to x86 assembly which goes a little as follows:

fib.x86.asm

        .text
.globl _fib

# Fibonacci in x86 assembly by Sean Donno <darkdonno@gmail.com> 
#
# Paramter n is in  0x8(esp) store in ebx
# Return via eax (last)
# ecx = n  (loaded from 0x(esp)
# ebx = secondlast
# eax = last  (and the value returned)
 
_fib:
    pushl %ebp
    movl  %esp, %ebp
    pushl %ebx
    pushl %ecx
    # Saved Registers
 
    movl 8(%ebp), %ecx          # ecx = n
    movl $0, %ebx               # ebx = secondlast = 1
    movl $1, %eax               # eax = last = 0
    jmp    fib2
fib1:
    add  %ebx, %eax             # last = last + secondlast
    neg  %ebx
    add  %eax, %ebx             # secondlast = secondlast + last
    dec  %ecx                   # n = n  1
fib2:
    cmp  $0, %ecx
    jne  fib1                   # if n != 0 goto fib1
 
    # Restore registers and return
    popl %ecx
    popl %ebx
    popl %ebp 
    ret

Code available on Google Docs at https://docs.google.com/Doc?docid=0AXW7elIWGgADZGR6ZHJkMl8xMzd6ZHBxNmhk&hl=en, I really wish WordPress had support this JavaScript syntax highlighter so it provides a a plain version, and can do line numbers, and colouring on the fly, it might and I just haven’t looked hard enough. At the time of writing this i am using hosting at wordpress.com so to the best of my knowledge installing any plugins are not possible.

The comments make more sense when you can compare it to the C version, a while later I followed this up with a version for DLX, for those who haven’t gone to the University of Adelaide (UofA), DLX is a a RISC processor designed by Hennessy and Patterson, who worked on the MIPS architecture, and its a slightly slimmer version of MIPS, that is taught in the Computer Systems course at UofA.  For some comparison the DLX version is 10 instructions, but passes arguments via registers instead of a stack, which is what 7 of the instructions is the x86 version does. There are 15  lines of comments in the traditional ‘function’ commenting style used in the course, which outnumbers the actual lines of code.

SEGP – Fourth Milestone – 1

1 Comment

This post documents the work done for the fourth milestone on the Software Engineering Group Project done at the University of Adelaide in association with ASTC as follow up to the Summer Project I worked on. 

The first part of the follow up work involved improving the tools that I had written during summer for collecting results such that more information could be extracted and formulated. Another thing that was done was the addition of using oprofile to determine which blocks the QEMU spends the most time in for a given test program. This means its easier to determine where time is spent the most and can point people into the right direction of what instructions and blocks to look at getting translatable next. The final thing was the addition of new instructions, which were reverse subtract and exclusive or. The addressing modes for using ASR and LSL were looked at and implementation provided, along with a implementation for the specialised zero case but none of the tests we run use that so those cases are disabled. The big one was the push instruction, out of two variations that were implemented only one works to date.

The plan was to remove the memcpy of LLVM Blocks by allowing qemu to jump out to the location were LLVM generates the native code for a block and then returning to the epilogue for that block, with the hope to remove the locking/blocking. For extended detail on this I will write the post at a later date. (*IF NOT PUT IT HERE AND SEPERATE THIS)

The first task to do was to first of all calculating the total time spent in the critical section

This was added in ext-blk-cmpl.c

#include <time.h>
	uint64_t ext_blk_time_spent_processing = 0;

 

After the lock is successful in ext_blk_cmpl_replace_block then the current time is recorded.

	struct timeval tv;
	gettimeofday(&tv, NULL);
	uint64_t start_time = (tv.tv_sec * 1000000) + tv.tv_usec;

After the lock is released update the total time spent by adding on the difference in current time with start time along with a print out of the time so far.

	
	gettimeofday(&tv, NULL);
	ext_blk_time_spent_processing += (tv.tv_sec * 1000000) + tv.tv_usec - start_time;
	printf("Acc Time spent blocked processing the jobs: %llu\n",ext_blk_time_spent_processing);

This has a higher precision, i first did it with just time which is second precision and it just read as zero. The result on my machine from that is the final was 36673 microseconds which is 0.036673 seconds. That is a lot smaller then expected,

Acc Time spent blocked processing the jobs: 36673

Research – Exploring the TranslationBlock structure and epilogue.The structure has a field unit8_t *tc_ptr, this is a pointer to the translated code. (The Address in the code cache). What we are after is weather or not it is read from and stored elsewhere, such as in the epilogue of another block. An quick check with ack-grep for tc_ptr reveals 43 lines in which tc_ptr appears. 28 are pointer references (->). 15 out of the 43 of them were in the ext-blk-cmpl which is created by our work so easily to handle, leaving 28 references in translate-all.c, exec-all.h, exec_c, cpu-exec.c, two of which are in comments.

 

So we have 28 occurrences of tc_ptr, and of those there are 13 that use pointer referencing. Three of them are uint8_t *tc_ptr so eliminating that brings us to 25, eliminating the two .tc_ptr which are in comments is down to 23.Remove another 2, one for a function with argument tc_ptr and unsigned long tc_ptr; and one is doing the disassembly of the code to the log file so that brings us to 20 possible usages of tc_ptr that may result in a problem. For the exact command I did, the result after wards is not exactly but i just modified it so it was blog friendly.

ack-grep tc_ptr | grep "\.[ch]" | grep -v ext-blk-cmpl | grep -v \*tc_ptr | grep -v "]\.tc" | grep -v "log_disas" | grep -v "unsigned long tc_ptr" | wc -l 

translate-all.c: 102, 173, 174, 185
exec-all.h: 242, 245, 265
exec.c: 735,875, 876, 1179, 1234, 1235, 1243, 1244, 1246
cpu-exec.c: 109, 581, 607, 614

 

  • translate-all.c
    • 102 – Located in cpu_gen_code, it is read into gen_code_buf, which is passed to tcg_gen_code so this is SAFE, as this is before TCG code is generated, and is only used there.
    • 173/174 – Located in cpu_restore_state and tc_ptr is read from and checked against searched_pc  (Potential problem comment says finds opc index corresponding to search pc)
    • 185 – Located in cpu_restore_state –> tcg_gen_code_search_pc is called with the tc_ptr and searched_pc – tc_ptr (Specially here is the problem)
      • This is the only location where tcg_gen_code_search_pc is called from, this calls tcg_gen_code_common.
  • exec-all.h
    • 242/245 – Insidetb_set_jmp_target, calls tb_set_jmp_target1 with the tc_ptr.
      • This is located in an #elif defined(__arm__) and requires USE_DIRECT_JUMP, which is also defined when arm is used
      • Now  following it down the rabbit hole: static inline void tb_set_jmp_target1(unsigned long jmp_addr, unsigned long addr)
      • Called from
        • tb_reset_jump in exec.c where by it is used to “reset the jump entry ‘n’ of a TB so taht it is not chained to another TB
          • the address is tc_ptr + tb->tb_next_offset[n]
        • tb_add_jump 264  where it is used to patch a native jump address. (Assuming it is used to chain)
      • tb_jmp_offset is 4 elements and is the offset of jump instruction.
    • 265 – Located in tb_add_jump, this is used to patch the native jump address
      • Called in cpu-exec.c line 593
  • exec.c
    • 735 – Used in calculation for tb_set_jmp_target see 242/245 in exec-all.h (above)
    • 875/876 – Located inside tb_gen_code this is where the tc_ptr in the TranslationBlock is first set so no problem.
    • 1179 – Located in tb_free, it frees a translation block, if its the last one used.
    • 1234/1235/1234/1244/1246 – Located in a function called tb_find_pc – Similar problem as the search_pc. This returns the translation block of a given PC. PROBLEM it uses the tc_ptr the thing for searching.
  • cpu-exec.c
    • 109 – Located in cpu_exec_nocache, tcg_qemu_tb_exec is called on the translated code. This executes the code after it is generated, so safe.
    • 581 – Used for the trace debug statement output
    • 607/614 – This reads in the tc_ptr from the translation block and executes it.

 

Summary of problem functions

  • tb_find_pc – It uses the the Translated Code Pointer to find a Translation Block. Located in exec.c.
  • cpu_restore_state – It uses the Translated Code Pointer. Requires more details, and research.
  • Chaining
    • tb_reset_jump – Called/Defined in exec.c
    • tb_add_jump – Called in cpu-exec.c and defined in exec-all.h

Distributed Music Player

Leave a comment

Well last Monday, was the last day for the game jam for the Game Development Club, I had hoped to finish off one of my entries but things took a detour instead. I had gone into university to submit my preferences for working as a prac supervisor/demonstrator. I was initially up in Innova 21 (the new building on campus). Waiting for Wilson to finish with his meeting as he recently got back from China, so there was some catch-up about how that was, and then we met with Lennie for lunch and more about Wilson’s adventures in China was discussed. Now the detour happened when after lunch we ended up in the Computer Graphics lab, as Lennie was working on improving his Distributed Music Player that he first wrote in 2007.

The problem was he was discussion the program and I decided to contribute to explain how I would do it and what he should do however, he turned around and suggested I look at it/ do it. So pretty much for the last three days that is what I have been doing. The work I did was created a new python class that was responsible for holding a number of clients,  able to load the ip list, and able to perform the play, stop and loadFile command on all the clients. I then started using multiprocessing library in Python 2.6 and later to do multi threading however slight problem with that as it uses pickle (serialisation) and network sockets do not like serialisation so rather then try to find a way to get that working decided to just use the threading library which is available in 2.5. Long story short I now have python 2.5 (what i had originally), python 2.6 and python 2.7.

 

That night I wasted most the night learning how to use Glade and PyGTK, at the end of the night I eventually had a basic GUI up and running as seen below. Glade is a Visual editor for creating user interfaces for GTK. and GTK is a user interface API/ library. However in hindsight I could of ported the existing Client and Host classes to C# and then developed the interface in Visual Studio in about an hour. However on the plus side I did learn a new tool, which could help in the future.

dmpcc2

Day two, I went into uni again, hooked up a slider to be the volume bar, and rehauled the interface to have a toolbar and status bar, and moved the buttons to the status bar, introduced a context menu for right clicking on clients, didn’t get that working, had issues passing what client was ‘selected. Added the loading orchrestra files since I didn’t want to make a dialog that had a list of items to select, and this way it was now playable. That night, I added a menubar and an about screen, and disconnect button.

dmpgoing

Day three, added the logout button to the menu and created a “question” dialog, which is a dialog that has a "Message or Question”, and an Text Entry area so you can enter. This was then hooked up as the “Connect to a client”, dialog when you right click on the client area somewhere there is not a client. Added Solo and Mute support before going to bed, I was only planning on adding it to the gui and the templates however I started writing comments for the steps to implement solo and mute but i ended up doing the code instead. I actually touched the java code for the node to perform a little cleanup, I removed some unused methods, and variables and changed all the fields to private followed by running a reformatter over the code but only indentation was changed .

 

Hopefully I will post a follow up later to night.

Wave Sleep

Leave a comment

I was trying to get to sleep last night (22nd-May-2010), and started thinking about projects I would like to make, which lead onto the hosting and project management. During the week I had read the Google I/O talk about Programming on/with Wave. The idea is a robot that will check a subversion repository, and will post to a wave of the changes, etc, so kind of like the emails that you can get as a post-commit script. Little bit of thinking I realised , it may be possible to handle the waving as a post commit hook script. I am hoping to get a chance to look up some more information about Google Wave to see if this is possible .The idea would be if the option has ‘standard layout’, meaning if it has trunk, branches, and tags. It will “do a ‘reply/ threadiness ” of where the thing was branched and continue on there. Granted at this stage I am really not that familiar with Wave or the terminology so bare with me. Since I was in bed trying to sleep I didn’t have access to a computer or the internet to check if someone had already implemented a system like this.

As I was typing this up I decided to have a quick search and sure enough http://code.google.com/p/svn-py-robot/

“A google wave robot to add the log message of a svn commit to a wave blip.”

Ideally I then thought, since I have been using Git lately would be nice to do it for that as well. So will look into that as well. Likewise really why stop there, when you could use wave to post issues and feature requests, and also use it as a forum. So ideally it would be the backend for a website for your project.

I never completed this post so will followed up at a later date.

Space Invader Clone Progress Update – Day 3

Leave a comment

Day Three

Continuing from where I left off. I forgot to document what i did on the second day.

I started the day, separating the current source file into separate units, and header files, so once for Home/About, and then one for the Game, and updating the Visual Studio Solution and GNU Makefile to match. Next I finally wrote the lines that would be displayed in the About and added the code to render them to the screen.

As I was writing the framework/skeleton for the Game file, I realized i had a limitation in my current design that was it wasn’t flexible enough for more actions on the States, since there was just 2 arrays, 1 for the Events Handling, and 1 for the Rendering, so I re factored this up to be an Array of a State Structure which contains pointers to the event and rendering functions, and added a initialization and deinitialization callback. Following that instead of having all the different functions like DrawMenu, DrawAbout, EventMenu, being seen by the rest, I exposed a SetupMenu, SetupAbout which takes a state structure pointer, and then that is responsible for setting the callback handler to those methods. This made the header for the Home [Menu and About], just 2 functions instead of 4 (it would of been 6 functions with the Init/Deinit). Few other little things in the design was changed to lock things down further.

The whole time I forgot to actually commit the C source and header for the actually Game State, my bad. Final thing for the day was implement the rendering and displaying of the Score and Lives, followed by fixing some memory leaks of un-freed memory.

Sorry no screen shots for this, the game screen is not much to look at.

imdbLinker – SVG/HTML

Leave a comment

After a lesson in Computer Science about Chess and Chess Playing computers, I decided to make a text based chess game written in ANSI C. Started off simple, defined the data sturcture to hold the chess board and where the pieces are, then a function to setup the board to the default peices, followed by another function that can draw the chess board to the terminal. Also started user input to choose a move.

Day Two, on the bus, to university i was thinking how to handle weather the move the player wants to do is valid. I then considered a mathmatical way using Gradients. For peices to move Left or Right the gradient is zero. For them to move to up or down the change in X is 0. For them to move diagonal, the gradient is zero. Haven’t sorted out Horses/Knights yet.

Older Entries