15th of January

Forgot my power pack for my laptop, so quickly booted it up and transferred the SPEC benchmark to it, and committed any outstanding work.

Talk to brad, and mentioned about the pointer switching, for being able to hot swap the code, by generating the code in a separate memory location, then automatically change a jump to that memory location and presto it would now be in that loop. First it would need to be explored weather this is actually case and that an scenario where it keeps running the same block, and does not get a chance to replace it, is occurring.

I looked over the results of the h264ref benchmark briefly, the times where

	Running test case: 1 h264ref.arm qemu-0.10.6 results in Real: 9:41.46 | User: 576.08 | Sys: 4.91
	Running test case: 1 h264ref.arm llvm-org results in Real: 9:56.49 | User: 591.69 | Sys: 4.73

I will run the instruction usage script over it to find the most used ARM instructions.

I proceeded to extend the instrument script to help with identifying, blocks that have been replaced. It really needs more test and to handle and fix the concurrency case, where by the replacing statement gets printed in the middle of the output of an instruction.

Next I added a simple way to link blocks together in the script, to get the previous block. so an OUT block can get the IN block, meaning the in block would get the out block that ran before. This needs some improvement, since it is one way. Wish to extend it so you can traversal just in and outs, and then only in to out, out to in, but not in to another out that does not belong.

Got the analysis working for going through in blocks and doing the check to see if it is only using instructions that are implemented, followed, by checking if the output block, is at the address of a block that was replaced. This is where I needed the double link, because at first I was running the address comparison on the in block but LLVM stuff replaces out blocks, next I used the link but forgot it points to the out block before the in not the one that generated from itself.

Ideas expand it to compare blocks, so size, first and last address used.

The current list of things to do was, look into how the qemu cache works, is it just a farm of pointers, or it it a graph structure, and the other was get Qemu to output the the code generated by LLVM. This is related to above, as if you could access the cache system you could print out all the blocks in there.

The format for Generated block for "(00011) generated block for 0x40095540 @ 0x60943cc0; 17:7" is a count or index in to the ring buffer, the address of the ARM code @ address of the AMD64. The remainder two numbers I am still not sure about the first is the size from the top of the translation block to the epilogue, the other the size of the external translated ‘block’.

Successfully managed to get what I am quite certain to be the LLVM version of a block. The thing that was defeating me was I was I was changing code but no effect took place since I was running Andrew’s version of it rather then my own. I have looked into getting it to print on replacement the TCG version before its overwritten then then LLVM. I currently have some of the numbers off so it seems to produce more then necessary, however some of the fields make more sense now after I started typing this up and reading/looking into some other things for the exact details.

Brad stopped by and suggested looking at the noop slide, and see what calculations and magic numbers are used for that along with other variables in the area. Granted I had tried that but quite sure I was just using the wrong combination and using them incorrectly. His main point was to try to get the overall look at the big pciture, and not just a small little block. Which is what I am heading towards, once i can print LLVM vs. TCG blocks, with the size can produce stats such as out of all the replacements how many are better, worst and about even.

 

End of week 2.

Advertisements