Abesq: mspgcc

Tampilkan postingan dengan label mspgcc. Tampilkan semua postingan

Senin, 18 Februari 2008

Switch/Case Headaches in MSP430 Assembly

by Travis Goodspeed <travis at utk.edu>
at the Extreme Measurement Communications Center
of the Oak Ridge National Laboratory

While polishing off my rewrite of msp430static, my function identifier ran into a bug which was the result of an improperly-handled switch/case statement. This short article is intended to show a practical example of the mixing of code and data in von Neumann machines, as well as what a headache variable-length instructions can be.

This article will concern the meaning of the following slice of object code and that which follows it, found within the Msp430TimerP$1$Event$fired method of TinyOS 2.x Blink example. You can find the associated executable and disassembly at http://frob.us/~travis/public/blog/misc/switchcase/.

Consider the following fragment of code:

    4124:       10 4f 28 41     br      16680(r15)              ;
    4128:       38 41           pop     r8              ;
    412a:       68 41           mov.b   @r1,    r8      ;
    412c:       78 41           pop.b   r8              ;
    412e:       88 41 98 41     mov     r1,     16792(r8);
    4132:       a8 41 b8 41     mov     @r1,    16824(r8);
    4136:       ...

What does this code accomplish? What is the meaning of the POP statement at 0x4128? Try it yourself before reading ahead.

The answer is simple. There is no POP instruction, neither at 0x4128 nor anywhere else in the code above! 0x4128 is the first entry of a jump table, which continues past the end of the excerpt. 0x4124 uses the indexed addressing more. `BR 16680(r15)' is a branch to the address contained within the word at 16680+r15. 16680--as you can find by a calculator or by reading the second word of the object code--is 0x4128, the address of our POP instruction.

It's easy to reconstruct the table by reading the object code, correcting for endianness. The fragment shown above is {4138, 4168, 4178, 4188, 4198, 41a8, 41b8, ...}. Note not only that the disassembler is unable to recognize that the table is not code, but also that the disassembler is unable to determine where words begin and end. Continuing the code, we find that the list terminates in the following manner:

    4136: c8 41 1f 42  mov.b r1, 16927(r8);
    413a: 82 01        .word 0x0182; ????
    413c: 8f 10        swpb r15  ;

The word at 413a is not properly disassembled because it is neither an element in the list nor an instruction. Rather, it is the second word of a 4-byte instruction. This instruction is "1f 42 82 01" or "0x421f 0x0128", depending upon your choice of notation. The MSPGCC project's handy python disassembler reveals that the instruction is "mov &296, R15" where 296=0x0128.

Rabu, 23 Januari 2008

Static Analysis of MSP430 Firmware in Perl

by Travis Goodspeed <travis at utk.edu>
at the Extreme Measurement Communications Center
of the Oak Ridge National Laboratory

That which follows is an adaption of notes which I made during the course of writing msp430static, a sort of poor man's IDA Pro for static analysis of MSP430 firmware without source code.

What Functions Look Like

A call to strcpy, such as the one which follows, is accomplished by populating r15 with the destination address and r14 with the source address, then calling the function at its hex address. In the following example, foo is the target (r15) and babe is the source (r14). See my article on IAR's MSP430 calling conventions for references to calling convention documentation for various compilers, as each compiler seems to do something different on this platform.

strcpy(foo,babe);                                                                                                                                                     
   1154:       3e 40 7a 02     mov     #634,   r14     ;#0x027a                                                                                                        
   1158:       0f 44           mov     r4,     r15     ;                                                                                                               
   115a:       b0 12 a4 11     call    #4516           ;#0x11a4

In the unstripped binary, we'll find the code for strcpy at the address (0x11a4) called above:

000011a4 <strcpy>:
   11a4:       0d 4f           mov     r15,    r13     ;                                                                                                               
   11a6:       0c 4f           mov     r15,    r12     ;                                                                                                               
   11a8:       6f 4e           mov.b   @r14,   r15     ;                                                                                                               
   11aa:       cd 4f 00 00     mov.b   r15,    0(r13)  ;                                                                                                               
   11ae:       4f 93           cmp.b   #0,     r15     ;r3 As==00                                                                                                      
   11b0:       07 24           jz      $+16            ;abs 0x11c0                                                                                                     
   11b2:       1e 53           inc     r14             ;                                                                                                               
   11b4:       1d 53           inc     r13             ;                                                                                                               
   11b6:       6f 4e           mov.b   @r14,   r15     ;                                                                                                               
   11b8:       cd 4f 00 00     mov.b   r15,    0(r13)  ;                                                                                                               
   11bc:       4f 93           cmp.b   #0,     r15     ;r3 As==00                                                                                                      
   11be:       f9 23           jnz     $-12            ;abs 0x11b2                                                                                                     
   11c0:       0f 4c           mov     r12,    r15     ;                                                                                                               
   11c2:       30 41           ret

The stripped binary has the function at the same address, but has no function label. In fact, there isn't even a note (in msp430-objdump) that the address is the beginning of a function.

   11a4:       0d 4f           mov     r15,    r13     ;                                                                                                               
   11a6:       0c 4f           mov     r15,    r12     ;                                                                                                               
   11a8:       6f 4e           mov.b   @r14,   r15     ;                                                                                                               
   11aa:       cd 4f 00 00     mov.b   r15,    0(r13)  ;                                                                                                               
   11ae:       4f 93           cmp.b   #0,     r15     ;r3 As==00                                                                                                      
   11b0:       07 24           jz      $+16            ;abs 0x11c0                                                                                                     
   11b2:       1e 53           inc     r14             ;                                                                                                               
   11b4:       1d 53           inc     r13             ;                                                                                                               
   11b6:       6f 4e           mov.b   @r14,   r15     ;                                                                                                               
   11b8:       cd 4f 00 00     mov.b   r15,    0(r13)  ;                                                                                                               
   11bc:       4f 93           cmp.b   #0,     r15     ;r3 As==00                                                                                                      
   11be:       f9 23           jnz     $-12            ;abs 0x11b2                                                                                                     
   11c0:       0f 4c           mov     r12,    r15     ;                                                                                                               
   11c2:       30 41           ret

It's easy enough to detect the presence of this code in a stripped executable by looking for "mov r15, r12" or "0x0c 0xf4" and comparing the bytes that follow. I can't stress enough the importance of endian-awareness: the second column is composed of bytes, not words. As a word, "mov r15,r12" is 0xf40c. When in doubt, double-check yourself with the Single Line MSP430 Assembler.

Note that calling conventions vary considerably across the many MSP430 compilers and even among versions of the same compiler, depending upon optimization options and inlining. Don't expect all calls to look like this: check for yourself.

Before looking at a decompilation of the above, notice that a reasonably large string of bytes {6f 4e, cd 4f 00 00, 4f 92} appears twice. This duplicity might be removed by another optimizer, but it shows that something in the code is sufficiently intrinsic to the function to appear twice in one function. Perhaps it will remain consistent across compilers? In point of fact, this expanse of code copies a byte from the address contained within r14 to the address contained within r13. The final word compares the byte that was copied to zero. In the first usage, the function jumps to the end in the event that the comparison is zero. In the second usage, which follows the incrementing of both r14 and r15, the jump is backward if the comparison is not zero. A rough approximation in psuedo-C follows

char* strcpy(char* dest, char* src){
a=dest;       //mov r15, r13
b=dest;       //mov r15, r12

c=*src;       //mov.b @r14, r15
*a=c;         //mov.b r15, 0(r13)
if(c==0)      //cmp.b #0, r15
 goto ret;    //jz $+16

do{
 src++;       //inc r14
 a++;         //inc r13
 c=*src;      //mov.b @r14, r15
 *a=c;        //mov.b r15, 0(r13)
}while(c!=0)  //cmp.b #0, r15
              // jnz #-12

ret:
 return b;    //mov r12, r15
}             //ret

In the decompilation, I refer to r15 as both dest and c, as its purpose changes completely. Variables are passed as dest=r15 and src=r14, as GCC allocates parameters in the order r15, r14, r13, r12. The result, for strcpy() the destination address, is returned in r15.

It is apparent that this could be written a bit more compactly by merging the first and second stanzas. Also, the use of the indirect auto-increment addressing mode (As=11, of the form @Rn+) could eliminate the "inc r14" line. The instructions might also be reordered, and any number of register combinations might be used to hold intermediate values. It's not possible to detect all the ways in which strcpy() might be implemented, but it shouldn't be too difficult to detect the different ways in which it will be implemented. After all, it's far easier to fix an overflow vulnerability than to hide it; is it not?

Testing my theory, I disassembled the same program, this time compiled with IAR's compiler (V4.09A/W32). Grepping for cmp.b yielded a single line, at address 0xF86E, in the codeblock which follows.

   f864:       0f 4c           mov     r12,    r15     ;                                                                                                               
   f866:       0e 4c           mov     r12,    r14     ;                                                                                                               
   f868:       1c 53           inc     r12             ;                                                                                                               
   f86a:       fe 4d 00 00     mov.b   @r13+,  0(r14)  ;                                                                                                               
   f86e:       ce 93 00 00     cmp.b   #0,     0(r14)  ;r3 As==00                                                                                                      
   f872:       f9 23           jnz     $-12            ;abs 0xf866                                                                                                     
   f874:       0c 4f           mov     r15,    r12     ;                                                                                                               
   f876:       30 41           ret

Those that have read my article on the register usage of IAR will note that the ABI is different in the code sample above. IAR fixed the register allocation order in October of 2007, and it now allocates registers in the order r12, r13, r14, r15.

This code is heavily--but imperfectly--optimized, so it's a bit difficult to decompile by hand. It all becomes clear when you realize that r12 is post-incremented and the original value is loaded into r14, the destination address for each character. Unlike GCC, the indirection post-increment addressing mode is used, but on the very next line we see that this necessitates another RAM access! Perhaps the cache will take care of it, but this means that IAR makes three memory accesses--one write and two reads--for every two that GCC makes. I'd recommend hand optimization for this function, if my stronger recommendation wasn't to scrap it as a troublemaker.

The decompiled code follows,

char* strcpy(char* dest, char* src){
char *a,
    *b=dest;      //mov r12, r15
do{
 a=dest;          //mov r12, r14
 dest++;          //inc r12
 *(a++)=*src;     //mov.b @r13+, 0(r14)
}while(0!=*src);  //cmp #0, 0(r14)
                  //jnz $-12
return b;         //mov r15, r12
                  //ret
}

I'd be willing to bet that the original is quite a bit denser in C, but this ought to be easy enough to understand.

So how do we do a generalized search for this, one which will recognize most implementations by most compilers? I propose a pattern that looks for the following:

The use of two registers, source pointer SRC and destination pointer DEST.
A mov.b instruction with SRC as the source. (Call the destination FOO)
A mov.b instruction with DEST as the destination. (Call the source BAR.)
A cmp.b instruction involving the immediate zero and the register SRC, DEST, FOO, or BAR.

I don't demand an inc for SRC as it might be auto-incremented (@Rn+), and I don't demand one for DEST as it might be copied from another variable in an IAR post-increment (cvar++).

It's possible to add more rules which describe the preceding examples. For example, both of these examples move their first parameter to a temporary register and, later, move it back. Both follow the cmp.b with a jnz. I advise against making any matching pattern too strict, as it'll result in false negatives. Keeping things loose might result in false positives, but those false positives will be fertile ground for exploits of their own, even if they aren't strcpy().

It's also worth noting that a ruleset that's complex is easy to sneak by, either intentionally or accidentally. Suppose this pattern were modified to exclude strncpy(). The following strcpy() implementation would skate by, undetected.

char *strcpy(char *dest, char *src){
 return strncpy(dest,src,0x1000);
}

By keeping rules loose--but perhaps prioritized--it's easy to catch such actions. After all, what byte-wise copying until reaching zero is not suspicious?

Recognizing Functions from Perl

Now that the hand analysis is complete, it's time to bring perl into the mix. Instructions are recognized as one of two types: code and IVT entries. I ignore the .data section for now, but a little tweaking of the regular expressions would make it match. I make the assumption that every function begins after a 'ret' and ends with a 'ret'. This isn't strictly true, but it suffices for this article and ought only to miss the first function in memory, assuming everything is built with C.

The first step is to recognize individual lines. I used the following regular expressions in an early revision:

Match an instruction:
   #    11b6:       6f 4e           mov.b   @r14,   r15     ;
   #    11b8:       cd 4f 00 00     mov.b   r15,    0(r13)  ;
   #    1111:       22222222222     33333   44444444444444  555555
   /\s(....):\s(.. .. .. ..)\s\t([^ \t]{2,5})\s(.*);?(.*)/
Match an IVT entry:
   #    fffe:       00 11           interrupt service routine at 0x1100
   /[\s\t]*(....):[\s\t]*(.. ..)[\s\t]*interrupt service routine at 0x(....)/

Although I don't strictly need to parse so much detail to recognize strcpy(), it will be helpful when I add features.

Once lines are recognized, they are loaded into a list of strings, indexed by the integer (not hex-string) value of the first field. I make a list of strings, rather than objects, because most comparisons can be performed by regular expressions. This is fine for a 16-bit microcontroller, but might be prohibitively expensive for something larger.

Routines are recognized--as I've previously stated--by assuming that they reside between ret statements. This assumption makes things quite easy to implement, but results in the loss of the first function as well as the concatenation of functions--such as main()--which do not return. In the following example main [118E to 11A0] and strcpy [11A4 to 11C2] are combined into a single listing:

   118e:       31 40 00 0a     mov     #2560,  r1      ;#0x0a00
   1192:       04 41           mov     r1,     r4      ;
   1194:       92 43 00 02     mov     #1,     &0x0200 ;r3 As==01
   1198:       b0 12 40 11     call    #4416           ;#0x1140
   119c:       b0 12 68 11     call    #4456           ;#0x1168
   11a0:       30 40 c4 11     br      #0x11c4         ;
   11a4:       0d 4f           mov     r15,    r13     ;
   11a6:       0c 4f           mov     r15,    r12     ;
   11a8:       6f 4e           mov.b   @r14,   r15     ;
   11aa:       cd 4f 00 00     mov.b   r15,    0(r13)  ;
   11ae:       4f 93           cmp.b   #0,     r15     ;r3 As==00
   11b0:       07 24           jz      $+16            ;abs 0x11c0
   11b2:       1e 53           inc     r14             ;
   11b4:       1d 53           inc     r13             ;
   11b6:       6f 4e           mov.b   @r14,   r15     ;
   11b8:       cd 4f 00 00     mov.b   r15,    0(r13)  ;
   11bc:       4f 93           cmp.b   #0,     r15     ;r3 As==00
   11be:       f9 23           jnz     $-12            ;abs 0x11b2
   11c0:       0f 4c           mov     r12,    r15     ;
   11c2:       30 41           ret

This happens because main() returns not by "ret" but by branching to 0x11C4, which is __stop_progExec__ in the firmware being analyzed. An alternate method would be to look for call targets, assuming that 0x11A4 is the beginning of a function because some other instruction calls it.

By searching by call targets, my script correctly recognizes the second function of the preceding example, but it no longer recognizes main(), which in GCC is called by "BR #addr" and not "CALL #addr". A quick check on a small GCC program shows that absolute jumps are only used for main() and non-user functions. Thus, by looking for "CALL #addr" and "BR #addr", it is possible to find the entry points of most if not all functions.

Once functions can be been identified, it isn't very difficult to add an output mode for Graphviz. The following image is a call tree in which main() calls two functions which call strcpy(). Dangerous functions and calls are labeled in red. The two islands on the right--which prevent this from being a Tree in the graph theory sense--exist in assembly as infinite loops.

Further, it's also useful to produce memory maps which detail memory usage. These can be produces from the database by dumping to a graphics programming language. My first revision published to LaTeX/PSTricks. This looks beautiful, but rendering everything as vector art quickly makes a complex memory map unmanageable. My solution was a rewrite that prints raw postscript. Both are shown below.

Conclusion

I've named the tool msp430static, and I intend to publish a revision as soon as I clean up the code. It's a decent hack at this point, but a hack isn't maintainable and I shudder to think at how I'll comprehend these few hundred lines of perl in three months' time without mush revision.

My redesign will feature an SQL backend, such that the input file needn't be reparsed for each minor revision. This will also allow for scripting in languages other than perl. A single command will stock a database of a defined schema, a second will analyze the database, and others will produce output or analysis. I intend to do most analysis in a self-contained perl script, but clients may be written in a variety of languages as appropriate. I'm undecided as to whether I'll make the tool architecture-agnostic in this revision. It's possible, but perhaps that's more appropriate for a later revision. Potential clients include a modified msp430simu and a GTK# GUI.

I don't intend to make a public release of the present version, but I'll send individual copies by email upon request.

Rabu, 02 Januari 2008

Tracing with MSP430simu, LaTeX, and PowerPoint

by Travis Goodspeed <travis at utk.edu>
at the Extreme Measurement Communications Center
of the Oak Ridge National Laboratory

I have need of a decent MSP430 simulator, and I can't seem to find any documentation for msp430simu, which is an auxiliary part of the MSPGCC project. What follows are some unorganized observations that I've made regarding the code and my use of it to render LaTeX for my upcoming presentation at TIDC '08. Forgive me if it isn't terribly coherent or well organized: It's better than nothing and it ought to save you a lot of time if you find yourself in the same position as I find myself.

The simulator runs code targeted toward a msp430x135, built with mspgcc. IAR could likely be used as well, see TI EZ430 in Linux with IAR Kickstart for details.

Caveat lector--I had no prior experience with Python, and I wrote this code as a quick hack to generate my slides, not as something to release or maintain.

Be sure to download and review the msp430simu code. This article will make little sense without it. I expect my readers to get their hands dirty!

cvs -d:pserver:anonymous@mspgcc.cvs.sourceforge.net:/cvsroot/mspgcc login
cvs -z3 -d:pserver:anonymous@mspgcc.cvs.sourceforge.net:/cvsroot/mspgcc co msp430simu

core.disassemble

After grabbing the source, the Makefile was sufficiently self-explanatory to get an example project up and running. I wished to demonstrate a string copy by dumping the result to LaTeX, so I threw in a dump to disassemble the code and spit it to a file. This worked well, except that my code would behave unpredictably. Adding a single branch would change the execution time from 23 ticks to a timeout after a few thousand ticks!

My mistake was in forgetting to copy PC before calling core.disassemble, which--God only know why--advances the PC to the next address. Thus, whenever I disassembled an address, I'd accidentally advance the PC twice after executing a single instruction!

The repaired code, cited below, copies PC and decodes the copy. This may be called without damaging the execution, and does not alter the simulation's results.

    def printme(self,step,FILE):
      #Decode a copy of the pc, so the PC itself isn't advanced.                                                             
      pc=core.PC(self,int(self.PC));
      name, args, execfu, cycles=self.disassemble(pc);
      
      #FILE I/O goes here

RAM

Variables within RAM can be read between instruction executions by calling core.memory.get(). By calling this between instructions, it's possible to watch variables. In my case, rather than stepping through hundreds of slides to get to the fun stuff, I can instead only print a slide when the watched variables change. How cool is that?

The only difficult part here is that you have to know the address of the object you wish to view. Luckily, with optimizations disabled, GCC begins them at the start of ram--0x0200 for the msp430x135. Just as in the heap of an architecture with memory to spare for malloc(), global variables begin at the bottom and grow upward while the stack begins at the top of RAM and grows downward. My globals follow:

int r=0xbeef;
char *foo="Hello world.";
const char *bar="Hey.";

Of course, I'm liable to screw this up if I predict the compiler's actions, so I double-check variable addresses with gdb. In the case of int r=0xBEEF being the first global, I find that:

(gdb) x/h 0x200
0x200 <_r>:     0xbeef
(gdb)

Note that the default value--0xBEEF in this instance--does not exist during the beginning of actual execution of the program. (An early revision of this article erroneously stated that this was never set. This mistake was a result of a bug in my code.) Value--as all of RAM--is initialized to 0x0000. It is only loaded to its specified value during the resetvector function, which is generated by the compiler.

Following _r are two strings. Rather, two pointers to strings, which belong to the two global strings that I instantiated:

(gdb) x/xh 0x200
0x200 <_r>:     0xbeef
(gdb) 
0x202 <foo>:    0x1170
(gdb) 
0x204 <bar>:    0x117d
(gdb) x/s 0x1170
0x1170 <test_puts+48>:   "Hello world."
(gdb) x/s 0x117d
0x117d <test_puts+61>:   "Hey."
(gdb)

Note the common C fallacy that I accidentally committed. Not only my const char* but also my char* are RAM pointers to ROM strings. The values will be loaded by the resetvector, but it will make tracing more difficult when I add string value dumping later.

Thus, I change my C code to

#define bar "Hey."
int r=0xbeef;
char foo[]="Hello world.";

And I now get in GDB:

(gdb) x/xh 0x200
0x200 <_r>:     0xbeef
(gdb) x/s 0x202
0x202 <foo>:     "Hello world."
(gdb)

Now the string foo exists in RAM at 0x202. That is to say that &(foo[0])==202; earlier, &foo==202. This is much easier to find by address when watching variables.

core.memory.get(addr, bytemode=0)

Grabbing an integer is easy, just make a function like the following:

    def getint(self, addr, bytemode=0):
        return self.memory.get(addr, bytemode);

To grab a string--which in my examples is a character array rather than a pointer to a character--, read and concatenate a series of integers.

    def getstr(self, addr):
        str='';
        c=1;
        while(c!=0):
            c=self.getint(addr,1);
            addr+=1;
            str+=chr(c);
        return str;

Note that bytemode is set to 1 so as to receive single bytes rather than full (16-bit) words. Printing self.getstr(0x202) while running the simulation runs strcpy(foo,bar) gives me the following:

...
getstr()=Hello w
getstr()=Hello w
getstr()=Hello w
getstr()=Hello w
getstr()=Hello wo
getstr()=Hello wo
getstr()=Hello wo
getstr()=Hello wo
getstr()=Hello wor
getstr()=Hello wor
getstr()=Hello wor
getstr()=Hello wor
getstr()=Hello worl
...

This works, but it's instruction-accurate. Few people have the patience to sit through a 90-minute lecture on machine-language. No one has the patience to sit through such a lecture when it takes four slides to copy a byte. To keep my audience awake, my code only prints a slide when something interesting happens, so the result is not a frame-by-frame record of execution but just slices of time at which watched variables change.

For use in LaTeX, it's necessary to sanitize the string output, particularly if it is to later be corrupted. As this is for a conference presentation--rather than a paper--I use Beamer to generate a PDF slideshow, pdf2oo to generate an OpenOffice Impress presentation, and OpenOffice to export to PowerPoint.

Continued

This article is a continued as MSP430simu and LaTeX, part 2.

Jumat, 30 November 2007

TI EZ430 in Linux with IAR Kickstart

by Travis Goodspeed [travis at utk.edu]
at the Extreme Measurement Communications Center
of the Oak Ridge National Laboratory

What follows are instructions for running the free version of IAR's C compiler for the MSP430 with Texas Instruments' EZ430 development tool in Linux under Wine. This will not work for Mac OS X until msp430-gdbproxy is made available for that platform. Also, this might not work with the full version of the compiler.

These instructions assume that you've installed wine, mspgcc, and msp430-gdbproxy. The assumption is also made that you've purchased the EZ430-F2013 development tool from Texas Instruments.

IAR Embedded Workbench

First, download slac050q.zip from the EZ430-F2013 page. Unzip it to get FET_R510.exe. Running wine FET_R510.exe installs the compiler to your C: drive under wine.

Next, you must find the executable and run it.


karen% find ~/.wine/drive_c -name icc\*.exe
/home/travis/.wine/drive_c/Program Files/IAR Systems/Embedded Workbench 4.0/430/bin/icc430.exe
karen% wine "C:\Program Files\IAR Systems\Embedded Workbench 4.0\430\bin\icc430.exe"
  IAR MSP430 C/C++ Compiler V4.09A/W32  [Kickstart]
  Copyright 1996-2007 IAR Systems. All rights reserved.

Available command line options:
--char_is_signed
               'Plain' char is treated as signed char
--core {430|430X}
               The processor core
                  430       (default)
                  430X    
--data_model {small|medium|large}
               Select data model (only for 430X core)
                  small      Small model
                        16 bit registers. __data16 default. (default)
                  medium     Medium model
                        20 bit registers. __data16 default. __data20 allowed.
                  large      Large model
                        20 bit registers. __data20 default. __data16 allowed.
--debug
-r              Insert debug info in object file
--dependencies=[i][m] file|directory
               List file dependencies
                  i     Include filename only (default)
                  m     Makefile style
--diagnostics_tables file|directory
               Dump diagnostic message tables to file
--diag_error tag,tag,...
               Treat the list of tags as error diagnostics
--diag_remark tag,tag,...
               Treat the list of tags as remark diagnostics
--diag_suppress tag,tag,...
               Suppress the list of tags as diagnostics
--diag_warning tag,tag,...
               Treat the list of tags as warning diagnostics
--discard_unused_publics
               Discard unused public functions and variables (experimental)
--dlib_config pathname
               Specify DLib library configuration file
--double {32|64}
               The size of the double floating point type
                  32     32 bits (default)
                  64     64 bits
--ec++          Embedded C++
--eec++         Extended EC++ (EC++ with templates/namespaces/mutable/casts)
--enable_multibytes
               Enable multibyte support
--error_limit limit
               Stop after this many errors (0 = no limit)
--header_context
               Adds include file context to diagnostics
--library_module
               Make a library module
--lock_r4       Exclude register R4 from use by the compiler
--lock_r5       Exclude register R5 from use by the compiler
--mfc           Enable multiple file compilation (experimental)
--migration_preprocessor_extensions
               Enable IAR migration preprocessor extensions
--misrac        Enable MISRA C diagnostics (not available)
--misrac_verbose
               Enable verbose MISRA C messages (not available)
--module_name name
               Set module name
--no_code_motion
               Disable code motion
--no_cse        Disable common sub-expression elimination
--no_fragments  Do not generate section fragments
--no_inline     Disable function inlining
--no_path_in_file_macros
               Strip path from __FILE__ and __BASE_FILE__ macros
--no_tbaa       Disable type based alias analysis
--no_typedefs_in_diagnostics
               Don't use typedefs when printing types
--no_unroll     Disable loop unrolling
--no_warnings   Disable generation of warnings
--no_wrap_diagnostics
               Don't wrap long lines in diagnostic messages
--omit_types    Omit function/variable type info in object output
--only_stdout   Use stdout only (no console output on stderr)
--output file|path
-o file|path    Specify object file
--pic           Generate position independent code
--preinclude filename
               Include file before normal source
--preprocess=[c][n][l] file|directory
               Preprocessor output
                  c     Include comments
                  n     Preprocess only
                  l     Include #line directives
--public_equ symbol[=value]
               Define public assembler symbol (EQU)
--reduce_stack_usage
               Reduce usage of stack at the cost of larger and slower code
--regvar_r4     Allow register R4 to be used as a global register variable
--regvar_r5     Allow register R5 to be used as a global register variable
--remarks       Enable generation of remarks
--require_prototypes
               Require prototypes for all called or public functions
--save_reg20    Save 20-bit registers in interrupt functions
--silent        Silent operation
--strict_ansi   Strict ANSI rules
--warnings_affect_exit_code
               Warnings affect exit code
--warnings_are_errors
               All warnings are errors
-D symbol[=value]
               Define macro (same as #define symbol [value])
-e              Enable IAR C/C++ language extensions
-f file         Read command line options from file
-I directory    Add #include search directory
-l[c|C|D|E|a|A|b|B][N][H] file|directory
               Output list file
                  c     C source listing
                  C        with assembly code
                  D        with pure assembly code
                  E        with non-sequential assembly code
                  a     Assembler file
                  A        with C source
                  b     Basic assembler file
                  B        with C source
                  N     Don't include diagnostics
                  H     Include header file source lines
-O[n|l|m|h|hs|hz]
               Select level of optimization:
                  n   No optimizations
                  l   Low optimizations (default)
                  m   Medium optimizations
                  h   High optimizations
                  hz  High optimizations, tuned for small code size
                  hs  High optimizations, tuned for high speed
                      (-O without argument) The same setting as -Oh
-s{0-9}         Optimize for speed:
                  0-2   Debug
                  3     Low
                  4-6   Medium
                  7-9   High
-z{0-9}         Optimize for size:
                  0-2   Debug
                  3     Low (default)
                  4-6   Medium
                  7-9   High
karen%

The usage information will be valuable, but is too long to scroll through. Pipe it to a textfile for later reference. Also, make some symlinks to more easily get at include files and the documentation:


karen% sudo ln -s /home/travis/.wine/drive_c/Program\ Files/IAR\ Systems/Embedded\ Workbench\ 4.0 /opt/IAR
karen% ls /opt/IAR/430/doc/
EW430_AssemblerReference.pdf  HelpMISRAC.chm              embOS_IAR_plugin.pdf
EW430_CompilerReference.pdf   IAR_Systems.jpg             ew430.htm
EW430_MigrationGuide.pdf      MSP-FET430 Users Guide.pdf  htm.gif
EW430_UserGuide.pdf           a430.htm                    icc430.htm
EW_MisraCReference.pdf        a430_msg.htm                icc430_msg.htm
Help430Compiler.chm           appnotes                    migration.htm
Help430Contents.ENU.chm       clib.pdf                    pdf.gif
Help430IDE1.chm               cs430.htm                   readme.htm
Help430IDE2.chm               embOSRelease.htm            uC-OS-II-KA-CSPY-UserGuide.pdf
karen%

Make scripts for both the compiler and the assembler. I'm uninterested in the IDE.


#!/bin/sh
#/usr/local/bin/a430
wine "C:\Program Files\IAR Systems\Embedded Workbench 4.0\430\bin\a430.exe" $*

#!/bin/sh
#/usr/local/bin/icc430
wine "C:\Program Files\IAR Systems\Embedded Workbench 4.0\430\bin\icc430.exe" $*

The compiler's options are very different from those of GCC, and you must remember (or update your script) to include the IAR include directory if you intend to use its headers. A test compile of the LED blinker from slac080b.zip follows.


karen% icc430 -I "Z:\opt\IAR\430\inc" msp430x20x3_1.c --output blink.exe

  IAR MSP430 C/C++ Compiler V4.09A/W32  [Kickstart]
  Copyright 1996-2007 IAR Systems. All rights reserved.

34 bytes of CODE memory
 0 bytes of DATA memory (+ 4 bytes shared)

Errors: none
Warnings: none
karen%

Now that the compiler is working, you'll need a linker. I use the following script:


#!/bin/sh
opts="-f Z:\opt\IAR\430\config\lnk430F2013.xcl -Fintel-standard  Z:\opt\IAR\430\LIB\CLIB\cl430f.r43 -s __program_start "
xlink="C:\Program Files\IAR Systems\Embedded Workbench 4.0\common\bin\xlink.exe"
wine "$xlink" $* $opts

msp430-objcopy aout.a43 aout.exe

The format switch, -Fintel-standard, makes the output file in the ihex format, one which msp430-objcopy can handle. This will let us program the board using msp430-gdb, so the GNU tools may be used to load the executable. Also note that you'll need to uncomment lines 76 and 77 of /opt/IAR/430/config/lnk430F2013.xcl to define the stack and heap sizes. This script is called as xlink msp430x20x3_1.r43.

The following is a functional, if inelegant, Makefile:


ALL=msp430x20x3_1.exe

all: $(ALL)

msp430x20x3_1.r43:  msp430x20x3_1.c
       icc430 -I "Z:\opt\IAR\430\inc" msp430x20x3_1.c
msp430x20x3_1.exe: msp430x20x3_1.r43
       xlink msp430x20x3_1.r43
       cp aout.exe msp430x20x3_1.exe

GDB

Assuming that msp430-gdb and the USB-FET drivers have been properly installed, the GDB server can be loaded as


karen% msp430-gdbproxy msp430 --spy-bi-wire /dev/ttyUSB0

Remote proxy for GDB, v0.7.1, Copyright (C) 1999 Quality Quorum Inc.
MSP430 adaption Copyright (C) 2002 Chris Liechti and Steve Underwood

GDBproxy comes with ABSOLUTELY NO WARRANTY; for details
use `--warranty' option. This is Open Source software. You are
welcome to redistribute it under certain conditions. Use the
'--copying' option for details.

debug: MSP430_Initialize()
debug: MSP430_Configure()
debug: MSP430_VCC(3000)
debug: MSP430_Identify()
info:      msp430: Target device is a 'MSP430F20x3' (type 52)
debug: MSP430_Configure()
notice:    msp430-gdbproxy: waiting on TCP port 2000

Your ~/.gdbinit file should be


set remoteaddresssize 16
set remotetimeout 999999
target remote localhost:2000
monitor interface spy-bi-wire

msp430-gdb runs with no options. Use load foo.exe to load an executable that has been made by msp430-objcopy.


karen% msp430-gdb
GNU gdb 6.0
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "--host=i686-pc-linux-gnu --target=msp430".notice:    msp430-gdbproxy: connected
debug: MSP430_Registers(READ)
0x0000f800 in ?? ()

(gdb) load aout.exe
Loading section .sec1, size 0x38 lma 0xf800
debug: MSP430_Memory(WRITE)
Loading section .sec2, size 0x2 lma 0xfffe
debug: MSP430_Memory(WRITE)
Start address 0xf800, load size 58
Transfer rate: 464 bits in <1>

Note that without "monitor interface spy-bi-wire" in .gdbinit and "--spy-bi-wire" to msp430-gdbproxy, load will still work but many debugging functions will not. Also note that the run command seems to have issues with spy-bi-wire; use continue instead.

You should now be able to play around with the MSP430. Grab the msp430f2013 datasheet and family guide if you'll be doing anything fancy.

Kamis, 02 Agustus 2007

On the IAR MSP430 C Compiler's Inefficient Register Utilization

Recently, I've been digging into the documentation of Texas Instruments' MSP430 micro-controller family. After covering the CPU itself, I continued into the documentation[1] for the mspgcc project, a port of GCC to the MSP430. After realizing that the ABI used for mspgcc had never been defined in the chip's documentation[3], I dug up the manual[2] for IAR's compiler and compared the two.

I quickly discovered that IAR's compiler wastes registers when passing 16-bit parameters to a C function. By its ABI, the first 16-bit parameter is placed into R12 and the second into R14. R13 and R15 remain unused, as they are reserved for the high words of 32-bit parameters. GCC follows the much more logical route of only assigning a single register to a 16-bit value, such that R15 is used for the first parameter, R14 for the second, R13 for the third, and R12 for the fourth. This allows it to accept four parameters by register, while IAR's compiler will push the third and fourth onto the stack while leaving two clobber registers unused!

To demonstrate this, I have compiled a simple C program containing only a function foo() which returned the sum of its four inputs and a main() method to call foo(). This was compiled to assembly language using mspgcc 3.2.3 and IAR MSP430 C/C++ Compiler V3.42A/W32.

In both compilers, four assembly instructions were used to add the values and return the result in the single register of the first parameter, R12 for IAR and R15 for GCC. The table below lists the assembly generated by each compiler, with instructions converted from the GCC format (lowercase, .W omitted) to the IAR format for clear comparison. GCC, by virtue of its more efficient register usage, avoids both having to PUSH.W two parameters onto the stack and avoids having to use the indexed addressing mode, as X(SP), within the function.

IAR Compiler	GCC Compiler
foo: ADD.W R14, R12 //1 cycle, 1 word ADD.W 0x2(SP), R12 //3c,2w ADD.W 0x4(SP), R12 //3c,2w RET	foo: ADD.W R14, R15 //1c,1w ADD.W R13, R15 //1c,1w ADD.W R12, R15 //1c,1w RET

Pages 3-72 and 3-73 of the MSP430 Family Guide[3] detail the full cost of these additions, which increase not only the runtime but also the storage requirements of the function. According to those pages, "ADD.W r14,r15" takes 1 cycle and 1 word of memory while "ADD.W 0x2(SP), R12" takes 3 cycles and 2 words of memory. Additionally, each of the two PUSH.W statements required to call foo() in the IAR compiler takes 3 cycles, which are unnecessary in GCC.

Texas Instruments' Code Composer Essentials does not suffer from IAR's inefficiency; rather, it uses an ABI similar to but incompatible with GCC. TICCE allocates register R12 for the first parameter, then R13, R14, and R15. The result is returned in R12. GCC uses registers in the opposite order and returns in R15. See the Users Guide[4] for more details.

What's the reasoning behind IAR's design? It makes functions of two 32-bit values easily compatible with those of two 16-bit values, but this compatibility breaks as soon as the third parameter comes into play, which is pushed onto the stack as a single word. If such compatibility were essential, the trick could be maintained by using R13 for the third parameter and R15 for the fourth.

Sources:
[1] mspgcc manual
[2] IAR manuals
[3] Texas Instruments's MSP430 Family Guide
[4] MSP430 Optimizing C/C++ Compiler User's Guide (SLAU132)