The Unterminated String

Embedded Things and Software Stuff

The Greedy C Runtime

Posted at — Apr 29, 2017

I occasionally browse hackaday with the hope that some project will guilt inspire me into creating something interesting. A posting which got my attention was the 1 kB Challenge. This was a competition run at the end of 2016, with the key stipulation being:

Projects must use 1 kB or less of code, including any initialized data tables, bootloaders, and executable code.

The competition is now long over and I didn’t write so much as a single line of code for it. However, it got me thinking about the MSP430G2 Launchpad, a small developer board I used to experiment with. If there has ever been a device to make me conscious of program size it was this one.

The current revision of the Launchpad comes with a MSP430G2553 MCU. Within its family this is a fairly “high end” part, but it’s 16 kB of flash memory and 512 B RAM aren’t exactly generous. Especially with several vendors now offering low cost, low power ARM Cortex M0 parts.

I recalled that even a basic C program used a significant amount of the MSP430G2553’s limited resources. Mostly for nostalgia’s sake I thought I would try to understand why.

The Code

The program I decided to test with is the MSP430 equivalent of “hello world”. For those unfamiliar with the Launchpad it:

The complete source can be seen below. Additionally it and any supporting files can be found in the github repository linked at the bottom of this article.

#include <msp430.h>

#define RED         BIT0
#define GREEN       BIT6

int main(void) 
{
    WDTCTL = WDTPW | WDTHOLD;

    P1DIR |= RED | GREEN;
    P1OUT |= RED | GREEN;

    while (1)
    {
        __bis_SR_register(LPM4_bits);
    }

    return 0;
}

Default / Non Optimised Build

ELF Size

When compiled with msp430-elf-gcc (without providing any fancy arguments) a binary with the following footprint is produced:

$ msp430-elf-size output/default.elf 

   text	   data	    bss	    dec	    hex	filename
    694	     16	     22	    732	    2dc output/default.elf

This basic program produced a whopping 732 B output, as indicated by “dec”. This value includes the program’s machine instructions (“text”), initialised data (“data”) and uninitialized data (“bss”). This is well past the halfway point of the 1 kB challenge.

The majority of the space is taken up by machine instructions (“text”). Certain that a near-empty main() function could not be responsible for generating so much output, I started probing the binary to see what it contained.

Main Disassembly

Below is the disassembly of main(). This was generated by using msp430-elf-objdump to disassemble the contents of the compiled binary. With the C source being included inline I won’t bother going into the specifics of the assembly instructions. For those interested, the assembly mnemonics and their descriptions can be found in the MSP430x2xx Family User’s Guide.

0000c142 <main>:
#define RED         BIT0
#define GREEN       BIT6 

int main(void) 
{
    WDTCTL = WDTPW | WDTHOLD;
    c142:	b2 40 80 5a 	mov	#23168,	&0x0120	;#0x5a80
    c146:	20 01 

0000c148 <.Loc.36.1>:

    P1DIR |= RED | GREEN;               
    c148:	5c 42 22 00 	mov.b	&0x0022,r12	;0x0022
    c14c:	7c d0 41 00 	bis.b	#65,	r12	;#0x0041
    c150:	3c f0 ff 00 	and	#255,	r12	;#0x00ff
    c154:	c2 4c 22 00 	mov.b	r12,	&0x0022	;

0000c158 <.Loc.37.1>:
    P1OUT |= RED | GREEN;
    c158:	5c 42 21 00 	mov.b	&0x0021,r12	;0x0021
    c15c:	7c d0 41 00 	bis.b	#65,	r12	;#0x0041
    c160:	3c f0 ff 00 	and	#255,	r12	;#0x00ff
    c164:	c2 4c 21 00 	mov.b	r12,	&0x0021	;

0000c168 <.L2>:

    while (1)
    {
        __bis_SR_register(LPM4_bits);
    c168:	32 d0 f0 00 	bis	#240,	r2	;#0x00f0

0000c16c <.Loc.42.1>:
    }
    c16c:	30 40 68 c1 	br	#0xc168		;

The important takeaway is the assembled subroutine for main() is only 46 bytes long. This means the vast majority of the binary’s instructions are coming from elsewhere.

Everything Else

In the table below are the sizes, in bytes, of the other subroutines present in the disassembly of the elf file. Alongside this is a description of what I think the various subroutines are attempting to do. Please take the descriptions with a pinch of salt as I didn’t devote enough time to do it justice.

Name Size (Bytes) Description
__msp430_resetvec_hook 2 On reset jump to __crt0_start
__crt0_start 4 Loads stack address into R1
__crt0_init_bss 14 Call memset on bss
__crt0_movedata 20 Copy data section from ROM to RAM (unknown 4 bytes?)
__crt0_call_init_then_main 10 Calls initialization code -call___do_global_ctors_aux then main
_msp430_run_init_array 14 Loads the start and end address of an array of subroutines to call before calling into _msp430_run_array. The array is 0 length.
_msp430_run_preinit_array 14 Loads the start and end address of an array of subroutines to call before calling into _msp430_run_array. The array is 0 length.
_msp430_run_fini_array 16 Loads the start and end address of an array of subroutines to call before calling into _msp430_run_array. The array is 0 length.
_msp430_run_array 14 Would call each subroutine from an array of their addresses. All callers to this have arrays of length 0.
_msp430_run_done 6 Return instruction for _msp430_run_array. Has 3 calls to ret?
deregister_tm_clones 30 See register_tm_clones
register_tm_clones 46 Appears to relate to transactional memory, which apparently is to make threading easier. Seems unlikely this would be required on a MSP430, unless it can benefit interrupts?
__do_global_dtors_aux 78 Attempts to iterate over an empty array of function pointers (__DTOR_LIST__).
call___do_global_dtors_aux 44 Calls register_tm_clones after a lot of value checking.
__mspabi_func_epilog* 16 Fall-through instructions to pop r4-r10 before returning. Defined in the EABI with the intention of reducing code size.
__mspabi_srli* 74 Fall-through subroutines to logical shift an int right. Right shifts through carry and clears carry.
__mspabi_srll* 106 Fall-through subroutines to logical shift a long right. Right shifts through carry and clears carry.
memmove 64 Included for __crt0_movedata
memset 22 Included for __ctr0_init_bss
__do_global_ctors_aux 26 Tries to call various functions from the array __CTOR_LIST__. The array itself is empty. Handler code for C++ constructors?
call___do_global_ctors_aux 18 Calls: call___do_global_dtors_aux, __do_global_ctors_aux, _msp430_run_preinit_array, _msp430_run_init_array.
__msp430_fini 10 Calls _msp430_run_fini_array then __do_global_dtors_aux

Somewhat unsurprisingly, these subroutines suggest the C language runtime library is responsible for using the rest of the memory. The runtime provides various supporting functions for the C language. For example, managing the stack is not something which you actively need to think about when writing C, but it happens in the background nonetheless.

Shrinking It

Attempt 1 - Optimise for Size

My first approach to reducing the binary size was using GCC’s optimization option intended for that specific purpose. This is enabled by passing the -Os switch to GCC.

The outcome of this was a meager saving of 20 bytes, with the entirety of this saving coming from GCC optimising main().

Attempt 2 - Minrt

The MSP430 GCC toolchain has an additional switch to reduce binary size, -minrt. I stumbled across this option in some documentation written by one of the developers. The snippet from the MSP430 GCC manpage states that -minrt will:

Enable the use of a minimum runtime environment - no static initializers or constructors. This is intended for memory-constrained devices. The compiler includes special symbols in some objects that tell the linker and runtime which code fragments are required.

Enabling this option strips away several subroutines in their entirety. As documented in the table above there were several redundant subroutines which operated on zero length arrays. The binary produced with -minrt enabled contains a total of 58 bytes of machine instructions (“text”).

Of these 58 bytes, the pre-main “minimal runtime” is only 12 bytes long:

0000c000 <__crt0_start>:
    c000:   31 40 00 04     mov #1024,  r1
0000c004 <__crt0_call_just_main>:
    c004:   0c 43           clr r12
    c006:   b0 12 0a c0     call    #49162

It ensures:

Attempt 3 - Combining Minrt and Os

The options -minrt and -Os can be specified simultaneously. This has the effect of producing a binary with a minimal runtime and a size optimised main() subroutine.

This results in a binary with a mere 38 bytes of machine instructions. This is a significant reduction from the original binary and at least offers the chance of squeezing something interesting out of 1 kB.

Source Code

This Github repository contains the source code, Makefile and disassembly used for this post. It additionally contains a handful of the relevant files taken from the MSP430 GCC source code which provide some of the runtime subroutines that have been referenced here.

The Various Sources of Runtime Code

I tried to track down as many of the files as possible which had input to the compiled binary.

The files I found were obtained from three sources:

The header files and linker scripts for the MSP430 MCUs e.g. msp430g2553.h are not shipped in the MSP430 GCC source code but are provided in the executable installer version. This list of instructions suggest that if building GCC from source, these files should be obtained separately from the “msp430-gcc-support-files.zip” package.

The source code for the various runtime subroutines is split between the newlib and libgcc directories of the GCC source code.

Newlib is an implementation of the C standard runtime library, e.g. the code which implements the functions found in string.h or stdlib.h, etc. According to it’s wiki page it was written with a focus on embedded systems which do not have an operating system (aka “bare metal”). I found this porting guide provides relevant information on its workings.

The additional subroutines pulled in from libgcc on the other hand appear to be for the handling of miscellaneous aspects of the C runtime. In particular code from the crtstuff.c file ends up in the non-minrt binary. This file is documented as providing:

Specialized bits of code needed to support construction and destruction of file-scope objects in C++ code

I don’t have any knowledge of C++ but this file contains functions referencing ctor and dtor which I assume are written for this purpose.

Summary of Suboutine Sizes

Below is a listing of the names of the subroutines and their various sizes, in bytes, for different optimisation levels.

Name Default -Os minrt minrt and -Os
__reset_vector 2 2 2 2
__crt0_start 4 4 4 4
__crt0_just_call_main 6 6
__ctr0_init_bss 14 14
__crt0_movedata 20 20
__crt0_call_init_then_main 10 10
_msp430_run_init_array 14 14
_msp430_run_preinit_array 14 14
_msp430_run_fini_array 16 16
_msp430_run_array 14 14
_msp430_run_done 6 6
deregister_tm_clones 30 30
register_tm_clones 46 46
__do_global_dtors_aux 78 78
call__do_global_dtors_aux 44 44
main 46 26 46 26
__mspabi_func_epilog* 16 16
__mspabi_srli* 74 74
__mspabi_srll* 106 106
memmove 64 64
memset 22 22
__do_global_ctors_aux 26 26
call__do_global_ctors_aux 18 18
__msp430_fini 10 10
Totals 694 674 58 38

Resources

The following resources were helpful in understanding the disassembly:

Note: The installed GCC version and GCC source code version were mismatched as this is what I had on hand. I don’t believe there were any significant changes to the files of interest to me between the versions. Both versions of GCC were obtained from TI’s Website.

Misc Notes

gc-sections

Another option which can help reduce binary size is: -Wl,--gc-sections

This consists of two parts:

-Wl, - Passes the option following the comma to the linker (i.e. to msp430-elf-ld).

--gc-sections - This is a linker option which per the man page will:

“Enable garbage collection of unused input sections”.

The inclusion of -Wl,-gc-sections to the invocation of GCC reduces the size of “text” from the default and space optimised build by 6 bytes. The subroutines which seen the saving, _msp430_run_done and memset, are not present when -minrt was enabled.

Interestingly this saving came from removing redundant return from subroutine (ret) instructions present within these two subroutines. For instance, the unoptimized build produced the following subroutine, which clearly has two unrequired ret instructions.

0000c076 <_msp430_run_done>:
    c076:    30 41        ret			

 0000c078 <L0>:
    c078:    30 41        ret			
    c07a:    30 41        ret