Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Programming
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Databases
Mail Systems
openSolaris
Eclipse Documentation
Techotopia.com
Virtuatopia.com
Answertopia.com

How To Guides
Virtualization
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Windows
Problem Solutions
Privacy Policy

  




 

 

4.17. Hardware Models and Configurations

Earlier we discussed the standard option -b which chooses among different installed compilers for completely different target machines, such as VAX vs. 68000 vs. 80386.

In addition, each of these target machine types can have its own special options, starting with -m, to choose among various hardware models or configurations--for example, 68010 vs 68020, floating coprocessor or none. A single installed version of the compiler can compile for any model or configuration, according to the options specified.

Some configurations of the compiler also support additional special options, usually for compatibility with other compilers on the same platform.

These options are defined by the macro TARGET_SWITCHES in the machine description. The default for the options is also defined by that macro, which enables you to change the defaults.

4.17.1. IBM RS/6000 and PowerPC Options

These -m options are defined for the IBM RS/6000 and PowerPC:

-mpower, -mno-power, -mpower2, -mno-power2, -mpowerpc, -mno-powerpc, -mpowerpc-gpopt, -mno-powerpc-gpopt, -mpowerpc-gfxopt, -mno-powerpc-gfxopt, -mpowerpc64, -mno-powerpc64

GCC supports two related instruction set architectures for the RS/6000 and PowerPC. The POWER instruction set are those instructions supported by the rios chip set used in the original RS/6000 systems and the PowerPC instruction set is the architecture of the Motorola MPC5xx, MPC6xx, MPC8xx microprocessors, and the IBM 4xx microprocessors.

Neither architecture is a subset of the other. However there is a large common subset of instructions supported by both. An MQ register is included in processors supporting the POWER architecture.

You use these options to specify which instructions are available on the processor you are using. The default value of these options is determined when configuring GCC. Specifying the -mcpu=cpu_type overrides the specification of these options. We recommend you use the -mcpu=cpu_type option rather than the options listed above.

The -mpower option allows GCC to generate instructions that are found only in the POWER architecture and to use the MQ register. Specifying -mpower2 implies -power and also allows GCC to generate instructions that are present in the POWER2 architecture but not the original POWER architecture.

The -mpowerpc option allows GCC to generate instructions that are found only in the 32-bit subset of the PowerPC architecture. Specifying -mpowerpc-gpopt implies -mpowerpc and also allows GCC to use the optional PowerPC architecture instructions in the General Purpose group, including floating-point square root. Specifying -mpowerpc-gfxopt implies -mpowerpc and also allows GCC to use the optional PowerPC architecture instructions in the Graphics group, including floating-point select.

The -mpowerpc64 option allows GCC to generate the additional 64-bit instructions that are found in the full PowerPC64 architecture and to treat GPRs as 64-bit, doubleword quantities. GCC defaults to -mno-powerpc64.

If you specify both -mno-power and -mno-powerpc, GCC will use only the instructions in the common subset of both architectures plus some special AIX common-mode calls, and will not use the MQ register. Specifying both -mpower and -mpowerpc permits GCC to use any instruction from either architecture and to allow use of the MQ register; specify this for the Motorola MPC601.

-mnew-mnemonics, -mold-mnemonics

Select which mnemonics to use in the generated assembler code. With -mnew-mnemonics, GCC uses the assembler mnemonics defined for the PowerPC architecture. With -mold-mnemonics it uses the assembler mnemonics defined for the POWER architecture. Instructions defined in only one architecture have only one mnemonic; GCC uses that mnemonic irrespective of which of these options is specified.

GCC defaults to the mnemonics appropriate for the architecture in use. Specifying -mcpu=cpu_type sometimes overrides the value of these option. Unless you are building a cross-compiler, you should normally not specify either -mnew-mnemonics or -mold-mnemonics, but should instead accept the default.

-mcpu=cpu_type

Set architecture type, register usage, choice of mnemonics, and instruction scheduling parameters for machine type cpu_type. Supported values for cpu_type are 401, 403, 405, 405fp, 440, 440fp, 505, 601, 602, 603, 603e, 604, 604e, 620, 630, 740, 7400, 7450, 750, 801, 821, 823, 860, 970, common, ec603e, G3, G4, G5, power, power2, power3, power4, power5, powerpc, powerpc64, rios, rios1, rios2, rsc, and rs64a.

-mcpu=common selects a completely generic processor. Code generated under this option will run on any POWER or PowerPC processor. GCC will use only the instructions in the common subset of both architectures, and will not use the MQ register. GCC assumes a generic processor model for scheduling purposes.

-mcpu=power, -mcpu=power2, -mcpu=powerpc, and -mcpu=powerpc64 specify generic POWER, POWER2, pure 32-bit PowerPC (i.e., not MPC601), and 64-bit PowerPC architecture machine types, with an appropriate, generic processor model assumed for scheduling purposes.

The other options specify a specific processor. Code generated under those options will run best on that processor, and may not run at all on others.

The -mcpu options automatically enable or disable the following options: -maltivec, -mhard-float, -mmfcrf, -mmultiple, -mnew-mnemonics, -mpower, -mpower2, -mpowerpc64, -mpowerpc-gpopt, -mpowerpc-gfxopt, -mstring. The particular options set for any particular CPU will vary between compiler versions, depending on what setting seems to produce optimal code for that CPU; it doesn't necessarily reflect the actual hardware's capabilities. If you wish to set an individual option to a particular value, you may specify it after the -mcpu option, like -mcpu=970 -mno-altivec.

On AIX, the -maltivec and -mpowerpc64 options are not enabled or disabled by the -mcpu option at present, since AIX does not have full support for these options. You may still enable or disable them individually if you're sure it'll work in your environment.

-mtune=cpu_type

Set the instruction scheduling parameters for machine type cpu_type, but do not set the architecture type, register usage, or choice of mnemonics, as -mcpu=cpu_type would. The same values for cpu_type are used for -mtune as for -mcpu. If both are specified, the code generated will use the architecture, registers, and mnemonics set by -mcpu, but the scheduling parameters set by -mtune.

-maltivec, -mno-altivec

These switches enable or disable the use of built-in functions that allow access to the AltiVec instruction set. You may also need to set -mabi=altivec to adjust the current ABI with AltiVec ABI enhancements.

-mabi=spe

Extend the current ABI with SPE ABI extensions. This does not change the default ABI, instead it adds the SPE ABI extensions to the current ABI.

-mabi=no-spe

Disable Booke SPE ABI extensions for the current ABI.

-misel=yes/no, -misel

This switch enables or disables the generation of ISEL instructions.

-mspe=yes/no, -mspe

This switch enables or disables the generation of SPE simd instructions.

-mfloat-gprs=yes/no, -mfloat-gprs

This switch enables or disables the generation of floating point operations on the general purpose registers for architectures that support it. This option is currently only available on the MPC8540.

-mfull-toc, -mno-fp-in-toc, -mno-sum-in-toc, -mminimal-toc

Modify generation of the TOC (Table Of Contents), which is created for every executable file. The -mfull-toc option is selected by default. In that case, GCC will allocate at least one TOC entry for each unique non-automatic variable reference in your program. GCC will also place floating-point constants in the TOC. However, only 16,384 entries are available in the TOC.

If you receive a linker error message that saying you have overflowed the available TOC space, you can reduce the amount of TOC space used with the -mno-fp-in-toc and -mno-sum-in-toc options. -mno-fp-in-toc prevents GCC from putting floating-point constants in the TOC and -mno-sum-in-toc forces GCC to generate code to calculate the sum of an address and a constant at run-time instead of putting that sum into the TOC. You may specify one or both of these options. Each causes GCC to produce very slightly slower and larger code at the expense of conserving TOC space.

If you still run out of space in the TOC even when you specify both of these options, specify -mminimal-toc instead. This option causes GCC to make only one TOC entry for every file. When you specify this option, GCC will produce code that is slower and larger but which uses extremely little TOC space. You may wish to use this option only on files that contain less frequently executed code.

-maix64, -maix32

Enable 64-bit AIX ABI and calling convention: 64-bit pointers, 64-bit long type, and the infrastructure needed to support them. Specifying -maix64 implies -mpowerpc64 and -mpowerpc, while -maix32 disables the 64-bit ABI and implies -mno-powerpc64. GCC defaults to -maix32.

-mxl-call, -mno-xl-call

On AIX, pass floating-point arguments to prototyped functions beyond the register save area (RSA) on the stack in addition to argument FPRs. The AIX calling convention was extended but not initially documented to handle an obscure K&R C case of calling a function that takes the address of its arguments with fewer arguments than declared. AIX XL compilers access floating point arguments which do not fit in the RSA from the stack when a subroutine is compiled without optimization. Because always storing floating-point arguments on the stack is inefficient and rarely needed, this option is not enabled by default and only is necessary when calling subroutines compiled by AIX XL compilers without optimization.

-mpe

Support IBM RS/6000 SP Parallel Environment (PE). Link an application written to use message passing with special startup code to enable the application to run. The system must have PE installed in the standard location (/usr/lpp/ppe.poe/), or the specs file must be overridden with the -specs= option to specify the appropriate directory location. The Parallel Environment does not support threads, so the -mpe option and the -pthread option are incompatible.

-malign-natural, -malign-power

On Darwin and 64-bit PowerPC GNU/Linux, the option -malign-natural overrides the ABI-defined alignment of larger types, such as floating-point doubles, on their natural size-based boundary. The option -malign-power instructs GCC to follow the ABI-specified alignment rules. GCC defaults to the standard alignment defined in the ABI.

-msoft-float, -mhard-float

Generate code that does not use (uses) the floating-point register set. Software floating point emulation is provided if you use the -msoft-float option, and pass the option to GCC when linking.

-mmultiple, -mno-multiple

Generate code that uses (does not use) the load multiple word instructions and the store multiple word instructions. These instructions are generated by default on POWER systems, and not generated on PowerPC systems. Do not use -mmultiple on little endian PowerPC systems, since those instructions do not work when the processor is in little endian mode. The exceptions are PPC740 and PPC750 which permit the instructions usage in little endian mode.

-mstring, -mno-string

Generate code that uses (does not use) the load string instructions and the store string word instructions to save multiple registers and do small block moves. These instructions are generated by default on POWER systems, and not generated on PowerPC systems. Do not use -mstring on little endian PowerPC systems, since those instructions do not work when the processor is in little endian mode. The exceptions are PPC740 and PPC750 which permit the instructions usage in little endian mode.

-mupdate, -mno-update

Generate code that uses (does not use) the load or store instructions that update the base register to the address of the calculated memory location. These instructions are generated by default. If you use -mno-update, there is a small window between the time that the stack pointer is updated and the address of the previous frame is stored, which means code that walks the stack frame across interrupts or signals may get corrupted data.

-mfused-madd, -mno-fused-madd

Generate code that uses (does not use) the floating point multiply and accumulate instructions. These instructions are generated by default if hardware floating is used.

-mno-bit-align, -mbit-align

On System V.4 and embedded PowerPC systems do not (do) force structures and unions that contain bit-fields to be aligned to the base type of the bit-field.

For example, by default a structure containing nothing but 8 unsigned bit-fields of length 1 would be aligned to a 4 byte boundary and have a size of 4 bytes. By using -mno-bit-align, the structure would be aligned to a 1 byte boundary and be one byte in size.

-mno-strict-align, -mstrict-align

On System V.4 and embedded PowerPC systems do not (do) assume that unaligned memory references will be handled by the system.

-mrelocatable, -mno-relocatable

On embedded PowerPC systems generate code that allows (does not allow) the program to be relocated to a different address at runtime. If you use -mrelocatable on any module, all objects linked together must be compiled with -mrelocatable or -mrelocatable-lib.

-mrelocatable-lib, -mno-relocatable-lib

On embedded PowerPC systems generate code that allows (does not allow) the program to be relocated to a different address at runtime. Modules compiled with -mrelocatable-lib can be linked with either modules compiled without -mrelocatable and -mrelocatable-lib or with modules compiled with the -mrelocatable options.

-mno-toc, -mtoc

On System V.4 and embedded PowerPC systems do not (do) assume that register 2 contains a pointer to a global area pointing to the addresses used in the program.

-mlittle, -mlittle-endian

On System V.4 and embedded PowerPC systems compile code for the processor in little endian mode. The -mlittle-endian option is the same as -mlittle.

-mbig, -mbig-endian

On System V.4 and embedded PowerPC systems compile code for the processor in big endian mode. The -mbig-endian option is the same as -mbig.

-mdynamic-no-pic

On Darwin systems, compile code so that it is not relocatable, but that its external references are relocatable. The resulting code is suitable for applications, but not shared libraries.

-mprioritize-restricted-insns=priority

This option controls the priority that is assigned to dispatch-slot restricted instructions during the second scheduling pass. The argument priority takes the value 0/1/2 to assign no/highest/second-highest priority to dispatch slot restricted instructions.

-msched-costly-dep=dependence_type

This option controls which dependences are considered costly by the target during instruction scheduling. The argument dependence_type takes one of the following values: no: no dependence is costly, all: all dependences are costly, true_store_to_load: a true dependence from store to load is costly, store_to_load: any dependence from store to load is costly, number: any dependence which latency >= number is costly.

-minsert-sched-nops=scheme

This option controls which nop insertion scheme will be used during the second scheduling pass. The argument scheme takes one of the following values: no: Don't insert nops. pad: Pad with nops any dispatch group which has vacant issue slots, according to the scheduler's grouping. regroup_exact: Insert nops to force costly dependent insns into separate groups. Insert exactly as many nops as needed to force an insn to a new group, according to the estimated processor grouping. number: Insert nops to force costly dependent insns into separate groups. Insert number nops to force an insn to a new group.

-mcall-sysv

On System V.4 and embedded PowerPC systems compile code using calling conventions that adheres to the March 1995 draft of the System V Application Binary Interface, PowerPC processor supplement. This is the default unless you configured GCC using powerpc-*-eabiaix.

-mcall-sysv-eabi

Specify both -mcall-sysv and -meabi options.

-mcall-sysv-noeabi

Specify both -mcall-sysv and -mno-eabi options.

-mcall-linux

On System V.4 and embedded PowerPC systems compile code for the Linux-based GNU system.

-maix-struct-return

Return all structures in memory (as specified by the AIX ABI).

-msvr4-struct-return

Return structures smaller than 8 bytes in registers (as specified by the SVR4 ABI).

-mabi=altivec

Extend the current ABI with AltiVec ABI extensions. This does not change the default ABI, instead it adds the AltiVec ABI extensions to the current ABI.

-mabi=no-altivec

Disable AltiVec ABI extensions for the current ABI.

-mprototype, -mno-prototype

On System V.4 and embedded PowerPC systems assume that all calls to variable argument functions are properly prototyped. Otherwise, the compiler must insert an instruction before every non prototyped call to set or clear bit 6 of the condition code register (CR) to indicate whether floating point values were passed in the floating point registers in case the function takes a variable arguments. With -mprototype, only calls to prototyped variable argument functions will set or clear the bit.

-msim

On embedded PowerPC systems, assume that the startup module is called sim-crt0.o and that the standard C libraries are libsim.a and libc.a. This is the default for powerpc-*-eabisim. configurations.

-mmvme

On embedded PowerPC systems, assume that the startup module is called crt0.o and the standard C libraries are libmvme.a and libc.a.

-mads

On embedded PowerPC systems, assume that the startup module is called crt0.o and the standard C libraries are libads.a and libc.a.

-myellowknife

On embedded PowerPC systems, assume that the startup module is called crt0.o and the standard C libraries are libyk.a and libc.a.

-mvxworks

On System V.4 and embedded PowerPC systems, specify that you are compiling for a VxWorks system.

-mwindiss

Specify that you are compiling for the WindISS simulation environment.

-memb

On embedded PowerPC systems, set the PPC_EMB bit in the ELF flags header to indicate that eabi extended relocations are used.

-meabi, -mno-eabi

On System V.4 and embedded PowerPC systems do (do not) adhere to the Embedded Applications Binary Interface (eabi) which is a set of modifications to the System V.4 specifications. Selecting -meabi means that the stack is aligned to an 8 byte boundary, a function __eabi is called to from main to set up the eabi environment, and the -msdata option can use both r2 and r13 to point to two separate small data areas. Selecting -mno-eabi means that the stack is aligned to a 16 byte boundary, do not call an initialization function from main, and the -msdata option will only use r13 to point to a single small data area. The -meabi option is on by default if you configured GCC using one of the powerpc*-*-eabi* options.

-msdata=eabi

On System V.4 and embedded PowerPC systems, put small initialized const global and static data in the .sdata2 section, which is pointed to by register r2. Put small initialized non-const global and static data in the .sdata section, which is pointed to by register r13. Put small uninitialized global and static data in the .sbss section, which is adjacent to the .sdata section. The -msdata=eabi option is incompatible with the -mrelocatable option. The -msdata=eabi option also sets the -memb option.

-msdata=sysv

On System V.4 and embedded PowerPC systems, put small global and static data in the .sdata section, which is pointed to by register r13. Put small uninitialized global and static data in the .sbss section, which is adjacent to the .sdata section. The -msdata=sysv option is incompatible with the -mrelocatable option.

-msdata=default, -msdata

On System V.4 and embedded PowerPC systems, if -meabi is used, compile code the same as -msdata=eabi, otherwise compile code the same as -msdata=sysv.

-msdata-data

On System V.4 and embedded PowerPC systems, put small global and static data in the .sdata section. Put small uninitialized global and static data in the .sbss section. Do not use register r13 to address small data however. This is the default behavior unless other -msdata options are used.

-msdata=none, -mno-sdata

On embedded PowerPC systems, put all initialized global and static data in the .data section, and all uninitialized data in the .bss section.

-G num

On embedded PowerPC systems, put global and static items less than or equal to num bytes into the small data or bss sections instead of the normal data or bss section. By default, num is 8. The -G num switch is also passed to the linker. All modules should be compiled with the same -G num value.

-mregnames, -mno-regnames

On System V.4 and embedded PowerPC systems do (do not) emit register names in the assembly language output using symbolic forms.

-mlongcall, -mno-longcall

Default to making all function calls via pointers, so that functions which reside further than 64 megabytes (67,108,864 bytes) from the current location can be called. This setting can be overridden by the shortcall function attribute, or by #pragma longcall(0).

Some linkers are capable of detecting out-of-range calls and generating glue code on the fly. On these systems, long calls are unnecessary and generate slower code. As of this writing, the AIX linker can do this, as can the GNU linker for PowerPC/64. It is planned to add this feature to the GNU linker for 32-bit PowerPC systems as well.

On Mach-O (Darwin) systems, this option directs the compiler emit to the glue for every direct call, and the Darwin linker decides whether to use or discard it.

In the future, we may cause GCC to ignore all longcall specifications when the linker is known to generate glue.

-pthread

Adds support for multithreading with the pthreads library. This option sets flags for both the preprocessor and linker.

4.17.2. Darwin Options

These options are defined for all architectures running the Darwin operating system. They are useful for compatibility with other Mac OS compilers.

-all_load

Loads all members of static archive libraries. See man ld(1) for more information.

-arch_errors_fatal

Cause the errors having to do with files that have the wrong architecture to be fatal.

-bind_at_load

Causes the output file to be marked such that the dynamic linker will bind all undefined references when the file is loaded or launched.

-bundle

Produce a Mach-o bundle format file. See man ld(1) for more information.

-bundle_loader executable

This specifies the executable that will be loading the build output file being linked. See man ld(1) for more information.

-allowable_client client_name, -arch_only

-client_name, -compatibility_version, -current_version, -dependency-file, -dylib_file, -dylinker_install_name, -dynamic, -dynamiclib, -exported_symbols_list, -filelist, -flat_namespace, -force_cpusubtype_ALL, -force_flat_namespace, -headerpad_max_install_names, -image_base, -init, -install_name, -keep_private_externs, -multi_module, -multiply_defined, -multiply_defined_unused, -noall_load, -nofixprebinding, -nomultidefs, -noprebind, -noseglinkedit, -pagezero_size, -prebind, -prebind_all_twolevel_modules, -private_bundle, -read_only_relocs, -sectalign, -sectobjectsymbols, -whyload, -seg1addr, -sectcreate, -sectobjectsymbols, -sectorder, -seg_addr_table, -seg_addr_table_filename, -seglinkedit, -segprot, -segs_read_only_addr, -segs_read_write_addr, -single_module, -static, -sub_library, -sub_umbrella, -twolevel_namespace, -umbrella, -undefined, -unexported_symbols_list, -weak_reference_mismatches, -whatsloaded

These options are available for Darwin linker. Darwin linker man page describes them in detail.

4.17.3. Intel 386 and AMD x86-64 Options

These -m options are defined for the i386 and x86-64 family of computers:

-mtune=cpu-type

Tune to cpu-type everything applicable about the generated code, except for the ABI and the set of available instructions. The choices for cpu-type are:

i386

Original Intel's i386 CPU.

i486

Intel's i486 CPU. (No scheduling is implemented for this chip.)

i586, pentium

Intel Pentium CPU with no MMX support.

pentium-mmx

Intel PentiumMMX CPU based on Pentium core with MMX instruction set support.

i686, pentiumpro

Intel PentiumPro CPU.

pentium2

Intel Pentium2 CPU based on PentiumPro core with MMX instruction set support.

pentium3, pentium3m

Intel Pentium3 CPU based on PentiumPro core with MMX and SSE instruction set support.

pentium-m

Low power version of Intel Pentium3 CPU with MMX, SSE and SSE2 instruction set support. Used by Centrino notebooks.

pentium4, pentium4m

Intel Pentium4 CPU with MMX, SSE and SSE2 instruction set support.

prescott

Improved version of Intel Pentium4 CPU with MMX, SSE, SSE2 and SSE3 instruction set support.

nocona

Improved version of Intel Pentium4 CPU with 64-bit extensions, MMX, SSE, SSE2 and SSE3 instruction set support.

k6

AMD K6 CPU with MMX instruction set support.

k6-2, k6-3

Improved versions of AMD K6 CPU with MMX and 3dNOW! instruction set support.

athlon, athlon-tbird

AMD Athlon CPU with MMX, 3dNOW!, enhanced 3dNOW! and SSE prefetch instructions support.

athlon-4, athlon-xp, athlon-mp

Improved AMD Athlon CPU with MMX, 3dNOW!, enhanced 3dNOW! and full SSE instruction set support.

k8, opteron, athlon64, athlon-fx

AMD K8 core based CPUs with x86-64 instruction set support. (This supersets MMX, SSE, SSE2, 3dNOW!, enhanced 3dNOW! and 64-bit instruction set extensions.)

winchip-c6

IDT Winchip C6 CPU, dealt in same way as i486 with additional MMX instruction set support.

winchip2

IDT Winchip2 CPU, dealt in same way as i486 with additional MMX and 3dNOW! instruction set support.

c3

Via C3 CPU with MMX and 3dNOW! instruction set support. (No scheduling is implemented for this chip.)

c3-2

Via C3-2 CPU with MMX and SSE instruction set support. (No scheduling is implemented for this chip.)

While picking a specific cpu-type will schedule things appropriately for that particular chip, the compiler will not generate any code that does not run on the i386 without the -march=cpu-type option being used.

-march=cpu-type

Generate instructions for the machine type cpu-type. The choices for cpu-type are the same as for -mtune. Moreover, specifying -march=cpu-type implies -mtune=cpu-type.

-mcpu=cpu-type

A deprecated synonym for -mtune.

-m386, -m486, -mpentium, -mpentiumpro

These options are synonyms for -mtune=i386, -mtune=i486, -mtune=pentium, and -mtune=pentiumpro respectively. These synonyms are deprecated.

-mfpmath=unit

Generate floating point arithmetics for selected unit unit. The choices for unit are:

387

Use the standard 387 floating point coprocessor present majority of chips and emulated otherwise. Code compiled with this option will run almost everywhere. The temporary results are computed in 80bit precision instead of precision specified by the type resulting in slightly different results compared to most of other chips. See -ffloat-store for more detailed description.

This is the default choice for i386 compiler.

sse

Use scalar floating point instructions present in the SSE instruction set. This instruction set is supported by Pentium3 and newer chips, in the AMD line by Athlon-4, Athlon-xp and Athlon-mp chips. The earlier version of SSE instruction set supports only single precision arithmetics, thus the double and extended precision arithmetics is still done using 387. Later version, present only in Pentium4 and the future AMD x86-64 chips supports double precision arithmetics too.

For i387 you need to use -march=cpu-type, -msse or -msse2 switches to enable SSE extensions and make this option effective. For x86-64 compiler, these extensions are enabled by default.

The resulting code should be considerably faster in the majority of cases and avoid the numerical instability problems of 387 code, but may break some existing code that expects temporaries to be 80bit.

This is the default choice for the x86-64 compiler.

sse,387

Attempt to utilize both instruction sets at once. This effectively double the amount of available registers and on chips with separate execution units for 387 and SSE the execution resources too. Use this option with care, as it is still experimental, because the GCC register allocator does not model separate functional units well resulting in instable performance.

-masm=dialect

Output asm instructions using selected dialect. Supported choices are intel or att (the default one).

-mieee-fp, -mno-ieee-fp

Control whether or not the compiler uses IEEE floating point comparisons. These handle correctly the case where the result of a comparison is unordered.

-msoft-float

Generate output containing library calls for floating point. Warning: the requisite libraries are not part of GCC. Normally the facilities of the machine's usual C compiler are used, but this can't be done directly in cross-compilation. You must make your own arrangements to provide suitable library functions for cross-compilation.

On machines where a function returns floating point results in the 80387 register stack, some floating point opcodes may be emitted even if -msoft-float is used.

-mno-fp-ret-in-387

Do not use the FPU registers for return values of functions.

The usual calling convention has functions return values of types float and double in an FPU register, even if there is no FPU. The idea is that the operating system should emulate an FPU.

The option -mno-fp-ret-in-387 causes such values to be returned in ordinary CPU registers instead.

-mno-fancy-math-387

Some 387 emulators do not support the sin, cos and sqrt instructions for the 387. Specify this option to avoid generating those instructions. This option is the default on FreeBSD, OpenBSD and NetBSD. This option is overridden when -march indicates that the target cpu will always have an FPU and so the instruction will not need emulation. As of revision 2.6.1, these instructions are not generated unless you also use the -funsafe-math-optimizations switch.

-malign-double, -mno-align-double

Control whether GCC aligns double, long double, and long long variables on a two word boundary or a one word boundary. Aligning double variables on a two word boundary will produce code that runs somewhat faster on a Pentium at the expense of more memory.

Warning: if you use the -malign-double switch, structures containing the above types will be aligned differently than the published application binary interface specifications for the 386 and will not be binary compatible with structures in code compiled without that switch.

-m96bit-long-double, -m128bit-long-double

These switches control the size of long double type. The i386 application binary interface specifies the size to be 96 bits, so -m96bit-long-double is the default in 32 bit mode.

Modern architectures (Pentium and newer) would prefer long double to be aligned to an 8 or 16 byte boundary. In arrays or structures conforming to the ABI, this would not be possible. So specifying a -m128bit-long-double will align long double to a 16 byte boundary by padding the long double with an additional 32 bit zero.

In the x86-64 compiler, -m128bit-long-double is the default choice as its ABI specifies that long double is to be aligned on 16 byte boundary.

Notice that neither of these options enable any extra precision over the x87 standard of 80 bits for a long double.

Warning: if you override the default value for your target ABI, the structures and arrays containing long double variables will change their size as well as function calling convention for function taking long double will be modified. Hence they will not be binary compatible with arrays or structures in code compiled without that switch.

-msvr3-shlib, -mno-svr3-shlib

Control whether GCC places uninitialized local variables into the bss or data segments. -msvr3-shlib places them into bss. These options are meaningful only on System V Release 3.

-mrtd

Use a different function-calling convention, in which functions that take a fixed number of arguments return with the ret num instruction, which pops their arguments while returning. This saves one instruction in the caller since there is no need to pop the arguments there.

You can specify that an individual function is called with this calling sequence with the function attribute stdcall. You can also override the -mrtd option by using the function attribute cdecl. Section 6.25 Declaring Attributes of Functions.

Warning: this calling convention is incompatible with the one normally used on Unix, so you cannot use it if you need to call libraries compiled with the Unix compiler.

Also, you must provide function prototypes for all functions that take variable numbers of arguments (including printf); otherwise incorrect code will be generated for calls to those functions.

In addition, seriously incorrect code will result if you call a function with too many arguments. (Normally, extra arguments are harmlessly ignored.)

-mregparm=num

Control how many registers are used to pass integer arguments. By default, no registers are used to pass arguments, and at most 3 registers can be used. You can control this behavior for a specific function by using the function attribute regparm. Section 6.25 Declaring Attributes of Functions.

Warning: if you use this switch, and num is nonzero, then you must build all modules with the same value, including any libraries. This includes the system libraries and startup modules.

-mpreferred-stack-boundary=num

Attempt to keep the stack boundary aligned to a 2 raised to num byte boundary. If -mpreferred-stack-boundary is not specified, the default is 4 (16 bytes or 128 bits), except when optimizing for code size (-Os), in which case the default is the minimum correct alignment (4 bytes for x86, and 8 bytes for x86-64).

On Pentium and PentiumPro, double and long double values should be aligned to an 8 byte boundary (see -malign-double) or suffer significant run time performance penalties. On Pentium III, the Streaming SIMD Extension (SSE) data type __m128 suffers similar penalties if it is not 16 byte aligned.

To ensure proper alignment of this values on the stack, the stack boundary must be as aligned as that required by any value stored on the stack. Further, every function must be generated such that it keeps the stack aligned. Thus calling a function compiled with a higher preferred stack boundary from a function compiled with a lower preferred stack boundary will most likely misalign the stack. It is recommended that libraries that use callbacks always use the default setting.

This extra alignment does consume extra stack space, and generally increases code size. Code that is sensitive to stack space usage, such as embedded systems and operating system kernels, may want to reduce the preferred alignment to -mpreferred-stack-boundary=2.

-mmmx, -mno-mmx, -msse, -mno-sse, -msse2, -mno-sse2, -msse3, -mno-sse3, -m3dnow, -mno-3dnow

These switches enable or disable the use of built-in functions that allow direct access to the MMX, SSE, SSE2, SSE3 and 3Dnow extensions of the instruction set.

Section 6.46.1 X86 Built-in Functions, for details of the functions enabled and disabled by these switches.

To have SSE/SSE2 instructions generated automatically from floating-point code, see -mfpmath=sse.

-mpush-args, -mno-push-args

Use PUSH operations to store outgoing parameters. This method is shorter and usually equally fast as method using SUB/MOV operations and is enabled by default. In some cases disabling it may improve performance because of improved scheduling and reduced dependencies.

-maccumulate-outgoing-args

If enabled, the maximum amount of space required for outgoing arguments will be computed in the function prologue. This is faster on most modern CPUs because of reduced dependencies, improved scheduling and reduced stack usage when preferred stack boundary is not equal to 2. The drawback is a notable increase in code size. This switch implies -mno-push-args.

-mthreads

Support thread-safe exception handling on Mingw32. Code that relies on thread-safe exception handling must compile and link all code with the -mthreads option. When compiling, -mthreads defines -D_MT; when linking, it links in a special thread helper library -lmingwthrd which cleans up per thread exception handling data.

-mno-align-stringops

Do not align destination of inlined string operations. This switch reduces code size and improves performance in case the destination is already aligned, but GCC doesn't know about it.

-minline-all-stringops

By default GCC inlines string operations only when destination is known to be aligned at least to 4 byte boundary. This enables more inlining, increase code size, but may improve performance of code that depends on fast memcpy, strlen and memset for short lengths.

-momit-leaf-frame-pointer

Don't keep the frame pointer in a register for leaf functions. This avoids the instructions to save, set up and restore frame pointers and makes an extra register available in leaf functions. The option -fomit-frame-pointer removes the frame pointer for all functions which might make debugging harder.

-mtls-direct-seg-refs, -mno-tls-direct-seg-refs

Controls whether TLS variables may be accessed with offsets from the TLS segment register (%gs for 32-bit, %fs for 64-bit), or whether the thread base pointer must be added. Whether or not this is legal depends on the operating system, and whether it maps the segment to cover the entire TLS area.

For systems that use GNU libc, the default is on.

These -m switches are supported in addition to the above on AMD x86-64 processors in 64-bit environments.

-m32, -m64

Generate code for a 32-bit or 64-bit environment. The 32-bit environment sets int, long and pointer to 32 bits and generates code that runs on any i386 system. The 64-bit environment sets int to 32 bits and long and pointer to 64 bits and generates code for AMD's x86-64 architecture.

-mno-red-zone

Do not use a so called red zone for x86-64 code. The red zone is mandated by the x86-64 ABI, it is a 128-byte area beyond the location of the stack pointer that will not be modified by signal or interrupt handlers and therefore can be used for temporary data without adjusting the stack pointer. The flag -mno-red-zone disables this red zone.

-mcmodel=small

Generate code for the small code model: the program and its symbols must be linked in the lower 2 GB of the address space. Pointers are 64 bits. Programs can be statically or dynamically linked. This is the default code model.

-mcmodel=kernel

Generate code for the kernel code model. The kernel runs in the negative 2 GB of the address space. This model has to be used for Linux kernel code.

-mcmodel=medium

Generate code for the medium model: The program is linked in the lower 2 GB of the address space but symbols can be located anywhere in the address space. Programs can be statically or dynamically linked, but building of shared libraries are not supported with the medium model.

-mcmodel=large

Generate code for the large model: This model makes no assumptions about addresses and sizes of sections. Currently GCC does not implement this model.

4.17.4. IA-64 Options

These are the -m options defined for the Intel IA-64 architecture.

-mbig-endian

Generate code for a big endian target. This is the default for HP-UX.

-mlittle-endian

Generate code for a little endian target. This is the default for AIX5 and GNU/Linux.

-mgnu-as, -mno-gnu-as

Generate (or don't) code for the GNU assembler. This is the default.

-mgnu-ld, -mno-gnu-ld

Generate (or don't) code for the GNU linker. This is the default.

-mno-pic

Generate code that does not use a global pointer register. The result is not position independent code, and violates the IA-64 ABI.

-mvolatile-asm-stop, -mno-volatile-asm-stop

Generate (or don't) a stop bit immediately before and after volatile asm statements.

-mb-step

Generate code that works around Itanium B step errata.

-mregister-names, -mno-register-names

Generate (or don't) in, loc, and out register names for the stacked registers. This may make assembler output more readable.

-mno-sdata, -msdata

Disable (or enable) optimizations that use the small data section. This may be useful for working around optimizer bugs.

-mconstant-gp

Generate code that uses a single constant global pointer value. This is useful when compiling kernel code.

-mauto-pic

Generate code that is self-relocatable. This implies -mconstant-gp. This is useful when compiling firmware code.

-minline-float-divide-min-latency

Generate code for inline divides of floating point values using the minimum latency algorithm.

-minline-float-divide-max-throughput

Generate code for inline divides of floating point values using the maximum throughput algorithm.

-minline-int-divide-min-latency

Generate code for inline divides of integer values using the minimum latency algorithm.

-minline-int-divide-max-throughput

Generate code for inline divides of integer values using the maximum throughput algorithm.

-mno-dwarf2-asm, -mdwarf2-asm

Don't (or do) generate assembler code for the DWARF2 line number debugging info. This may be useful when not using the GNU assembler.

-mfixed-range=register-range

Generate code treating the given register range as fixed registers. A fixed register is one that the register allocator can not use. This is useful when compiling kernel code. A register range is specified as two registers separated by a dash. Multiple register ranges can be specified separated by a comma.

-mearly-stop-bits, -mno-early-stop-bits

Allow stop bits to be placed earlier than immediately preceding the instruction that triggered the stop bit. This can improve instruction scheduling, but does not always do so.

4.17.5. S/390 and zSeries Options

These are the -m options defined for the S/390 and zSeries architecture.

-mhard-float, -msoft-float

Use (do not use) the hardware floating-point instructions and registers for floating-point operations. When -msoft-float is specified, functions in libgcc.a will be used to perform floating-point operations. When -mhard-float is specified, the compiler generates IEEE floating-point instructions. This is the default.

-mbackchain, -mno-backchain, -mkernel-backchain

In order to provide a backchain the address of the caller's frame is stored within the callee's stack frame. A backchain may be needed to allow debugging using tools that do not understand DWARF-2 call frame information. For -mno-backchain no backchain is maintained at all which is the default. If one of the other options is present the backchain pointer is placed either on top of the stack frame (-mkernel-backchain) or on the bottom (-mbackchain). Beside the different backchain location -mkernel-backchain also changes stack frame layout breaking the ABI. This option is intended to be used for code which internally needs a backchain but has to get by with a limited stack size e.g. the linux kernel. Internal unwinding code not using DWARF-2 info has to be able to locate the return address of a function. That will be eased be the fact that the return address of a function is placed two words below the backchain pointer.

-msmall-exec, -mno-small-exec

Generate (or do not generate) code using the bras instruction to do subroutine calls. This only works reliably if the total executable size does not exceed 64k. The default is to use the basr instruction instead, which does not have this limitation.

-m64, -m31

When -m31 is specified, generate code compliant to the GNU/Linux for S/390 ABI. When -m64 is specified, generate code compliant to the GNU/Linux for zSeries ABI. This allows GCC in particular to generate 64-bit instructions. For the s390 targets, the default is -m31, while the s390x targets default to -m64.

-mzarch, -mesa

When -mzarch is specified, generate code using the instructions available on z/Architecture. When -mesa is specified, generate code using the instructions available on ESA/390. Note that -mesa is not possible with -m64. When generating code compliant to the GNU/Linux for S/390 ABI, the default is -mesa. When generating code compliant to the GNU/Linux for zSeries ABI, the default is -mzarch.

-mmvcle, -mno-mvcle

Generate (or do not generate) code using the mvcle instruction to perform block moves. When -mno-mvcle is specified, use a mvc loop instead. This is the default.

-mdebug, -mno-debug

Print (or do not print) additional debug information when compiling. The default is to not print debug information.

-march=cpu-type

Generate code that will run on cpu-type, which is the name of a system representing a certain processor type. Possible values for cpu-type are g5, g6, z900, and z990. When generating code using the instructions available on z/Architecture, the default is -march=z900. Otherwise, the default is -march=g5.

-mtune=cpu-type

Tune to cpu-type everything applicable about the generated code, except for the ABI and the set of available instructions. The list of cpu-type values is the same as for -march. The default is the value used for -march.

-mfused-madd, -mno-fused-madd

Generate code that uses (does not use) the floating point multiply and accumulate instructions. These instructions are generated by default if hardware floating point is used.

-mwarn-framesize=framesize

Emit a warning if the current function exceeds the given frame size. Because this is a compile time check it doesn't need to be a real problem when the program runs. It is intended to identify functions which most probably cause a stack overflow. It is useful to be used in an environment with limited stack size e.g. the linux kernel.

-mwarn-dynamicstack

Emit a warning if the function calls alloca or uses dynamically sized arrays. This is generally a bad idea with a limited stack size.

-mstack-guard=stack-guard, -mstack-size=stack-size

These arguments always have to be used in conjunction. If they are present the s390 back end emits additional instructions in the function prologue which trigger a trap if the stack size is stack-guard bytes above the stack-size (remember that the stack on s390 grows downward). These options are intended to be used to help debugging stack overflow problems. The addtionally emitted code cause only little overhead and hence can also be used in production like systems without greater performance degradation. The given values have to be exact powers of 2 and stack-size has to be greater than stack-guard. In order to be effecient the extra code makes the assumption that the stack starts at an address aligned to the value given by stack-size. So don't expect this to work correctly with a 8k stack size and an initial stack pointer like 0xffffefff.

 
 
  Published under the terms of the GNU General Public License Design by Interspire