 |
GCC Myths and Facts
by Joao Seabra, in Editorials - Sat, Feb 15th 2003 00:00 PDT
Since my good old Pentium 166 days, I've liked to search for the best
optimizations possible so programs can take the maximum advantage of
hardware/CPU cycles. If I have a nice piece of hardware, why not run
it at its full power, using every little feature? Shouldn't we all
try to get the best results from the money invested in our machines?
Copyright notice: All reader-contributed material on freshmeat.net
is the property and responsibility of its author; for reprint rights, please contact the author
directly.
This article is written for the average desktop Linux user and with
the x86 architecture and C/C++ in mind, but some of its content can be
applied to all architectures and languages.
GCC 3 Improvements
GCC 3 is the biggest step forward since GCC 2 and represents more than
ten years of work and two of hard development. It has major benefits
over its predecessor, including:
Target Improvements
- A new x86 backend, generating much-improved code.
- Support for a generic i386-elf target.
- A new option to emit x86 assembly code using an Intel-style
syntax.
- Better code generated for floating point-to-integer conversions,
leading to better performance by many 3D applications.
Language Improvements
- A new C++ ABI. On the IA-64 platform, GCC is capable of
interoperating with other IA-64 compilers.
- A significant reduction in the size of symbol and debugging
information (thanks to the new ABI).
- A new C++ support library and many C++ bugfixes, vastly
improving conformance to the ISO C++ standard.
- A new inliner for C++.
- A rewritten C preprocessor, integrated into the C, C++, and
Objective C compilers, with many improvements, including ISO C99
support and improvements to dependency generation.
General Optimizations
- Infrastructure for profile-driven optimizations.
- Support for data prefetching.
- Support for SSE, SSE2, 3DNOW!, and MMX instructions.
- A basic block reordering pass.
- New tail call and sibling call elimination optimizations.
Why do some programmers and users fail to take advantage of these
amazing new features? I admit that some of them are still
"experimental", but not all of them. Perhaps the PGCC (Pentium
compiler group) project gave rise to several misunderstandings which
persist today. (PGCC offered several Pentium-specific
optimizations. I looked at it when it first started, but benchmarks
showed that the improvement was only about 2%-5% over GCC 2.7.2.3.)
We should clear the air about the GCC misconceptions. Let's start
with the most loved and hated optimization: -Ox.
Myths
I use -O69 because it is faster than -O3.
This is wrong!
The highest optimization is -O3.
From the GCC 3.2.1 manual:
-O3 Optimize yet more. -O3 turns on all optimizations
specified by -O2 and also turns on the
-finline-functions and -frename-registers options.
The most skeptical can verify this in gcc/topolev.c:
/* Scan to see what optimization level has been specified.
That will determine the default value of many flags. */
-snip-
if (optimize >= 3)
{
flag_inline_functions = 1;
flag_rename_registers = 1;
}
If you are using GCC, there's no point in using anything higher than
3.
-O2 turns on loop unrolling.
In the GCC manpage, it's clearly written that:
-O2 turns on all optional optimizations except for loop unrolling [...]
Skeptics: check topolev.c.
So when you use -O2, which optimizations are you using?
The -O2 flag turns on the following flags:
- -O1, which turns on:
- defer pop (see -fno-defer-pop)
- -fthread-jumps
- -fdelayed-branch (on, but specific machines may handle it differently)
- -fomit-frame-pointer (only on if the machine can debug
without a frame pointer; otherwise, you need to specify)
- guess-branch-prob (see -fno-guess-branch-prob)
- cprop-registers (see -fno-cprop-registers)
- -foptimize-sibling-calls
- -fcse-follow-jumps
- -fcse-skip-blocks
- -fgcse
- -fexpensive-optimizations
- -fstrength-reduce
- -frerun-cse-after-loop
- -frerun-loop-opt
- -fcaller-saves
- -flag_force_mem
- peephole2 (a machine-dependent option; see -fno-peephole2)
- -fschedule-insns (if supported by the target machine)
- -fregmove
- -fstrict-aliasing
- -fdelete-null-pointer-checks
- reorder blocks
There's no point in using -O2 -fstrength-reduce, etc., since O2
implies all this.
Facts
The truth about -O*
This leaves us with -O3, which is the same as -O2 and:
- -finline-functions
- -frename-registers
Inline-functions is useful in some cases (mainly with C++) because it
lets you define the size of inlined functions (600 by default) with
-finline-limit. Unfortunately, if you set a high number, at compile
time you will probably get an error complaining about lack of
memory. This option needs a huge amount of memory, takes more
time to compile, and makes the binary big. Sometimes, you can see a
profit, and sometimes, you can't.
Rename-registers attempts to avoid false dependencies in scheduled code
by making use of registers left over after register allocation. This
optimization will most benefit processors with lots of registers. It
can, however, make debugging impossible, since variables will no
longer stay in a "home register". Since i386 is not a register-rich
architecture, I don't think this will have much impact.
A higher -O does not always mean improved performance. -O3 increases
the code size and may introduce cache penalties and become slower than
-O2. However, -O2 is almost always faster than -O.
-march and -mcpu
With GCC 3, you can specify the type of processor you're using with
-march or -mcpu. Although they seem the same, they're not, since one
specifies the architecture, and other the CPU. The available options
are:
|
|
- Pentium
- pentium-mmx
- pentiumpro
- pentium2
- pentium3
- pentium4
|
|
- athlon
- athlon-tbird
- athlon-4
- athlon-xp
- athlon-mp
|
-march implies -mcpu, so when you use -march, there's no need
to use -mcpu.
-mcpu generates code tuned for the specified CPU, but it does
not alter the ABI and the set of available instructions, so you can
still run the resulting binary on other CPUs (it turns on flags like
mmx/3dnow, etc.).
When you use -march, you generate code for the specified machine type,
and the available instructions will be used, which means that
you probably cannot run the binary on other machine types.
Conclusion
Fine-tune your Makefile, remove those redundant options, and take a
look at the GCC manpage. I bet you will save yourself a lot of
time. There's probably a bug somewhere that can be smashed by turning
off some of GCC's default flags.
This article discusses only a few of GCC's features, but I won't
broaden its scope. I just want to try to clarify some of the myths
and misunderstandings. There's a lot left to say, but nothing that
can't be found in the Fine Manual, HOWTOs, or around the Internet. If
you have patience, a look at the GCC sources can be very rewarding.
When you're coding a program, you'll inevitably run into bugs.
Occasionally, you'll find one that's GCC's fault. When you do, stop
to think about the time and effort that's gone into the compiler
project and all that it's given you. You might think twice before
simply flaming GCC.
Interesting Links
Author's bio:
Joao Seabra would like to thank:
- Mom :)
- Nuno Sucena
- Nuno Subtil
- Tony Gonçalves
- Centro de Informática da Associação
Académica de Coimbra
T-Shirts and Fame!
We're eager to find people interested in writing articles on
software-related topics. We're flexible on length, style, and
topic, so long as you know what you're talking about and back up
your opinions with facts. Anyone who writes an article gets a
t-shirt from ThinkGeek
in addition to 15 minutes of fame. If you think you'd like to try
your hand at it, let jeff.covey@freshmeat.net
know what you'd like to write about.
[Comments are disabled]
Comments
[»]
Optimization - does O3 always generate faster code than O2?
by hzmonte - Sep 16th 2006 16:27:33
Is it possible that code generated using the O2 option runs faster than
that using O3, for example? Is it posible that an optimzation
technique used by O3 is counter-productive for a particular algorithm?
And is there more detailed explanation (preferably with examples) about
each optimization technique used by gcc than that in the GCC manual ?
The article says: "In the GCC manpage, it's clearly written that: -O2
turns on all optional optimizations except for loop unrolling [...]"
(In the 4.1.1 manual, the exact wordings are: "The compiler does not
perform loop unrolling or function inlining when you specify -O2."
which is even more confusing.) True, but -O2 turns on all flags that are
turned on by -O. And -O turns on -floop-optimize which
"optionally" does loop unrolling. Therefore, I guess the
conclusion (at least for gcc 3) is
1. -O2 does not mandate loop unrolling;
2. with -O or -O2, loop unrolling may or may not be turned on.
However, based on the 4.1.1 wordings, there is simply no loop unrolling
under -O2, period. It somehow implies that if there is any loop unrolling
optionally turned on by -O, -O2 would disable it. That is strange.
And how does -floop-optimize2 works?
GCC 4.1.1 manual :
-fprofile-use
Enable profile feedback directed optimizations, and optimizations
generally profitable only with profile feedback available.
The following options are enabled: -fbranch-probabilities, -fvpt,
-funroll-loops, -fpeel-loops, -ftracer.
-funroll-loops
Unroll loops whose number of iterations can be determined at compile time
or upon entry to the loop. -funroll-loops implies -frerun-cse-after-loop.
It also turns on complete loop peeling (i.e. complete removal of loops
with small constant number of iterations). This option makes code larger,
and may or may not make it run faster.
Enabled with -fprofile-use.
-floop-optimize
Perform loop optimizations: move constant expressions out of loops,
simplify exit test conditions and optionally do strength-reduction and
loop unrolling as well.
Enabled at levels -O, -O2, -O3, -Os.
-floop-optimize2
Perform loop optimizations using the new loop optimizer. The optimizations
(loop unrolling, peeling and unswitching, loop invariant motion) are
enabled by separate flags.
That is, -O turns on -floop-optimize which optionally does loop unrolling.
On the other hand, -fprofile-use enables -funroll-loops. And none of the
-Ox flags turns on -fprofile-use. Also, none of the -Ox flags turns on
-floop-optimize2. And it appears that the manual says that once
-floop-optimize2 is turned on, loop unrolling is enabled by a separate
flag, presumably -funroll-loops, and implies that -floop-optimize2 would
"disable" -floop-optimize because -floop-optimize2 would force
the loop optimization techniques be individually turned on. It follows
that if I do this:
gcc -O -fprofile-use myprog.c or
gcc -O -floop-optimize2 myprog.c
No loop optimization is performed because any loop optimization that would
otherwise be turned on by -O is turned off by -floop-optimize2. If I want
to do loop unrolling using the so-called "new loop optimizer",
and also benefit from other optimization (except loop optimzation) offered
by -O, then I need to do this:
gcc -O -floop-optimize2 -funroll-loops myprog.c
This would do:
1. optimization (except loop optimzation) offered by -O
2. loop unrolling offered by the "new loop optimizer"
but would not do any other loop optimization.
Is my understanding correct?
[reply]
[top]
[»]
Re: Optimization - does O3 always generate faster code than O2?
by hzmonte - Sep 16th 2006 17:02:58
%Therefore, I guess the conclusion (at least for gcc 3) is
>1. -O2 does not mandate loop unrolling;
>2. with -O or -O2, loop unrolling may or may not be turned on.
To clarify, what gcc 3 means may be:
-O2 does not perform loop unrolling unless it is already performed by -O.
Therefore, loop unrolling may or may not be performed under -O or -O2.
And it seems -O3 does not turn on loop unrolling either (unless it is
already performed ny -O).
Is my understanding correct?
With the wording in the 4.1.1 Manual, I have no clue what it means. In
particular, it says "The compiler does not perform loop unrolling or
function inlining when you specify -O2." It does not say "-O2 does not
perform loop unrolling"; it says "the compiler does not perform loop
unrolling". So it seems -O2 will turn off any loop unrolling that is
enabled by -O!
[reply]
[top]
[»]
gcc 3.2 is much improved, only a little lag than icc
by Ben Li - Apr 30th 2003 01:28:00
just do a unstrict test in a P4 Xeon 1.8G with gcc 3.2(redhat 8.0 default)
and icc 7.0¡£
compile openssl 0.96b
icc flags:
icc -O3 -ip -tpp7 -xW
gcc flags:
gcc -O3 --march=pentium4 -pipe -fomit-frame-pointer
in general, both is much fast then gcc 2.96, and icc is about 1% fast than
gcc 3.2, but in some case, gcc 3.2 is fast than icc.
Don't test some profile feedback optimization feature yet.
[reply]
[top]
[»]
Think about when optimizing and what
by Riko - Mar 30th 2003 04:30:24
First of all most people don't even know what means optimizing. Otherwise
you can't explain why many (ahem) programmers use things like VBasic. Ok
this applies only for Win stuff, still they are lots of people.
And in general we shoud think what means optimizing. For example if you do
scientic calculations, you are writing some 3d game it can be critical...
but if you are just writing a chat program or a mail client, speed is not
so important (it should be just decent). Gui slows down, because users
tend to be slower than CPU... and if you got a 56k... ok you can have the
best code, but you can't overdo that limit...
So... let's think about it, maybe it is obvious...
from the other side I get Eclipse and it takes 45 seconds on my slackware
8.1 linux box (that means a quick OS) with 1700MHz CPU ... it is not
really accettable...
Think about it....
[reply]
[top]
[»]
Re: Think about when optimizing and what
by Michael Nenov - Apr 18th 2003 04:56:29
% from the other side I get Eclipse and
> it takes 45 seconds on my slackware 8.1
> linux box (that means a quick OS) with
> 1700MHz CPU ... it is not really
> accettable...
> Think about it....
And what about running OpenOffice ?! I think it is a lot of work to make
it start and run so slow !!! May me some optimization is needed, but I do
not believe that the compiler can help in this case!
-- //^^__Morth__^^\\
[reply]
[top]
[»]
Re: Think about when optimizing and what
by Tom - Aug 4th 2003 23:36:23
> And what about running OpenOffice ?! I
> think it is a lot of work to make it
> start and run so slow !!! May me some
> optimization is needed, but I do not
> believe that the compiler can help in
> this case!
>
I haven't tried this yet (I intend to), but with huge apps like
OpenOffice, you'd probably get more benefit with -Os, which optimizes for
size. Most of loading OpenOffice is loading it's huge binary and it's
custom GUI code.
[reply]
[top]
[»]
Re: Think about when optimizing and what
by eye/midiclub - Nov 2nd 2003 06:50:47
> Most of
> loading OpenOffice is loading it's huge
> binary and it's custom GUI code.
You'd be surprised how tiny the GUI code probably is. Just look at
FLTK.
The key concept for OpenOffice is modularization, where it should simply
not load parts which are not immediately requiered.
However, you can help it by roping, which is traditional on SUNs. There
was some tool for Linux, which disassembles the files, does callgraphs,
and then assembles the code back again, but the resulting code loads and
even runs up to 30% faster! In Windows World, only DigitalMars C++ and
VC++ 7.1 are able to do this, as far as i'm aware.
-eye
[reply]
[top]
[»]
Re: Think about when optimizing and what
by that weasel - May 17th 2004 13:08:17
> First of all most people don't even know
> what means optimizing. Otherwise you
> can't explain why many (ahem)
> programmers use things like VBasic.
Second of all you don't even know what means engrish!
If you are going to take the time to bash something at least take the to
properly formulate your sentences. This way your opponents don't have
flame fodder sitting all over the place!
Ack!! What a horrible blanket statement. There are tons of successful
projects developed in Visual Basic. It a quite stable environement to
develop and debug from. There is very little you can't do with VB, and
the longer I use it the more I realize that you can do basically anything
with it, given the right skills. Since you can call just about any API
call directly from VB its is just as efficient as other languages. The
only overhead you might have is a few K of the VB runtime environment,
that it likely already installed and being actively used on your Window
machine. It certainly not the best tool for every job, but no language
is. Its also far better than alot of other languages/IDEs out there.
[reply]
[top]
[»]
Re: Think about when optimizing and what
by that weasel - May 17th 2004 13:10:58
>
> % First of all most people don't even
> know
> % what means optimizing. Otherwise you
> % can't explain why many (ahem)
> % programmers use things like VBasic.
>
>
> Second of all you don't even know what
> means engrish!
> If you are going to take the time to
> bash something at least take the to
> properly formulate your sentences. This
> way your opponents don't have flame
> fodder sitting all over the place!
>
> Ack!! What a horrible blanket statement.
> There are tons of successful projects
> developed in Visual Basic. It a quite
> stable environement to develop and debug
> from. There is very little you can't do
> with VB, and the longer I use it the
> more I realize that you can do basically
> anything with it, given the right
> skills. Since you can call just about
> any API call directly from VB its is
> just as efficient as other languages.
> The only overhead you might have is a
> few K of the VB runtime environment,
> that it likely already installed and
> being actively used on your Window
> machine. It certainly not the best tool
> for every job, but no language is. Its
> also far better than alot of other
> languages/IDEs out there.
Ack I just stuck my foot in my mouth, by improperly 'formulating' a
sentence about properly formulating sentences....hahah
*feels like a fool*
close enough!
[reply]
[top]
[»]
Re: Think about when optimizing and what
by oliverthered - Dec 4th 2004 23:41:24
> For example if you do
> scientic calculations, you are writing
> some 3d game it can be critical... but
> if you are just writing a chat program
> or a mail client, speed is not so
> important
What if the person running the scientific calcluations
is also talking to someone else over the internet
using a chat client.
1: All applications should be optimized.
2: If your running scientific application (or povray) or
anything else that needs high throughput or a lot of
CPU time, profile the code and recompile using the
profile, this will help out more than -O?
3:Optimization is a trades off with features,
stability, and relase time. Sometimes you want a
fast turn around, with a simple toolkit even if it runs
twice as slow, that's kinda what you get with
Basic/VB. What you don't get is software that you'd
want to use in a server or critical environment, and
that's the tradeoff a lot of people took but regreted.
Personally I'd do with Delphi for fast turn around rad
in the days of VB.
[reply]
[top]
[»]
auto select gcc options
by pixelbeat - Mar 26th 2003 07:01:29
I've written a script to picks the optimal gcc options
for x86 hardware. Also it only works on Linux, but
this combination handles a significant percentage
of gcc users, so here
you go
[reply]
[top]
[»]
GCC myths
by jcduque - Mar 5th 2003 09:15:40
You may check out the busybox Makefile. busybox
uses
-Os -march=i386 -fomit-frame-pointer \
-Wall -mpreferred-stack-boundary=2 \
-malign-functions=0 -malign-jumps=0 \
-Wshadow
although the -W does not really optimizes your
code.
[reply]
[top]
[»]
Re: GCC myths
by eye/midiclub - Nov 2nd 2003 06:44:29
> -Os -march=i386 -fomit-frame-pointer \
> -Wall -mpreferred-stack-boundary=2 \
I don't like the options. First, best generic optimisation level for
modern architectures is a Pentium - and you can really assume everyone has
at least an old Pentium. Trying to optimise code (by hand or by genetic
mutation) for Pentium 2 and 3, as well as for Athlon, almost never shows
more than 2% speed increase, provided that Pentium code was really
optimal, and taking into account that GCC doesn't effectively use MMX and
SSE, except for special vector intrinsics, which are used very rarely in
the code.
Also, beginning with Pentium there are large penalties for alignment of 2.
At least alignments of 4 have to be used. For values larger than 4 bytes
(like doubles), Pentium 2 has even larger optimal alignments.
So, there is a set of options to fairly satisfy vererything starting with
an old Pentium, but the one you quote is not there at all.
What i'd like to know. Is there any way to specify one platform setting
for one piece of source, and one for the other? Else i'd need to write a
custom tool. For really dense performance-oriented code, multiple versions
should be compiled, and selected at run-time.
-eye/midilcub
[reply]
[top]
[»]
quick compilation (using tinycc)
by Basile Starynkevitch - Mar 4th 2003 00:39:30
Sometimes a very small compilation time matters (even at the expense of
a bit slower execution time). For that, please consider using TinyCC, an
opensource C99 compiler for Linux/x86. See tinycc.org for details.
TinyCC compiles C code 5-10 times faster than GCC, but
the resulting generated code runs about 30% slower.
Using TinyCC might be interesting in applications which generates C
code and dynamically loads it. Alternatively, dynamic code generating
programs (or metaprograms) might consider using the GNU lightning
library (and my Qish
runtime & GC might help too).
-- Basile STARYNKEVITCH
[reply]
[top]
[»]
Re: quick compilation (using tinycc)
by eye/midiclub - Nov 2nd 2003 06:53:35
> Using TinyCC might be interesting in
> applications which generates C code and
> dynamically loads it. Alternatively,
> dynamic code generating programs (or
> metaprograms) might consider using the
> GNU lightning library (and my Qish
> runtime & GC might help too).
As is Tick C, which generates code which optimises itself at run-time. It
makes sense, when some parameters are fixed at execution time and hold for
a while. It has a crappy optimiser, but nontheless can show speeds af
multiple of GCCs.
-eye
[reply]
[top]
[»]
a sad observation (flame on?)
by The Evil Twin - Feb 18th 2003 01:17:06
Your mileage may vary, but...
My experience with GCC 3.2 is that, for a fairly well-written (read:
optimized, not neat) integer/mem code compiled with some sensible
optimization options, the code generated that is slightly slower and
larger compared to GCC 2.96 output.
I haven't had a chance to look at the output too carefully, but I've
noticed a number of examples (a filesystem driver, a memory allocator and
garbage collector, etc). Some code is, admittably, smaller and faster, but
this does not justify the impact on other fronts.
I'm glad there's some new stuff that would sure benefit modern
architectures, 3D graphics and floating point conversions, but my
impression that there is a slight decline in the quality of produced code,
and THIS comes at a price of slower compilation. Humm.
[reply]
[top]
[»]
thin article, butt presuming I am convinced.
by somebody-else - Feb 18th 2003 23:13:43
How? how to install a new gcc version? there appear to be many
dependendancies that run circular and are not happy with a new(er)
version of gcc. the "how to" url is dated 3 years and is thin with
respect to the "install section".
(covert request for assistance...)
gcc in .rpm &/or src.rpm wont install and complain about
libc.so.c & libstdc++ & glibc & none of them are happy.
sure, it is possible to upgrade the OS, but how about gcc?
usually failed dependancies are easily resolved but not so with gcc.
How about a gcc how to upgrade article addressing failed
dependencies... -clueless seeking clues...
The good news is finding out more about gcc as it fails to obey....
( smile )
[reply]
[top]
[»]
Re: thin article, butt presuming I am convinced.
by gurensan - Feb 28th 2003 16:19:31
> How? how to install a new gcc version?
> there appear to be many dependendancies
> that run circular and are not happy with
> a new(er) version of gcc. the "how to"
> url is dated 3 years and is thin with
> respect to the "install section".
> (covert request for assistance...)
>
> gcc in .rpm &/or src.rpm wont install
> and complain about
> libc.so.c & libstdc++ & glibc & none of
> them are happy.
>
> sure, it is possible to upgrade the OS,
> but how about gcc?
>
> usually failed dependancies are easily
> resolved but not so with gcc.
>
> How about a gcc how to upgrade article
> addressing failed
> dependencies... -clueless seeking
> clues...
>
> The good news is finding out more about
> gcc as it fails to obey....
> ( smile )
>
You are referring to rpms, which are not the domain of
the GCC maintainers. See whoever made your distro
about this. The GCC people are only responsible for
making sure the thing compiles on the test systems,
and only then in a full release. If you try beta (or
alpha) code, you're asking for it.
[reply]
[top]
[»]
Re: a sad observation (flame on?)
by ajs - Mar 3rd 2003 04:27:56
> Your mileage may vary, but...
>
> My experience with GCC 3.2 is that, for
> a fairly well-written (read: optimized,
> not neat) integer/mem code compiled with
> some sensible optimization options, the
> code generated that is slightly slower
> and larger compared to GCC 2.96 output.
>
> I haven't had a chance to look at the
> output too carefully, but I've noticed a
> number of examples (a filesystem driver,
> a memory allocator and garbage
> collector, etc). Some code is,
> admittably, smaller and faster, but this
> does not justify the impact on other
> fronts.
>
> I'm glad there's some new stuff that
> would sure benefit modern architectures,
> 3D graphics and floating point
> conversions, but my impression that
> there is a slight decline in the quality
> of produced code, and THIS comes at a
> price of slower compilation. Humm.
>
>
My experience and viewpoint are quite the opposite.
I have seen gcc 3.2.1 outperform gcc 2.95.3 on
video compression applications by 3-5%, on an
Athlon running Linux and also on Cygwin. Pretty
impressive. (gcc 2.96 is buggy btw, beware -- pull
down mplayer if you doubt this.)
(If anyone can give some performance numbers on
commercial compilers (esp. Intel's) vs gcc on Linux/x86
systems, please post!)
I'm quite willing to trade compile time for improved
application performance, and with the speed and cost of
modern machines, I can't imagine who wouldn't.
Looking forward to gcc3-built Linux distribtutions.
-- ajs
[reply]
[top]
[»]
Re: a sad observation (flame on?)
by sibn - Mar 4th 2003 06:27:47
> I have seen gcc 3.2.1 outperform gcc
> 2.95.3 on
> video compression applications by 3-5%,
> on an
> Athlon running Linux and also on Cygwin.
> Pretty
> impressive. (gcc 2.96 is buggy btw,
> beware -- pull
> down mplayer if you doubt this.)
>
This is getting awfully tired. gcc 2.96 was buggy in its original
tarball, to be sure. It had somewhere in the region of 350 patches issued
that corrected this problem. Mplayer had bad code in it, which prevented
it from compiling with 2.96-300(ish), and this was never formally admitted
by the mplayer developers.
They silently fixed the problem, and their spin stuck: people still think
to this day that 2.96 was a bad compiler (and like I said, 2.96-0 WAS, but
we don't live in that era any more). This is the same type of
fudmongering often seen by diehard Windows fans who say they tried Linux
but it was too complex, and immature.
For them, it may have been.... 8 years ago when they tried it. They
continue to refer back to this experience though, as if it reflects any
measure of reality. They labor under the delusion that GNU never evolves,
and that the state of the system is the same as it was 8 years ago.
This is so obviously false it is becoming difficult to find linux users
who believe that it's a difficult operating system to install and use.
The only people who truly believe this are the ones who haven't touched it
in 5 years-- just like the people who tried gcc 2.96, or read the bad press
it got when it was new.
At the time gcc 3 was released, 2.96 was better. gcc 3.1 may well be
better than 2.96, but at the time that gcc 3.0 was released, gcc 2.96 was
more mature, better tested, had wider deployment, and was more reliable.
It was also the most standards-compliant gcc to date.
-- umm... i guess this is my signature. =)
[reply]
[top]
[»]
Re: a sad observation (flame on?)
by Eric Hodel - May 17th 2003 21:51:17
>
> > (gcc 2.96 is buggy btw, beware -- pull
> > down mplayer if you doubt this.)
> >
>
> This is getting awfully tired. gcc 2.96
> was buggy in its original tarball, to be
> sure.
I don't see a GCC 2.96 tarball on gcc.gnu.org. It never was a release, so
of course it started out as buggy!
[reply]
[top]
[»]
Nice Article
by Shlomi Fish - Feb 15th 2003 15:42:51
A very good article, that clarified a few
things to me. Keep up the good work! In one of my
projects I used to set up the optimization flag as
-O3 in the makefile. When compiling for testing
and debugging, I use -g without any optimization
flags, so gdb will be happy.
Note that I recently encountered a bug that was
present only when compiling with -O2. Apparently,
without it, a variable I declared was initialized to
NULL, which was the value I implicitly expected it
to have. With -O2 it was initialized to random
bytes and so was not NULL.
So it useful giving some automatic tests to the
finally compiled executable.
[reply]
[top]
[»]
Re: Nice Article
by piman - Feb 17th 2003 23:39:52
> Note that I recently encountered a bug
> that was
> present only when compiling with -O2.
> Apparently,
> without it, a variable I declared was
> initialized to
> NULL, which was the value I implicitly
> expected it
> to have. With -O2 it was initialized to
> random
> bytes and so was not NULL.
This isn't a bug. C variables aren't guaranteed to contain any particular
value (this may be different in C99? I'm not sure, but I doubt it). Not
having to make sure it's initialized to zero saves time; ergo, an
optimization. :)
GCC probably turns it on by default to deal with compiling code that
doesn't assign initial values, so it doesn't blantently crash. IMO this is
a bad idea, since it encourages C programmers to think variables are NULL
by default.
[reply]
[top]
[»]
Re: Nice Article
by Dan Maas - Feb 24th 2003 14:52:33
> This isn't a bug. C variables aren't
> guaranteed to contain any particular
> value (this may be different in C99? I'm
> not sure, but I doubt it). Not having to
> make sure it's initialized to zero saves
> time; ergo, an optimization. :)
global and static variables ARE null (or zero) by default.
int foo;
void func()
{
static int bar;
int baz;
}
"foo" and "bar" are guaranteed to be initialized to zero. "baz" is not,
it'll be undefined.
("initialized to zero" just means the compiler/linker puts them in the
executable's "bss" segment, which is mapped with zeros before execution
begins. This is actually more efficient than explicitly initializing with
"int foo = 0;" since they won't take up space in the binary on disk, as
"int foo = 0" will!)
[reply]
[top]
[»]
pt
by omegax - Feb 15th 2003 12:47:30
estudas em coimbra?! :) n tens uma versao desse artigo em português?
-- :: omegax ::
[reply]
[top]
[»]
Re: pt
by João Seabra - CI-AAC - Feb 15th 2003 13:08:55
> estudas em coimbra?! :)
sim
>n tens uma
> versao desse artigo em português?
Não não tenho.
Quick-n-dirty translation:) "Yes.No I dont"
-- CI-Associação Académica de Coimbra
[reply]
[top]
[»]
Re: pt
by omegax - Feb 18th 2003 01:38:54
>
> % estudas em coimbra?! :)
>
>
> sim
>
> %n tens uma
> % versao desse artigo em
> português?
>
>
> Não não tenho.
>
> Quick-n-dirty translation:) "Yes.No I
> dont"
s alguma vez pensars em escrever uma versao em portugues avisa! FiKa bEM
-- :: omegax ::
[reply]
[top]
[»]
More comments
by 0x0d0a - Feb 15th 2003 12:21:54
First of all, excellent article, Joao. This is the most to-the-point,
useful, and opinion-free editorial I've seen on Freshmeat.
A few more suggestions as regards optimization -- I've done a bit of
benchmarking with different options. Most specific tweaks you can try
(above -O3) make very little difference on average code, with the
exception of three options.
First, -fomit-frame-pointer can provide a small boost (admittedly, not as
much as I'd expect), at least on x86. The drawback is that you will not
be able to get backtraces from core dumps or dying apps. This might be
worth using if you have a program that's *almost* fast enough, but not
quite, like an emulator or movie player, and you're not doing development
on it (or care about sending in bug reports).
Second, -ffast-math *can* be very helpful, though most programs will not
see much of a benefit, since usually you don't see a ton of floating-point
operations in most software. This *can*, as per the gcc man page, break
correct software, but I've yet to run into a package that it causes
problems with.
Third, -fstrict-aliasing produces a speedup of around 10% in snes9x.
While strict ANSI code should not be broken by it, it's relatively easy
for someone to write code that *does* break with -fstrict-aliasing. I
haven't seen many problems with it.
Fourth, -DNDEBUG isn't technically a compiler flag, but will tell the
preprocessor not to evaluate assert() conditions. Decent for production
builds. Most developers avoid having assert()s in inner loops *anyway*,
so this is unlikely to provide a huge speedup. Also, while code with side
effects should not be placed in assert() statements, this is easy to do --
and code of this nature will break with -DNDEBUG on. For most software,
very minor benefits.
Fifth, -DG_DISABLE_ASSERT is a similar flag to -DNDEBUG, but applies to
g_assert() in the glib package (used by gnome and gtk software). Again,
for most software, very minor or nonexistant benefits.
Sixth, there are a few new arch types in gcc 3.2. If you used to use
-march=i686 but have a pentium 2, you should now be using
-march=pentium2.
Seventh, while the real-world benefits appear to be minimal, I've written
some simple tests to see if the optimizer rips out branches that should
obviously be dead code. gcc does not do so without
-fexpensive-optimizations. OTOH, while I feel that
-fexpensive-optimizations generates more appealing machine language, I
haven't seen any huge performance benefits granted by it.
And just for the heck of it, (while this isn't really
optimization-related), always compile with -pipe and -Wall. -Wall *will*
help you find bugs, and -pipe will speed up compilation (in some packages,
by a lot).
[reply]
[top]
[»]
Three thoughts
by imipak - Feb 15th 2003 09:36:38
First, the maximum optimization on modern GCCs is, indeed, -O3. This has
not always been the case. Higher optimizations have existed in (much)
earlier versions, usually undocumented.
I believe the highest optimization ever recognized was a massive -O6.
We're talking 10 years or so ago, here. At some point, -O6 and -O5
vanished, leaving the KotH to -O4, which itself shortly vanished.
These were never official optimizations, to the best of my knowledge, and
use of them (even on those GCCs that supported them) was usually
considered as kamakazi coding.
The second point I want to raise is with architecture support. The number
of architectures supported in GCC is declining. I took a look at the pages
for GCC, and it's scary. I want to take this opportunity to borrow the
Cluebat from UserFriendly and swat anyone who has contributed to this
decline.
If GCC/Glibc2 are to become universally acceptable, they must FIRST be
universally usable. We are far from that state. Glibc2 has gone through
numerous revisions and clean-ups, and doesn't even run on a tenth of the
systems Glibc1 did.
Sorry, but that ain't progress, in my books. Nobody is going to adopt an
environment they can't use. That's the bottom line. It doesn't matter that
the latest GCC and Glibc are brilliant (although Glibc needs better pointer
handling). What matters is that if the only choice of development
environment for a platform is proprietary, then Joe and Jane Average
Programmer will believe that proprietary programming is what works.
People don't listen to what others say, they listen to what others do.
Last, but by no means least, more languages need to be moved into GCC.
Guile, Smalltalk, perhaps ELisp - these are all candidates. (Elisp? Yeah!
All you need is support for an Elisp bytecode target, and you've got an
Elisp bytecode compiler in GCC.)
It would be cool if GPLed compilers for Cobol and Algol could make their
way into GCC, too. Why??? Because Cobol is still heavily used, and Algol
makes for a great teaching language.
[reply]
[top]
[»]
Re: Don't spread Gcc more thin
by Bryan Henderson - Feb 15th 2003 11:12:35
I would rather have a Gcc developer spend his time making it work better
for IA32 than making it work on some other architecture.
Similarly, I'd rather have Gcc get really good at a few good languages
than be so-so in a dozen of them.
People have a little bit of choice over what architecture and language
they use. If all of them have mediocre compilers, that's not much of a
choice. If one of them has a superlative compiler, it's something to
think about switching to.
I believe Gcc development resources are limited.
[reply]
[top]
[»]
Re: Don't spread Gcc more thin
by Trizt - Feb 15th 2003 16:54:20
> I would rather have a Gcc developer
> spend his time making it work better for
> IA32 than making it work on some other
> architecture.
Here I can't agree with you at all, GCC is an important tool for many
operating systems which runs on a lot of different CPUs. I do think that
PPC and Sparc should be as much supported as x86 CPUs, of course old CPUs
as 8088, mc68k won't get as much attention.
> Similarly, I'd rather have Gcc get
> really good at a few good languages than
> be so-so in a dozen of them.
I do agree here... the main manpower should be amied into the languages
that are supported now (IMHO C adn C++, don't care much for the rest).
-- //Trizt
[reply]
[top]
[»]
Re: Three thoughts
by Shlomi Fish - Feb 15th 2003 16:01:18
> The second point I want to raise is with
> architecture support. The number of
> architectures supported in GCC is
> declining. I took a look at the pages
> for GCC, and it's scary. I want to take
> this opportunity to borrow the Cluebat
> from UserFriendly and swat anyone who
> has contributed to this decline.
>
> If GCC/Glibc2 are to become universally
> acceptable, they must FIRST be
> universally usable. We are far from that
> state. Glibc2 has gone through numerous
> revisions and clean-ups, and doesn't
> even run on a tenth of the systems
> Glibc1 did.
>
> Sorry, but that ain't progress, in my
> books. Nobody is going to adopt an
> environment they can't use. That's the
> bottom line. It doesn't matter that the
> latest GCC and Glibc are brilliant
> (although Glibc needs better pointer
> handling). What matters is that if the
> only choice of development environment
> for a platform is proprietary, then Joe
> and Jane Average Programmer will believe
> that proprietary programming is what
> works.
>
> People don't listen to what others say,
> they listen to what others do.
>
That's interesting information. I'll have to check it
out myself to verify this is the case. The question
is how much interest is there in porting Glibc
and/or gcc to these architectures. I know Cygnus
got a lot of income from doing just that.
> Last, but by no means least, more
> languages need to be moved into GCC.
> Guile, Smalltalk, perhaps ELisp - these
> are all candidates. (Elisp? Yeah! All
> you need is support for an Elisp
> bytecode target, and you've got an Elisp
> bytecode compiler in GCC.)
>
Guile is a Scheme _Interpreter_. Interpreters are
much more simple to write and maintain than
compiler front-ends, especially for symbolic
high-level languages such as Scheme, Perl,
Python, etc. There are Scheme compilers out
there, but I don't think the GNU people wish to
pursue this direction because guile is not intended
to run very quickly as it is. Not more than perl or
python, in any case.
Same goes for Elisp. Better keep the code clean
than start hacking on a useless GCC front-end. I
don't know too much about Smalltalk.
> It would be cool if GPLed compilers for
> Cobol and Algol could make their way
> into GCC, too. Why??? Because Cobol is
> still heavily used,
Right. There is Tiny COBOL or whatever, but I
don't know if it's as flexible as gcc. Are the various
COBOL implementations adhere to some kind of
common standard?
Of course, if you ask me, from what I've heard and
know of COBOL, it is so limited and brain-dead,
that it would be a good idea to re-implement all
this aging COBOL code that can be found around
in something more sensible. (from C to Java to Perl
and friends) I heard a statistics that claimed that
most of the code in Israel is in COBOL, and I was
quite surprised to hear that.
But of course, you still need COBOL compilers.
> and Algol makes for
> a great teaching language.
>
Algol is a very old language. Last I heard it was
superceded by Pascal as far as learning is
concerned. I believe one can find better languages
to teach programming today than Pascal as well.
Aren't there some Algol interpreters around?
[reply]
[top]
[»]
Re: Three thoughts
by unknown_lamer - Feb 26th 2003 15:20:34
> Guile is a Scheme _Interpreter_.
> Interpreters are
> much more simple to write and maintain
> than
> compiler front-ends, especially for
> symbolic
> high-level languages such as Scheme,
> Perl,
> Python, etc. There are Scheme compilers
> out
> there, but I don't think the GNU people
> wish to
> pursue this direction because guile is
> not intended
> to run very quickly as it is. Not more
> than perl or
> python, in any case.
>
> Same goes for Elisp. Better keep the
> code clean
> than start hacking on a useless GCC
> front-end. I
> don't know too much about Smalltalk.
Actually, Guile is eventually (hopefully soon) going to compile to
bytecode and probably to machine code using a GCC frontend. Guile does
need to run fast because Emacs will eventually be ported to run on Guile
instead of using Elisp (There will be an Elisp translator so you will be
able to use either one).
[reply]
[top]
[»]
Rename-registers
by jimfaulkner - Feb 15th 2003 07:12:00
I thought that register renaming benefits
register-starved architectures the most?
On a processor with 8 general purpose registers
(x86), functions are more likely to use the same
register for their variables, so register renaming would
be more beneficial.
On a processor with 32 general purpose registers
(sparc), the compiler is much less likely to run out of
registers for holding variables, so register renaming
does not do so much good.
At least that's what my compiler design professor told
me.
[reply]
[top]
[»]
Re: Rename-registers
by GladeSoft - Feb 19th 2003 00:15:10
> I thought that register renaming
> benefits
> register-starved architectures the most?
That's what I thought when I read that. I'm almost sure your compiler
professor is correct. Relaxing strict register assignments is much more
likely to produce better x86 code... assuming its properly implemented.
I haven't taken a look at GCC's optimizer since pre 2.95. Anybody care to
do a quick analysis?
[reply]
[top]
[»]
What about -Os?
by QuoteMstr - Feb 15th 2003 06:45:50
Since most software isn't cpu-bound, and since memory and disk are also
limited resources, why not try -Os?
`-Os'
Optimize for size. `-Os' enables all `-O2' optimizations that do
not typically increase code size. It also performs further
optimizations designed to reduce code size.
Code compiled with this option would run just as fast (in wall-clock time,
since it isn't cpu-bound), but reduces memory consumption, leaving more
space for disk caches.
[reply]
[top]
[»]
Re: What about -Os?
by Fredrik Mellström - Aug 15th 2003 10:44:43
> Since most software isn't cpu-bound, and
> since memory and disk are also limited
> resources, why not try -Os?
Well, perhaps because regardless of what the documentation says, -Os and
-O2 do exactly the same thing? ;-)
(At least on gcc 3.2.3 and gcc 2.95.3; I don't have any other versions
around I could try.)
[reply]
[top]
[»]
Optimizations in general
by submissions - Feb 15th 2003 04:24:47
When I was writting ncc, I thought about the basic levels of optimization.
These seem to be:
0) No optimization. Compiler just produces correct code. -O(-1)
1) Decent optimization. Compiler is a little smarter. This is the default
gcc output -O1
2) Good optimization. Compiler does some basic jump prediction, inlining
and architecture hacks. At this level we can talk about a decent compiler.
This is not gcc -O2 yet!
3) Extreme optimizations: Here the compiler tries to be very smart. By
looking at the output assembly, one would not be able to understand the
structure of the program. There are two subcategories
a) Black magic stuff. Move blocks of code around, things disappear and
reappear elsewhere, etc. This is usually gcc -O2/O3
b) Extreme architecture hacks. This is the intel's C compiler main
advantage.
- This it the upper limit -
- Here some visioners dream of extreme features -
4) Infinite compiler intelligence which approximates "The
programmer". This is a utopia for all compiler developers.
So what we really need from our compiler is to get to level 2. From then
on we can optimize it manually. For the paranoid there is always
assembly.
I'd like to add that
1) Many interactive programs do not need much optimization. Just good
design.
2) In many of the programs that do need optimization, it's easy for the
programmer to abstract the heavy loops and spend some more time with human
optimizations on them.
For example in quake, Carmack has written a special copy_to_screen
function which is based on using the cache efficiently.
Anyway, this is a nice article because it sums up the huge gcc manual. Now
that's optimization!
[reply]
[top]
[»]
Re: Optimizations in general
by Schneelocke - Feb 15th 2003 07:21:27
> 4) Infinite compiler intelligence which
> approximates "The programmer".
> This is a utopia for all compiler
> developers.
A utopia indeed. From what I recall, this - generating the best code
possible under all circumstances - is provably impossible. And of course,
even defining just what is the "best code" is hard enough in itself. :)
[reply]
[top]
[»]
Re: utopia
by Bryan Henderson - Feb 15th 2003 10:56:59
If it's utopia, then it doesn't just approximate "the programmer," it
matches it.
And it does more than match it. Much of the optimization that compilers
do today exceeds the capability of a typical human programmer, and we'd
want to keep that.
[reply]
[top]
[»]
Re: utopia
by submissions - Feb 15th 2003 13:44:45
>
> And it does more than match it. Much of
> the optimization that compilers do today
> exceeds the capability of a typical
> human programmer, and we'd want to keep
> that.
>
An example from Stroustrup:
int f (int n)
{
if (n==1) return 1;
return n * f(n-1);
}
A super intelligent compiler would replace f(3) to the value 6.
That's the utopia.
On the other hand I do agree that compilers today produce better
*assembly* than the typical human programmer.
[reply]
[top]
[»]
Re: utopia
by Ed Avis - Feb 15th 2003 14:54:40
Have a look at C-Mix as a
candidate for your 'super intelligent compiler'. It should handle the
factorial example you give.
However it's still true that no compiler can optimize perfectly.
-- Ed Avis
[reply]
[top]
[»]
Re: utopia
by Leon Brooks - Feb 15th 2003 18:32:52
> A super intelligent compiler would replace f(3) to the
value 6.
>
> That's the utopia.
DEC ForTran did this in 1980. Friend was
benchmarking it and got unreasonably good results, made
it spit out the asm, which was a single instruction:
print-fixed-string-and-exit with the correct end result.
[reply]
[top]
[»]
Re: utopia
by renoo - Feb 18th 2003 00:46:46
>
> An example from Stroustrup:
>
> int f (int n)
> {
> if (n==1) return 1;
> return n * f(n-1);
> }
>
> A super intelligent compiler would
> replace f(3) to the value 6.
>
> That's the utopia.
>
No, that's the future. In fact, computing facto at compile time can
already be done in c++ using templates.
template
struct facto
{
static int value = facto::value * n;
}
template < >
struct facto
{
static int value = 1;
}
I think the compiler can replace f(3) by 6 by copy propagation
techniques.
Utopia is perhaps something like this:
int fibo(int n)
{
if (n
[reply]
[top]
[»]
Re: utopia
by submissions - Feb 18th 2003 01:46:43
>
> I think the compiler can replace f(3) by
> 6 by copy propagation techniques.
>
> Utopia is perhaps something like this:
Yes. That was merely an example of "extreme features". One can think of
100 similar examples (not necessarily related to maths), which a compiler
"could" optimize "if" it had an extremely complex optimization
algorithm.
The fibo example proves that even propagation techiques are not always
possible (what if the entire program is about find_prime_no (10^10)? This
is a constant that could be computed at compile time, but unfortunatelly
it will take several years for the compiler to compute it)
[reply]
[top]
[»]
Re: utopia
by TheMeld - Feb 25th 2003 17:02:35
> The fibo example proves that even
> propagation techiques are not always
> possible (what if the entire program is
> about find_prime_no (10^10)? This is a
> constant that could be computed at
> compile time, but unfortunatelly it will
> take several years for the compiler to
> compute it)
This is trivially dealt with. Any system to do recursive constant
propogation like this generally has (implcitly or explicitly) a limit to
the depth of evaluation. Explicit can be in the sense of rules like "only
do recursive function constant evaluation up to a depth N". Implicit can
be in the sense of the compiler using recursive functions to do the
expansion and stack overflowing itself when it gets too deep. The latter
of course would trigger a bug report by someone and probably get
translated to the former.
The one (unlikely?) possibility is if the compiler's evaluation technique
uses tail recursion or some other method that won't trigger a stack
overflow, in which case it will, as you said, run for several years.
Personally I would file that behavior as either a bug that can be fixed
with the explicit recursion limit, or under the general heading of GIGO
(Garbage In ...)
[reply]
[top]
[»]
Re: utopia
by Rafael 'Dido' Sevilla - Feb 18th 2003 22:37:01
> int f (int n)
> {
> if (n==1) return 1;
> return n * f(n-1);
> }
>
> A super intelligent compiler would
> replace f(3) to the value 6.
Well, you could wind up getting a compiler that goes on a vain attempt at
solving the halting problem if you tried to do this. What would your
compiler do if it got f(-1), an expression that never halts? It wouldn't
be able to figure out that it's doing infinite recursion (if you found an
algorithm to determine this in the fullest generality, you would have
solved the halting problem, and proved Alan Turing and Kurt Goedel
wrong).
Finding the fastest code to perform a certain task is clearly an
undecidable problem (this is worse than NP-complete, as it can be
mathematically shown that an algorithm simply doesn't exist). The best I
think that can be accomplished is a few heuristics that conform to some
simple facts we know about how to speed code up.
[reply]
[top]
[»]
Re: utopia
by Victor Bogado - Feb 25th 2003 11:22:28
>
> % int f (int n)
> % {
> % if (n==1) return 1;
> % return n * f(n-1);
> % }
> %
> % A super intelligent compiler would
> % replace f(3) to the value 6.
>
>
> Well, you could wind up getting a
> compiler that goes on a vain attempt at
> solving the halting problem if you tried
> to do this. What would your compiler do
> if it got f(-1), an expression that
> never halts? It wouldn't be able to
> figure out that it's doing infinite
> recursion (if you found an algorithm to
> determine this in the fullest
> generality, you would have solved the
> halting problem, and proved Alan Turing
> and Kurt Goedel wrong).
>
> Finding the fastest code to perform a
> certain task is clearly an undecidable
> problem (this is worse than NP-complete,
> as it can be mathematically shown that
> an algorithm simply doesn't exist). The
> best I think that can be accomplished is
> a few heuristics that conform to some
> simple facts we know about how to speed
> code up.
This algorithm always stops even if you ask f(-1), you must remember that
the C int is not an infinit set as the math Z. when the counter gets to
the value MININT it will turn into MAXINT and goes down from there tilll
it gets in 1. The simple code below will show this :
---
#include
int main()
{
printf ("%d %d %d %d\n", MININT -1, MININT, MAXINT, MAXINT+1);
}
---
linux redhat 8.0 shows on an intel pentium 4 CPU shows:
2147483647 -2147483648 2147483647 -2147483648
[reply]
[top]
[»]
Re: utopia
by Rafael 'Dido' Sevilla - Feb 25th 2003 19:28:15
> This algorithm always stops even if you
> ask f(-1), you must remember that the C
> int is not an infinit set as the math Z.
> when the counter gets to the value
> MININT it will turn into MAXINT and goes
> down from there tilll it gets in 1. The
> simple code below will show this :
Aye, but you still get wrong code. :P But I'm not just talking
about the factorial function, of course. Let's generalize the situation.
What if you had a function which for certain unspecified (and probably
unknown) inputs goes into an infinite loop through some complex
contortions? A compiler presented with code with an application of this
function to one of those unspecified constants that induces infinite loops
would also loop forever attempting to fold the constant. Remember that
there's no algorithm capable of finding infinite loops in their fullest
generality (the halting problem again).
Also, for a pure functional language (with no side effects, static
binding, and so forth), such a strategy of folding constants from function
applications might be feasible (but see above for caveats), but for an
imperative language which depends on side effects, the algorithm explodes
in complexity. If the value of a function happens to depend on side
effects outside of its scope, what do you do? Throw away all of the work
you made before figuring out that you have to run the program in its
entirety? Jeez, if you're going to all this trouble, don't bother
compiling your program. Write an interpreter. :)
[reply]
[top]
[»]
Re: utopia
by Victor Bogado - Feb 26th 2003 03:25:30
>
> Aye, but you still get wrong code. :P
> But I'm not just talking about the
> factorial function, of course. Let's
> generalize the situation. What if you
> had a function which for certain
> unspecified (and probably unknown)
> inputs goes into an infinite loop
> through some complex contortions? A
> compiler presented with code with an
> application of this function to one of
> those unspecified constants that induces
> infinite loops would also loop forever
> attempting to fold the constant.
> Remember that there's no algorithm
> capable of finding infinite loops in
> their fullest generality (the halting
> problem again).
>
> Also, for a pure functional language
> (with no side effects, static binding,
> and so forth), such a strategy of
> folding constants from function
> applications might be feasible (but see
> above for caveats), but for an
> imperative language which depends on
> side effects, the algorithm explodes in
> complexity. If the value of a function
> happens to depend on side effects
> outside of its scope, what do you do?
> Throw away all of the work you made
> before figuring out that you have to run
> the program in its entirety? Jeez, if
> you're going to all this trouble, don't
> bother compiling your program. Write an
> interpreter. :)
You are compleatly right, I was just pointing out a fact that some people
don´t see. Many problems, including security related, in programs comes
from unexpected side efects like this one. Or maybe I was simply being a
tigth a**. :-)
My sugestion is simple, why not create a modifier for functions that would
hint
the compiler that this function depends only on it's parameters and
nothing else. The compiler would then be able to mark call's to this
functions that have static parameters and then when the liker see those
calls it would simply call the
function and replace the call entirely with the result.
This could open a lot of security problems (think about, a compiler
calling compiled code), and I'm not shure it should be ever
implemented.
[reply]
[top]
[»]
Re: utopia
by David Cheatham - May 3rd 2003 08:49:08
> My sugestion is simple, why not create a
> modifier for functions that would hint
> the compiler that this function depends
> only on it's parameters and nothing
> else. The compiler would then be able to
> mark call's to this functions that have
> static parameters and then when the
> liker see those calls it would simply
> call the
> function and replace the call entirely
> with the result.
>
>
>
gcc already has this. info gcc and search for 'pure', which implies a
function doesn't do anything except return a result based on global
functions and passed params. gcc 3 has an even stricter one called 'const'
which implies it doesn't even look at global functions.
While I doubt gcc actually does this, it would be perfectly legal to just
evaluate a function marked 'const' during a *compile* and sub in the value
of it.
[reply]
[top]
[»]
Re: utopia
by firecode - Mar 22nd 2003 14:17:38
> Well, you could wind up getting a
> compiler that goes on a vain attempt at
> solving the halting problem if you tried
> to do this. What would your compiler do
> if it got f(-1), an expression that
> never halts? It wouldn't be able to
> figure out that it's doing infinite
> recursion (if you found an algorithm to
> determine this in the fullest
> generality, you would have solved the
> halting problem, and proved Alan Turing
> and Kurt Goedel wrong).
>
> Finding the fastest code to perform a
> certain task is clearly an undecidable
> problem (this is worse than NP-complete,
> as it can be mathematically shown that
> an algorithm simply doesn't exist). The
> best I think that can be accomplished is
> a few heuristics that conform to some
> simple facts we know about how to speed
> code up.
You are correct but only in theory. However, in practice
NP-completeness and Turing's results have very little meaning (IMHO). One
can usually come up with algorithms where probability of failure can be
made to be small enough.
For example many pattern recognition (PR) problems are NP-complete and/or
mathematically ill-conditioned (I think), but one can usually make
probability of failure small enough.
If one has more a priori problem specific information about the specific
problem than in many PR problems then it's possible to have guaranteed
bounds for error.
For example in one practical case: c*10^-n probability for failure and
computational requirements: O(n). Now take n = 1000. (c
[reply]
[top]
[»]
Re: Optimizations in general
by Jancs - Feb 21st 2003 01:27:29
So, in common, if i am regular user who posess dual celeron box, compiles
programs for it's own use and rarely bothers about bug-hunt, would be such
cflags ok (gcc 3.2.1):
-O2 -march=pentium2 -mcpu=pentium2 -fomit-frame-pointer ?
I do nto know about -pipe, but i often noticed that -Wall is used by
default.
Does it have sense to use such optimizations if the most part of system is
built with -O2 -march=i386 -mcpu=i686?
[reply]
[top]
[»]
Re: Optimizations in general
by submissions - Feb 21st 2003 05:14:43
> So, in common, if i am regular user who
> posess dual celeron box, compiles
> programs for it's own use and rarely
> bothers about bug-hunt, would be such
> cflags ok (gcc 3.2.1):
> -O2 -march=pentium2 -mcpu=pentium2
> -fomit-frame-pointer ?
> I do nto know about -pipe, but i often
> noticed that -Wall is used by default.
>
> Does it have sense to use such
> optimizations if the most part of system
> is built with -O2 -march=i386
> -mcpu=i686?
>
If I was distributing a program that could benefit from optimization, I'd
make ./configure set the best values for each system.
I never change ./configure defaults.
IMHO these options should concern the developers, provided autoconf can
determine/set architecture flags correctly.
[reply]
[top]
[»]
Re: Optimizations in general
by Jancs - Feb 21st 2003 06:42:34
> % Does it have sense to use such
> % optimizations if the most part of
> system
> % is built with -O2 -march=i386
> % -mcpu=i686?
>
> If I was distributing a program that
> could benefit from optimization, I'd
> make ./configure set the best values for
> each system.
>
> I never change ./configure defaults.
To compile the programs, i use the building frame of slackware, and it
contains mentioned cflags set to i386/i686. As i watched the compilation
process dump on screen, configure takes them as defaults (may be i am
wrong?)
[reply]
[top]
[»]
Re: Optimizations in general
by submissions - Feb 21st 2003 14:43:13
>
> % % Does it have sense to use such
> % % optimizations if the most part of
> % system
> % % is built with -O2 -march=i386
> % % -mcpu=i686?
> %
> % If I was distributing a program that
> % could benefit from optimization, I'd
> % make ./configure set the best values
> for
> % each system.
> %
> % (AS A USER) I never change ./configure defaults.
>
>
> To compile the programs, i use the
> building frame of slackware, and it
> contains mentioned cflags set to
> i386/i686. As i watched the compilation
> process dump on screen, configure takes
> them as defaults (may be i am wrong?)
I don't know. Probably yes, it overrides the defaults to better values.
It's apparent from this discussion that maybe gcc should provide a
-mselect-best-arch-for-current-sysem, option where it would check the cpu
of the machine and set the best optimization flags for it.
Then everybody would be happiest
[reply]
[top]
[»]
Frame Pointer
by Nils O. Selåsdal - Feb 15th 2003 04:12:57
Let's not forget -fomit-frame-pointer (see here), this
frees a register in the CPU which in itself is good, and therefor also
makes the compiler do more optimizations. Maybe not the biggest point on
Alphas with lots of internal registers, but nice for x86. Note, it also
makes debugging impossible on x86, but normal users won't need that.
[reply]
[top]
[»]
Re: Frame Pointer
by Michael Sweet - Feb 15th 2003 06:29:40
IIRC, -fomit-frame-pointer prevents libsafe and other tools from working,
since they need the frame pointer to compute the upper bounds of the stack
in the current function.
[reply]
[top]
[»]
Re: Frame Pointer
by rainy - Mar 7th 2003 22:10:15
But do note that code size increase quite a bit (because stack references
via ESP are one byte larger) when omitting the frame pointer. If you are
compiling something small and computationally-intensive like gzip, this
may help. But if you are compiling something big and not that
computationally-intensive, such as the kernel or mozilla, it is often
better to preserve the frame pointer and use -Os to reduce code size even
more. You may even gain speed because the code fits more nicely in the
cache.
[reply]
[top]
[»]
some points still missing
by Freek - Feb 15th 2003 01:47:23
"-mcpu generates code tuned for the specified CPU[...] so you can
still run the resulting binary on other CPUs (it turns on flags like
mmx/3dnow, etc.)"
The phrase in ( ) is NOT true (afaik) - gcc will >schedule<
instructions according to the specified -mcpu, but will not use
instructions not available by generic i386/pentium (I'm not sure if 386 is
still the "reference") processors; as such, MMX or 3DNow!
instructions are not used.
As an example, I can compile with -mcpu=i686, but still won't see cmov in
the ASM-code; -march=i686 may run on my K6, but only if the compiler sees
no need for cmov, for example.
The article is great in informing about common misconceptions.
But I miss giving a rough to more detailed explanation what the specific
optimizations do.
E.g. -fomit-frame-pointer will give you another GP-register to use (%ebp)
at the cost of debugging no more available (at least on x86); this
register is normally used to indicate the stack-frame of the current
function, but costs ~2 instructions more per function-call as it needs to
be maintained, so it's a good option to specify.
And there are lots of other options, fstrict-aliasing, fstrength-reduce,
...
And please - developers - don't turn on -g (debugging) by default...
Thank you.
[reply]
[top]
[»]
Re: some points still missing
by Gerhard Häring - Feb 15th 2003 05:44:16
> And please - developers - don't turn on
> -g (debugging) by default...
Why not? If executable size is a problem, you can 'strip' them.
[reply]
[top]
[»]
Re: some points still missing
by Lostguy - Feb 15th 2003 07:20:02
> Why not? If executable size is a
> problem, you can 'strip' them.
What about leave the default build without '-g' and create a debug rule
for people who will do debugging ? Most users don't even know what's
strip. They only need a working and fast app.
[reply]
[top]
[»]
Re: -g by default
by Bryan Henderson - Feb 15th 2003 11:04:50
Developers turn on -g by default as a means of setting policy for their
users -- the ones not sophisticated or interested enough to do strips or
use non-default configuration options. They want copies of their programs
in the field to have the debugging symbols in them so they can solve
problems with them easily. They determine that this debuggability is more
important than the resource savings of omitting -g.
[reply]
[top]
[»]
Re: some points still missing
by ButterBrain - Feb 15th 2003 11:06:30
> They only need a working and fast app.
. . . except when it breaks and they need to send in debugging
information. This is why gcc should support saving the symbol table for a
stripped binary. It would still need some kind of loader hack, though.
[reply]
[top]
[»]
Re: some points still missing
by FTC - Feb 15th 2003 12:12:24
>
> % Why not? If executable size is a
> % problem, you can 'strip' them.
>
> What about leave the default build
> without '-g' and create a debug rule for
> people who will do debugging ? Most
> users don't even know what's strip.
> They only need a working and fast
> app.
>
100% true. Even as a developer, I've only used -g while debuging an app.
Never for production releases.
-- Bye!!!
FTC.
[reply]
[top]
[»]
Re: some points still missing
by Kevin - Feb 15th 2003 12:20:32
>
> 100% true. Even as a developer, I've
> only used -g while debuging an app.
> Never for production releases.
>
How do you debug core files? It is very time consuming without
symbols.
-Kevin
[reply]
[top]
[»]
Re: some points still missing
by Nils O. Selåsdal - Feb 16th 2003 03:09:57
>
> %
> % 100% true. Even as a developer, I've
> % only used -g while debuging an app.
> % Never for production releases.
> %
>
>
> How do you debug core files? It is very
> time consuming without symbols.
>
> -Kevin
Do you think my father, my boss and most other normal users will ever
need/want to do that. They would rather see some speed. (Yes, I know -g
doesnt have any speed penalties, but -fomit-frame-pointer should also be
"default" )
[reply]
[top]
[»]
Re: some points still missing
by Greg A. Woods - Feb 15th 2003 14:39:32
>
> % Why not? If executable size is a
> % problem, you can 'strip' them.
>
> What about leave the default build
> without '-g' and create a debug rule for
> people who will do debugging ? Most
> users don't even know what's strip.
> They only need a working and fast
> app.
>
Indeed! Why not leave '-g' enabled by default, especially for simple
languages like C?
Users can indeed always strip a binary if it seems too big for them.
However users can never unstrip a binary.
The key word here is "working", and many applications need a lot of help
with that part. Using '-g' allows the user to give (with appropriate
recipies supplied by the developer) much better feedback when something
fails catastrophically, as things all too often do. Developers can't
always get direct access to the core dump, and sometimes providing an
exact matching binary with debugging symbols is not possible either.
As for "fast", well in most modern applications that comes about through
good design, not compiler optimisation. (and -g does not necessarilly
slow anything down on the average modern system)
-- Greg A. Woods
[reply]
[top]
[»]
Re: some points still missing
by Kevin - Feb 15th 2003 12:19:13
> And please - developers - don't turn on
> -g (debugging) by default...
> Thank you.
Why? The debugging symbols are only used by a debugger. They take up no
memory when you run the binary. Symbols only consume some disk space. As
a developer it is so much easier for me to have the symbols.
-Kevin
[reply]
[top]
[»]
Re: some points still missing
by Ed Avis - Feb 15th 2003 14:58:32
I kinda like the system Windows uses where the debugging symbols are in a
separate file alongside the executable. So compiling a program generates
both foo.exe and foo.sym (IIRC). You can choose to install the symbols
alongside the executable, or not.
-- Ed Avis
[reply]
[top]
[»]
Re: some points still missing
by reduz - Feb 15th 2003 23:26:39
> I kinda like the system Windows uses
> where the debugging symbols are in a
> separate file alongside the executable.
> So compiling a program generates both
> foo.exe and foo.sym (IIRC). You can
> choose to install the symbols alongside
> the executable, or not.
Not only that, it makes linking faster, and that
without mentioning microsoft's awesome incremental
compiler WHICH BINUTILS LACKS HORRIBLY, THUS
MAKING MY PROJECTS TAKE MINUTES LINKING..
AND USING SEVERAL DOZENS OF MEGABYTES!!
but as someone said about the world of open source
"Welcome to the world of half implemented features"
[reply]
[top]
[»]
Re: some points still missing
by claudio martella - Feb 16th 2003 06:42:56
>
> Not only that, it makes linking faster,
> and that
> without mentioning microsoft's awesome
> incremental
> compiler WHICH BINUTILS LACKS HORRIBLY,
> THUS
> MAKING MY PROJECTS TAKE MINUTES
> LINKING..
> AND USING SEVERAL DOZENS OF MEGABYTES!!
>
>
> but as someone said about the world of
> open source
> "Welcome to the world of half
> implemented features"
Why don't you use your beautiful closed windows programming environment in
order to implement the other half of the unimplemented features instead of
complaining about the job of volunteers?
[reply]
[top]
[»]
Re: some points still missing
by MinnaKirai - Feb 19th 2003 10:37:21
Incremental compiling is protected under US patent 5,586,328. Open-Source
implementations will be illegal until 2017.
[reply]
[top]
[»]
Re: some points still missing
by olsner - Mar 16th 2003 09:01:20
> Incremental compiling is protected under
> US patent 5,586,328. Open-Source
> implementations will be illegal until
> 2017.
"Open-Source implementations available for distribution in the
US ...", you mean?
[reply]
[top]
[»]
Re: some points still missing
by Paul Wise - Apr 10th 2003 01:22:53
> "Open-Source implementations available
> for distribution in the US ...", you
> mean?
No he means "non-licenced implementations existing in the USA"
-- bye, pabs
[reply]
[top]
[»]
Re: some points still missing
by Tom - Oct 2nd 2004 04:31:15
> Incremental compiling is protected under
> US patent 5,586,328. Open-Source
> implementations will be illegal until
> 2017.
Only if the patent holds up in court, which seems unlikely, given that the
technique is many years older than that.
[reply]
[top]
[»]
Re: some points still missing
by Bernd - Mar 22nd 2003 18:03:19
You can do this by:
- Compile with -g
- copying the binary: cp -pv progname progname.debug
- striping the debug information from progname
- install progname and progname.debug(or keep it).
Then you have the stripped binary and the debug binary
and if you need to debug or analyze a core file, just use
the debug binary for example with gdb to analyze the core file:
gdb progname.debug core
[reply]
[top]
[»]
Re: some points still missing
by David Cheatham - May 3rd 2003 08:55:11
> You can do this by:
> - Compile with -g
> - copying the binary: cp -pv progname
> progname.debug
>
> - striping the debug information from
> progname
> - install progname and progname.debug(or
> keep it).
>
> Then you have the stripped binary and
> the debug binary
> and if you need to debug or analyze a
> core file, just use
> the debug binary for example with gdb to
> analyze the core file:
>
> gdb progname.debug core
Or you can just ship stripped *known* binaries, and then ask users (who
don't compile themselves) to email the core to you, where you have the
unstripped binary.
If they compile themselves, of course, it should probably default to debug
builds.
[reply]
[top]
|
 |