mainand command line arguments
The C language is closely related to UNIX. Although the first UNIX on the PDP-7 machine was written in assembly, when porting the assembly to PDP-11 the authors decide to rewrite UNIX in a higher level language. The language that was available (B) to UNIX authors was deemed not suitable and Brian Kernighan and Denis Richie created a new one, calling it C (as C follows B). Book by the authors, The C Programming Language (1978), is highly regarded and even today is considered a model of technical writing (often called just K&R).
C was 1989 standardized by the American National Standards Institute, this version is known as C89 or simply ANSI C. The latest version is C18 which is just a clarification a much more extensive revision C11 (from 2011).
The K&R has a second edition that was updated for ANSI C, unfortunately not for the newer version.
C++ started its life as “C with classes” – adding object oriented features to the C language. It even used the C compiler: code was transpiled to C and only then compiled to machine code. The name translates as “next after C”. C++ was first standardized in 1990, the lates version is C++17 with numerous changes/additions with respect to the previous version C++11 and C++14.
Novadays C++ is a huge language with many features not present in C (in particular namespaces, templates, references and functions overloading). While ANSI C is quite compact (full specification of the language and the standard library is 550 pages, K&R is less then 300 pages), the standard for C++ is over 1400 pages (and often references the C standard). Mastering the subtleties of C++ takes years. Writing a standard-compliant C++ compiler is a huge undertaking.
Both languages are closely connected and the most often used compilers for C and C++ are created together. Although technically a C++ compiler is not a C compiler, it comes close.
Both languages are considered mature for demanding jobs: they have excellent compilers and performance of generated code is usually comparable to human-tuned assembly. Compared to other languages there are also drawbacks: both languages are considered verbose (C more than C++) and quite low-level. For example both languages force the programmer to manage memory when newer languages (e.g., Java) do it themselves. Programmer’s errors with memory management often lead to security vulnerabilites.
Even though they are higher level than assembly the code written in C/C++ is often not portable between operating systems. Even if a program is portable it needs to be recompiled. Other languages do not have these problems: a Java program can often be run on Windows and Linux without any changes.
Most often used compiler are:
g++(GNU Compiler Collection, free software)
clang++(from the LLVM project, free software)
icpc(Intel C Compiler, proprietary)
cl(in MS Visual C++, proprietary)
GCC is a collection of compilers from the GNU Project. A working compiler was one of the first goals of the Project. Today GCC is modern, often the most standard-compliant compiler. It has frontends for multiple languages: C, C++, Objective-C, Fortran, Ada and Go. The Linux kernel is usually compiled with GCC (other compiler may coause ugs or just fail the compilation altogether).
LLVM is a novel compiler infrastructure that simplifies writing compilers. The project creates its own C/C++ compiler:
clang. LLVM simplifies using the compiler inside other tools (e.g., IDEs) and tries to be compatible with GCC (e.g., usually uses the same command line options).
A compiler should not be confused with an IDE (Integrated Development Environment). This is a program that helps in developing code by integrating many tools used during development (compiler, code editor, debugger, profiler, version control etc.). The team creating an IDE is often not connected to the people behind the compiler, some IDEs allow even to use different compilers.
The most well regarded IDEs for C++ are:
In an IDE one absolutely wants a working integration with the debugger – the tool that allows to trace program execution line by line and watch the values of variables in memory. If you cannot do that in your IDE you either need to learn that or change the IDE.
Other very useful features:
To be a successful programmer one needs to learn:
All of these are important, unfortunately usually tools are not covered in courses, and the last one comes probably with experience only.
C source files usually have the extension
.c while header files have the extension
C++ source files have the extension
.cxx. Headers in the standard library have no extensions while user headers usually also use
gcc compiler looks at the extension when deciding how to compile the file.
Building a C program is carried out in four stages:
In fact there are eight stages (see https://en.cppreference.com/w/c/language/translation_phases), but we will concentrate on these four.
One needs to know what they do to understand error messages and correct the code.
gcc allows to stop the process after each stage and look at the resulting files.
Preprocessor is a simple macro processor that works with text. Its commands are called directives and start with
# (always at the beginning of line). Most often used directives:
#define– defines macros
#ifdef, #ifndef– allows to hide a part of the code if a macro is (not) defined
#include– puts contents of another file (usually a header file) instead of this line (the system keeps multiple headers in
One can stop the build and see the file after preprocessing using the
-E option for
gcc. Even the simplest C/C++ programs have usually thousands of lines after preprocessing. One can skip preprocessing altogether by using files with the extensions
This is the main step – translating the source code from C/C++ into assembly. At this step most optimizations are taking place and many options are present to influence the code that is produced. The most important ones are:
-std– version of the language (
-O– optimization level (from 0 to 3)
-mtune– allow to indicate the target processor for the program
-masm=intel– emits assembly in the Intel syntax, not AT&T
One can stop after compilation using the option
-S. The corresponding file is saved with extension
.s. One can skip compilation by using files with this extension.
Translating the assembly into machine code. There are very little options as the assembler usually just takes assembly instructions and rewrites them in machine code. One can stop after this step using the option
-c. The machine code for the program will be stored in the so-called object file with extension
Libraries are added to linking using the option
-llibname (usually with no space). The linker will search for a file
liblibname.so in the system library directory (usually
Our basic programs often live in just one file, but this will change. C/C++ allow (promote even) separate compilation: each source file should be compiled into its own object file and thses object files should be linked together. This way, if only one source file is changed only it will have to be recompiled and the program relinked. It can drastically decrease the compilation time.
It may be a surprise, but executable files have a specific format they have to obey, otherwise the operating system won’t be able to run them. Under Linux the standard format for executables is called ELF. Under Windows the executable format is called PE32+. There are numerous tools that are able to get data and info from binaries. Many of them are included in GNU Binutils:
nm– a tool that reads symbols (function names) from a binary
objdump– displays information from object files and executables
strip– discards unnecessary data from executables (e.g., debug info)
Programs written in C tend to have multiple files with different dependencies. Say we have four source files
main.c and two header files
c.h. The first header is included in both
b.c, the second in
main.c uses both headers. If any of the headers is changed, the files including it should probably be recompiled. The compiler does not know which files to recompile beforehand. This is a task for a tool that helps building programs – the basic one is
make’s usefulness is by no means limited to building programs. It is very useful every time where some files (target files) are created from other files (source files) by some commands and have to be recreated if the source files change.
The run of
make is governed by a file that is usually named
Makefile. The files contains rules that describe dependencies between files and commands that can create them. A typical rule looks like this:
target : source1 source2 command1 to create target from sources command2 to create target from sources
By tradition commands are preceded by a single good-old TAB character (not 8 spaces etc.). Both sources and recipes are optional, although a rule with neither is not very useful.
make encounters a rule it checks whether a file named
target exists and whether it is not older than sources by looking at the modification times. If
target does not exist or is too old, the command (or commands) are run, possibly recreating the target from sources, but this is not checked in any way.
Makefile is purely declarative – the order of rules does not matter,
make will use the optimal order to recreate the targets. It is also smart enough to recreate intermediate targets.
If there are no commands, then the rule just marks a dependency between files. The file will have to recreated by a different rule (e.g., by a pattern rule).
Pattern rules are very often used. The give a template of a rule.
In a separated compilation we may want to compile each
.c file to a
.o file. If we have multiple source files it would be tedious and error-prone to list all the rules. We can make a pattern rule like this:
%.o : %.c gcc -c $< -o $@
This way if
make encounters a need for an
.o file it will know how to create it form a
If there are no source files in a rule, then the commands in the recipe are run if the
target file does not exist. If
target file exists no command is run.
If a rule has no prerequisities and the recipes do not create the target it is considered a
phony rule. These are often used to write a sequence of commands that can be easily run if needed. Some phony targets names are conventional:
all– build all the targets
clean– delete all the targets leaving just the source files
edit : main.o kbd.o command.o display.o \ insert.o search.o files.o utils.o cc -o edit main.o kbd.o command.o display.o \ insert.o search.o files.o utils.o main.o : main.c defs.h cc -c main.c kbd.o : kbd.c defs.h command.h cc -c kbd.c command.o : command.c defs.h command.h cc -c command.c display.o : display.c defs.h buffer.h cc -c display.c insert.o : insert.c defs.h buffer.h cc -c insert.c search.o : search.c defs.h buffer.h cc -c search.c files.o : files.c defs.h buffer.h command.h cc -c files.c utils.o : utils.c defs.h cc -c utils.c clean : rm edit main.o kbd.o command.o display.o \ insert.o search.o files.o utils.o
We can give the name of the target to (re)build when invoking
make. When no target is indicated on the command line,
make tries to build the first encountered in the
Makefile is not difficult but tedious. Multiple tools were created that create a
Makefile that is later used for building the project.
The venerable GNU Autotools are a collection of programs and macros that create a
Makefile from a skeleton file
Makefile.in. The idea is that as systems differ from one another, the tool can find all the necessary libraries and create a suitable
Makefile and write variables in a header file to be used by the code. The final machine need not have Autotools installed, the user just runs the
./configure script and then runs
make and finally
CMake is a relatively new tool that is often considered better than Autotools. One needs to have it installed on the finale machine and presence of a file
CMakeLists.txt is a sign that a project is built with CMake. CMake prefers out-of-tree builds and is usually run like this: