Pointers in C

in #pointers7 years ago (edited)

Pointers

Pointers are at the heart of C. They are one of the main reasons why many people tend to work at C instead of a higher level programming language. They provide fast and efficient acess to memory and its contents, not being covered by layer after layer. In some higher level programming languages its not even possible to work with pointers like structures and you have to perform some kind of "hack" to have similar functionalities. Some programming languages got an idea they are even too dangerous to be handled. C is one of those languages which allows you to manipulate them freely, since as a programmer, you are in control.

Computers have memory adresses and programs have virtual memory. Virtual memory is an abstraction, and trough this mechanism the program himself and the programmer have the illusion they are working directly and with all the memory available on the computer. When you declare different datatypes in C or any programming language, you are reserving some space in this virtual memory available for the program. Every datatype or even code you write has its own memory adress(the virtual memory adress does not have the same value of the physical memory adress, but both occupy a distinct adress at the time this value is called or declared).

The & operator, as we have seen, allow us to acess the virtual memory adress of its operand(long time ago, in a land far far away, programmers and software they wrote used to work directly in memory, and today altough you can do it, you probably gonna need a little deeper knowledge for it, as the rule is to work within layer over layer of abstractions starting with your own operational system).

The * operator allows you to declare a pointer and also dereference it, acessing then whatever is inside the memory adress you called with this operator, depending on the context you called.
When you apply * at a datatype declaration, you are declaring a pointer, followed by an assignment of the & operator:


int x = 30;
int *y = &x;


Now, variable y will contain the value of the memory adress of x, and when applying * in any context different of the declaration, you will get whats inside that memory adress:


int p = *y;
printf("%d\n", p);


When you assign a variable to another variable being declared, you wont have direct acess to this variable, but only a copy of the value which is this variable you assigned:


int a = 15;
int b = a;
a = 25;
printf("a: %d, b: %d", a, b);


If what you really want is to acess the original variable directly, all you have to do is declare a pointer to this variable, and any changes at it or at the dereferenced pointer will reflect in all instances:


int a = 15;
int *b = &a;
a = 25;
printf("a: %d, b: %d \n", a, *b);
*b = 35;
printf("a: %d, b: %d \n", a, *b);


img1.jpg

You may ask why i assigned the * at b variable at the second printf(). Well, if i did not, then b would be pointing to memory adress 35, which is actually not initialized by the programmer, so it contains a dirty bogus value. ugh! But since b is a pointer to a, we are saying the memory adress of a now contains 35, then we ask to print a, and we can see we can acess whats inside a through pointer b.

You must take care where you point a variable to. It can be confusing to begginers to handle with pointers and crash or get undesirable results at their applications. Take the code below:


int *p;
p = 30;


In the example above, we are declaring a pointer and assigning it at the next line with the value 30. But what exactly is stored at memory value 30? Depending on your luck you could get a crash or not, because you created a pointer which points to an uknown location, and you can not even reach this location directly, only trough indirection, and this location could be anything, including a vital part of your program.

Declared:


int *p;


Initialized:


int *p = 0;


Contain the memory adress of other variables:


int a = 7;
int *p = &a;


Passing pointers

Pointers can be passed as arguments to functions. Variables passed as pointers will be passed by reference, so you have direct acess to the original variable trough its memory adress. You can use that property to your advantage, instead of returning, you can send this value directly to the caller scope by the pointer. This property is also benefitial, since arguments passed by value are copies. If you are passing a copy of small size like an integer, the overhead can be small, but if you are passing big data structures like a big array or struct, the copy can take too much memory of your program everytime you are calling the function. Instead, you can just pass a pointer which which contains only the adress of the memory value of this big data structure, the work from there, like passing a hyperlink instead of a website to a friend.


int f(int *p) {
printf("Recebeu: %d \n", *p);
return 0;
}

int x = 12;
int *p = &x;
f(&x);
f(p);

return;


img2.jpg

Constant pointers:

Pointers can point to different places through your code, but you can also declare a pointer to point only to a specific location. By using the const modifier, your pointer will only point to the initial place it was initialized at the first:


int v = 55;
int *const p = &v;

printf("p: %d \n", *p);


Take note you may declare more than one constant pointer in one line , but the all must be initialized in the same line and must contain the *const declaration. Thats why im not into declaring the * operator by the side of the datatype. This would be a situtation where it can cause some confusion:


int* const p = &v, const q = &x;


it may look like we are declaring two pointers, but only the first is really a pointer, altough possible. My favorite style is this:


int *const p = &v, *const q = &c;


Now pointing to another memory adress like the example below would be an illegal operation:


int v = 55;
int x = 90;
int *const p = &v;
p = &x;

printf("p: %d \n", *p);
return;


But you can still change the value of whats inside this memory area altough your pointer cannot point to anywhere else anymore:


int v = 55;
int x = 90;
int *const p = &v;
printf("p: %d \n", *p);

*p = x;
printf("p: %d \n", *p);

return;


img3.jpg

Pointers to Constants:

We have seen constant pointers right above, which can only point to an specific memory adress, but the value of whats inside this adress can be modified. With the keyword const, its possible to specify this variable cannot be modified by your pointer. Now our code below would be valid:


int v = 55;
int x = 90;
const int *p = &v;
p = &x;

printf("p: %d \n", *p);
return;


img4.jpg

But trying to replace the value of whats inside variable "v" adress through indirection will generate an error:


*p = 50; //error


If you want to modify v now you can only do it directly trough v as long as v is not a const also.

Constant Pointers to Constants:

Constant pointers to constants will combine the restrictions of const pointers and pointers to constants. By declaring a pointer like this, it means it cannot point to anywhere else and cannot modify the value of whats inside the memory adress it points through indirection:


int v = 55;
int x = 90;
const int const *p = &v;


Now both the statemens would be invalid:


*p = 50;
p = &x;


This is a very important technique.Imagine you are passing a pointer to a function, but you dont want any values or adresses to be altered. Since pointers are inexpensive, its better to pass them as arguments instead of a big data structure, since passing arguments by value will create a copy everytime you call the function. Now you can pass by reference and not worry too much about a important value being changed and ripple through all the code by an unwared programmer.

Passing by value and by reference with pointers

We have seen there are 2 ways to pass arguments to functions, by value and by reference. We showed an example of passing by reference with arrays, but you also can do it and in a more efficient way with pointers.


int main()
{

int x = 10;
sum(x);
printf("by value: %d\n", x);
sum_by_reference(&x);
printf("by reference: %d\n", x);

return;

}

sum(int x) {
x = x + x;

}

sum_by_reference(int *x) {
*x += 10;

}


img5.jpg

NULL pointers

Pointers can be initialized to point to the NULL defined macro(#define) or the value 0. This means they dont point to anywhere, so it wont modify a critical part of your program, but be careful to not trying to dereference a NULL pointer. In general trying to dereference it will cause a fault and your program will be terminated. But lets say you are using another type of machine, in some of those, it is possible to acess memory adress 0. The compiler will prevent values from being initialized there, but allow acess to this space would hide symptons of a disease, making your program harder to debug and analyse.

Pointers to pointers

It is very well possible, while talking about indirection, multiple levels of indirection. So we can very well make a pointer point to another pointer, after all they all occupy a memory space adress in a given time of your code execution.


int i = 15;
int *p1 = &i;
int **p2 = &p1;
printf("%d %d %d \n", **p2, *p2, p2);
printf("%d %d \n", *p1, p1);


img6.jpg

Function pointers

Functions are also occupy a memory adress, as everything you write in your code. By that view, it is no strange you can also have a pointer to a function. The best way to start their study is showing its declaration:


int f(int x) {
return ++x;
}
int main()
{

int x = 7;
int(*pf)(int x) = &f;

int r = pf(7);
printf("Calling via pointer: %d \n", r);
return 0;

}


img7.jpg

Here we have a function f with a parameter of type int called x. Right below our function pointer is declared.
It is important to remember you must declare your function before the pointer declaration since now the compiler must check datatype compatibility with the value it points to. If not at least have a function prototype to not let your compiler lost.
Notice we declared our pointer within parenthesis, because os its precedence, if you do not ,the compiler will get the wrong idea you are trying to declare a function returning a pointer:


int *pf(int x);


Declaring a function pointer like you did above is not valid even if you assign the ampersand & operator to it. You have to explicitly especify *pf as pointer with the parenthesis.
You can use pointers to functions in situations where you must pass functions to other functions. Those are called callback functions because they execute their logic and return back to the function which called it. Very common for asyncronous tasks in higher level programming languages. You may now call your function as:


pf(7) ;


And your function f will be called with 7 as argument. You may very well pass it to another function also:


int inc(int x)
{
return ++x;
}

int f(int (*fn)(int a), int b) //receives a pointer to a function with 1 argument to the pointer (int a) and 1 to the >function (int b)
{
return fn(b); //calls the function via pointer fn with b as argument
}



int main()
{
int(*fn)(int x) = inc; //declared pointer to inc function
printf("%d\n", f(fn, 3)); //call function f passing a pointer to function f with 3 as argument
return 0;
}


img8.jpg

In C you may use them inn situations where you need to write more generic code, like in the stdlib function qsort we will see soon when studying this library:


void qsort(void *base, size_t nmemb, size_t size, int ( *compare)(const void *, const void *));


Pointers: a few notes

If pointers are not initialized, pointers can be pointing to any memory area if not done so. So always try to start your pointers at least with a null value or 0;


int *p = 0;
int *p = null;


Compare memory areas if they are the same:


*p1 == *p2;


In languages like java where you cant use pointers, it may be a little difficult to see:


int a = 14;
int b = a;
int c = 14;
b == a;
c == a;


In C, thanks to pointers, you have a great control over what you return, pass, and acess trough your program.

Pointers arithmetic

You are allowed to perform arithmetic on pointers, but they are not like adding or subtracting numbers in an ordinary operation. When you add or subtract a pointer from an integer, what you get is another pointer.


int main() {

int x = 100, y = 150, z = 200;
pointer_arithmetic(100, 150, 200);
int *l = &x - 3;
*l = 17;
printf("main function adress: %d and value: %d\n", l, *l);
printf("y: %d\n", y);
return 0;
}

pointer_arithmetic(int x, int y, int z) {

int *p = &x;

printf("x: %d\n", *p);
p++;
printf("pointer p is now pointing to: %d and contains: %d \n", p, *p);
p--;
printf("pointer p is now pointing to: %d and contains: %d \n", p, *p);
p = p + 2;
printf("pointer p is now pointing to: %d and contains: %d \n", p, *p);

}


img9.jpg

First thing to note is how our variables x, y and z are declared in sequence, and so are the adresses they occupy at stack when passed to the function. Second, look how when you find the value of the adress of a variable you can easily input a new whenever time you want, so imagine what happens if you make a mistake, or even worst, allow an user to write beyond the scope of the variable they are supposed to acess. Back in the days, buffer overflow attacks could do it through basic functions like strcpy, which we will also see soon, but imagine if you asked the user to input the size of the buffer the buffer was supposed to be. Or make a mistake like forgetting to dereference a lost pointer in your huge enterprise network which you would waste all of your life to be 100% familiar with it. If one of the thousands of users find this pointer instead of the value it should represent and do some simple operation, who knows what he can find out.

Now you see why some languages believe pointers can be too dangerous to be handled. I myself believe C has a better approach, allowing you to to be in total control of what you do, but all deppends on the situation and the problem you need to solve. Try to add r subtract different values and see what can go wrong. I got 2 errors, a compile and a runtime failure, and it can happen if you are acessing critical parts of your program, or areas not previous allocated and such, but at least now you are in control. I use that technique in my home small projects in varargs for example. so i dont need to worry about implement va_list, va_start, va_arg, va_end, altough it is the right way to do it in any production software or any code which will be altered by someone else but you. If you are curious, look how easy you can deal with varargs without using the macros:


int main()
{

vararg(6, 5, 4, 12, 6, 10, 7);
return 0;

}

vararg(int narg, ...) {
int *beginstack = &narg;
int *e1 = &narg + 1;
int *e2 = &narg + 2;
int *e3 = &narg + 3;
int *e4 = &narg + 4;
int *e5 = &narg + 5;
int *e6 = &narg + 6;
printf("begin stack: %d\n", *beginstack);
printf("vararg element 1: %d\n", *e1);
printf("vararg element 2: %d\n", *e2);
printf("vararg element 3: %d\n", *e3);
printf("vararg element 4: %d\n", *e4);
printf("vararg element 5: %d\n", *e5);
printf("vararg element 5: %d\n", *e6);

}

img10.jpg

Of course you can iterate with a loop using the first parameter (narg). Kinda hacked but it works, and you dont even need to import the "stdarg.h"anymore

Pointers arithmetic and arrays

We havent seen yet arrays in depth, but since we took a peek in previous threads, lets take another and see briefly how to work with arrays and pointers. With simple arithmetic, its possible to iterate through all arrays elements, since they are declared one after another, and the name of the array points to its first element:


int main()
{

int a[5] = { 13, 5, 64, 28, 100 };
pointer_arithmetic(&a);
return 0;

}

pointer_arithmetic(int *p) {

//first element
printf("p is pointing to %d\n", *p);
p++;
//second element
printf("p is pointing to %d\n", *p);
p++;
//third element
printf("p is pointing to %d\n", *p);
p++;
//fourth element
printf("p is pointing to %d\n", *p);
p++;
//fifth element
printf("p is pointing to %d\n", *p);
p++;;

}


img11.jpg

Note how we iterate just incrementing the pointer. Or maybe, with a simple loop:


int main()
{

int a[5] = { 13, 5, 64, 28, 100 };
int a_size = sizeof(a) / sizeof(a[0]);
pointer_arithmetic(&a, a_size);
return 0;

}
pointer_arithmetic(int *p, int size) {
for (int i = 0; i < size; ++i) {
printf("iteration %d ", i);
printf("p is pointing to %d\n", *p);
p++;
}
}


img12.jpg

Not much to explain above, except the declaration int a_size = sizeof(a) / sizeof(a[0]); This piece of code will take the full size of array a. Since 1 integer has 4 bytes, and our array has 5 elements, its size in bytes will be 20. Then we divide for the size of the first element and the result will be the number of elements of the array. Then we can pass this number to our function and iterate in the for loop.

Remember we said in last thread we could declare in function parameters with pointers or arrays?


change_by_reference(char n[]);
change_by_reference(char *n);
change_by_reference(char n[2]);


Thats because they all start poiting to the same place. You would have no problem to pass the array as argument to the function, or its memory area, or a pointer, and such, and do minor or even no adjustments in your code to manipulate your array then.

Altough an array is not really a pointer, because they cant do everything a pointer can, like pointing to somewhere else, they can be manipulated by them as if they were.
Uff, thats it, i tried to compact a little my adventures with pointers, but there are lots of cool things you can do i did not mention here with them. Hope it helps someone out there.