back to the basics – pointers & references – C/C++/C#/Java

While discussing about .NET recently with a small group recently out of college, it suddenly occurred to me that they were missing something fundamental, so I went back to the basics. What I say here may would hopefully be insanely obvious… except maybe the links at the end of the article.

Still considering that I’ve seen enough number of people, sometimes even experienced programmers, missing the point, I’m going to put it down anyway once and for all. [Let me know if any links on the web that explain these things in a simple way.]

To start with, a question in Java…

void f(String p)
{
p = “def”;
}

public static void main(String[] args)
{
String s = “abc”;
f(s);
System.out.println(s);
}

Some of them were pretty sure it would be “abc” that gets printed, and others were wondering if it was some kind of trick question 🙂 but they got it right. I had wanted to know what has to be done to actually get “def” printed.

But slightly jarred, I first went straightaway to C…

void f(char* p)
{
p = “def”; // not strcpy
}

void main()
{
char *s = “abc”;
f(s);
printf(s);
}

After a while they said say that “abc” gets printed, but not without some confusion. Of course, what this will do is its going to print “abc”, so to actually get it changed, we’d need to pass the reference ‘s’ by reference…

void f(char** p)
{
*p = “def”; // not strcpy
}

void main()
{
char *s = “abc”;
f(&s);
printf(s);
}

Then to C++, lets say there’s a string class in C++, based on STL or something

void f(string p)
{
p = “def”;
}

void main()
{
string s = “abc”;
f(s);
print(s);
}

This would still print “abc”, and to make it print “def” we’d need to modify it…

void f(string& p)
{
p = “def”;
}

void main()
{
string s = “abc”;
f(s);
print(s);
}


Ok so far so good, now coming back to the original question – in Java, what if we *wanted* the value to change within the function?

One of them came up with the idea of wrapping it in a class, and then passing the class, so that the value gets changed inside. The equivalent of such a class does exist in java the StringBuilder class.

void f(StringBuilder[] s)
{
s.replace(0, 2, “def”); //hardcoded start index, end index, string
}

void main()
{
StringBuilder p = new StringBuilder(“abc”);
f(p);
System.out.print(p.toString()); // This prints “def”
}

Another relatively little known way, perhaps this IMO is almost like a hack but just for the record is to use an array of dimension 1.

void f(string[] s)
{
s[0] = “def”;
}

void main()
{
string p = “abc”;
f (new String[] = {p} );
System.out.print(p); // This prints “def”
}


Ok coming to the main point of this post… consider the following in Java:

class A
{
public string s;
}

A a1, a2;

The above code in Java literally translates to the following code in C++

class A
{
public:
string s;
}

A *a1, *a2;

[Note, no reverse Java equivalent exists for the C++ declaration: A a1]

In both, we need to allocate memory as follows.

a1 = new A(); a2 = new A();

Therefore in Java

a1.s = “a1”;
a2.s = “a2”;

would in C++ really mean…

a1->s = “string a1”;
a2->s = “string a2”;

Now the most important thing is that in either case:

a1 = a2;

would mean that a2’s original value is completely lost. In C++ this would be a memory leak, and in Java this would be considered as a deferenced object which would eventually be garbage collected.

Therefore if we then do

a1.s = “some other string”;

and if we happened to print a2.s, most of them seemed to have absolutely no doubt that printing a2.s would still have “string a2”. In fact it seemed as if I was being hopelessly misled when I insisted that it was actually “some other string” that would get printed. I encouraged them to doubt my words and actually try it out in a program for themselves. [Still, to explain I drew 2 rectangles for two blocks of memory, and then two arrows representing a1 and a2 pointing to each of them, and show how the assignment means that both arrows are now pointing to the new block of memory.]

One thing we know about Java is that everything is by reference. So when something is passed as a parameter to a function (ok, “method” to be precise), its by reference, and so if the function changes the value it has to change. But another thing to note is that the *reference is passed by value*. So if anything has to change, it would only work if the function changes whatever the reference is referring to, and not by changing the reference itself.

Unless of course the reference is passed by reference – the C/C++ equivalent of which is **

And now coming to C#… the declaration

class A
{
public string s;
}

A a1, a2;

means exactly the same as it does in Java.

Here too we allocate and initialize memory…

a1 = new A(); a2 = new A();

Here C# has taken these ideas further with some extra keywords like out and ref which basically makes these things a lot more explicit in the syntax itself. But I guess if the above is clear than these things would only seem to be a natural thing.

Types and parameter passing primer is an article that goes further specific to .NET and C#. Check out the links in the beginning of the article.

Even if you don’t plan to ever code in C++, this is a very useful article for academic reference, it gives an idea of how references in Java/C# or even Visual Basic work: Smart Pointers – What, Why, Which? I would recommend any beginner programmer to try out atleast a very elementary implementation of a smart pointer class in C++ herself, maybe for just a specific class instead of making it a template. And then depending on time and implementation check out the STL implementation as well.

One more thing about different parameter types here is memory allocation and deallocation. This of course is only in the good old days of C++ where there was specific new and delete to be done. In the good new days of Java/C# one need not worry (too much) about these things… except if one runs into null pointer exceptions 😉 but I believe atleast an idea of this is still relevant.

Programming languages are a bit like religions, and these basic engineering principles are like the underlying spirit common to any of them 😉 So I recall here a pretty neat table long ago from my COM programming days…

There are 3 parameter types: in (input), out (output) and in/out (reference). Depending on the parameter type, the responsibility for allocating and freeing the memory resource lies with the caller of a method or within the method thats called (referred to here as ‘callee’).

allocated by freed by
————— ————
in | caller caller
out | callee caller
ref | caller/callee caller/callee

With more and more advanced abstraction (like references, garbage collection, etc) evolving in programming languages, the developer can focus more on what to do rather than how to do it. This works fine most of the time, however as Joel says in his excellent article, [abstractions are leaky](http://www.joelonsoftware.com/articles/LeakyAbstractions.html), and when something goes wrong, then having atleast a rough idea of whats going on under the hood helps to identify the problem.

For example check out [memory leaks in Java](http://www-106.ibm.com/developerworks/java/library/j-leaks/) and I dont see how it would’ve been any different in C#.

The problem with this font manager class was that while the code put the font vector into the hashtable when the form was created, no provision was ever made to remove the vector when the form was deleted. Therefore, this static hashtable, which essentially existed for the life of the application itself, was never removing the keys that referenced each form. Consequently, the form and all of its associated classes were left dangling in memory.

But this is a pretty debatable topic as the article [Respecting Abstraction](http://www.codinghorror.com/blog/archives/000277.html) discusses – summarizes in the statement “It’s amazing how far [down the rabbit hole](http://www-2.cs.cmu.edu/People/rgs/alice-I.html) you can go following the many abstractions that we routinely rely on today.” 🙂

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *