Googol string (google codejam problem) solution using recursion

Problem link

A simple but slow (around O(n)) approach would be to generate the whole string and then find the nth character. Tough O(n) seems good, but this problem could be solved in much better time (with worst case around O(log(n)) and best being O(1), and the probability (on random input) of being near to the best case is many times greater that being near worst).

If you start generating the string manually, you will notice that patterns are recurring. Let us take two sub-strings, str1 as “001” (first 3 characters of the string), str2 as “011” (last 3 characters of a string of length 7, which is “001 0 011”). Now let’s generate the string in terms of str1 and str2:

str1 (0) str2 (0) str1 (1) str2 (0) str1 (0) str2 (1) str1 (1) str2 and so on…

Now we can easily correctly guess the characters at positions 0, 1, 2, 4, 5, 6, 8, 9, 10 and so on (they either belong to str1 or str2).

Now to notice how strings are reccurring, this whole string can be seen in terms of substrings str1 and str2. Apart from str1 and str2 reccuring directly in the example string, if we only write the new terms written as digits in brackets in the above example) and omit the sub-strings str1 and str2,  another recurring of pattern can be seen. The new string after omitting the said characters looks same as the original string.

001 0 011, that is, str1 (0) str2 and so on…

Now in terms of programming, we can break down the problem. Notice that at any step (or iteration), we know characters at all the positions apart from positions 3, 6, 9 and so on (with base 0). In other terms, we know all the positions where modulo 4 of the character to be found is 0, 1 or 2. If the modulo is 3, we have to further recurse one more level (note that it has 1/4 chance that at any level we have to goto the next level as well, 3/4 being that we will find the result at the same level).

Also note that, getting modulo 0, 1 or 2 means we have the character either from str1 or str2. To differentiate, we will have to find out which sub-string we actually have to refer to. This can be done by noticing that the string is divided into patternse each of 4 character  (as str1 or str2 has a character next to it which is not used in current iteration, making groups of 4 characters each). So we can now find out which sub-string it particularly is, by finding whether modulo 2 of the position of the character to be found divided by 4 is 0 or 1.

Here is the code:

#include <string>
#include <fstream>

long long int a, b;
std::string str1 = "001", str2 = "011";

char res()
{
    if(a < 3)
    {
        if((b % 2) == 0)
            return str1[a];
        return str2[a];
    }
    a = b % 4;
    b = b / 4;
    return res();
}

int main()
{
    std::ifstream in("/home/nishantbhardwaj2002/Downloads/A-large-practice.in");
    std::ofstream out("/home/nishantbhardwaj2002/Downloads/codejam_sol2.txt");

    int t;
    in >> t;

    for(int i = 1; i <= t; i++)
    {
        in >> b;
        b--;
        a = b % 4;
        b = b / 4;
        out << "Case #" << i << ": " << res() << std::endl;
    }

    in.close();
    out.close();

    return 0;
}
Advertisements

Linkers

When we compile a c program, the compiler generates the object file for it. Tough the object files are in machine language, but we still can’t execute them directly. Reason is that, our program generally, consists of more than one object files and includes libraries. And compiler compiles each file separately, generating object file for each of them. Some variables and functions, may be defined in one of the file and used in others. So it is not possible to run these object files as they don’t know much about the variables or the actual implementation of the functions defined in other files. Consider a simple example, where we have defined a variable in a file (file1.c) and we want to use this variable in another file (file2.c).

file1.c

int var = 20;

file2.c

#include <stdio.h>
extern int var;
int global_var = 10;
int main()
{
    int local_var = 5;
    printf("%d, %d and %d\n", local_var, global_var, var);
    return 0;
}

We may choose to use the variables defined in one file in other files. To do this, we define the variable in only one of them. But we can’t use them directly in other files as using them directly will create compile time error. This is because all files are compiled separately and compiler doesn’t know whether we have defined something in other file or not. Here we have to do something to tell the compiler that we have defined this stuff in some other file and it should trust us and not create compile time error for this. We can do so, by using extern keyword. This is called declaration. Later, we will link all these files together.

So, in file2.c, we have defined the global_var but we have declared the var (using extern). It is file1.c that defines var. Now file2.c doesn’t need to define it, it only declares it, which means, we tell the compiler (using extern), that we have defined this in some other file, which we will link together later. The compiler trusts us and successfully compiles the code.

Note that if we don’t define it in any file (say, we forgot to define var in file1.c) but still declare and use it in some files, we will get an error when we will link these files together, which may say “undefined reference to var”. The same will happen if we try to execute file2.c alone (there is no file that defines var):

gcc file2.c
/tmp/ccgbKG3u.o: In function `main’:
file2.c:(.text+0x11): undefined reference to `var’
collect2: error: ld returned 1 exit status

Now, when we compile file2.c, and the compiler generates object file corresponding to it. In the object file of file2.c, symbols corresponding to the variables that have been defined in this same file (e.g. global_var) will get resolved, but not to the other ones (like var), which are assumed to be defined in another file, will remain unresolved. And this is the reason we can’t execute object files, as some of the symbols are still unresolved (i.e. we haven’t created a link between the variables, which are same in all the files). The same happens with functions as well, which we can define in one file and declare and use in another file.

We have linkers to do the job of linking several object files together and they resolve the unresolved symbols and produce an executable, which can run. So, when object files are linked (i.e. their symbols gets resolved) in this linking phase, which happens before the program is loaded into memory, it is called static linking.

To execute this program, we will have to compile and link both the files and then execute the final result. Tough we can do all this separately (compiling, linking etc) but it is more easy to use the following command, which does all this and places the final result in the file named final_program, and finally to run the program, we can run this final_program:

gcc file1.c file2.c -o final_program
./final_program
5, 10 and 20

We also have something called libraries, which are basically collection of objects files. Instead of dealing with many objects separately, we deal with a single library. Libraries are of two type, static and dynamic (shared). Static libraries are linked in static phase along with other object files. The other ones (dynamic/shared libraries) are linked later, and this type of linking is known as dynamic linking.

In dynamic linking, we resolve the unresolved symbols when the program is running. This is done for dynamic (shared) libraries. Such libraries are not linked in the static linking phase. Their symbols will remain unresolved in the executable. But when we finally run the program, their symbols will get resolved. This can be done by calling some function in the program to load and link the library (to resolve the unresolved symbols).

Shared libraries have an advantage that the same library is not included with every program. Libraries can be shared, i.e. if the library already exists in memory, we don’t need to load it separately, hence saving space. Also, if we change anything in shared library (say, a newer version comes), we won’t need to compile the program again. When we will run the program, the new versions of library will be linked. This is not the case with static libraries, as they are already linked and resolved in the executable, which not only increses the size, but if we change or update the library later, we will have to compile and link the files again, creating a new executable.

Generic programming and generic pointers

Wikipedia gives a very straight-forward definition of generic programming. It says, “In the simplest definition, generic programming is a style of programming in which algorithms are written in terms of types to-be-specified-later that are then instantiated when needed for specific types which are provided as parameters”.

Suppose we want to implement stack using a class in c++. But we want to use it for multiple data types, say integers, floats, characters etc. We can definitely make multiple classes, one using int as data type, another using float and so on. But this isn’t the best way to do so.

Generic programming helps us in solving this problem. It simply means, that we don’t have to write code for every data type. We write code for an arbitrary type, like in c++ we use templates to do so.

template<typename T>
class myClass
{
    //myClass, that deals with an arbitrary type T
};

void main() 
{
    // myClass for ints.
    myClass myClassForInts;
    // myClass for floats
    myClass myClassForFloats
}

Here, we don’t write classes for any specific type. We use template-classes. When we create an instance of myClass later, we tell the compiler which type we want to use. Compiler uses that type to create a class from of the template. We may think of it as, replacing the typename with int or float from the template-class and compiling that class. Here the compiler does this instead of us writing codes for multiple classes using different data types. Suppose we want to use it for ints and floats, the compiler will make two classes, one for ints and another for floats. This way we are saved from writing multiple classes, using generic programming.

Also note that we also have generic pointers in c++. Generic pointers are pointers pointing to void and void is not a data type. We just mean that we don’t want to specify the data-type of the object we want to point to. That’s why we can point to multiple data types using the same pointer.

int main()
{
    int iint = 10;
    float ffloat = 1.2;
    void* ptr;
    ptr = &iint;
    cout << * (int *) ptr << endl;
    ptr = &ffloat;
    cout << * (float *) ptr << endl;
}

Note that, here we printed the values of both int and float data-type using the same pointer. This would not have been possible if we had used pointer of int  or float instead of void.

However, using void pointers is generally not suggested and they don’t really provide much generic behavior.

Firstly, whenever we are dereferencing the pointer, we will have to specify the object we are pointing to, just like what we did while printing values in the code. This is because the pointer is pointing to void, and void is not a type, it is absence of a type. So, we have to tell what are we pointing to as the pointer just contains the starting address, to actually get the value of the object, we must know object’s size, and hence its data type. Secondly, we can’t use pointer arithmetic on generic pointers because of the same reason that we don’t know whether ptr++ will now point to memory 4 bits ahead as in case of ints, or 8 bits ahead as in case of floats.