Saturday, March 16, 2013

Interview by Samsung: (Part-1)

Samsung had visited our college for campus on 12th August 2010 [I don't know for sure, but my batchmate Harish Thanvi who's now a Software Engineer in Samsung SEL, says so ;) ] and on 16th I had posted this blog-post. This blog-post is about the first question the tech-interviewer asked me. I had planned to discuss more questions, but then I forgot what all they asked me :P

So here's the original post [which may help the students who will be appearing for campus interviews]:

Many people have been asking about the Interview Questions by Samsung Team. Though the Written Technical Test was pretty easy their Interview Questions were tricky based only on the fundamental knowledge of C/C++ and Programming Skills. I have decided to write a series of blogs on the same.This question though is an easy one, it demands very specific details, and if you are honest with yourself, you'll realize, there's a lot you did not know. And even if you knew, its hard to remember during the interview. Fortunately, I did!

One Question that was asked was:
Write a program to reverse a string with all its duplicates removed. Only the last instance of a character has to appear. Also, the following conditions are to be satisfied: (Assume only Capital Letters).
  1. Minimum Time Complexity
  2. Minimum Space Complexity
  3. Minimum Lines of Code
For example: If the input string is “AVISHAL” the output should be “LAHSIV”. When the input is "JTVAKAVISHAAAL” the output should be “LAHSIVKTJ”.

Writing this program shouldn’t be a tough task. However since all the conditions are to be satisfied, we need a line of approach.

Approach:

Calculation for Minimum Time Complexity: Its beyond doubt that the string has to be processed, i.e. , it has to be traversed. Thus the minimum time complexity should be O(n) where ‘n’ is the number of characters in the string. Also, there has to be comparisons. It would be great if we can have minimum comparisons. But to maintain O(n) complexity, the maximum comparisons should be k*n where k is a constant. If we can have an algorithm with k=1, we can be sure that the algorithm has the best time complexity.

Calculation for Minimum Space Complexity: On similar lines we can deduce that since we require only n characters to store the String, Space Complexity should also be O(n). Also, the string used to store the reverse of the original String cannot be more than 26 letters. We will also need a set of index-value pairs for the process of traversal and comparisons. Since using array(index-value pairs) takes additional time every time to evaluate address of ith term each iteration(arr[i]), we will stick with pointers.


Minimum Lines of Code: We can think about it later after developing the algorithm and implementing in C/C++ Program. Though we can keep some rules handy:
  1. Use of sentinels in loops.
  2. Proper use of post and pre-increament operators
  3. Proper use of Event Controlled and Count Controlled Loops.
There's a catch.. which I is discussed in the Post-analysis.

    Developing the Algorithm: Now, since we have finished our preliminary analysis, we can think of the algorithm. Its clear that only the last instances of the characters have to be used, so we can place a pointer at the end of the string and start traversing towards the end. Also we want only 1 comparison for each character. In order to develop the logic we need to take advantage of the fact that the Output string will contain no more than 26 letters. Also, only 26 letters are involved. Hence we can use flags for the characters which have already being encountered. We can set the count for each character. If the character has already occurred we need not process it. Thus we require 26 flags! We can instead use a mapping table or an index table which is nothing but the familiar array. Array’s indices will correspond to the alphabets. For example if arr[0] is 0 it means A has not occured. If it is 1, A has already occurred and hence we continue with the next character.

    Hence we require an array of 26 integers. But it is going to consume 4*8*26 bits of memory(assuming it to be a 4 bit compiler). This is greater than the size of Output String which is not more than 8*26 bits. Hence we need to see if we can further reduce the array size. We see that the we need not track the count of occurrence of the character. We need only two values 0 & 1. Hence a boolean array of 26 values (only 26 bits!) will suffice.



    Writing the Program: With the basic knowledge of pointers the program can be written as simply as:
    #include 
    using namespace std; 
    int main( )
    {
        char *str="JTVAKAVISHAAAL"; //Program can be modified to get Console Input
        char *s1=str;
        bool arr[26] = {0};
        while( *++s1!='\0' ); 
        while( s1--!=(str-1) )
        {
         if( arr[*s1-'A']==0 )
         {
          arr[*s1-'A']=1;
          printf("%c",*s1);//Program can be modified to store the Output string instead of priting at Console.
         }
        }
         getchar( );
    return 0;   
    }


    POST PROGRAM ANALYSIS: 
    At this point the program seems to satisfy the Time/Space Criteria. Doesn’t it? However, there is still a catch. The String can be of arbitrary length, but there are only 26 letters. Hence it may happen that the String can be of 1000 characters however all the 26 letters occur atleast once in the initial 500 or 250 characters! We still might be traversing the whole string. Hence we need to use a “SENTINEL” to avoid further traversal and unnecessary comparisons. To do this we can use a variable ‘count’ to keep track of the number of non-repeating alphabets encountered during the traversal. Also we should can use short int to save memory! 

    But we also need to utilize the facility provided by ‘&&’ operator. Suppose we have A && B. Where A and B are two conditions. A is evaluated first. If the condition is false, there is no need to evaluate B. B is evaluated only when A is TRUE. Here we are assuming that count becomes 26 before the end of the string is reached. Hence, we test for count first and then for end of string. However, if the string is of short length, this wont be much of a use. However, based on the assumptions the modified program will look as follows:
      

    #include 
    using namespace std; 
    int main( )
    {
        char *str="JTVAKAVISHAAAL"; //Program can be modified to get Console Input
        char *s1=str; int count=0;
        bool arr[26] = {0};
        while( *++s1!='\0' ); 
        while( count<26 amp="" s1--="" span="" str-1="">
        {
         if( arr[*s1-'A']==0 )
         {
          arr[*s1-'A']=1; count++;
          printf("%c",*s1);//Program can be modified to store the Output string instead of priting at Console.
         }
        }
         getchar( );
    return 0;   
    }


    Also, you see I have used a code for strlen() function, instead of including a header file like #include. This is because the pre-processor usually copy-pastes the whole lines of code in the header file into the current file where the funtion/program is used. Hence using our own strlen() assures that we have lesser number of lines of code.

    As far as the criteria of minimum lines of codes is concerned, we can assume that this program meets the criteria. Writing the statements will reduce the number of lines, but not “Statements” and will also affect the readability of the code. Also it was given that the input string has only capital letters. Other characters and White Spaces have not been considered. The program can be modified accordingly if required. 
    It is also to be noted that the beauty of this program lies in its simplicity. No use of Stack or any Data Structure is required as suggest by some of my friends. A few also wanted to keep comparing the characters will result in a nested loop making the complexity to be O(n^2). With some practise people can actually develop the algorithm with a similar thought process in minutes. With practise the steps become inherent part of a programmer’s logic.

    Other O(n), O(n*logn) and O(n^2) algorithms are also possible. After I had presented this algorithm, the interviewer asked me to mention some other non-efficient algorithms for the same problem. Try a few on your own. Also, if you come up with a better algorithm in terms of Time/Space Complexity, do let me know.


    samsung-logo

    Later I got offer from-

    1. DRDO - directly by the CEO and MD of Brahmos Missile Program.
    2. Microsoft's Evangelism Division
    3. 6th months intern in Microsoft Research


    But I joined Samsung India Electronics - Software Engineering Lab on 15th June 2011 and the rest is history :)

    2 comments:

    1. sir i m also come up with a solution where we do not need any extra boolean array

      #include
      #include
      int main()
      {
      char a[100];
      scanf("%s",a);
      int l=strlen(a);
      int bit,i,count=0;
      int bitmask=0;
      for( i=l-1;i>=0;i--)
      {
      bit=a[i]-'a';
      bit=1<<bit;
      if(bit&bitmask)
      continue;
      printf("%c",a[i]);
      bitmask|=bit;
      count++;
      if(count==26)
      break;


      }
      }

      ReplyDelete
      Replies
      1. I am glad to see that you have come up with a good and more importantly a solution of your own.

        However:
        1. an integer will take 32 bits (4 bytes) but boolean array of 26 bit will take only 26 bits.

        2. Using strlen() will increase runtime time complexity, compilation time, space complexity and power complexity.

        3. The variable 'Count' needs to count only till 26, so yoi can use char or atleast short int.

        4. You are using an extra variable 'l' to calculate the length of the string which is not required. You just need to go to the end of the string. You are wasting 32 bits of memory.

        Delete