How to Solve a Problem 3: Strings - Making Anagrams

 Let us take on another problem. This time, let us do HackerRank's "Making Anagrams" problem. 

This is the important part...

Given two strings,  and , that may or may not be of the same length, determine the minimum number of character deletions required to make  and  anagrams. Any characters can be deleted from either of the strings.

For example, if  and , we can delete  from string  and  from string  so that both remaining strings are  and  which are anagrams.

Also note the output:

Print a single integer denoting the number of characters you must delete to make the two strings anagrams of each other.

There are a couple different ways to do this, but let's think about the problem given. Often, the data itself is a clue. 

1) We are only solving for anagrams, so the position order does not matter.  

2) There are only lower case characters, so at most, we're looking at 26 possibilities, but some of them may be duplicates. 

3) They are only looking for a number, so we don't need to derive the letters, so the value can be calculated mathematically after some computations. 

We can try a couple different approaches, and see which one pans out. In the future, you have seen something like this and can skip the dead-end approaches. 

Keep in mind, the first things you try are often wrong. 

Idea 1: Looking at the data, abc vs cde... can we just test the substring of one into the other? 

(Maybe) Sort the strings, then search for one in the other. If that doesn't work, take successively smaller substrings and try them one by one. 

Initially, this seems to make sense, but this is their overly simplistic data fooling you. Think about it... Anagram means the characters are NOT in order, so sorting it makes no sense. And trying to find one substring in the other doesn't actually help as you can't guarantee there aren't extra characters between them. 

In other words, this would work for abc and cde, but not for showman and woman. 

Woman sorted would be amnow, and showman sorted would be ahmnosw. See how this would never work right? The answer is obviously 2, but you can't get to 2 with this approach.  

We need another approach. 

Idea 2: How about... we just check each character in one string against each character in the other string, and look for the common ones? 

Excellent idea, given there are only 26 characters. Bucket sort each string, then loop through to count the number of commonalities. 

Technically, we are using counting sort. We are using array size of 26 as we just need to worry about lowercase letters. Load each array with the string, then loop through both arrays together and look for positive values in both, and somehow, figure out what to add to the "common" count, and from there, derive the number of letters to be dropped. 

Let's try this idea on the two cases we have so far. 

[1,1,1,0,0,0...] // abc

[0,0,1,1,1,0...] // cde

So you can see they only lap on "c", length of 1. So length of abc (3) plus length of cde (3) minus 2*common (1)=4 which is correct. 

If you test this on showman, and woman, you'll find that they overlap on 5 different letters, so common = 5. Letters dropped are 7+5-(10)=2 which is correct. 

In other words, if we can figure out how many letters are common (including duplicates!) we can figure out how many letters were dropped. 

So how do we figure out the letters in common? We did say earlier...

...look for positive values in both, and somehow, figure out what to add to the "common" count

Let's see, what are the possibilities when both numbers are positive? 

a>b

a=b

a<b

Each representing number of that letter in the string. 

So we'd want the SMALLER of the value, as that's the actual "overlap", if you will. 

So if a>b, then we'd want b, else we'd want a. And if they are equal, then it doesn't matter which one we take! 

So the program logic would look something like this:

if (tArr[i3]>0 && bArr[i3]>0) {
    if (tArr[i3]==bArr[i3]) {
        tcommon+=tArr[i3];
    } else {
        if ((tArr[i3]-bArr[i3])>0) {
            tcommon+=bArr[i3]
        } else {
            tcommon+=tArr[i3]
        }
    }
 }

This is without the loop. But the loop's easy, right? 

AND you need to calculate the number of characters dropped. 



Comments

Popular posts from this blog

Yet another take on recursion

Problem Solving for Programmers: a Neglected Topic?

How to Solve a Problem, with Examples