preload
Jun 14
Programming

Programming

I recently heard about a Car Talk puzzler.  I don’t listen to Car Talk as much as I used to.  Anyway, you can read about the puzzler and the fellow who solved it here.  This is an excerpt.

During Christmas week on the popular National Public Radio show Car Talk, the weekly puzzler required listeners to find the longest English word that remains a valid English word as you remove its letters one at a time, but without rearranging any of the letters. For example: sprite, spit, pit, it, I. There are many such words, but, as Barr discovered, only one with 11 letters.

So, only one word with eleven letters: complecting.  Maybe.  I’m not convinced, but finding out is easy.

Recursion

Okay, so how would you know?  Well, suppose you had a list of valid words.  The statement of the problem suggests a simple recursive algorithm to find a solution.

  • Put the words in a heap, ordered longest to shortest.
  • Iterate until the heap is empty.
    • Take the next word $w$ from the heap, and perform $\mbox{search}(w)$

The procedure $\mbox{search}(w)$ works as follows.  It returns true if it finds a solution.

  • If $w$ is the empty string, then return true.
  • For each letter $l$ of $w$, do the following.
    • Drop $l$ from $w$, generating the candidate word $v$.
    • If $v$ is in the dictionary, then execute $r=\mbox{search}(v)$.
      • If $r$ is true, then write out $w$ and return true.
  • The word $w$ is a dead end; drop it from the dictionary and return false.

Dropping the words from the dictionary, or otherwise marking them as dead ends in some other way is necessary to avoid re-searching them later.  The time complexity is roughly $O(n)$, where $n$ is the size of the dictionary to search.  If the average length of a word is $k$, then we can be more specific, and say the complexity is $O(kn)$, but $k$ is very small with respect to $n$, and does not vary much with the data size, so it can be safely ignored.

Iteration

The problem was explained to me in the reverse fashion.  Given a letter, generate words by adding a single letter at a time.  Every word must be in the dictionary.  Find the longest word you can generate from a single letter.  This suggests a different algorithm.

  • Put the words in a heap, ordered shortest to longest.
  • Mark the empty string as reachable and set $long$ to the empty string.
  • Iterate through the heap, considering each word $w$.
    • For each position $i$ of $w$, do the following.
      • Drop the $i$th letter from $w$, generating the candidate word $v$.
        • If $v$ is marked as reachable, then mark $w$ as reachable by omitting $i$, and set $long=w$.
  • While $long$ is not the empty string, do the following.
    • Write $long$.
    • Omit the character $i$ by which $long$ was reached, and set $long$ equal to the new word.

We can improve this by checking against the length of the longest word reached.  If we ever have $|w|>|longest|+1$, then we can stop the loop immediately, since we cannot reach any further words.  We can’t do that with the prior algorithm.

So… What’s the Longest Word?

The solution given was complecting, at 11 letters.  When I download the aspell dictionaries, and run the algorithm (the iterative one), I find the following derivation.

-> a
-> ah
-> ach
-> bach
-> banch
-> banchi
-> branchi
-> branchia
-> branchiae
-> branchiate
-> abranchiate
-> abranchiates

Ooh.  12 letters.  But, it’s a plural.  Does that count?  Well, complecting is a participle, so it seems it should.  Time to send a note to Tom and Ray?

So… Which Algorithm?

Well, I coded up the iterative one in Java, since originally I was asked how I would code it in Java.  Again, the question suggesting the solution.  I suspect there are a lot of long words that are not reachable, so it seemed better to use the trick to terminate the search when the current word is longer than the longest found (so far) plus one.  Plus, it isn’t recursive, and I don’t have to modify the data structure by deleting an item from the heap.

The Java

So… here’s the Java source.  Enjoy!

/**
 * Given some iterable object that provides strings, find the longest
 * string in the collection that can be reached from the empty string
 * by adding one character at a time such that every intermediate step
 * is also in the collection.
 *
 
 * For example, if the collection is {@code {@literal {}a,
 * at, cat, chat, chart, cart, car{@literal }}},
 * then the longst word is {@code chart}, and it's derivation is:
 * {@code a => at => cat => cart => chart}.  If there are multiple
 * longest words, then only one is detected.
 *
 
 * Note that if single letters are not in the collection, then no words
 * can be derived, and only the empty derivation is returned.
 *
 * @param strings    An iterable providing the strings.
 * @return    The derivation of the longest, from the shortest word to the
 *             longest.  If there is no longest word, then the empty
 *             derivation is returned.
 */
public List search(Iterable strings) {
    if (strings == null) {
        throw new NullPointerException("The string iterator is null.");
    }
 
    // Read each string from the iterable.  The strings assumed to be
    // single words, and the words are not assumed to occur in any
    // particular order.  We put the words we read into a heap, ordered
    // by length.  Our "heap" is a sorted map.
    //
    // Every entry in the map has an associated integer value, which
    // tells us the position of the last inserted character.  This can
    // be used to "unwind" the derivation of the word.  If there is not
    // (yet) any way to reach the word, then the word is mapped to -1.
    TreeMap heap =
        new TreeMap(new Comparator() {
            public int compare(String first, String second) {
                // Okay, we sort first by length, and then alphabetically.
                // This is necessary because of the way TreeMap tests
                // equality; it uses the comparator.  Thus we must only
                // return zero if the two are actually equal.  So, we
                // first order by length, and then for items of the same
                // length, we order using the natural comparator for
                // strings.
                int order = first.length() - second.length();
                if (order == 0) {
                    order = first.compareToIgnoreCase(second);
                }
                return order;
            }
        });
    for (String str : strings) {
        heap.put(str,-1);
    } // Add all strings to the heap.
 
    // Prime the pump by adding the empty string as "reachable."
    String longest = "";
    int length = 0;
    heap.put("", 0);
 
    // At this point the heap is constructed.  We now iterate over the
    // heap looking for connected words.  If we find one, we mark it.
    for (String str : heap.keySet()) {
        // Watch for a case where we skip a length.  If we do, we're done.
        if (str.length() > length+1) {
            break;
        }
        length = str.length();
 
        // Now we have a string, we need to find out if the string is
        // reachable from a previously-reached string.  We drop each
        // character, and then check if the resulting string is
        // present in the heap and is marked.
        for (int index = 0; index < str.length(); index++) {
            // Omit the character at position index.
            String test = str.substring(0,index) + str.substring(index+1);
            // See if the resulting test string is in the heap.
            Integer pos = heap.get(test);
 
            // Now there are three cases:
            // (1) The test string is not in the heap.
            // (2) The test string is in the heap, but not reachable.
            if (pos == null || pos < 0) {
                continue;
            }
            // (3) The test string is in the heap, and is reachable.
            // In this last case we know we can reach the current string
            // from the test string by adding the character at position
            // index, so put this in the heap.
            heap.put(str, index);
            // Note that whenever this happens, we might have just found
            // (one of) the longest words, so save it.
            longest = str;
            // We don't need to test any other positions.
            break;
        } // Construct new words by omitting each character.
    } // Search the heap for words.
 
    // Now we have (hopefully) found a long word.  We need to generate
    // the derivation and return it.
    List derivation = new LinkedList();
    while (longest.length() > 0) {
        // Get the insertion position from the heap.
        int pos = heap.get(longest);
        // Add the word to the derivation.
        derivation.add(0, longest);
        // Generate the prior string in the derivation.
        longest = longest.substring(0, pos) + longest.substring(pos+1);
    } // Build the derivation.
 
    // Done!  Return the derivation.
    return derivation;
}

The code isn’t optimal, but it is commented. I assume you are all capable of putting this in a class and feeding it a dictionary. Have fun!

  • Share/Bookmark

One Response to “Car Talk and the Longest Word”

  1. longest english word….Pneumonoultramicroscopicsilicovolcanoconiosis is the longest word!…. Pneumonoultramicroscopicsilicovolcanocon Definition! – LALATE…. | Latest Information Says:

    [...] Okay, so how would you know?  Well, suppose you had a list of valid words.  The statement of the problemRead more at http://stacyprowell.com/blog/2009/06/14/car-talk-and-the-longest-word/ [...]

Leave a Reply