Design Search Autocomplete System - Leetcode Solution

Companies:

LeetCode: Design Search Autocomplete System Leetcode Solution

Difficulty: Hard

Problem statement:

Design a search autocomplete system for a search engine. Users may input a sentence (at least one word and end with a special character '#'). For each character they type except '#', you need to return the top 3 historical hot sentences that have prefix the same as the part of the sentence already typed. Here are the specific rules:

The hot degree for a sentence is defined as the number of times a user typed the exactly same sentence before.
The returned top 3 hot sentences should be sorted by hot degree (The first is the hottest). If several sentences have the same hot degree, use ASCII-code order (smaller one appears first).
If less than 3 hot sentences exist, then just return as many as you can.
When the input is a special character, it means the sentence ends, and in this case, you need to return an empty list.

Implement the AutocompleteSystem class:

AutocompleteSystem(string[] sentences, int[] times) initializes the object with the sentences and times arrays. The mapping between a sentence and its times is the same as the mapping in a user-based system.
List<string> Input(char c) this indicates that the user typed the character c.

Constraints:

n == sentences.length
n == times.length
1 <= n <= 100
1 <= sentences[i].length <= 100
1 <= times[i] <= 50
c is a lowercase English letter, a hash '#', or space ' '.
Each tested sentence will be a sequence of characters c that end with the character '#'.
Each tested sentence will have a length in the range [1, 200].
The words in each input sentence are separated by single spaces.
At most 5000 calls will be made to input.

Solution:

The problem statement states that we need to return the top 3 historical hot sentences that have the same prefix as the sentence typed so far. Therefore, we can use a trie data structure to store the sentences and their frequency. A trie node will represent a character in the sentence and hold a frequency count of how many times the sentence was typed.

We will also maintain a string buffer that will hold the characters typed so far. As the user types characters, we will traverse the trie starting from the root node and follow the path for the characters typed so far. Once we reach the last node of the path, we will perform a DFS to find the top 3 hot sentences that have the same prefix as the sentence typed so far.

To perform a DFS for the trie, we will maintain a priority queue that will hold the top 3 hot sentences found so far. We will also maintain a stack that will hold the nodes to be visited. We will start with the last node of the path and push it onto the stack. We will also maintain a string that will hold the sentence built so far.

For each node popped from the stack, we will append the character represented by the node to the sentence string and check if the node represents the end of a sentence. If it does, we will check if the sentence has a higher frequency than the top 3 hot sentences found so far. If it does, we will add the sentence to the priority queue and remove the lowest hot sentence from the queue. We will continue the DFS for all the child nodes of the current node.

At the end of the DFS, we will return the top 3 hot sentences found so far.

Pseudo code:

class AutocompleteSystem { private TrieNode root; private StringBuilder sb;

public AutocompleteSystem(String[] sentences, int[] times) {
    root = new TrieNode();
    sb = new StringBuilder();
    for (int i = 0; i < sentences.length; i++) {
        insert(sentences[i], times[i]);
    }
}

private void insert(String sentence, int time) {
    TrieNode node = root;
    for (char c : sentence.toCharArray()) {
        int index = c == ' ' ? 26 : c - 'a';
        if (node.children[index] == null) {
            node.children[index] = new TrieNode();
        }
        node = node.children[index];
    }
    node.frequency += time;
}

public List<String> input(char c) {
    if (c == '#') {
        String sentence = sb.toString();
        insert(sentence, 1);
        sb.setLength(0);
        return new ArrayList<>();
    } else {
        sb.append(c);
        TrieNode node = root;
        for (char ch : sb.toString().toCharArray()) {
            int index = ch == ' ' ? 26 : ch - 'a';
            if (node.children[index] == null) return new ArrayList<>();
            node = node.children[index];
        }
        List<HotSentence> hotSentences = new ArrayList<>();
        dfs(node, sb.toString(), hotSentences);
        Collections.sort(hotSentences, (h1, h2) -> {
            if (h1.frequency != h2.frequency) {
                return h2.frequency - h1.frequency;
            } else {
                return h1.sentence.compareTo(h2.sentence);
            }
        });
        List<String> result = new ArrayList<>();
        for (int i = 0; i < Math.min(3, hotSentences.size()); i++) {
            result.add(hotSentences.get(i).sentence);
        }
        return result;
    }
}

private void dfs(TrieNode node, String sentence, List<HotSentence> hotSentences) {
    if (node.frequency > 0) {
        hotSentences.add(new HotSentence(sentence, node.frequency));
        if (hotSentences.size() > 3) {
            Collections.sort(hotSentences, (h1, h2) -> {
                if (h1.frequency != h2.frequency) {
                    return h2.frequency - h1.frequency;
                } else {
                    return h1.sentence.compareTo(h2.sentence);
                }
            });
            hotSentences.remove(3);
        }
    }
    for (int i = 0; i < 27; i++) {
        if (node.children[i] != null) {
            char ch = (char) (i == 26 ? ' ' : 'a' + i);
            dfs(node.children[i], sentence + ch, hotSentences);
        }
    }
}

}

class TrieNode { TrieNode[] children; int frequency;

public TrieNode() {
    children = new TrieNode[27];
}

}

class HotSentence { String sentence; int frequency;

public HotSentence(String sentence, int frequency) {
    this.sentence = sentence;
    this.frequency = frequency;
}

}

Time Complexity:

The time complexity of the constructor is O(n^2) where n is the number of sentences. This is because we need to insert each sentence in the trie and the insertion operation takes O(n) time in the worst case where n is the length of the sentence.
The time complexity of the input method is O(klogk) where k is the number of hot sentences found so far. The DFS operation takes O(27^m) time in the worst case where m is the length of the sentence typed so far. However, since we are only returning the top 3 hot sentences, we only need to visit a few nodes. Therefore, the actual time complexity is much less than O(27^m). Sorting the hot sentences takes O(klogk) time. Therefore, the total time complexity of the input method is O(klogk) where k is the number of hot sentences found so far.

Space Complexity:

The space complexity of the solution is O(nm) where n is the number of sentences and m is the length of the longest sentence. This is because we need to store each sentence in the trie and the storage required for each sentence is proportional to its length. We also need to store the hot sentences found so far which takes O(km) space where k is the number of hot sentences found so far.

Similar Problems

Design Search Autocomplete System - Leetcode Solution

Companies:

Design Search Autocomplete System Solution Code