Utf 8 Validation - Leetcode Solution

Companies:

Unknown

LeetCode: Utf 8 Validation Leetcode Solution

Difficulty: Medium

The problem of Utf 8 validation on LeetCode is to determine whether an array of integers representing a sequence of bytes is a valid UTF-8 encoding.

UTF-8 is a variable-length character encoding used for electronic communication. It uses one to four bytes to represent each character in the Unicode standard. The first byte of each character determines the number of bytes needed to represent it. The following bytes are marked by a prefix of "10".

In this problem, we are given an array of integers representing a sequence of bytes. We need to determine if the sequence is a valid UTF-8 encoding.

To solve this problem, we can use a loop to process each byte in the array. For each byte, we check the number of leading ones in the binary representation. If there are no leading ones, then it is a one-byte character and we move to the next byte. If there are two leading ones, then it is a two-byte character. We need to check that the following byte has a "10" prefix. If it does not, then the sequence is invalid. We continue this process for three and four byte characters.

We also need to check that the sequence ends with a valid character. If it ends with a prefix "10" or "110", then it is invalid. If it ends with a three-byte character we need to ensure that there are two following bytes with the "10" prefix. If it ends with a four-byte character we need to ensure that there are three following bytes with the "10" prefix.

In code, the solution to this problem would look like this:

public boolean validUtf8(int[] data) {
    int i = 0;
    while (i < data.length) {
        int numOnes = countOnes(data[i]);
        if (numOnes == 0) {
            i++;
        } else if (numOnes == 2) {
            if (i+1>=data.length || countOnes(data[i+1])!=1) 
                return false;
            i += 2;
        } else if (numOnes == 3) {
            if (i+2>=data.length || countOnes(data[i+1])!=1 || countOnes(data[i+2])!=1) 
                return false;
            i += 3;
        } else if (numOnes == 4) {
            if (i+3>=data.length || countOnes(data[i+1])!=1 || countOnes(data[i+2])!=1 || countOnes(data[i+3])!=1) 
                return false;
            i += 4;
        } else {
            return false;
        }
    }
    return true;
}

private int countOnes(int num) {
    if ((num & 128) == 0) {
        return 0;
    } else if ((num & 64) == 0) {
        return 1;
    } else if ((num & 32) == 0) {
        return 2;
    } else if ((num & 16) == 0) {
        return 3;
    } else {
        return 4;
    }
}

The countOnes method is used to determine the number of leading ones in the binary representation of the byte. The validUtf8 method processes each byte using this helper method and checks for the "10" prefix for multi-byte characters. If the sequence is valid, the method returns true. Otherwise, it returns false.

In the worst case, the solution has a time complexity of O(n) where n is the length of the input array. This is because we need to examine every byte in the array at least once. The space complexity is O(1) because we only need to store a few variables to keep track of the data processing.

Similar Problems

Utf 8 Validation - Leetcode Solution

Companies:

Utf 8 Validation Solution Code