Solution For Utf 8 Validation
The problem of Utf 8 validation on LeetCode is to determine whether an array of integers representing a sequence of bytes is a valid UTF-8 encoding.
UTF-8 is a variable-length character encoding used for electronic communication. It uses one to four bytes to represent each character in the Unicode standard. The first byte of each character determines the number of bytes needed to represent it. The following bytes are marked by a prefix of “10”.
In this problem, we are given an array of integers representing a sequence of bytes. We need to determine if the sequence is a valid UTF-8 encoding.
To solve this problem, we can use a loop to process each byte in the array. For each byte, we check the number of leading ones in the binary representation. If there are no leading ones, then it is a one-byte character and we move to the next byte. If there are two leading ones, then it is a two-byte character. We need to check that the following byte has a “10” prefix. If it does not, then the sequence is invalid. We continue this process for three and four byte characters.
We also need to check that the sequence ends with a valid character. If it ends with a prefix “10” or “110”, then it is invalid. If it ends with a three-byte character we need to ensure that there are two following bytes with the “10” prefix. If it ends with a four-byte character we need to ensure that there are three following bytes with the “10” prefix.
In code, the solution to this problem would look like this:
“`
public boolean validUtf8(int[] data) {
int i = 0;
while (i < data.length) {
int numOnes = countOnes(data[i]);
if (numOnes == 0) {
i++;
} else if (numOnes == 2) {
if (i+1>=data.length || countOnes(data[i+1])!=1)
return false;
i += 2;
} else if (numOnes == 3) {
if (i+2>=data.length || countOnes(data[i+1])!=1 || countOnes(data[i+2])!=1)
return false;
i += 3;
} else if (numOnes == 4) {
if (i+3>=data.length || countOnes(data[i+1])!=1 || countOnes(data[i+2])!=1 || countOnes(data[i+3])!=1)
return false;
i += 4;
} else {
return false;
}
}
return true;
}
private int countOnes(int num) {
if ((num & 128) == 0) {
return 0;
} else if ((num & 64) == 0) {
return 1;
} else if ((num & 32) == 0) {
return 2;
} else if ((num & 16) == 0) {
return 3;
} else {
return 4;
}
}
“`
The countOnes method is used to determine the number of leading ones in the binary representation of the byte. The validUtf8 method processes each byte using this helper method and checks for the “10” prefix for multi-byte characters. If the sequence is valid, the method returns true. Otherwise, it returns false.
In the worst case, the solution has a time complexity of O(n) where n is the length of the input array. This is because we need to examine every byte in the array at least once. The space complexity is O(1) because we only need to store a few variables to keep track of the data processing.
Step by Step Implementation For Utf 8 Validation
public class Solution { public boolean validUtf8(int[] data) { // Check if the data is valid UTF-8 if (data == null || data.length == 0) { return false; } // Number of bytes in the current UTF-8 character int numBytes = 0; // For each integer in the data array, for (int i = 0; i < data.length; i++) { // Get the binary representation of the integer String binRep = Integer.toBinaryString(data[i]); // If this is the start of a new UTF-8 character, if (numBytes == 0) { // Check that the first bit is 0 if (binRep.charAt(0) == '0') { continue; } // Determine how many bytes the UTF-8 character consists of int j = 1; while (j < binRep.length() && binRep.charAt(j) == '1') { j++; } numBytes = j; // Check that the UTF-8 character is valid if (numBytes == 1 || numBytes > 4) { return false; } } else { // If this is not the start of a new UTF-8 character, // Check that the first bit is 1 if (binRep.charAt(0) != '1') { return false; } } // Decrement the number of bytes left in the current UTF-8 character numBytes--; } // Check that we have processed all bytes in the data array return numBytes == 0; } }
def validUtf8(data): # Number of bytes in the current UTF-8 character n_bytes = 0 # Mask to check if the most significant bit (8th bit from the left) is set or not mask1 = 1 << 7 # Mask to check if the second most significant bit is set or not mask2 = 1 << 6 for num in data: # Get the number of set most significant bits in the byte if # this is the starting byte of an UTF-8 character. mask = 1 << 7 if n_bytes == 0: while mask & num: n_bytes += 1 mask >>= 1 # 1 byte characters if n_bytes == 0: continue # Invalid scenarios according to the rules of the problem. if n_bytes == 1 or n_bytes > 4: return False else: # If this byte is a part of an existing UTF-8 character, then we # simply have to look at the second most significant bit and we # make use of the masks we defined before to do so. If it is # valid, then simply reduce the number of bytes to process by 1. if not (num & mask1 and not (num & mask2)): return False n_bytes -= 1 # This is for the case where we might not have the complete data for # a particular UTF-8 character. return n_bytes == 0
/** * @param {number[]} data * @return {boolean} */ var validUtf8 = function(data) { // your code here };
class Solution { public: bool validUtf8(vector& data) { int count = 0; for (int i = 0; i < data.size(); i++) { if (count == 0) { if ((data[i] >> 5) == 0b110) count = 1; else if ((data[i] >> 4) == 0b1110) count = 2; else if ((data[i] >> 3) == 0b11110) count = 3; else if ((data[i] >> 7)) return false; } else { if ((data[i] >> 6) != 0b10) return false; count--; } } return count == 0; } };
using System; using System.Collections.Generic; public class GFG { // function to validate UTF8 encoding public static bool validateUTF8(int data) { int no_of_bytes = 0; int mask1 = 1 << 7; int mask2 = 1 << 6; // Number of bits to be checked while ((mask1 & data) != 0) { no_of_bytes++; mask1 >>= 1; } // Check that data did not have the form: // 110xxxxx 10xxxxxx if (no_of_bytes == 1 || no_of_bytes > 4) return false; // Handle cases where no_of_bytes > 1 for (int i = no_of_bytes - 1; i > 0; i--) { if ((mask1 & data) != 0 || (mask2 & data) == 0) return false; data >>= 8; } return true; } // Driver code public static void Main() { int data = 0b11100010; bool res = validateUTF8(data); if(res) Console.WriteLine("Yes"); else Console.WriteLine("No"); } }