This is a program that can get the correct original text from garbled text, based on the principle that wrong encoding often leads to bit supplementation, so the number of bytes used by the correct text should be the smallest (one).
The code copy is as follows:
package com.hongyuan.test;
import java.io.UnsupportedEncodingException;
/*
* This is a program that can get the correct original text from garbled text, based on the principle that incorrect encoding often leads to bit supplementation,
* Therefore the correct text should use the minimum number of bytes (one).
*
* If you cannot get the correct text when testing this program, the possible reasons are as follows:
* 1. This program can only get the original text from the error-encoded text once, and cannot recover the text from multiple error-encoded text.
* 2. Sometimes wrong encoding causes some characters to become invisible characters, and you may not copy all the garbled text, resulting in missing bits. In this case, the text cannot be restored.
* 3. The original text is a relatively large character set. The wrong encoding uses a small character set. The character information outside the small character set is lost and the correct text cannot be parsed from it.
* 4. Congratulations on winning the prize. There are some characters that use any encoding without any difference or the wrong encoding does not cause bit replenishment, so I can do nothing. (This is indeed rare)
*
* Note: The garbled text in the program is obtained by adjusting Baidu homepage (utf-8) to gbk (obviously garbled). Interested comrades can use other garbled tests. If you have any questions, please feel free to reply.
*/
public class CharSetTest {
public static final String[] CHARSET_NAMES=new String[]{"ISO8859-1","GBK","UTF-8"};
public static void main(String[] args) throws UnsupportedEncodingException {
//Garbose string
String str="中文版文版文版文版";
int strLength=Integer.MAX_VALUE;//Character length
String newStr="";//Stands analyzed from garbled strings
String srcCharSet="";//Current garbled string encoding
String targetCharSet="";// Correct encoding of garbled strings
//Travel through possible encoding combinations to create the encoding format with the smallest encoding length
for(int i=0;i<CHARSET_NAMES.length;i++){
for(int j=0;j<CHARSET_NAMES.length;j++){
String temp=new String(str.getBytes(CHARSET_NAMES[i]),CHARSET_NAMES[j]);
//System.out.println(temp);
if(temp.length()<=strLength){
strLength=temp.length();
newStr=temp;
srcCharSet=CHARSET_NAMES[i];
targetCharSet=CHARSET_NAMES[j];
}
}
}
//Output the query code and correct text format
System.out.println(srcCharSet+"-->"+targetCharSet+":"+newStr);
}
}