Occasionally, characters such as " in the data are as follows
Start with &#, with a string of numbers in the middle, ending with;
Start with &, with a string of characters in the middle, ending with;
For example, the most common or equivalent
When a browser encounters these escape characters, it will escape back, but how to identify them through code? org.apache.commons.lang.StringEscapeUtils.unescapeHtml provides a good explanation
In the first case above, the middle is a number, and the number (unicode) is converted to char
When encountering the second situation, there are characters in the middle, so you can only look up the mapping table. Find the corresponding numbers of characters from the mapping table and convert them to char. Look at the code and you will see it at a glance.
See how HTML40 is defined
Copy the code