The problem with this is because I am trying to serialize an object and store it in an xml file, then read the string from the xml file, and deserialize the object (the hadoop mapreduce program transfers the object to the tasktracker via JobConf).
Regarding serialization:
When I directly store the serialized data in xml, I encountered a parsing error when parsing the xml because there are illegal characters. After reading the instructions of xml in detail, I found that < > ' " & is not allowed as xml PCDATA. To use these characters, they must be replaced with built-in entities:
Entity Quote Characters
< <
> >
& &
"
' '
Therefore, the most direct way is to control it yourself, convert special characters into entity references when writing to the XML file, and then turn them back when reading. Of course, there is another method that is to not use strings as PCDATA, but as CDATA (no testing).
The method of using entity reference is too troublesome, and my application XML format cannot be specified by myself, so I have to find another method: BASE64 encoding. This is a encoding method commonly used to transmit data on the network. If you encode the characters that exist these illegal characters into base64 encoding, there will be no such characters. The most important thing is that the encoded encoder and decoder are provided in the java API, BASE64Encoder and BASE64Decoder located in sun.misc. This way, the byte stream obtained after serialization can be converted into a string. And it can also be stored in an xml file.
However, these two classes are not open APIs of Sun, and there will be warnings for compilation.