Use regular expressions in asp to clear the word format of copied and pasted content in the background editor and convert it into plain text. When company customers use website background editing to add and modify content, they often copy the content directly from the word document to the editor. Just submit. The result is that there are a variety of styles on the content display page, and sometimes some plain text content is needed as excerpts, all of which require clearing the word format. It is difficult to change the customer's habits by asking the customer to copy it into Notepad and then paste it into the editor for editing, so we start by changing it ourselves. I found some regular rules for clearing word formats from Baidu on the Internet, but the results were not satisfactory, so I wrote an asp function to clear word formats myself, which can meet our own needs. The function is as follows:
Copy the code code as follows:
function cleanWord(html)
dimregEx
set regEx=New RegExp
regEx.IgnoreCase=True
regEx.Global=True
regEx.Pattern=<[^>]*> 'Clear all content between <>
html = regEx.replace(html, )
regEx.Pattern={[^}]*} 'Clear all content between {}
html = regEx.replace(html, )
regEx.Pattern=/[^/]*/ 'Clear all comments between /**/
html = regEx.replace(html, )
html =Replace(html,table.MsoNormalTable,) 'Replace the missing words
cleanWord=html
set regEx=nothing
end function