Eliminate any code other than text in the html in the page. The php version just released is the version in ASP. asp function code
Copy the code code as follows:
<%
Function RemoveHTML(str)
Dim objRegExp, Match,strHTML
if isnull(str) then
str=
end if
strHTML=str
strHTML=replace(replace(replace(strHTML,vblf,),vbcr,),vbcrlf,)
Set objRegExp = New Regexp
objRegExp.IgnoreCase = True
objRegExp.Global = True
objRegExp.Pattern = <script[^>]*?>.*?</script>
strHTML = objRegExp.Replace(strHTML,)
objRegExp.Pattern = <style[^>]*?>.*?</style>
strHTML = objRegExp.Replace(strHTML,)
objRegExp.Pattern = <.+?>
strHTML = objRegExp.Replace(strHTML,)
Set objRegExp = Nothing
RemoveHTML=strHTML
End Function
%>
code in php
Copy the code code as follows:
<?php
$search = array ('<script[^>]*?>.*?</script>'si, // Remove javascript
'<style[^>]*?>.*?</style>'si, // Remove css
'<[/!]*?[^<>]*?>'si, // Remove HTML tags
'<!--[/!]*?[^<>]*?>'si, // Remove the comment mark
'([rn])[s]+', // Remove whitespace characters
'&(quot|#34);'i, // Replace HTML entity
'&(amp|#38);'i,
'&(lt|#60);'i,
'&(gt|#62);'i,
'&(nbsp|#160);'i,
'&(iexcl|#161);'i,
'&(cent|#162);'i,
'&(pound|#163);'i,
'&(copy|#169);'i,
'&#(d+);'e); // Run as PHP code
$replace = array (,
,
,
,
/1,
/,
&,
<,
>,
,
chr(161),
chr(162),
chr(163),
chr(169),
chr(/1));
//$document is a string that needs to be processed. If the source is a file, $document = file_get_contents('http://www.sina.com.cn');
$out = preg_replace($search, $replace, $document);
echo $out;
?>
Just save it as get.php.