A lightweight parsing implementation. This code will not download any information from the Internet, nor will it execute any scripts, it is purely parsing.
parsing is implemented through mshtml's markupservice. To use this code correctly, you need to add an mshtml reference.
Since the impersiststreamint interface is not defined in .net, it must be implemented by itself, and the interface definition is:
| The following content is the program code: [comvisible(true),comimport(),guid("7fd52380-4e07-101b-ae2d-08002b2ec713"),interfacetypeattribute(cominterfacetype.interfaceisiunknown)] publicinterfaceipersiststreaminit { voidgetclassid([in,out]refguidpclassid); [return:marshalas(unmanagedtype.i4)][preservesig] intisdirty(); voidload([in,marshalas(unmanagedtype.interface)]ucomistreampstm); voidsave([in,marshalas(unmanagedtype.interface)]ucomistreampstm, [in,marshalas(unmanagedtype.i4)]intfcleardirty); voidgetsizemax([out,marshalas(unmanagedtype.lparray)]longpcbsize); voidinitnew(); } |
|
| The following content is the program code: unsafeihtmldocument2parse(strings) { ihtmldocument2pdocument=newhtmldocumentclass(); if(pdocument!=null) { ipersiststreaminitppersist=pdocumentasipersiststreaminit; ppsist.initnew(); ppsist=null; imakupservicesms=pdocumentasimarkupservices; if(ms!=null) { immarkupcontainerpmc=null; imarkuppointerpstart,pend; ms.createmarkuppointer(outpstart); ms.createmarkuppointer(outpend); stringbuildersb=newstringbuilder(s); intptrpsource=marshal.stringtohglobaluni(s); ms.parsestring(ref*(ushort*)psource.topointer(),0,outpmc,pstart,pend); if(pmc!=null) { marshal.release(psource); returnpmcasihtmldocument2; } marshal.release(psource); } } returnnull; } |
|
Something went wrong when writing the code. The first parameter of the markupservice::parsestring is refund. Obviously, to pass in the html code, this ushort must be the first widechar, so here we bypass the compiler warning by using unsafe code.
Collect the most practical web page special effects code!