Javaはhtmlparserを使用して、HTMLで必要なコードの特定の実装を取得します

著者：Eve Cole 更新時間：2025-03-13 11:00:04

私は過去2日間に何かをする必要があり、他の人のWebページからいくつかの情報をcraう必要があります。最後に、htmlparserを使用してHTMLを解析します。

コードからそれを見てください：

まず、インポートパッケージは次のとおりであることに注意する必要があります。

コードコピーは次のとおりです。

List <MP3> mp3List = new ArrayList <MP3>（）;

試す{

Parser Parser = new Parser（htmlstr）; // Parserを初期化します。ここでは、org.htmlparserとしてインポートパッケージに注意を払う必要があります。ここには多くのパラメーターがあります。私はこの場所を書いて、良いHTMLテキストを事前に取得しました。 URLオブジェクトを渡すこともできます

parser.setencoding（ "utf-8"）; //エンコーディングマシンを設定します

アンドフィルターフィルター=

新しいandFilter（

新しいtagnamefilter（ "div"）、

new hasattributefilter（ "id"、 "songlistwrapper"）

）; //フィルターを介してdivを見つけ、divのIDはsonglistwrapperです

nodeList nodes = parser.parse（filter）; //フィルターを介してノードを取得します

node node = nodes.elementat（0）;

nodeList nodeschild = node.getChildren（）;

node [] nodeSarr = nodeschild.tonodearray（）;

nodeList nodeschild2 = nodeSarr [1] .getChildren（）;

node [] nodesarr2 = nodeschild2.tonodearray（）;

node nodeul = nodesarr2 [1];

node [] nodesli = nodeul.getChildren（）。tonodearray（）; //必要に応じてnodesliを解析します

for（int i = 2; i <nodesli.length; i ++）{

//system.out.println（nodesli [i] .tohtml（））;

node tempnode = nodesli [i];

tagnode tagnode = new tagnode（）; // tagnodeに変換することによってのみ属性を取得します。

tagnode.settext（tempnode.tohtml（））;

string clastr = tagnode.getTribute（ "class"）; // clastrはbb-dotimg clearfix song-item-hook {'songitem'：{'sid'： '113275822'、 'sname'： '私の要件は高くない' 、 '著者'： 'Huang bo'}}

clastr = clastr.replaceall（ "" "、" "）;

if（clastr.indexof（ "//？"）== -1）{

パターンパターン= pattern.compile（ "[// s // wa-z //-]+// {'songitem'：// {'sid'： '（[// d]+）'、 'sname' ： '（[// s // s]*）'、 'Author'： '（[// s // s]*）' //} //} "）;

matcher matcher = pattern.matcher（clastr）;

if（matcher.find（））{

mp3 mp3 = new Mp3（）;

mp3.setsid（matcher.group（1））;

mp3.setsname（matcher.group（2））;

mp3.setauthor（matcher.group（3））;

mp3List.Add（mp3）;

//（int j = 1; j <= matcher.groupcount（）; j ++）{

//system.out.print（ ""+j+"--->"+matcher.group（j））;

//}

}

//system.out.println（matcher.find（））;

}

} catch（例外e）{

e.printstacktrace（）;

}

上記は、私がプロジェクトで分析したものです。

/////クラストはbb-dotimg clearfix song-item-hook {'songitem'：{'sid'： '113275822'、 'sname'： '私の要件は高くない'、 '著者'： 'huang bo

これは、Webページから解析されたコンテンツです。