Mavenを使用して、Hadoop開発環境を構築します

著者：Eve Cole 更新時間：2025-05-23 18:00:04

Mavenの使用については話しません。オンラインではたくさんあり、長年にわたって大きな変化はありませんでした。ここでは、Hadoop開発環境を構築する方法のみを紹介します。

1.最初にプロジェクトを作成します

コードを次のようにコピーします。MVNARCHETYPE：GENERATE -DGROUPID = MY.HADOOPSTUDY -DARTIFACTID = HADOOPSTUDY -DARCHETYPEARTID = MAVEN -ARCHETYPE -QUICKSTART -DINTERACTIVEMODE = fals

2。次に、pom.xmlファイルにHadoop-Common、Hadoop-Client、Hadoop-HDFSのHadoop依存関係パッケージを追加します。追加されたpom.xmlファイルは次のとおりです

<プロジェクトXMLNS：XSI = "http://www.w3.org/2001/xmlschema-instance" xmlns = "http://maven.apache.org/pom/4.0.0" XSI：Schemalocation = "http://maven.apach. http://maven.apache.org/maven-v4_0_0.xsd "> <modelversion> 4.0.0 </modelversion> <groupid> my.hadoopstudy </groupid> <artifactid> hadoopstudy </artifactid> <パッケージ> <name> hadoopstudy </name> <url> http://maven.apache.org </url> <dependencies> <dependency> <groupid> org.apache.hadoop </groupid> <artifactid> hadoop-common </artifactid> <バージョン> 2.5.1 <groupid> org.apache.hadoop </groupid> <artifactid> hadoop-hdfs </artifactid> <bersion> 2.5.1 </version> </dependency> <dependency> <groupid> org.apache.hadoop <groupId> junit </groupId> <artifactid> junit </artifactid> <version> 3.8.1 </version> <scope> test </scope> </dependency> </dependencies> </project>

3。テスト

3.1最初にHDFSの開発をテストできます。ここでは、前のHadoop記事でクラスターを持っていると仮定します。クラスコードは次のとおりです。

 my.hadoopstudy.dfs; Import org.apache.hadoop.conf.configuration; Import org.apache.hadoop.fs.fs.fsdataoutputstream; Import org.apache.hadoop.fs.filestatus; Import org.apache.hadoop.fs.files -system; import.apache.f.fs.fs.f.path org.apache.hadoop.io.ioutils; import java.io.inputstream; import java.net.uri; public class test {public static void main（string [] args）throws {string uri = "hdfs：//9.111.254.189：9000/"; configuration config = new configuration（）; filesystem fs = filesystem.get（uri.create（uri）、config）; // hdfs filestatus [] statuses = fs.liststatus（new Path（ "/user/fkong"））の/user/fkong/ディレクトリの下にすべてのファイルとディレクトリをリストします。 for（filestatus status：statuses）{system.out.println（status）; } // hdfsの/user/fkongディレクトリにファイルを作成し、テキストfsdataoutputStream os = fs.create（new Path（ "/user/fkong/test.log"））のラインを作成します。 os.write（ "hello world！"。getbytes（））; os.flush（）; os.close（）; // hdfs inputstreamのunder/fkongの下の指定されたファイルのコンテンツを表示= fs.open（new Path（ "/user/fkong/test.log"））; ioutils.copybytes（is、system.out、1024、true）; }}

3.2 MapReduceジョブのテスト

テストコードは比較的単純です。

 my.hadoopstudy.mapreduce; Import org.apache.hadoop.conf.configuration; Import org.apache.hadoop.fs.path; import org.apache.hadoop.io.intwrit; import org.apache.hadoop.io.text; Import org.apache.hadoop.mapreduce.mapreduce.job; org.apache.hadoop.mapreduce.mapper; Import org.apache.hadoop.mapreduce.ducer; Import org.apache.hadoop.mapreduce.lib.input.fileinputformat; Import org.apache.hadoop.mapreduce.lib.outtupt.fileoutputpontat; org.apache.hadoop.util.genericoptionsparser; import java.io.io.ioexception; public class mymapperはmapper <object、text、text、intwritable> {private final static intwritable = new intwritable（1）;プライベートテキストイベント= new Text（）; public voidマップ（オブジェクトキー、テキスト値、コンテキストコンテキスト）IOException、arturtedexception {int idx = value.toString（）。indexof（ ""）; if（idx> 0）{string e = value.toString（）。substring（0、idx）; event.set（e）; context.write（event、one）; }}} public static class myReducerはreducer <text、intwritable、text、intwritable> {private intwritable result = new intwritable（）; public void reduce（テキストキー、iterable <intwrita>値、コンテキストコンテキスト）がioexception、arturtedexception {int sum = 0; for（intwritable val：values）{sum += val.get（）; } result.set（sum）; context.write（key、result）; }} public static void main（string [] args）throws exception {configuration conf = new Configuration（）; string [] otherargs = new genericoptionsparser（conf、args）.getRemaingargs（）; if（otherargs.length <2）{system.err.println（ "usage：eventcount <in> <out>"）; System.Exit（2）; } job job = job.getInstance（conf、 "event count"）; job.setjarbyclass（eventcount.class）; job.setmapperclass（mymapper.class）; job.setcombinerclass（myReducer.class）; job.setReducerClass（myReducer.class）; job.setOutputKeyclass（text.class）; job.setOutputValueClass（intwritable.class）; fileInputformat.addinputpath（job、new Path（otherArgs [0]））; fileoutputformat.setOutputPath（job、new Path（otherArgs [1]））; system.exit（job.waitforcompletion（true）？0：1）; }}

「MVNパッケージ」コマンドを実行して、JARパッケージHadoopStudy-1.0-Snapshot.jarを生成し、JARファイルをHadoopインストールディレクトリにコピーします。

ここでは、いくつかのログファイルのイベント情報を分析してイベントの数をカウントする必要があるため、ディレクトリとファイルを作成する必要があると仮定します。

/tmp/input/event.log.1
/tmp/input/event.log.2
/tmp/input/event.log.3

ここで列を作成する必要があるため、内容が次の場合、各ファイルの内容は同じである可能性があります

job_new ...
job_new ...
job_finish ...
job_new ...
job_finish ...

次に、これらのファイルをHDFSにコピーします

コードコピーは次のとおりです：$ bin/hdfs dfs -put/tmp/input/user/fkong/input

MapReduceジョブを実行します

コードを次のようにコピーします：$ bin/hadoopjar hadoopstudy-1.0-snapshot.jar my.hadoopstudy.mapreduce.eventcount/user/fkong/input/user/fkong/output

実行結果を表示します

コードコピーは次のとおりです：$ bin/hdfs dfs -cat/user/fkong/output/part-r-00000

上記はこの記事のすべての内容です。みんなの学習に役立つことを願っています。誰もがwulin.comをもっとサポートすることを願っています。