Java -Zugriff auf HDFS -Konfigurationsanweisungen für das Hadoop -verteilte Dateisystem

Autor：Eve Cole Aktualisierungszeit：2025-03-26 19:32:01

Konfigurationsdatei

M103 wird durch HDFS -Serviceadresse ersetzt.
Um den Java-Client zum Zugriff auf Dateien auf HDFs zu verwenden, muss ich sagen, dass die Konfigurationsdatei Hadoop-0.20.2/Conf/Core-Site.xml diejenige war, die ich hier zunächst einen großen Verlust erlitten habe, sodass ich keine Verbindung zu HDFs herstellen konnte und die Dateien nicht erstellt oder gelesen oder gelesen werden konnten.

 <xml Version = "1.0"?> <? Verzeichnisse. </Beschreibung> </Property> <!-Dateisystemeigenschaften-> <Formen> <name> fs.default.name </name> <wert> hdfs: // linux-zzk-113: 9000 </value> </property> </configuration>

Konfigurationselement: Hadoop.tmp.dir repräsentiert den Verzeichnisort, an dem Metadaten auf dem benannten Knoten gespeichert sind, und für den Datenknoten ist es das Verzeichnis, in dem Dateidaten auf dem Knoten gespeichert sind.

Konfigurationselement: fs.default.name repräsentiert die benannte IP -Adresse und die Portnummer. Der Standardwert ist Datei: ///. Für die Java -API muss die Verbindung zu HDFs die konfigurierte URL -Adresse hier verwenden. Für Datenknoten zugreifen Datenknoten über diese URL auf den benannten Knoten.

HDFS-site.xml

 <? <name> dfs.namenode.servicerpc-address </name> <wert> m103: 8022 </value> </Eigenschaft> <Fotage> <name> dfs.https.Address </name> <wert> M103: 50470 </value> </property> <name> <name> dfs.https.https.https. <property> <name>dfs.namenode.http-address</name> <value>m103:50070</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.blocksize</name> <value>134217728</value> </property> <property> <name> dfs.client.use.datanode.hostname </name> <wert> false </value> </property> <name> <name> fs.permissions.umask-mode </name> <wert> 022 </value> </property> <name> <name> custure <name> dfs.block.local-path-access.user </name> <wert> cloudera-scm </value> </property> <elemente> <name> dfs.client.read.shortcircuit </name> <Wirps> false </value> </property> <name> dfs.domain.socket.path.path.path.path.path.path.path.path.path.path.path.path.path.path.path.path.path.path.path.path.path.path.path.path.path.path.path.path.path.path.path.path.path.path.path.path.path.path. <wert>/var/run/hdfs-sockets/dn </value> </property> <elements> <name> dfs.client.read.shortcircuit.skip.Checksum </name> <wert> false </value> </Property> <name <name <name> calent.domain. <name> dfs.datanode.hdfs-blocks-metadata.enabled </name> <wert> true </value> </property> <elemente> <name> fs.http.impl </name> <wert> com.scistor.datavision.fs.httpFilessystem </value> </configuration>

Mapred-Site.xml

 <? <wert> 120 </value> </property> <elemente> <name> mapredce.output.fileOutputformat.comPress </name> <wert> true </value> </property> <name> mapredece.output.fileOutputformat.compress <name> mapreduce.output.FileOutputFormat.comPress.Codec </name> <wert> org.apache.hadoop.io.compress.snappyCodec </value> </property> <elemente> <name> maprecece.map.output.compress.codec </name> <wert> org.apache.apache.hadoop. <name>mapreduce.map.output.compress</name> <value>true</value> </property> <property> <name>zlib.compress.level</name> <value>DEFAULT_COMPRESSION</value> </property> <property> <name>mapreduce.task.io.sort.factor</name> <value>64</value> </property> <property> <name> mapReduce.map.sort.spill.PERCYT </name> <wert> 0.8 </value> </property> <elemefone> <name> mapReduce.Reduce.Shuffle.ParallelCopies </name> <value> 10 </value> </Property> </name> mapreduce.timeout </name> </name> 400000 <name>mapreduce.client.submit.file.replication</name> <value>1</value> </property> <property> <name>mapreduce.job.reduces</name> <value>24</value> </property> <property> <name>mapreduce.task.io.sort.mb</name> <value>256</value> </property> <property> <name>mapreduce.map.speculative</name> <value>false</value> </property> <property> <name>mapreduce.reduce.speculative</name> <value>false</value> </property> <property> <name>mapreduce.reduce.speculative</name> <value>false</value> </property> <property> <Name> mapReduce.job.Reduce.slowStart.comPletedMaps </name> <wert> 0.8 </value> </property> <elemente> <name> mapReduce.jobhistory.address </name> <Werts> M103: 10020 </value> </property> <name> mapreduce.jobhistory <value>m103:19888</value> </property> <property> <name>mapreduce.jobhistory.webapp.https.address</name> <value>m103:19890</value> </property> <property> <name>mapreduce.jobhistory.admin.address</name> <value>m103:10033</value> </property> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>yarn.app.mapreduce.am.staging-dir</name> <value>/user</value> </property> <property> <name>mapreduce.am.max-attempts</name> <value>2</value> </property> <property> <Name> yarn.app.MapReduce.am.resource.mb </name> <wert> 2048 </value> </property> <elemente> <name> arnn.app.MapReduce.am.resource.cpu-vcores </name> </value> </property> <name> <name> mapreduce.job.ubertask.enable </</<Name> <name> mapreduce. </Property> <Spertage> <name> arkar.app.mapreduce.am.command-opts </name> <wert> -djava.net.preferipv4Stack = true -xmx1717986918 </value> </property> <name> mapreduce.java.opts </name> <wert> -djava.net.PreferIPVV4Stack = true -xmx1717986918 </value> </Eigenschaft> <name> <name> mapreduce.reduce.java.opts </name> <wert> -djava.net.Preferipv4Stack = true --xmx2576980378 </value> </xmx2576980378 </value> < <Name> yarn.app.mapreduce.am.admin.user.env </name> <wert> ld_library_path = $ hadoop_common_home/lib/nativ: $ java_library_path </value> </property> <name> mapreduce.map.memory <name> mapReduce.map.cpu.vcores </name> <wert> 1 </value> </property> <name> <name> mapReduce.Reduce.Memory.mb </name> <wert> 3072 </value> </property> <name> <name> mapreduce.map.cpu.vcores </name> </name> 1 </value> </value> <name> mapReduce.Reduce.cpu.vcores </name> <wert> 1 </value> </property> <Formen <value>$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$MR2_CLASSPATH,$CDH_HCAT_HOME/share/hcatalog/*,$CDH_HIVE_HOME/lib/*,/etc/hive/conf,/opt/cloudera/parcels/CDH/lib/udps/*</value> </property> <property> <name>mapreduce.admin.user.env</name> <value>LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native:$JAVA_LIBRARY_PATH</value> </property> <property> <name>mapreduce.shuffle.max.connections</name> <value>80</value> </property></configuration>

Verwenden Sie die Java -API, um auf HDFS -Dateien und -verzeichnisse zuzugreifen

 Paket com.demo.hdfs; import Java.io.BufferedInputStream; Import Java.io.FileInputStream; Import Java.io.FilenotFoundException; Import Java.io.fileOutputStream; ImportputStream importieren Java.IO.IO.IRETSTREAM; org.apache.hadoop.conf.configuration; import org.apache.hadoop.fsdatainputstream; import org.apache.hadoop.fs.fsdataoutputStream; import org.apache.hadoop.fs.filestatus; org.apache.hadoop.io.ioutils; import org.apache.hadoop.util.progressable;/*** @Author Zhangzk**/public class filecopytoHdfs {public static void main (String [] args) löst eine Ausnahme aus {try {// adodtohdfs (); // deletefromhdfs (); // GetDirectoryFromHdfs (); appendToHdfs (); readFromHdfs (); } catch (Ausnahme e) {// Todo automatisch generierter Block E. printstacktrace (); } endlich {System.out.println ("Erfolg"); }}/** Die Datei in HDFS hochladen*/private static void uploadtohdfs () löst FileNotFoundException aus, ioException {string localsrc = "d: //qq.txt"; String dst = "hdfs: //192.168.0.113: 9000/user/zhangzk/qq.txt"; InputStream in = neuer bufferedInputStream (neuer FileInputStream (localSrc)); Configuration conf = new configuration (); Dateisystem fs = Dateisystem.get (uri.create (dst), conf); OutputStream out = fs.create (neuer Pfad (DST), New Progressable () {public void progress () {System.out.print (".");}}); Ioutils.copyBytes (in, out, 4096, wahr); }/** Datei aus hdfs lesen*/private static void readfromhdfs () löscht FilenotFoundException, IOException {string dst = "hdfs: //192.168.0.113: 9000/user/zhangzk/qq.txt" aus; Configuration conf = new configuration (); Dateisystem fs = Dateisystem.get (uri.create (dst), conf); FsdatainputStream hdfsinstream = fs.open (neuer Pfad (DST)); OutputStream out = new FileOutputStream ("d: /qq-hdfs.txt"); Byte [] iObuffer = neues Byte [1024]; int readlen = hdfsinstream.read (iObuffer); while (-1! = Readlen) {out.write (iObuffer, 0, Readlen); Readlen = hdfsinstream.read (iObuffer); } out.close (); hdfsinstream.close (); fs.close (); } /** Inhalte zum Ende der Datei auf HDFS in append hinzu; HINWEIS: Wenn Dateiaktualisierungen addieren </value> </value> </property>*/private statische void appendToHdfs () löscht FilenotfoundException, IOException {String dst = "hdfs: //192.168.168.113: 9000/usws: //192.168.168.113: 9000/usw.; Configuration conf = new configuration (); Dateisystem fs = Dateisystem.get (uri.create (dst), conf); FsdataoutputStream out = fs.append (neuer Pfad (DST)); int readlen = "Zhangzk hinzufügen von hdfs java api" .getBytes (). Länge; while (-1! } out.close (); fs.close (); }/** die Datei aus HDFS löschen*/private statische void deletefromhdfs () löst FilenotFoundException aus, IOException {string dst = "hdfs: //192.168.0.113: 9000/user/zhangzk/qq-bak.txt"; Configuration conf = new configuration (); Dateisystem fs = Dateisystem.get (uri.create (dst), conf); fs.deleteonexit (neuer Pfad (DST)); fs.close (); }/** Übertragungsdateien und Verzeichnisse auf HDFS*/privat static void getDirectoryFromHdfs () löscht FilenotFoundException, IOException {String dst = "hdfs: //192.168.0.113: 9000/user/ZHangzk"; Configuration conf = new configuration (); Dateisystem fs = Dateisystem.get (uri.create (dst), conf); Filestatus filelist [] = fs.ListStatus (neuer Pfad (DST)); int size = filelist.length; für (int i = 0; i <size; i ++) {System.out.println ("Name:" + filelist [i] .getPath (). getName () + "/t/tsize:" + filelist [i] .getlen ()); } fs.close (); }}

HINWEIS: Für Anhangoperationen wurde es seit Hadoop-0.21 nicht unterstützt. Für Anhangoperationen finden Sie in einem Dokument zu Javaeye.