This article assumes that the hadoop environment is on a remote machine (such as a linux server), and the hadoop version is 2.5.2
Note: This article eclipse/intellij idea remote debugging hadoop 2.6.0 mainly references and has been adjusted based on it.
Since I like to install 32-bit software on win7 64-bit, such as 32-bit jdk and 32-bit eclipse, although the operating system in this article is win7 64-bit, all software is 32-bit.
Software version:
Operating system: win7 64-bit
eclipse: eclipse-jee-mars-2-win32
java: 1.8.0_77 32-bit
hadoop:2.5.2
1. Install hadoop
1. Find a directory in win7 to decompress hadoop-2.5.2.tar.gz, such as D:/app/hadoop-2.5.2/
2. Configure environment variables
HADOOP_HOME = D:/app/hadoop-2.5.2/
2. Install hadoop eclipse plugin
1. Download hadoop-eclipse-plugin
hadoop-eclipse-plugin is a hadoop plugin specially used for eclipse, which can view the directory and file content of hdfs directly in the IDE environment. Its source code is hosted on github, and the official website address is https://github.com/winghc/hadoop2x-eclipse-plugin. Download hadoop-eclipse-plugin-2.6.0.jar in the release folder.
2. Download the hadoop plug-in package for Windows 32-bit platform (hadoop.dll, winutils.exe)
Since our software environment is 32-bit, we need to download 32-bit hadoop.dll and winutils.exe. The download address is available for Baidu hadoop.dll 32
For example, download this: http://xiazai.VeVB.COM/201607/yuanma/eclipse-hadoop(VeVB.COM).rar
Copy winutils.exe to the $HADOOP_HOME/bin directory, and copy hadoop.dll to the C:/Windows/SysWOW64 directory (Note: Since our operating system is 64-bit and the software is 32-bit, we copy it to this directory. In addition, if your operating system is 32-bit, then copy it to the c:/windwos/system32 directory)
3. Configure hadoop-eclipse-plugin plugin
Start eclipse, window->preferences->hadoop map/reduce Specify the root directory of hadoop on win7 (ie: $HADOOP_HOME)
Switch Map/reduce view
windows->show view->other Map/Reduce Locations
Then add a new Location in the Map/Reduce Locations panel below
Configure as follows
Location name is just a name, just call it
Map/Reduce(V2) Master Host Here is the IP address corresponding to the hadoop master in the virtual machine. The port below corresponds to the port specified by the dfs.datanode.ipc.address attribute in hdfs-site.xml
The port here of DFS Master Port corresponds to the port specified by fs.defaultFS in core-site.xml
The last user name should be the same as the user name that runs hadoop in the virtual machine. I installed and ran hadoop 2.6.0 with hadoop, so fill in hadoop here. If you installed it with root, change it to root accordingly.
After these parameters are specified, click Finish and eclipse to know how to connect to hadoop. If everything goes well, you can see the directories and files in HDFS in the Project Explorer panel.
You can right-click on the file and select Delete to try. Usually, the first time is unsuccessful, and there will be a lot of things. The general idea is that there are insufficient permissions. The reason is that the current Win7 login user is not the running user of Hadoop in the virtual machine. There are many solutions. For example, you can create a new Hadoop administrator user on win7, then switch to Hadoop to log in to win7, and then use eclipse to develop. However, this is too annoying, the easiest way:
Added in hdfs-site.xml
<property> <name>dfs.permissions.enabled</name> <value>false</value> </property>
In short, it is to completely turn off the security detection of hadoop (there is no need for these in the learning stage, don’t do this when it is officially produced), finally restart hadoop, then go to eclipse, and repeat the delete file operation just now, and it should be fine.
Note: If you cannot connect, please try telnet 192.168.1.6 9000 first (please replace the IP and port with your own hadoop server ip and port) to ensure that the port is accessible.
If telnet fails, it may be that there is a problem with the value of fs.defaultFS in core-site.xml. For example, the configuration is localhost:9000. You can consider replacing localhost with the host name
3. Write wordcount examples
1. Create a new project and select Map/Reduce Project
Just the next one, and then create a new class WodCount.java code as follows:
import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.GenericOptionsParser;public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); if (otherArgs.length < 2) { System.err.println("Usage: wordcount <in> [<in>...] <out>"); System.exit(2); } Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); for (int i = 0; i < otherArgs.length - 1; ++i) { FileInputFormat.addInputPath(job, new Path(otherArgs[i])); } FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length - 1])); System.exit(job.waitForCompletion(true) ? 0 : 1); }}Then create a log4j.properties in the src directory, the content is as follows: (For the convenience of running, check various outputs)
log4j.rootLogger=INFO, stdout#log4j.logger.org.springframework=INFO#log4j.logger.org.apache.activemq=INFO#log4j.logger.org.apache.activemq.spring=WARN#log4j.logger.org.apache.activemq.store.journal=INFO#log4j.logger.org.apache.activemq.or g.activeio.journal=INFOlog4j.appender.stdout=org.apache.log4j.ConsoleAppenderlog4j.appender.stdout.layout=org.apache.log4j.PatternLayoutlog4j.appender.stdout.layout.ConversionPattern=%d{ABSOLUTE} | %-5.5p | %-16.16t | %-32.32c{1} | %-32.32C %4L | %m%nThe final directory structure is as follows:
2. Configure the running parameters
Because WordCount is to enter a file to count words and then output to another folder, so give two parameters, refer to the above figure, enter in Program arguments
hdfs://192.168.1.6:9000/user/nub1.txt
hdfs://192.168.1.6:9000/user/output
Note that if the user/nub1.txt file does not have it, please upload it manually first (using the right-click of the DFS Location tool in eclipse), and then /output/ must not exist. Otherwise, if the program runs to the end and finds that the target directory exists, an error will also be reported.
OK, just run
The above is all the content of this article. I hope it will be helpful to everyone's learning and I hope everyone will support Wulin.com more.