Sunday, November 25, 2012

Hadoop Single Node Set Up


With this post I am hoping to share the procedure to set up Apache Hadoop in single node. Hadoop is used in dealing with Big Data sets where deployment is happening on low-cost commodity hardware. It is a map-reduce framework which map segments of a job among the nodes in a cluster for execution. Though we will not see the exact power of Hadoop running it on single node, it is the first step towards the multi-node cluster. Single node set up is useful in getting familiar with the operations and debugging applications for accuracy, but the performance may be far low than what is achievable.

I am sharing the steps to be followed on a Linux system, as it is supported as both a development and a  production platform for Hadoop. Win32 is supported only as a development platform and equivalent commands for the given Linux commands need to be followed. This Hadoop document includes the details for setting up Hadoop, in brief. I am here sharing a detailed guidance for the set up process with the following line up.

  • Pre-requisites
  • Hadoop Configurations
  • Running the Single Node Cluster
  • Running a Map-reduce job

Pre-requisites

Java 1.6.X

Java needs to be installed in your node, in order to run Hadoop as it is a java based framework. We can check the Java installation of node by,
pushpalanka@pushpalanka-laptop:~$ java -version
java version "1.6.0_23"
Java(TM) SE Runtime Environment (build 1.6.0_23-b05)
Java HotSpot(TM) Server VM (build 19.0-b09, mixed mode)

If it is not giving an intended output, we need to install Java by running,
pushpalanka@pushpalanka-laptop:~$ sudo apt-get install sun-java6-jdk
or
running a downloaded binary java installation package. Then set the environment variable JAVA_HOME in the relevant configuration file(.bashrc for Linux bash shell users).

Create a Hadoop User

At this step we create a user account dedicated for Hadoop installation. This not a "must do" step, but recommended for security reasons and ease of managing the nodes. So we create a group called 'hadoop' and add a new user to the group called 'hpuser'(the names can be of our choice) with the following commands.
pushpalanka@pushpalanka-laptop:~$ sudo addgroup hadoop
pushpalanka@pushpalanka-laptop:~$ sudo adduser --ingroup hadoop hpuser
pushpalanka@pushpalanka-laptop:~$ su hpuser
hpuser@pushpalanka-laptop:~$

Now on we will work as hpuser, with the last command.

Enable SSH Access

Hadoop requires SSH(Secure Shell) access to the machines it uses as nodes. This is to create a secured channel to exchange data. So even in single node the localhost need SSH access for the hpuser in order to exchange data for Hadoop operations. Refer this documentation if you need more details on SSH.
We can install SSH with following command.
hpuser@pushpalanka-laptop:~$ sudo apt-get install ssh

Now let's try to SSH the localhost without a pass-phrase.
hpuser@pushpalanka-laptop:~$ ssh localhost
Linux pushpalanka-laptop 2.6.32-33-generic #72-Ubuntu SMP Fri Jul 29 21:08:37 UTC 2011 i686 GNU/Linux
Ubuntu 10.04.4 LTS

Welcome to Ubuntu!
................................................

If it does not give something similar to the above, we have to enable SSH access to the localhost as following.
Generate a RSA key pair without a password. (We do not use a password only because then it will prompt us to provide password in each time Hadoop communicate with the node.)
hpuser@pushpalanka-laptop:~$ssh-keygen -t rsa -P ""

Then we need to concatenate the generated public key in the authorized keys list of the localhost. It is done as follows. Then make sure 'ssh localhost' is successful.
hpuser@pushpalanka-laptop:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Now we are ready to move onto Hadoop. :) Use the latest stable release from Hadoop site and I used hadoop-1.0.3.

Hadoop Configurations

Set Hadoop Home

As we set JAVA_HOME at the beginning of this post we need to set HADOOP_HOME too. Open the same file (.bashrc) and add the following two lines at the end.
export HADOOP_HOME=<absolute path to the extracted hadoop distribution>
export PATH=$PATH:$HADOOP_HOME/bin

Disable IPv6

As there will not be any practical need to go for IPv6 addressing inside Hadoop cluster, we are disabling this to be less error-prone. Open the file HADOOP_HOME/conf/hadoop-env.sh and add the following line.
export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true

Set Paths and Configuration

In the same hadoop-env.sh file add the JAVA_HOME too.
export JAVA_HOME=<path to the java installation>

Then we need create a directory to be used as HDFS (Hadoop Distributed File System).Make directory at a place of your choice and make sure the owner of the directory is 'hpuser'. I ll refer this as 'temp_directory'. We can do it by,
hpuser@pushpalanka-laptop:~$ sudo chown hpuser:hadoop <absolute path to temp_directory>

Now we add the following property segments in the relevant file inside the <configuration>......</configuration> tags.
conf/core-site.xml
<property>
  <name>hadoop.tmp.dir</name>
  <value>path to temp_directory</value>
  <description>Location for HDFS.</description>
</property>

<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation. </description>
</property>

conf/mapred-site.xml
<property>
  <name>mapred.job.tracker</name>
  <value>localhost:54311</value>
  <description>The host and port that the MapReduce job tracker runs
  at. </description>
</property>

conf/hdfs-site.xml
<property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default number of block replications.
   </description>
</property>

The above value needs to be decided on the priority on speed, space and fault tolerance factors.

Format HDFS

This operation is needed every time we create a new Hadoop cluster. If we do this operation on a running cluster all the data will be lost.What this basically do is creating the Hadoop Distributed File System over the local file system of the cluster.
hpuser@pushpalanka-laptop:~$ <HADOOP_HOME>/bin/hadoop namenode -format
12/09/20 14:39:56 INFO
namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = pushpalanka-laptop/127.0.1.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.0.3
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
1335192; compiled by 'hortonfo' on Tue May 8 20:31:25 UTC 2012
************************************************************/
...
12/09/20 14:39:57 INFOnamenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at pushpalanka-laptop/127.0.1.1
************************************************************/

If the output is something as above HDFS is formatted successfully. Now we are almost done and ready to see the action.

Running the Single Node Cluster

hpuser@pushpalanka-laptop:~/hadoop-1.0.3/bin$ ./start-all.sh

and you will see,
hpuser@pushpalanka-laptop:~/hadoop-1.0.3/bin$ ./start-all.sh 
starting namenode, logging to /home/hpuser/hadoop-1.0.3/libexec/../logs/hadoop-hpuser-namenode-pushpalanka-laptop.out
localhost: starting datanode, logging to /home/hpuser/hadoop-1.0.3/libexec/../logs/hadoop-hpuser-datanode-pushpalanka-laptop.out
localhost: starting secondarynamenode, logging to /home/hpuser/hadoop-1.0.3/libexec/../logs/hadoop-hpuser-secondarynamenode-pushpalanka-laptop.out
starting jobtracker, logging to /home/hpuser/hadoop-1.0.3/libexec/../logs/hadoop-hpuser-jobtracker-pushpalanka-laptop.out
localhost: starting tasktracker, logging to /home/hpuser/hadoop-1.0.3/libexec/../logs/hadoop-hpuser-tasktracker-pushpalanka-laptop.out

which starts several nodes and trackers. We can observe these Hadoop processes using jps tool from Java.
hpuser@pushpalanka-laptop:~/hadoop-1.0.3/bin$ jps
5767 NameNode
6619 Jps
6242 JobTracker
6440 TaskTracker
6155 SecondaryNameNode
5958 DataNode

Also we can check the ports Hadoop is configured to listen on,
hpuser@pushpalanka-laptop:~/hadoop-1.0.3/bin$ sudo netstat -plten | grep java
[sudo] password for hpuser: 
tcp        0      0 0.0.0.0:33793           0.0.0.0:*               LISTEN      1001       175415      8164/java       
....      
tcp        0      0 0.0.0.0:60506           0.0.0.0:*               LISTEN      1001       174566      7767/java       
tcp        0      0 0.0.0.0:50075           0.0.0.0:*               LISTEN      1001       176269      7962/java

There are several web interfaces available to observe inside behavior, job completion, memory consumption etc. as follows.

  • http://localhost:50070/– web UI of the NameNode
  • http://localhost:50030/– web UI of the JobTracker
  • http://localhost:50060/– web UI of the TaskTracker
    At the moment this will not show any information as we have not put any job for execution to observe the progress.

We can stop the cluster at any time with the following command.
hpuser@pushpalanka-laptop:~/hadoop-1.0.3/bin$ ./stop-all.sh 
stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode

Running a Map-reduce Job

Let's try to get some work done using Hadoop. We can use the word count example distributed with Hadoop as explained here at Hadoop wiki. In brief what it does is take some text files as inputs, count the number of repetitions of distinct words mapping each line to a mapper and output the reduced result as a text file.
first create a folder and copy some .txt files having several 10000s of words. We are going to count the distinct words' repetitions of those.
hpuser@pushpalanka-laptop:~/tempHadoop$ ls -l
total 1392
-rw-r--r-- 1 hpuser hadoop 1423810 2012-09-21 02:10 pg5000.txt

Then restart the cluster, using start-all.sh file as done before.
Copy the sample input file to HDFS.
hpuser@pushpalanka-laptop:~/hadoop-1.0.3$ bin/hadoop dfs -copyFromLocal /home/hpuser/tempHadoop/ /user/hpuser/testHadoop

Let's check whether it is correctly copied HDFS,
hpuser@pushpalanka-laptop:~/hadoop-1.0.3$ bin/hadoop dfs -ls /user/hpuser/testHadoop
Found 1 items
-rw-r--r--   1 hpuser supergroup    1423810 2012-09-21 02:17 /user/hpuser/testHadoop/pg5000.txt

Now inputs are ready. Let's run the map reduce job. For this we use a jar distributed with Hadoop which is written to do the needful, which can later refer and learn how the things are done.
hpuser@pushpalanka-laptop:~/hadoop-1.0.3$ bin/hadoop jar hadoop*examples*.jar wordcount /user/hpuser/testHadoop /user/hpuser/testHadoop-output
12/09/21 02:24:34 INFO input.FileInputFormat: Total input paths to process : 1
12/09/21 02:24:34 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/09/21 02:24:34 WARN snappy.LoadSnappy: Snappy native library not loaded
12/09/21 02:24:34 INFO mapred.JobClient: Running job: job_201209210216_0003
12/09/21 02:24:35 INFO mapred.JobClient:  map 0% reduce 0%
12/09/21 02:24:51 INFO mapred.JobClient:  map 100% reduce 0%
12/09/21 02:25:06 INFO mapred.JobClient:  map 100% reduce 100%
12/09/21 02:25:11 INFO mapred.JobClient: Job complete: job_201209210216_0003
12/09/21 02:25:11 INFO mapred.JobClient: Counters: 29
12/09/21 02:25:11 INFO mapred.JobClient:   Job Counters 
12/09/21 02:25:11 INFO mapred.JobClient:     Launched reduce tasks=1
12/09/21 02:25:11 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=17930
12/09/21 02:25:11 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
12/09/21 02:25:11 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
12/09/21 02:25:11 INFO mapred.JobClient:     Launched map tasks=1
12/09/21 02:25:11 INFO mapred.JobClient:     Data-local map tasks=1
12/09/21 02:25:11 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=14153
12/09/21 02:25:11 INFO mapred.JobClient:   File Output Format Counters 
12/09/21 02:25:11 INFO mapred.JobClient:     Bytes Written=337639
12/09/21 02:25:11 INFO mapred.JobClient:   FileSystemCounters
12/09/21 02:25:11 INFO mapred.JobClient:     FILE_BYTES_READ=466814
12/09/21 02:25:11 INFO mapred.JobClient:     HDFS_BYTES_READ=1423931
12/09/21 02:25:11 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=976811
12/09/21 02:25:11 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=337639
12/09/21 02:25:11 INFO mapred.JobClient:   File Input Format Counters 
12/09/21 02:25:11 INFO mapred.JobClient:     Bytes Read=1423810
12/09/21 02:25:11 INFO mapred.JobClient:   Map-Reduce Framework
12/09/21 02:25:11 INFO mapred.JobClient:     Map output materialized bytes=466814
12/09/21 02:25:11 INFO mapred.JobClient:     Map input records=32121
12/09/21 02:25:11 INFO mapred.JobClient:     Reduce shuffle bytes=466814
12/09/21 02:25:11 INFO mapred.JobClient:     Spilled Records=65930
12/09/21 02:25:11 INFO mapred.JobClient:     Map output bytes=2387668
12/09/21 02:25:11 INFO mapred.JobClient:     CPU time spent (ms)=9850
12/09/21 02:25:11 INFO mapred.JobClient:     Total committed heap usage (bytes)=167575552
12/09/21 02:25:11 INFO mapred.JobClient:     Combine input records=251352
12/09/21 02:25:11 INFO mapred.JobClient:     SPLIT_RAW_BYTES=121
12/09/21 02:25:11 INFO mapred.JobClient:     Reduce input records=32965
12/09/21 02:25:11 INFO mapred.JobClient:     Reduce input groups=32965
12/09/21 02:25:11 INFO mapred.JobClient:     Combine output records=32965
12/09/21 02:25:11 INFO mapred.JobClient:     Physical memory (bytes) snapshot=237834240
12/09/21 02:25:11 INFO mapred.JobClient:     Reduce output records=32965
12/09/21 02:25:11 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=778846208
12/09/21 02:25:11 INFO mapred.JobClient:     Map output records=251352

While the job is running if we try out the previously mentioned web interfaces we can observe the progress and utilization of the resources in a summarized way. As given in the command the output file that carries the word counts is written to the '/user/hpuser/testHadoop-output'.
hpuser@pushpalanka-laptop:~/hadoop-1.0.3$ bin/hadoop dfs -ls /user/hpuser/testHadoop-output
Found 3 items
-rw-r--r--   1 hpuser supergroup          0 2012-09-21 02:25 /user/hpuser/testHadoop-output/_SUCCESS
drwxr-xr-x   - hpuser supergroup          0 2012-09-21 02:24 /user/hpuser/testHadoop-output/_logs
-rw-r--r--   1 hpuser supergroup     337639 2012-09-21 02:25 /user/hpuser/testHadoop-output/part-r-00000

To see what is inside the file, let's get it copied to the local file system.
hpuser@pushpalanka-laptop:~/hadoop-1.0.3$ bin/hadoop dfs -getmerge /user/hpuser/testHadoop-output /home/hpuser/tempHadoop/out 
12/09/21 02:38:10 INFO util.NativeCodeLoader: Loaded the native-hadoop library

Here getmerge option is used to merge if there are several files, then the HDFS location of output folder and the desired place for output in local file system are given. Now you can browse to the given output folder and open of the result file, which will something as follows, according to your input file.
"1      1
"1,"    1
"35"    1
"58,"   1
"AS".   1
"Apple  1
"Abs    1
"Ah!    1

Now we have completed setting up a single node cluster with Apache Hadoop and running a map-reduce job. In a next post I ll share how to set up a multi node cluster.
Cheers!

Original Source

Hadoop Single Node Set-up - Guchex
By Pushpalanka Jayawardhana

55 comments :

  1. This comment has been removed by a blog administrator.

    ReplyDelete
  2. This comment has been removed by a blog administrator.

    ReplyDelete
  3. I have recently heard of hadoop. Wanted to pursue career in this booming new field.. Ur blog is very educative. Pls continue with good work..

    ReplyDelete
  4. Awesome blog Pushpalanka, keep it up! One minor fix, when running the hadoop example I think it should be "./bin/hadoop ..." (notice the {dot}{slash}), but that's a minor fix. Great job once again.

    ReplyDelete
    Replies
    1. Thanks Shariq... :) Requirement of . depends on the folder we already in, as far as I know. It

      Delete
  5. I have installed hadoop previously. your blog is simple and understandable. In between I have few exceptions while running the map reduce job.
    12/09/21 02:24:34 WARN snappy.LoadSnappy: Snappy native library not loaded
    How to solve this warning. And what is the reason for this warning.
    thanks in advance

    ReplyDelete
    Replies
    1. Thanks and sorry I couldn't help you earlier. I haven't come around this error. But seems this may help for this kind of error, http://stackoverflow.com/questions/10878038/warn-snappy-loadsnappy-snappy-native-library-not-loaded

      Delete
  6. This comment has been removed by a blog administrator.

    ReplyDelete
  7. Excellent blog......very informative and helpful! Thanks a lot.

    ReplyDelete
  8. it's a nice blog, very helpful for us and thank's for sharing. we are providing Hadoop online training

    ReplyDelete
  9. This comment has been removed by a blog administrator.

    ReplyDelete
  10. Hi ,

    Can you explain me this syntax ,
    Why are we using jar soon after bin/hadoop
    What does hadoop*examples*.jar means..?
    Do wordcount is name of the job, or we asking hadoop to count no of words..?

    bin/hadoop jar hadoop*examples*.jar wordcount /user/hpuser/testHadoop /user/hpuser/testHadoop-output

    ReplyDelete
    Replies
    1. Hi,

      This is the syntax,
      bin/hadoop jar hadoop-*-examples.jar wordcount [-m <#maps>] [-r <#reducers>] as provided in the Hadoop wiki(http://wiki.apache.org/hadoop/WordCount). Here the maps and reducers are not defined. We are executing the wordcount sample which is packed into a jar, with the provided options(Those parameters must have been read within the program and dealt internally). 'wordcount' is the class name of the implementation.

      Following link will help in understanding this code and command,
      [1] - http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Source+Code

      Delete
  11. This comment has been removed by a blog administrator.

    ReplyDelete
  12. This comment has been removed by a blog administrator.

    ReplyDelete
  13. Hi Pushpa ,

    Need your help i am trying to setup a single node hadoop machine in fedora
    i had done installing the Java and hadoop but when i am trying to set the path in /.bashrc

    i am getting below error

    /usr/bin/which: no hadoop in (/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/home/trivenigk/.local/bin:/home/trivenigk/bin)

    my settings in the /.bashrc are like this

    export Java_Home=/opt/jdk1.7
    export Hadoop_Home=/home/trivenigk/hadoop
    export PATH=$PATH:$Java_Home/bin:$Hadoop_Home/bin

    ReplyDelete
    Replies
    1. Hi Later,

      Can you double check whether the hadoop is the extracted folder of your downloaded pack. Normally this folder name ends up with version number if you do not rename it.

      Delete
  14. Hello Pushpa,
    I have configured the Single node hadoop setup.
    it is showing me following status when I check for the namenode
    Cluster Summary
    7 files and directories, 0 blocks = 7 total. Heap Size is 31.57 MB / 966.69 MB (3%)
    Configured Capacity : 0 KB
    DFS Used : 0 KB
    Non DFS Used : 0 KB
    DFS Remaining : 0 KB
    DFS Used% : 100 %
    DFS Remaining% : 0 %
    Live Nodes : 0
    Dead Nodes : 0
    Decommissioning Nodes : 0
    Number of Under-Replicated Blocks : 0

    There are no datanodes in the cluster

    and for the port 50030 it shows nodes count to 0
    so can u please help me out with this?

    ReplyDelete
    Replies
    1. Hi Vyankatesh,

      What is the jps output you get, while running Hadoop?

      Delete
  15. hi puspha

    i am getting a following error when i run the wordcount problem can you tell me the solution

    13/09/17 07:13:59 WARN snappy.LoadSnappy: Snappy native library not loaded

    ReplyDelete
  16. This comment has been removed by a blog administrator.

    ReplyDelete
  17. Hi Pushpa,
    The problem was solved by editing the hosts file.
    Thanks for your reply!

    ReplyDelete
  18. hi I have a small project for the implementation of an architecture a website similar to youtube
    where I have same worries. If you can help me understanding the architecture of this project i will
    be glad so if that is ok with you I will send to you the project file. and thank you

    ReplyDelete
  19. This comment has been removed by a blog administrator.

    ReplyDelete
  20. This comment has been removed by a blog administrator.

    ReplyDelete
  21. This comment has been removed by a blog administrator.

    ReplyDelete
  22. Pushp your blog are appreciable. I cant find issue when install hadoop virtual machine 32 bit in my laptop.So please help me and I want to more information regards to <a href="http:intellipaat.com/:>hadoop tutorial</a> and single and multi node cluster

    ReplyDelete
  23. This comment has been removed by a blog administrator.

    ReplyDelete
  24. This comment has been removed by a blog administrator.

    ReplyDelete
  25. This comment has been removed by a blog administrator.

    ReplyDelete
  26. Hi Madam,

    The article was too good and simple.I am trying the same example, When i am starting the node i am getting the below eeror. please help me in below error.

    hduser@pyro-HP-ProBook-4440s:~$ sh /usr/local/hadoop/sbin/start-all.sh
    This script is Deprecated. Instead use start-dfs.sh and start-mapred.sh
    /usr/local/hadoop/sbin/start-all.sh: 28: .: 3: Too many open files

    Regards,
    Rama

    ReplyDelete
  27. This comment has been removed by a blog administrator.

    ReplyDelete
  28. This comment has been removed by a blog administrator.

    ReplyDelete
  29. Hi, thanks for this tutorial for setting up single node cluster.
    I have successfully set up a single noded cluster. jps shows all the process are running. But however the namenode administration shows zero live nodes. and to my surprise i have also made directories in HDFS and it can be accessed via this administration.

    Please help as to why it is still showing live nodes 0
    Thanks in advance.

    ReplyDelete
  30. Hi Pushpalanka,

    I will be very much thankful to you, if you can provide answers to my questions.

    1) After untar the hadoop 2.2.0, it shows many folders inside

    there are many bin folders inside many subfolders of hadoop main folder, please specify the exact path

    In my setup I am using the following path:-

    export PATH=$PATH:$HADOOP_INSTALL/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/bin
    export PATH=$PATH:$HADOOP_INSTALL/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/sbin

    Is this ok?

    Also you have given this path ~/hadoop/etc/hadoop/core-site.xml for making changes, but in actual I am using the following path :-
    /home/adminuser/hdfs/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop/core-site.xml

    Is this ok?

    Likewise I am modifying the configuration settings for other xml files:-
    /home/adminuser/hdfs/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop/hadoop-env.sh
    /home/adminuser/hdfs/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop/yarn-site.xml
    /home/adminuser/hdfs/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop/mapred-site.xml
    /home/adminuser/hdfs/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop/hdfs-site.xml

    likewise in .bashrc file I have set the environment varibales like this:-

    export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386
    export HADOOP_INSTALL=/home/adminuser/hdfs
    export PATH=$PATH:$HADOOP_INSTALL/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/bin
    export PATH=$PATH:$HADOOP_INSTALL/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/sbin
    export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
    export HADOOP_COMMON_HOME=$HADOOP_INSTALL
    export HADOOP_HDFS_HOME=$HADOOP_INSTALL
    export YARN_HOME=$HADOOP_INSTALL
    #export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
    #export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/hadoop-tools/hadoop-di$"
    #HADOOP VARIABLES END

    after doing all this when I execute hadoop version cmd then I get the following error message:-

    Error: Could not find or load main class org.apache.hadoop.util.VersionInfo

    when I execute hadoop namenode -format cmd then I get the following error message:-

    Error: Could not find or load main class org.apache.hadoop.hdfs.server.namenode.NameNode

    when I type java -version then the output is:-

    java version "1.7.0_51"
    OpenJDK Runtime Environment (IcedTea 2.4.4) (7u51-2.4.4-0ubuntu0.13.10.1)
    OpenJDK Client VM (build 24.45-b08, mixed mode, sharing)

    Please help me in properly configuring hadoop, I am new to Ubuntu 13.10

    Regards,
    Varun

    ReplyDelete
    Replies
    1. Could you able to resolve it, I am getting same error

      Delete
  31. how to
    Now we add the following property segments in the relevant file inside the ...... tags.
    conf/core-site.xml


    hadoop.tmp.dir
    path to temp_directory
    Location for HDFS.



    fs.default.name
    hdfs://localhost:54310
    The name of the default file system. A URI whose
    scheme and authority determine the FileSystem implementation.

    ReplyDelete
  32. This comment has been removed by a blog administrator.

    ReplyDelete
  33. This comment has been removed by a blog administrator.

    ReplyDelete
  34. This comment has been removed by a blog administrator.

    ReplyDelete
  35. This comment has been removed by a blog administrator.

    ReplyDelete
  36. This comment has been removed by a blog administrator.

    ReplyDelete
  37. This comment has been removed by a blog administrator.

    ReplyDelete
  38. This comment has been removed by a blog administrator.

    ReplyDelete
  39. This comment has been removed by a blog administrator.

    ReplyDelete
  40. This comment has been removed by a blog administrator.

    ReplyDelete
  41. This comment has been removed by a blog administrator.

    ReplyDelete
  42. This comment has been removed by a blog administrator.

    ReplyDelete
  43. This comment has been removed by a blog administrator.

    ReplyDelete