Install Hadoop on Ubuntu | explained

Parmanand
3 min readOct 7, 2020

To install Hadoop on Ubuntu Following are the steps : —

1. Install Java(version 8) on your machine and find out the path.

For example : /usr/lib/jvm/java-8-openjdk-amd64/

2. Download hadoop-3.1.3.tar.gz from http://mirrors.estointernet.in/apache/hadoop/common/hadoop-3.1.3

3. Installing pdsh on ubuntu (it is a tool that enables you to issue the same command on multiple hosts at once)

sudo apt-get install pdsh

4. Create a separate user for Hadoop ecosystem

$adduser hadoop

5. Modify sudoers file to add hadoop user

$su root
$visudo
#add below line in the file
hadoop ALL=(ALL:ALL) ALL

6. Generate keys for passwordless ssh connection

$su hadoop
$ssh-keygen -t rsa
$cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$chmod 0600 ~/.ssh/authorized_keys

7. Check ssh connection

$ssh localhost
$exit

if it shows error :- ssh: connect to host localhost port 22: Connection refused.Please follow https://askubuntu.com/questions/856771/how-to-start-listening-on-port-22

8. Working with Hadoop files

→Uncompress your downloaded hadoop file

→ Move downloaded hadoop folder to /home/hadoop (Home dir)

9. Now at this step we setup Environment Variables for Hadoop. Edit ~/.bashrc file and append following lines

→ This file is executed whenever you open a new terminal.So after adding below lines reopen your terminal or use source command.

export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export PDSH_RCMD_TYPE=ssh

10. Modify hadoop-env.sh

$cd $HADOOP_HOME/etc/hadoop
$vi hadoop-env.sh
--Add following lines in this fileexport JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
export HDFS_NAMENODE_USER="hadoop"
export HDFS_DATANODE_USER="hadoop"
export HDFS_SECONDARYNAMENODE_USER="hadoop"
export YARN_RESOURCEMANAGER_USER="hadoop"
export YARN_NODEMANAGER_USER="hadoop"

11. Modify core-site.xml

$cd $HADOOP_HOME/etc/hadoop
$vi core-site.xml
hdfs-site.xml
--Add following lines insie the configuration tag<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>

12. Modify hdfs-site.xml

It contains namenode path and datanode paths of your local file systems.where you store data blocks on datanode.

$cd $HADOOP_HOME/etc/hadoop
$vi hdfs-site.xml
--Add following lines in between the <configuration></configuration><property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/datanode</value>
</property>

13. Modify mapred-site.xml

$cd $HADOOP_HOME/etc/hadoop
$vi mapred-site.xml
--Add following lines in between the <configuration></configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

14. Modify yarn-site.xml

$cd $HADOOP_HOME/etc/hadoop
$vi yarn-site.xml
--Add following lines in between the <configuration></configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

15. Now , its time to start all Hadoop services

$cd $HADOOP_HOME/sbin/
$sudo ./start-all.sh
or
$sudo ./start-dfs.sh
$sudo ./start-yarn.sh
--list all running processes
$jps

16. Check out Web UI of NameNode

http://localhost:9870

17. Check the web-ui of all components as described below:

18. Stop all Hadoop Services

$cd $HADOOP_HOME/hadoop/sbin
$sudo ./stop-all.sh

--

--