To install Hadoop on Ubuntu Following are the steps : —
1. Install Java(version 8) on your machine and find out the path.
For example : /usr/lib/jvm/java-8-openjdk-amd64/
2. Download hadoop-3.1.3.tar.gz from http://mirrors.estointernet.in/apache/hadoop/common/hadoop-3.1.3
3. Installing pdsh on ubuntu (it is a tool that enables you to issue the same command on multiple hosts at once)
sudo apt-get install pdsh
4. Create a separate user for Hadoop ecosystem
$adduser hadoop
5. Modify sudoers file to add hadoop user
$su root
$visudo#add below line in the file
hadoop ALL=(ALL:ALL) ALL
6. Generate keys for passwordless ssh connection
$su hadoop
$ssh-keygen -t rsa
$cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$chmod 0600 ~/.ssh/authorized_keys
7. Check ssh connection
$ssh localhost
$exit
if it shows error :- ssh: connect to host localhost port 22: Connection refused.Please follow https://askubuntu.com/questions/856771/how-to-start-listening-on-port-22
8. Working with Hadoop files
→Uncompress your downloaded hadoop file
→ Move downloaded hadoop folder to /home/hadoop (Home dir)
9. Now at this step we setup Environment Variables for Hadoop. Edit ~/.bashrc file and append following lines
→ This file is executed whenever you open a new terminal.So after adding below lines reopen your terminal or use source command.
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export PDSH_RCMD_TYPE=ssh
10. Modify hadoop-env.sh
$cd $HADOOP_HOME/etc/hadoop
$vi hadoop-env.sh--Add following lines in this fileexport JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
export HDFS_NAMENODE_USER="hadoop"
export HDFS_DATANODE_USER="hadoop"
export HDFS_SECONDARYNAMENODE_USER="hadoop"
export YARN_RESOURCEMANAGER_USER="hadoop"
export YARN_NODEMANAGER_USER="hadoop"
11. Modify core-site.xml
$cd $HADOOP_HOME/etc/hadoop
$vi core-site.xml
hdfs-site.xml--Add following lines insie the configuration tag<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
12. Modify hdfs-site.xml
It contains namenode path and datanode paths of your local file systems.where you store data blocks on datanode.
$cd $HADOOP_HOME/etc/hadoop
$vi hdfs-site.xml--Add following lines in between the <configuration></configuration><property>
<name>dfs.replication</name>
<value>1</value>
</property><property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/namenode</value>
</property><property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/datanode</value>
</property>
13. Modify mapred-site.xml
$cd $HADOOP_HOME/etc/hadoop
$vi mapred-site.xml--Add following lines in between the <configuration></configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
14. Modify yarn-site.xml
$cd $HADOOP_HOME/etc/hadoop
$vi yarn-site.xml--Add following lines in between the <configuration></configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
15. Now , its time to start all Hadoop services
$cd $HADOOP_HOME/sbin/
$sudo ./start-all.sh
or
$sudo ./start-dfs.sh
$sudo ./start-yarn.sh--list all running processes
$jps
16. Check out Web UI of NameNode
17. Check the web-ui of all components as described below:
18. Stop all Hadoop Services
$cd $HADOOP_HOME/hadoop/sbin
$sudo ./stop-all.sh