# hadoop环境搭建 **Repository Path**: whitejavadog/hadoophuanjingdajian ## Basic Information - **Project Name**: hadoop环境搭建 - **Description**: hadoop 环境搭建 1. 单击模式 2. 伪分布式 3. 完全分布式 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2017-05-06 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README #hadoop环境搭建 ### Vmare 搭多台虚拟机做集群、负载均衡、主从数据库 需求:CentOS 6.5 64 位 VMware 虚拟机 hadoop-2.x tar.gz 包 - 安装CentOS 虚拟环境 - 桌面版网络配置 1. 采用 NAT 模式 2. 打开Vmware 网路配置 点击编辑,选中虚拟网络编辑器: ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/121358_efea7e67_642486.png "在这里输入图片标题") 出现如下窗口: ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/121430_c43d4e74_642486.png "在这里输入图片标题") 点击更改设置,获得管理员权限: ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/121503_525c2e4d_642486.png "在这里输入图片标题") (VMware VMnet8默认采用NAT模式,VMnet1采用host-only模式)点击VMnet8: ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/121529_1c68914e_642486.png "在这里输入图片标题") 点击NAT设置,可以看到网关,记下网关 : ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/121544_f24e116b_642486.png "在这里输入图片标题") 返回虚拟网络编辑器界面,去掉 “使用本地DHCP服务…” 选项。 打开桌面版CentOS 6.x系统,选中 Network Connections: ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/121606_1a1782a0_642486.png "在这里输入图片标题") 双击 System eth0:![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/121617_fdfc6498_642486.png "在这里输入图片标题") 出现如下窗口,对端口eth0 进行编辑: ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/121647_8408e4a1_642486.png "在这里输入图片标题") 选中IPv4 Setting, Method:manual Address:192.168.17.10 (分配的静态ip地址) Netmask:255.255.255.0 (子网掩码) Gateway:192.168.17.2 (网关,也就是你在Vmware中看到的网关) ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/121700_b25edbef_642486.png "在这里输入图片标题") - 配置命令行版本的网络设置 先ifconfig查下所有网卡 # ifconfig ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/121722_8f0f9438_642486.png "在这里输入图片标题") 一般第一个网卡应该是eth0。 现在去配置它 # vi /etc/sysconfig/network-scripts/ifcfg-eth0 配置好的文件如下: ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/121735_cf154d9d_642486.png "在这里输入图片标题") IPADDR 是IP,GATEWAY是网关,NETMASK是掩码。网关同样是外部实体机的那个192.168.17.2 然后要把onboot=yes写上,这行代表开机就让网络生效。 bootproto=static也是必须的,代表手动配置网络,写dhcp的话,代表自动获取IP地址。先用vi命令按图上的修改。修改完了之后保存退出。 然后再修改DNS服务器 # vi /etc/resolv.conf 加上 nameserver 192.168.129.2 ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/121750_5986a531_642486.png "在这里输入图片标题") 用service network restart重启网络。 然后ping www.baidu.com 发现可以PING通的话,OK,配置成功。 注意: 在用xshell连接 虚拟机CentOS的时候发现无法ping通,后来发现主机的VMnet8的 ip 地址选择的是 dhcp 自动获取,修改为使用以下ip地址,把ip改为VMware 网关下的ip , 发现可以ping通。 ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/121810_912d962e_642486.png "在这里输入图片标题") - 克隆主机 PS后记:克隆出来的机器无法联网。 因为克隆之后VMWare自动把网卡变成了eth1,而我们原来的配置是eth0。 解决: ifconfig /all查询网卡,如果是eth1,记录下MAC地址。是HWaddr=***就是MAC ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/121842_04d2b73a_642486.png "在这里输入图片标题") # cd /etc/sysconfig/network-scripts 把原来的eth0文件改为eth1 # mv ifcfg-eth0 ifcfg-eth1 然后修改它的device和ipaddr # vi ifcfg-eth1 结果图如下 ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/121854_540f0059_642486.png "在这里输入图片标题") 修改第一行的名字eth0 改为 eth1 修改IPADDR 修改第三行的MAC地址为刚刚记录下来的MAC地址。 保存。然后service network restart 就生效了 - SSH无密码登录 ``` 主机A(hadoop):192.168.17.136 主机B(hadoop):192.168.17.135 ``` 主机A无密码登录主机A和主机B 先保持所有主机的防火墙状态是关闭状态。 编辑本机的sshd配置文件 找到以下内容,并去掉注释符”#“ ``` RSAAuthentication yes PubkeyAuthentication yes AuthorizedKeysFile .ssh/authorized_keys ``` ``` # vi /etc/ssh/sshd_config ``` ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/121951_04362678_642486.png "在这里输入图片标题") 生成本机的 rsa 密钥(每台机器均采用这条命令生成各自的rsa密钥) ``` # ssh-keygen -t rsa ``` ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/122006_39bf29ac_642486.png "在这里输入图片标题") 这个命令将为hadoop上的用户hadoop生成其密钥对,询问其保存路径时直接回车采用默认路径,当提示要为生成的密钥输入passphrase的时候,直接回车,也就是将其设定为空密码。生成的密钥对id_rsa,id_rsa.pub,默认存储在/home/hadoop/.ssh目录下然后将id_rsa.pub的内容复制到每个机器(也包括本机)的/home/hadoop/.ssh/authorized_keys文件中,如果机器上已经有authorized_keys这个文件了,就在文件末尾加上id_rsa.pub中的内容,如果没有authorized_keys这个文件,直接复制过去就行 打开 .ssh文件 发现有三个文件: ``` # cd .ssh # ls ``` ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/122028_cf59bb3a_642486.png "在这里输入图片标题") 私钥文件:id_rsa 公钥文件:id_rsa.pub 将各台机器中的公钥文件id_rsa.pub中的内容互相放到 authorized_keys中(一定每个机器产生的rsa都有): ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/122107_4e87f373_642486.png "在这里输入图片标题") 更改权限: ``` # chmod 700 ~/.ssh # chmod 600 ~/.ssh/authorized_keys ``` 测试: ``` # ssh 192.168.17.136 ``` ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/122246_d3cb59cf_642486.png "在这里输入图片标题") ``` # ssh 192.168.17.135 ``` ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/122307_1aa88dde_642486.png "在这里输入图片标题") - Hadoop 安装以及配置 CentOS 6.4 64位 Jdk1.7 64位 Hadoop的三种运行模式: 单机模式:无需任何的守护进程,所有程序在单JVM上执行 伪分布式模式:守护进程运行在本地机器上 完全分布模式:守护进程运行在集群上 集群:一个master 两个slave 主机名分别是 node1 node2 node3机器名 Ip 作用 Node1 192.168.17.30 NameNode,JobTraker Node2 192.168.17.31 DataNode,TaskTraker Node3 192.168.17.32 DataNode,TaskTraker 安装: 1. 修改主机名并设置hosts 设置主机名:Hostname node1(三台主机上分别设置) 修改hosts: # vim /etc/hosts 写下如下内容(三台主机都要设置) ``` 192.168.17.30 node1 192.168.17.31 node2 192.168.17.32 node3 ``` 然后保证相互可以ping通 2. 在每台机器上增加 Hadoop用户 # useradd Hadoop 建立hadoop用户 # passwd Hadoop 修改密码 3. ssh 无密码登录 4. 安装hadoop 将hadoop安装包放到hadoop用户下,解压 ``` # cd /home/Hadoop ``` ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/122623_6114255b_642486.png "在这里输入图片标题") # mkdir Hadoop # 新建一个hadoop文件夹 防止hadoop tar包 ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/122641_e6c2b875_642486.png "在这里输入图片标题") ``` # tar zxvf Hadoop-2.5.1_x64.tar.gz # mv Hadoop-2.5.1_x64 hadoop ``` 修改环境变量 ``` # vim /etc/profile ``` ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/122709_3973e1ca_642486.png "在这里输入图片标题") 使文件生效 ``` # source /etc/profile ``` 5. 配置hadoop ``` # cd Hadoop ``` ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/122744_4b5e7bcc_642486.png "在这里输入图片标题") ``` # cd etc ``` ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/122757_73ad3d08_642486.png "在这里输入图片标题") ``` # cd Hadoop ``` ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/122809_d3c1e0de_642486.png "在这里输入图片标题") 1) # vim core-site.xml ``` hadoop.tmp.dir /home/hadoop/Hadoop/Hadoop/tmp Abase for other temporary directories. fs.defaultFS hdfs://node1:9000 io.file.buffer.size 4096 ``` 2) # vim hdfs-site.xml ``` dfs.nameservices hadoop-cluster1 dfs.namenode.secondary.http-address node1:50090 dfs.namenode.name.dir file:/home/hadoop/dfs/name dfs.datanode.data.dir file:/home/hadoop/dfs/data dfs.replication 2 dfs.webhdfs.enabled true ``` 3) # vim mapred-site.xml ``` mapreduce.framework.name yarn mapreduce.jobtracker.http.address nameNode:50030 mapreduce.jobhistory.address nameNode:10020 mapreduce.jobhistory.webapp.address nameNode:19888 ``` 4) # vim yarn-site.xml ``` yarn.nodemanager.aux-services mapreduce_shuffle yarn.resourcemanager.address nameNode:8032 yarn.resourcemanager.scheduler.address nameNode:8030 yarn.resourcemanager.resource-tracker.address nameNode:8031 yarn.resourcemanager.admin.address nameNode:8033 yarn.resourcemanager.webapp.address nameNode:8088 ``` 5) 在slaves中添加 slave ``` # vim slaves ``` ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/123148_bc7c9c86_642486.png "在这里输入图片标题") 6) 修改JAVA_HOME 分别在文件hadoop-env.sh 和 yarn-env.sh 中添加JAVA_HOME配置 ``` # vim Hadoop-env.sh # vim yarn-env.sh ``` ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/131554_a2bb2d1e_642486.png "在这里输入图片标题") ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/131601_534b0df1_642486.png "在这里输入图片标题") 格式化文件系统 ``` # ./hdfs namenode -format ``` 输出: ``` 17/02/19 03:37:03 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = node1/192.168.17.30 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 2.5.1 STARTUP_MSG: classpath = /home/hadoop/hadoop/hadoop/etc/hadoop:/ ......... STARTUP_MSG: build = Unknown -r Unknown; compiled by 'root' on 2014-10-20T05:53Z STARTUP_MSG: java = 1.7.0_79 ************************************************************/ 17/02/19 03:37:03 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP INT] 17/02/19 03:37:03 INFO namenode.NameNode: createNameNode [-format] Formatting using clusterid: CID-4f84cd26-dfdf-48ad-b7c8-5a18bf10e906 17/02/19 03:37:05 INFO namenode.FSNamesystem: fsLock is fair:true 17/02/19 03:37:05 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000 17/02/19 03:37:05 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registratin.ip-hostname-check=true 17/02/19 03:37:05 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deetion.sec is set to 000:00:00:00.000 17/02/19 03:37:05 INFO blockmanagement.BlockManager: The block deletion will start aroun 2017 Feb 19 03:37:05 17/02/19 03:37:05 INFO util.GSet: Computing capacity for map BlocksMap 17/02/19 03:37:05 INFO util.GSet: VM type = 64-bit 17/02/19 03:37:05 INFO util.GSet: 2.0% max memory 966.7 MB = 19.3 MB 17/02/19 03:37:05 INFO util.GSet: capacity = 2^21 = 2097152 entries 17/02/19 03:37:05 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false 17/02/19 03:37:05 INFO blockmanagement.BlockManager: defaultReplication = 2 17/02/19 03:37:05 INFO blockmanagement.BlockManager: maxReplication = 512 17/02/19 03:37:05 INFO blockmanagement.BlockManager: minReplication = 1 17/02/19 03:37:05 INFO blockmanagement.BlockManager: maxReplicationStreams = 2 17/02/19 03:37:05 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks = false 17/02/19 03:37:05 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000 17/02/19 03:37:05 INFO blockmanagement.BlockManager: encryptDataTransfer = false 17/02/19 03:37:05 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000 17/02/19 03:37:05 INFO namenode.FSNamesystem: fsOwner = hadoop (auth:SIMPLE) 17/02/19 03:37:05 INFO namenode.FSNamesystem: supergroup = supergroup 17/02/19 03:37:05 INFO namenode.FSNamesystem: isPermissionEnabled = true 17/02/19 03:37:05 INFO namenode.FSNamesystem: Determined nameservice ID: hdoop-cluster1 17/02/19 03:37:05 INFO namenode.FSNamesystem: HA Enabled: false 17/02/19 03:37:05 INFO namenode.FSNamesystem: Append Enabled: true 17/02/19 03:37:06 INFO util.GSet: Computing capacity for map INodeMap 17/02/19 03:37:06 INFO util.GSet: VM type = 64-bit 17/02/19 03:37:06 INFO util.GSet: 1.0% max memory 966.7 MB = 9.7 MB 17/02/19 03:37:06 INFO util.GSet: capacity = 2^20 = 1048576 entries 17/02/19 03:37:06 INFO namenode.NameNode: Caching file names occuring more than 10 times 17/02/19 03:37:06 INFO util.GSet: Computing capacity for map cachedBlocks 17/02/19 03:37:06 INFO util.GSet: VM type = 64-bit 17/02/19 03:37:06 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB 17/02/19 03:37:06 INFO util.GSet: capacity = 2^18 = 262144 entries 17/02/19 03:37:06 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.990000128746033 17/02/19 03:37:06 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0 17/02/19 03:37:06 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 3000 17/02/19 03:37:06 INFO namenode.FSNamesystem: Retry cache on namenode is enabled 17/02/19 03:37:06 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap an retry cache entry expiry time is 600000 millis 17/02/19 03:37:06 INFO util.GSet: Computing capacity for map NameNodeRetryCache 17/02/19 03:37:06 INFO util.GSet: VM type = 64-bit 17/02/19 03:37:06 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB 17/02/19 03:37:06 INFO util.GSet: capacity = 2^15 = 32768 entries 17/02/19 03:37:06 INFO namenode.NNConf: ACLs enabled? false 17/02/19 03:37:06 INFO namenode.NNConf: XAttrs enabled? true 17/02/19 03:37:06 INFO namenode.NNConf: Maximum size of an xattr: 16384 17/02/19 03:37:06 INFO namenode.FSImage: Allocated new BlockPoolId: BP-649073677-192.16817.30-1487504226438 17/02/19 03:37:06 INFO common.Storage: Storage directory /home/hadoop/hadoop/hadoop/dfs/ame has been successfully formatted. 17/02/19 03:37:07 INFO namenode.NNStorageRetentionManager: Going to retain 1 images withtxid >= 0 17/02/19 03:37:07 INFO util.ExitUtil: Exiting with status 0 17/02/19 03:37:07 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at node1/192.168.17.30 ************************************************************/ ``` 启动 ``` [root@nameNode sbin]# ./start-dfs.sh [root@nameNode sbin]# ./start-yarn.sh ``` ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/131626_cc5168cd_642486.png "在这里输入图片标题") ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/131633_9e705cc9_642486.png "在这里输入图片标题") 停止 ``` [root@nameNode sbin]# ./stop-dfs.sh [root@nameNode sbin]# ./stop-yarn.sh ``` 查看进程 ``` # jps ``` ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/131700_d773c051_642486.png "在这里输入图片标题") 通过浏览器访问 ``` http://192.168.17.30.50070/ http://192.168.17.30.8088 ``` ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/131707_e9edfee0_642486.png "在这里输入图片标题") ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/131713_f7918005_642486.png "在这里输入图片标题") 将hadoop 传到其他节点 ``` # scp –r Hadoop hadoop@node2:/home/hadoop/Hadoop #scp –r Hadoop Hadoop@node3:/home/hadoop/hadoop ``` 查看hadoop版本信息 进入hadoop安装目录 # cd hadoop/lib/native # file libhadoop.so.1.0.0 ![输入图片说明](https://git.oschina.net/uploads/images/2017/0506/131722_fab48a0d_642486.png "在这里输入图片标题") 不能创建文件: chown –R hadoop:hadoop /usr/hadoop/tmp sudo chmod -R a+w /usr/local/Hadoop ### 在启动时,发现无法启动 secondarynamenode 单进程执行 ``` hadoop-daemon.sh start namenode hadoop-daemon.sh start datanode hadoop-daemon.sh start secondarynamenode yarn-daemon.sh start resourcemanager yarn-daemon.sh start nodemanager mr-jobhistory-daemon.sh start historyserver ```