Hadoop做文件服务器POC<3>Nginx文件缓存

安装Nginx

  1. 下载Nginx

     $ cd /app/nginx
     $ Wget http://nginx.org/download/nginx-1.7.11.tar.gz
    
  2. 编译Nginx

     $ cd /app/nginx
     $ tar zvxf nginx-1.7.11.tar.gz
     $ cd nginx-1.7.11
     $ .configure
    
  3. 安装Nginx

     $ make & make install
    

配置反向代理

  1. 修改配置文件

         $ cd /usr/local/nginx
         $ vi ./conf/nginx.conf
    
  2. 添加以下代码到server>location / 中

         add_header 'Access-Control-Allow-Origin' '*';
         add_header 'Access-Control-Allow-Credentials' 'true';
         add_header 'Access-Control-Allow-Methods' 'GET';
         proxy_pass http://192.168.55.64:8002/HDFSWeb/;
    
  3. 保存配置

配置文件缓存

  1. 修改配置文件

         $ cd /usr/local/nginx
         $ vi ./conf/nginx.conf
    
  2. 添加以下代码到server 中

       ##cache##
       proxy_connect_timeout 5;
       proxy_read_timeout 60;
       proxy_send_timeout 5;
       proxy_buffer_size 16k;
       proxy_buffers 4 64k;
       proxy_busy_buffers_size 128k;
       proxy_temp_file_write_size 128k;
       proxy_temp_path /home/temp_dir;
       proxy_cache_path /home/cache levels=1:2 keys_zone=cache_one:200m inactive=1d max_size=30g;
       ##end##
    
  3. 添加以下代码到server 中

       proxy_redirect off;
       proxy_set_header Host $host;
       proxy_cache cache_one;
       proxy_cache_valid 200 302 1h;
       proxy_cache_valid 301 1d;
       proxy_cache_valid any 1m;
       expires 30d;
    
  4. 保存配置

启动Nginx

     $ /usr/local/nginx/sbin/nginx

测试

  1. 点击文件下载
    使用的是反向代理地址

  2. 下载完成查看缓存目录

     $ cd /home/cache
    

    查看缓存文件

     -rw-------. 1 nobody nobody 8212493 Mar 31 17:27 91aa96dab8d300dc81295f78552a4a0a
    

vi查看文件内容

POC过程中的疑问

  1. Hadoop是否适合做分布式文件服务器?
    作为分布式文件系统,hadoop默认的文件存储块大小为64M,而对于作为文件服务器存放小文件,会极大浪费系统资源。如果调整默认存储块大小,那么对于其它基于hadoop服务的应用会不会受到影响。
  2. 作为分布式文件服务器性能测试。
    性能测试尚未完成。

Hadoop做文件服务器POC<2>服务开发

Eclipse配置

  1. 新建Maven项目
    选择webapp项目
    配置项目

  2. 配置pom.xml

     <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
       <modelVersion>4.0.0</modelVersion>
       <groupId>cn.com.git</groupId>
       <artifactId>HDFSWeb</artifactId>
       <packaging>war</packaging>
       <version>0.0.1-SNAPSHOT</version>
       <name>HDFSWeb Maven Webapp</name>
       <url>http://maven.apache.org</url>
       <dependencies>
         <dependency>
             <groupId>org.apache.hadoop</groupId>
             <artifactId>hadoop-common</artifactId>
             <version>2.6.0</version>
         </dependency>
          <dependency>
             <groupId>org.apache.hadoop</groupId>
             <artifactId>hadoop-hdfs</artifactId>
             <version>2.6.0</version>
         </dependency>
       </dependencies>
       <build>
         <finalName>HDFSWeb</finalName>
       </build>
     </project>
    

代码开发

  1. 编写HDFSUtil
    新建HDFSUtil

    上传文件

         /**
          * 用输入流上传文件到HDFS
          * 
          * @param is
          * @param remoteName
          * @return
          * @throws Exception
          */
         public int upload(InputStream is, String remoteName) throws Exception {
             FileSystem fs = null;
             try {
                 fs = FileSystem.get(URI.create(HDFS), conf);
                 OutputStream outStream = fs.create(new Path(remoteName), new Progressable() {
                     public void progress() {
                         System.out.print('.');
                     }
                 });
                 logger.info("开始上传: " + remoteName);
                 IOUtils.copyBytes(is, outStream, 4096000, true);
                 logger.info("上传结束!");
                 is.close();
                 return 0;
             } catch (IOException e) {
                 is.close();
                 e.printStackTrace();
                 return -1;
             }
         }
    
         /**
          * 上传文件到HDFS
          * 
          * @param localName
          * @param remoteName
          * @return
          * @throws Exception
          */
         public int upload(String localName, String remoteName) throws Exception {
             InputStream is = new BufferedInputStream(new FileInputStream(localName));
             return upload(is, remoteName);
         }
    

    下载文件

         /**
              * HDFS上下载文件并返回文件流
              * 
              * @param hadoopFile
              * @return
              * @throws Exception
              */
             public FSDataInputStream download(String hadoopFile) throws Exception {
                 FSDataInputStream iStream = null;
                 FileSystem fs = null;
                 fs = FileSystem.get(URI.create(HDFS), conf);
                 Path path = new Path(hadoopFile);
                 iStream = fs.open(path);
                 return iStream;
             }
    

    删除文件

             /**
              * 删除HDFS文件
              * 
              * @param hadoopFile
              * @return
              */
             public boolean deleteFile(String hadoopFile) {
                 try {
                     FileSystem fs = FileSystem.get(URI.create(HDFS), conf);
                     Path path = new Path(hadoopFile);
                     fs.delete(path, true);
                     fs.close();
                 } catch (Exception e) {
                     e.printStackTrace();
                     return false;
                 }
                 return true;
             }
    

    显示文件夹文件列表

             /**
              * 显示文件夹中文件列表
              * 
              * @param folder
              * @return 文件列表对象
              * @throws Exception
              */
             public FileStatus[] list(String folder) throws Exception {
                 Path path = new Path(folder);
                 FileSystem fs = FileSystem.get(URI.create(HDFS), conf);
                 return fs.listStatus(path);
             }
    
             /**
              * 显示文件夹中文件列表并打印
              * 
              * @param folder
              * @throws Exception
              */
             public void ls(String folder) throws Exception {
                 Path path = new Path(folder);
                 FileSystem fs = FileSystem.get(URI.create(HDFS), conf);
                 FileStatus[] list = fs.listStatus(path);
                 for (FileStatus f : list) {
                     System.out.printf("name: %s, folder: %s, size: %d\n", f.getPath(), f.isDir(), f.getLen());
                 }
             }    
    
  2. 前端页面
    WebContent目录

    Web页面

Web部署

  1. 应用打包成war

     Eclipse>File>Export>Web>war file>HDFSWeb.war
    
  2. 部署到weblogic 11g

     Weblogic Console>部署>new>upload war>finish>激活更改>启动HDFSWeb    
    
  3. 查看部署结果
    Web页面

测试HDFS上传和下载

测试
测试上传
测试下载

Hadoop做文件服务器POC<1>环境准备

硬件准备

机器名 IP 硬件配置 操作系统
nimbus 192.168.55.173 4cpu 2.93GHz 4G内存 Linux x86_64
supervisor1 192.168.55.174 4cpu 2.93GHz 4G内存 Linux x86_64
supervisor2 192.168.55.175 4cpu 2.93GHz 4G内存 Linux x86_64

软件准备

三台虚拟机,一台作为Hadoopnamenode,另外两台做Hadoopdatanode

基本软件安装

三台机器必备安装软件,jdk、hadoop

Jdk1.7.0_15安装

  1. 下载JDK

     $ wget http://download.oracle.com/otn-pub/java/jdk/8u31-b13/jdk-7u15-linux-x64.tar.gz
    
  2. 解压JDK

     tar zvxf jdk-7u15-linux-x64.tar.gz
    
  3. 设置环境变量

     $ set PATH=/your_jdk_unzip_dir/jdk1.7.0_15/bin:$PATH
     $ set JAVA_HOME=/your_jdk_unzip_dir/jdk1.7.0_15
    
  4. 验证Jdk版本

     $ jdk -version
    

Hadoop2.6.0安装

  1. 下载Hadoop

     $ wget http://apache.fayea.com/hadoop/common/stable/hadoop-2.6.0.tar.gz
    
  2. 切换root用户添加Hadoop用户

     $ groupadd hadoop  
     $ useradd hadoop hadoop  
     $ passwd  hadoop  #为用户添加密码   可以不设置密码  
    
  3. 安装ssh

     $ rpm -qa |grep ssh  #检查是否装了SSH包  
     $ yum install openssh-server  # 安装ssh  
     $ chkconfig --list sshd #检查SSHD是否设置为开机启动  
     $ chkconfig --level 2345 sshd on  #如果没设置启动就设置下.  
     $ service sshd restart  #重新启动  
    
  4. 配置ssh无密码登录

     nimbus$ ssh-keygen -t rsa
     nimbus$ ssh-copy-id hadoop@nimbus
     nimbus$ ssh-copy-id hadoop@supervisor1
     nimbus$ ssh-copy-id hadoop@supervisor2
    

    然后用scp命令,把公钥文件发放给slaver

     nimbus$ scp .ssh/id_rsa.pub hadoop@supervisor1:/home/hadoop/id_rsa_01.pub
     nimbus$ scp .ssh/id_rsa.pub hadoop@supervisor2:/home/hadoop/id_rsa_01.pub
    

    测试无密码登录

     $ ssh nimbus
     $ ssh supervisor1
     $ ssh supervisor2
    
  5. 安装hadoop

     $ tar vxzf hadoop-2.6.0.tar.gz
    

    修改hadoop配置文件
    slaves

     $vi etc/hadoop/slaves
     supervisor1
     supervisor2
    

    core-site.xml

     $vi etc/hadoop/core-site.xml
     <configuration>
            <property>
                     <name>fs.defaultFS</name>
                     <value>hdfs://nimbus:8020</value>
            </property>
            <property>
                     <name>io.file.buffer.size</name>
                     <value>131072</value>
             </property>
            <property>
                    <name>hadoop.tmp.dir</name>
                    <value>file:/app/hadoop/tmp</value>
                    <description>Abase for other temporary   directories.</description>
            </property>
             <property>
                    <name>hadoop.proxyuser.u0.hosts</name>
                    <value>*</value>
            </property>
            <property>
                    <name>hadoop.proxyuser.u0.groups</name>
                    <value>*</value>
            </property>
     </configuration>
    

    hdfs-site.xml

     $vi etc/hadoop/hdfs-site.xml
     <configuration>
           <property>
                     <name>dfs.namenode.secondary.http-address</name>
                    <value>nimbus:9001</value>
            </property>
          <property>
                  <name>dfs.namenode.name.dir</name>
                  <value>file:/app/hadoop/dfs/name</value>
            </property>
           <property>
                   <name>dfs.datanode.data.dir</name>
                   <value>file:/app/hadoop/dfs/data</value>
            </property>
            <property>
                    <name>dfs.replication</name>
                    <value>3</value>
             </property>
             <property>
                      <name>dfs.webhdfs.enabled</name>
                       <value>true</value>
              </property>
     </configuration>
    

    mapred-site.xml

     $vi etc/hadoop/mapred-site.xml
     <configuration>
         <property>
             <name>mapreduce.framework.name</name>
             <value>yarn</value>
         </property>
         <property>
           <name>mapreduce.jobhistory.address</name>
           <value>nimbus:10020</value>
         </property>
         <property>
           <name>mapreduce.jobhistory.webapp.address</name>
           <value>nimbus:19888</value>
         </property>
         <property>
         <name>dfs.permissions</name>
         <value>false</value>
         </property>
     </configuration>
    

    同步hadoop目录到supervisor和supervisor2

     $ scp hadoop/* -R hadoop@supervisor1:/app/hadoop/
     $ scp hadoop/* -R hadoop@supervisor2:/app/hadoop/
    
  6. 启动hadoop

     $ ./sbin/start-all.sh
     This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
     15/03/30 17:50:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
     Starting namenodes on [nimbus]
     nimbus: starting namenode, logging to /app/hadoop/logs/hadoop-hadoop-namenode-nimbus.out
     supervisor2: starting datanode, logging to /app/hadoop/logs/hadoop-hadoop-datanode-supervisor2.out
     supervisor1: starting datanode, logging to /app/hadoop/logs/hadoop-hadoop-datanode-supervisor1.out
     Starting secondary namenodes [nimbus]
     nimbus: starting secondarynamenode, logging to /app/hadoop/logs/hadoop-hadoop-secondarynamenode-nimbus.out
     15/03/30 17:50:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
     starting yarn daemons
     starting resourcemanager, logging to /app/hadoop/logs/yarn-hadoop-resourcemanager-nimbus.out
     supervisor2: starting nodemanager, logging to /app/hadoop/logs/yarn-hadoop-nodemanager-supervisor2.out
     supervisor1: starting nodemanager, logging to /app/hadoop/logs/yarn-hadoop-nodemanager-supervisor1.out
    

    查看进程

     nimbus$ jps
     10775 Jps
     10146 NameNode
     10305 SecondaryNameNode
     10464 ResourceManager
    
     supervisor1$ jps
     4308 Jps
     4094 NodeManager
     3993 DataNode
    
     supervisor2$ jps
     4308 Jps
     4094 NodeManager
     3993 DataNode    
    

    查看管理页面
    http://192.168.55.173:50070/

    http://192.168.55.173:8088/