1. hive (컨테이너)
데이터 웨어하우스 기능을 제공하는 오픈 소스 데이터베이스 관리 시스템입니다.
주로 대용량의 데이터를 다루는 데 사용되며, Hadoop의 일부로 개발
1-1. hive 동작
하이브는 Hadoop의 맵리듀스(MapReduce) 프레임워크를 기반으로 동작합니다.
쿼리가 실행될 때 Hive는 HiveQL을 맵리듀스 작업으로 변환하여 클러스터에서 분산 처리를 수행
1-2. hdfs mkdir( 웨어하우스 디렉토리 생성 )
hdfs dfs -mkdir -p /user/hive/warehouse
2. dockfile[hadoop 도커 파일에서 추가]
# 베이스 이미지
FROM openjdk:8
# bash 및 필요 패키지 설치
RUN apt-get update && apt-get install -y bash wget tar
# 환경 변수 설정
ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 \
HADOOP_HOME=/data/sy0218/hadoop-3.3.5 \
HADOOP_COMMON_HOME=/data/sy0218/hadoop-3.3.5 \
HADOOP_MAPRED_HOME=/data/sy0218/hadoop-3.3.5 \
HADOOP_HDFS_HOME=/data/sy0218/hadoop-3.3.5 \
HADOOP_YARN_HOME=/data/sy0218/hadoop-3.3.5 \
HADOOP_CONF_DIR=/data/sy0218/hadoop-3.3.5/etc/hadoop \
HADOOP_LOG_DIR=/logs/hadoop \
HADOOP_PID_DIR=/var/run/hadoop/hdfs \
HADOOP_COMMON_LIB_NATIVE_DIR=/data/sy0218/hadoop-3.3.5/lib/native \
HADOOP_OPTS="-Djava.library.path=/data/sy0218/hadoop-3.3.5/lib/native" \
HIVE_HOME=/data/sy0218/apache-hive-3.1.3-bin \
HIVE_AUX_JARS_PATH=/data/sy0218/apache-hive-3.1.3-bin/aux
ENV PATH=$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$HIVE_HOME/bin:$HIVE_AUX_JARS_PATH/bin:$PATH
# /usr/lib/jvm 디렉토리 생성 및 JDK 복사
RUN mkdir -p /usr/lib/jvm && cp -r /usr/local/openjdk-8 /usr/lib/jvm/java-8-openjdk-amd64
# 하둡 설치 및 설정 파일 복사
RUN mkdir -p /data/sy0218
RUN mkdir -p /data/download_tar
RUN mkdir -p /hadoop/data1
RUN mkdir -p /hadoop/data2
RUN mkdir -p /hadoop/hdfs
RUN mkdir -p /hadoop/hdfs_work
RUN mkdir -p /hadoop/jn
RUN mkdir -p /hadoop/data
# 하둡 tar파일 download_tar 디렉토리 복사
COPY hadoop-3.3.5.tar.gz /data/download_tar/hadoop-3.3.5.tar.gz
COPY apache-hive-3.1.3-bin.tar.gz /data/download_tar/apache-hive-3.1.3-bin.tar.gz
# 하둡 tar파일 원하는 경로에 풀기
RUN tar xzvf /data/download_tar/hadoop-3.3.5.tar.gz -C /data/sy0218/
# 하둡 설정 파일
COPY core-site.xml /data/sy0218/hadoop-3.3.5/etc/hadoop/
COPY hdfs-site.xml /data/sy0218/hadoop-3.3.5/etc/hadoop/
COPY mapred-site.xml /data/sy0218/hadoop-3.3.5/etc/hadoop/
COPY yarn-site.xml /data/sy0218/hadoop-3.3.5/etc/hadoop/
COPY workers /data/sy0218/hadoop-3.3.5/etc/hadoop/
COPY hadoop-env.sh /data/sy0218/hadoop-3.3.5/etc/hadoop/
COPY hadoop-config.sh /data/sy0218/hadoop-3.3.5/libexec/
# Hive tar파일 원하는 경로에 풀기
RUN tar xzvf /data/download_tar/apache-hive-3.1.3-bin.tar.gz -C /data/sy0218/
# 설정파일 및 JAR파일 원하는경로 COPY
COPY postgresql-42.2.11.jar /data/sy0218/apache-hive-3.1.3-bin/lib/
COPY hive-site.xml /data/sy0218/apache-hive-3.1.3-bin/conf/
# 컨테이너 내에서 실행할 기본 명령어 설정
CMD tail -f /dev/null
2-1. hive-site.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:postgresql://192.168.56.10:5432/sy0218</value>
<description>metadata is stored in a MySQL server</description>
</property>
<property>
<name>hive.metastore.db.type</name>
<value>postgres</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.postgresql.Driver</value>
<description>MySQL JDBC driver class</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>!hive0218</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location warehouse</description>
</property>
<property>
<name>hive.execution.engine</name>
<value>mr</value>
</property>
<property>
<name>hive.server2.transport.mode</name>
<value>binary</value>
</property>
<property>
<name>hive.server2.webui.port</name>
<value>0</value>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>false</value>
</property>
<property>
<name>hive.vectorized.execution.enabled</name>
<value>true</value>
</property>
<property>
<name>mapreduce.input.fileinputformat.split.maxsize</name>
<value>64000000</value>
</property>
<property>
<name>hive.exec.max.dynamic.partitions</name>
<value>3000</value>
</property>
<property>
<name>hive.exec.max.dynamic.partitions.pernode</name>
<value>100</value>
</property>
</configuration>
2-2. 그 외 필요 파일 및 tar
/data/hadoop_docker >> hadoop_hive 도커 디렉토리
apache-hive-3.1.3-bin.tar.gz
core-site.xml
dockerfile
fair-scheduler.xml
hadoop-3.3.5.tar.gz
hadoop-config.sh
hadoop-env.sh
hdfs-site.xml
hive-site.xml
mapred-site.xml
masters
postgresql-42.2.11.jar
workers
yarn-site.xml
2-3. 컨테이너 재기동 스크립트( 필요 변수 : 이미지 명, 이미지 테크, 도커 디렉토리 )
- 재 기동전 /data/sy0218/hadoop-3.3.5/sbin/stop-all.sh 하둡 클러스터 종료 필수!!
/data/work/hadoop_auto_fun.sh
#!/usr/bin/bash
# 인자 개수 확인
if [ "$#" -ne 3 ]; then
echo "Usage: $0 [hadoop_hive_image_name] [image_tag] [docker_image_dir]"
exit 1
fi
check_sc_home="/data/check"
hadoop_hive_image_name=$1
docker_image_tag=$2
docker_image_dir=$3
echo "[`date`] Time_Stamp : hadoop auto run Start...."; echo "";
#####################################################################################################
echo "[`date`] Time_Stamp : hadoop 필요 디렉토리 생성 Start...."
${check_sc_home}/all_command.sh "rm -rf /hadoop"
for dir_name in /hadoop/data1 /hadoop/data2 /hadoop/hdfs /hadoop/hdfs_work /hadoop/jn /hadoop/data
do
${check_sc_home}/all_command.sh "mkdir -p ${dir_name}"
done
echo "[`date`] Time_Stamp : hadoop 필요 디렉토리 생성 End...."; echo "";
#####################################################################################################
#####################################################################################################
echo "[`date`] Time_Stamp : hadoop_logs_files rm Start...."
${check_sc_home}/all_command.sh "rm -rf /logs/hadoop/*"
${check_sc_home}/all_command.sh "rm -rf /var/run/hadoop/hdfs"
echo "[`date`] Time_Stamp : hadoop_logs_files rm End...."; echo "";
#####################################################################################################
#####################################################################################################
echo "[`date`] Time_Stamp : hadoop docker_container_stop Start...."
${check_sc_home}/all_command.sh "docker stop ${hadoop_hive_image_name}"
echo "[`date`] Time_Stamp : hadoop docker_container_stop End...."; echo "";
#####################################################################################################
#####################################################################################################
echo "[`date`] Time_Stamp : hadoop docker_container_rm Start...."
${check_sc_home}/all_command.sh "docker rm ${hadoop_hive_image_name}"
echo "[`date`] Time_Stamp : hadoop docker_container_rm End...."; echo "";
#####################################################################################################
#####################################################################################################
echo "[`date`] Time_Stamp : hadoop docker_container_image_rm Start...."
${check_sc_home}/all_command.sh "docker rmi ${hadoop_hive_image_name}:${docker_image_tag}"
echo "[`date`] Time_Stamp : hadoop docker_container_image_rm End...."; echo "";
#####################################################################################################
#####################################################################################################
echo "[`date`] Time_Stamp : hadoop docker_build Start...."
${check_sc_home}/all_command.sh "docker build -t ${hadoop_hive_image_name}:${docker_image_tag} ${docker_image_dir}"
echo "[`date`] Time_Stamp : hadoop docker_build End...."; echo "";
#####################################################################################################
#####################################################################################################
echo "[`date`] Time_Stamp : hadoop docker_run Start...."
${check_sc_home}/all_command.sh "docker run -d --name ${hadoop_hive_image_name} --network host -v /root/.ssh:/root/.ssh -v /hadoop:/hadoop -v /var/run/hadoop/hdfs:/var/run/hadoop/hdfs ${hadoop_hive_image_name}:${docker_image_tag}"
echo "[`date`] Time_Stamp : hadoop docker_run End...."; echo "";
#####################################################################################################
#####################################################################################################
echo "[`date`] Time_Stamp : hadoop bin_file cp Start...."
${check_sc_home}/all_command.sh "docker cp ${hadoop_hive_image_name}:/data/sy0218/ /data/"
echo "[`date`] Time_Stamp : hadoop bin_file cp End...."; echo "";
#####################################################################################################
echo "[`date`] Time_Stamp : hadoop auto run End...."
2-4. 하둡 재기동
1) hdfs zkfc -formatZK(master1)
2) start-dfs.sh(master1)
3) hdfs namenode -format(master1)
4) stop-dfs.sh(master1)
5) start-all.sh(master1)
7)
/data/sy0218/hadoop-3.3.5/sbin/stop-all.sh
/data/sy0218/hadoop-3.3.5/sbin/start-all.sh
hdfs haadmin -getServiceState namenode1
hdfs haadmin -getServiceState namenode2
---------------------------------------------------------
6) hdfs namenode -bootstrapStandby(master2)
jps : 현재 실행 중인 모든 Java 프로세스의 이름과 PID 확인
/data/check/jps_check.sh
#!/usr/bin/bash
for host_name in kube-control1 kube-node1 kube-data1 kube-data2 kube-data3
do
echo "------------jps on ${host_name}------------"
ssh ${host_name} "jps"
echo "-------------------------------------------"; echo"";
done
********************************출력 결과********************************
------------jps on kube-control1------------
9873 ResourceManager
9093 NameNode
10663 Jps
9534 DFSZKFailoverController
9342 JournalNode
-------------------------------------------
------------jps on kube-node1------------
5980 JournalNode
6605 Jps
5870 NameNode
6110 DFSZKFailoverController
6239 ResourceManager
-------------------------------------------
------------jps on kube-data1------------
5250 JournalNode
5380 NodeManager
5125 DataNode
5658 Jps
-------------------------------------------
------------jps on kube-data2------------
4616 NodeManager
4874 Jps
4478 DataNode
-------------------------------------------
------------jps on kube-data3------------
4629 NodeManager
4935 Jps
4491 DataNode
-------------------------------------------
/data/check/all_command.sh(지정 커맨드 실행)
#!/usr/bin/bash
# 인자 개수 확인
if [ "$#" -ne 1 ]; then
echo "Usage: $0 <command>"
exit 1
fi
command=$1
for host_name in kube-control1 kube-node1 kube-data1 kube-data2 kube-data3
do
echo "------command ${host_name}------------"
ssh ${host_name} "${command}"
echo "--------------------------------------"; echo"";
done
docker_container_restart.sh(docker 컨테이너 재시작)
#!/usr/bin/bash
echo "[`date`] Time_Stamp : restart docker container Start...."; echo "";
echo "[`date`] Time_Stamp : restart zookeeper Start...."
for host_name in kube-control1 kube-node1 kube-data1 kube-data2 kube-data3
do
containers_id=$(ssh ${host_name} "docker ps -a | grep zookeeper | awk '{print \$1}'")
if [ -n "${containers_id}" ]; then
echo "------------Containers restart ${host_name}------------"
ssh ${host_name} "docker start ${containers_id}"
echo "-------------------------------------------------------"; echo"";
else
echo "------------Containers restart ${host_name}------------"
echo "no zeekeeper server"
echo "-------------------------------------------------------"; echo"";
fi
done
echo "[`date`] Time_Stamp : restart zookeeper End...."; echo "";
echo "[`date`] Time_Stamp : restart postgresql Start...."
for host_name in kube-control1 kube-node1 kube-data1 kube-data2 kube-data3
do
containers_id=$(ssh ${host_name} "docker ps -a | grep postsql | awk '{print \$1}'")
if [ -n "${containers_id}" ]; then
echo "------------Containers restart ${host_name}------------"
ssh ${host_name} "docker start ${containers_id}"
echo "-------------------------------------------------------"; echo"";
else
echo "------------Containers restart ${host_name}------------"
echo "no postsql server"
echo "-------------------------------------------------------"; echo"";
fi
done
echo "[`date`] Time_Stamp : restart postgresql End...."; echo "";
echo "[`date`] Time_Stamp : restart hadoop_hive Start...."
for host_name in kube-control1 kube-node1 kube-data1 kube-data2 kube-data3
do
containers_id=$(ssh ${host_name} "docker ps -a | grep hadoop_hive | awk '{print \$1}'")
if [ -n "${containers_id}" ]; then
echo "------------Containers restart ${host_name}------------"
ssh ${host_name} "docker start ${containers_id}"
echo "-------------------------------------------------------"; echo"";
else
echo "------------Containers restart ${host_name}------------"
echo "no hadoop_hive server"
echo "-------------------------------------------------------"; echo"";
fi
done
echo "[`date`] Time_Stamp : restart hadoop_hive End...."; echo "";
echo "[`date`] Time_Stamp : restart docker container End...."
docker_container.sh(실행중인 docker 컨테이너 확인)
#!/usr/bin/bash
for host_name in kube-control1 kube-node1 kube-data1 kube-data2 kube-data3
do
echo "------Containers on ${host_name}------------"
ssh ${host_name} 'docker ps --format "{{.Names}}"'
echo "--------------------------------------------"; echo"";
done
zookeeper_check.sh(주키퍼 상태 확인)
#!/usr/bin/bash
for host_name in kube-control1 kube-node1 kube-data1
do
echo "------------zookeeper type ${host_name}------------"
ssh -t ${host_name} "docker exec -it zookeeper /data/sy0218/apache-zookeeper-3.7.2-bin/bin/zkServer.sh status"
echo "---------------------------------------------------"; echo"";
done
[kube-control1:/data/check] cat ^C
[kube-control1:/data/check] cat zookeeper_check.sh
#!/usr/bin/bash
for host_name in kube-control1 kube-node1 kube-data1
do
echo "------------zookeeper type ${host_name}------------"
ssh -t ${host_name} "docker exec -it zookeeper /data/sy0218/apache-zookeeper-3.7.2-bin/bin/zkServer.sh status"
echo "---------------------------------------------------"; echo"";
done
3. hive 기동
1. Apache Hive의 메타스토어 데이터베이스를 초기화
/data/sy0218/apache-hive-3.1.3-bin/bin/schematool -initSchema -dbType postgres
2. HiveServer2를 백그라운드에서 실행하며 로그를 지정된 파일에 기록
mkdir -p /hive/log
nohup hive --service hiveserver2 >> /hive/log/hiveserver2.log 2>&1 &
3-1. hive_test
hive -e "show databases;"'데이터 엔지니어( 실습 정리 )' 카테고리의 다른 글
| 가상환경 리눅스 스왑영역 잡아주기 (0) | 2024.06.27 |
|---|---|
| 작업시 유용한 스크립트 정리(1) (0) | 2024.06.27 |
| docker 폐쇄망 설치(우분투 22.04) (0) | 2024.06.13 |
| postsql 도커 컨테이너 구축 (0) | 2024.05.22 |
| 하둡 클러스터 운영( 도커 컨테이너 ) 실습[하둡] (0) | 2024.05.21 |