도커 컨테이너 실습[하둡hive] 동적 실행 (tistory.com)
도커 컨테이너 실습[하둡hive] 동적 실행
업데이트(Update) 2024-07-03- 도커 레지스트리 활용 컨테이너 실행으로 스크립트 변경- proc.sh 스크립트 수정(절차지향 >>> 객체지향) 재 사용성을 위해 수정하였습니다.- component_proc.py(system_download.txt
sy02229.tistory.com
기존 컨테이너 방식은 오버헤드 발생 단점... ㅠㅠ
local 환경에 ansible을 활용한 동적 설치를 통해 단점을 개선!!! 가쥬앗
0. 분산 애플리케이션 설정 공유자원( /data/work/system_download.txt )
[server_ip]|192.168.56.10|192.168.56.11|192.168.56.12
[zookeeper_ip]|192.168.56.10|192.168.56.11|192.168.56.12
-----------------------[zoo.cfg-start]-----------------------
[tickTime=]|2000
[initLimit=]|11
[syncLimit=]|5
[dataDir=]|/data/sy0218/apache-zookeeper-3.7.2-bin/data
[clientPort=]|2181
-----------------------[zoo.cfg-end]-----------------------
[postgresql_data_directory]|/pgdb/pg_data
-----------------------[postgresql.conf-start]-----------------------
[data_directory]|'/pgdb/pg_data'
[listen_addresses]|'*'
[port]|5432
-----------------------[postgresql.conf-end]-----------------------
[hadoop_ip]|192.168.56.10|192.168.56.11|192.168.56.12
[need_dir]|/hadoop/hdfs_work|/hadoop/hdfs|/hadoop/data1|/hadoop/data2|/hadoop/jn|/hadoop/data
-----------------------[core-site.xml-start]-----------------------
[fs.default.name]|hdfs://192.168.56.10:9000
[fs.defaultFS]|hdfs://my-hadoop-cluster
[hadoop.tmp.dir]|file:///hadoop/hdfs_work/hadoop-root
[ha.zookeeper.quorum]|192.168.56.10:2181,192.168.56.11:2181,192.168.56.12:2181
-----------------------[core-site.xml-end]-----------------------
-----------------------[hdfs-site.xml-start]-----------------------
[dfs.namenode.name.dir]|file:///hadoop/hdfs/nn
[dfs.datanode.data.dir]|file:///hadoop/data1,file:///hadoop/data2
[dfs.journalnode.edits.dir]|/hadoop/jn
[dfs.namenode.rpc-address.my-hadoop-cluster.namenode1]|192.168.56.10:8020
[dfs.namenode.rpc-address.my-hadoop-cluster.namenode2]|192.168.56.11:8020
[dfs.namenode.http-address.my-hadoop-cluster.namenode1]|192.168.56.10:50070
[dfs.namenode.http-address.my-hadoop-cluster.namenode2]|192.168.56.11:50070
[dfs.namenode.shared.edits.dir]|qjournal://192.168.56.10:8485;192.168.56.11:8485;192.168.56.12:8485/my-hadoop-cluster
[dfs.name.dir]|/hadoop/data/name
[dfs.data.dir]|/hadoop/data/data
-----------------------[hdfs-site.xml-end]-----------------------
-----------------------[mapred-site.xml-start]-----------------------
[mapreduce.framework.name]|yarn
-----------------------[mapred-site.xml-end]-----------------------
-----------------------[yarn-site.xml-start]-----------------------
[yarn.resourcemanager.hostname.rm1]|192.168.56.10
[yarn.resourcemanager.hostname.rm2]|192.168.56.11
[yarn.resourcemanager.webapp.address.rm1]|192.168.56.10:8088
[yarn.resourcemanager.webapp.address.rm2]|192.168.56.11:8088
[yarn.resourcemanager.zk-address]|192.168.56.10:2181,192.168.56.11:2181,192.168.56.12:2181
[yarn.nodemanager.resource.memory-mb]|8192
-----------------------[yarn-site.xml-end]-----------------------
-----------------------[hadoop-env.sh-start]-----------------------
[export JAVA_HOME=]|/usr/lib/jvm/java-8-openjdk-amd64
[export HADOOP_HOME=]|/data/sy0218/hadoop-3.3.5
-----------------------[hadoop-env.sh-end]-----------------------
-----------------------[workers-start]-----------------------
192.168.56.10
192.168.56.11
192.168.56.12
-----------------------[workers-end]-----------------------
1. 인벤토리 ( /data/work/hadoop_3.3.5_auto_ansible/hosts.ini )
인벤토리: 관리 대상 시스템의 리스트입니다.
[servers]
192.168.56.10
192.168.56.11
192.168.56.12
2. 플레이북 변수 파일 ( /data/work/hadoop_3.3.5_auto_ansible/main.yml )
hadoop_tar_path: "/data/download_tar"
hadoop_tar_filename: "hadoop-3.3.5.tar.gz"
work_dir: "/data/sy0218"
play_book_dir: "/data/work/hadoop_3.3.5_auto_ansible"
3. 하둡 동적 설정을 위한 entrypoint.sh (/data/work/hadoop_3.3.5_auto_ansible/entrypoint.sh)
(0) 하둡 tar wget
wget https://archive.apache.org/dist/hadoop/core/hadoop-3.3.5/hadoop-3.3.5.tar.gz
(1) hadoop 설정 파일 모음
하둡 클러스터 운영( 도커 컨테이너 ) 실습[하둡] (tistory.com)
하둡 클러스터 운영( 도커 컨테이너 ) 실습[하둡]
1. 하둡 (컨테이너)하둡은 대규모 데이터 처리를 위해 설계된 오픈 소스 소프트웨어 프레임워크입니다. 분산 저장 및 분산 처리를 지원하며, 대량의 데이터를 효율적으로 처리할 수 있는 기능을
sy02229.tistory.com
(2) entrypoint.sh
#!/bin/bash
system_file="/data/work/system_download.txt"
conf_dir=$1
work_dir=$2
ip_array=($(grep hadoop_ip ${system_file} | awk -F '|' '{for(i=2; i<=NF; i++) print $i}'))
len_ip_array=${#ip_array[@]}
hadoop_need_dir=($(grep need_dir ${system_file} | awk -F '|' '{for(i=2; i<=NF; i++) print $i}'))
len_need_dir=${#hadoop_need_dir[@]}
for ((i=0; i<len_ip_array; i++)); do
current_ip=${ip_array[$i]}
for ((j=0; j<len_need_dir; j++)); do
current_dir=${hadoop_need_dir[$j]}
echo "${current_ip} 서버 필요 디렉토리 삭제 후 생성 $current_dir"
ssh ${current_ip} "rm -rf ${current_dir}"
ssh ${current_ip} "mkdir -p ${current_dir}"
done
done
for core_config_low in $(awk '/\[core-site.xml-start\]/{flag=1; next} /\[core-site.xml-end\]/{flag=0} flag' ${system_file});
do
file_name=$(find ${conf_dir} -type f -name *core-site.xml*)
core_site_name=$(echo ${core_config_low} | awk -F '|' '{print $1}' | sed 's/[][]//g')
core_site_value=$(echo ${core_config_low} | awk -F '|' '{print $2}')
sed -i "/<name>${core_site_name}<\/name>/!b;n;c<value>${core_site_value}</value>" ${file_name}
done
for hdfs_site_config_low in $(awk '/\[hdfs-site.xml-start\]/{flag=1; next} /\[hdfs-site.xml-end\]/{flag=0} flag' ${system_file});
do
file_name=$(find ${conf_dir} -type f -name *hdfs-site.xml*)
hdfs_site_name=$(echo ${hdfs_site_config_low} | awk -F '|' '{print $1}' | sed 's/[][]//g')
hdfs_site_value=$(echo ${hdfs_site_config_low} | awk -F '|' '{print $2}')
sed -i "/<name>${hdfs_site_name}<\/name>/!b;n;c<value>${hdfs_site_value}</value>" ${file_name}
done
for mapred_site_config_low in $(awk '/\[mapred-site.xml-start\]/{flag=1; next} /\[mapred-site.xml-end\]/{flag=0} flag' ${system_file});
do
file_name=$(find ${conf_dir} -type f -name *mapred-site.xml*)
mapred_site_name=$(echo ${mapred_site_config_low} | awk -F '|' '{print $1}' | sed 's/[][]//g')
mapred_site_value=$(echo ${mapred_site_config_low} | awk -F '|' '{print $2}')
sed -i "/<name>${mapred_site_name}<\/name>/!b;n;c<value>${mapred_site_value}</value>" ${file_name}
done
for yarn_site_config_low in $(awk '/\[yarn-site.xml-start\]/{flag=1; next} /\[yarn-site.xml-end\]/{flag=0} flag' ${system_file});
do
file_name=$(find ${conf_dir} -type f -name *yarn-site.xml*)
yarn_site_name=$(echo ${yarn_site_config_low} | awk -F '|' '{print $1}' | sed 's/[][]//g')
yarn_site_value=$(echo ${yarn_site_config_low} | awk -F '|' '{print $2}')
sed -i "/<name>${yarn_site_name}<\/name>/!b;n;c<value>${yarn_site_value}</value>" ${file_name}
done
hadoop_env_config=$(awk '/\[hadoop-env.sh-start\]/{flag=1; next} /\[hadoop-env.sh-end\]/{flag=0} flag' ${system_file})
while IFS= read -r hadoop_env_config_low;
do
file_name=$(find ${conf_dir} -type f -name *hadoop-env.sh*)
hadoop_env_name=$(echo $hadoop_env_config_low | awk -F '|' '{print $1}' | sed 's/[][]//g')
hadoop_env_value=$(echo $hadoop_env_config_low | awk -F '|' '{print $2}')
sed -i "s|^${hadoop_env_name}.*$|${hadoop_env_name}${hadoop_env_value}|" ${file_name}
done <<< $hadoop_env_config
work_file_name=$(find ${conf_dir} -type f -name *workers*)
truncate -s 0 $work_file_name
for workers_low in $(awk '/\[workers-start\]/{flag=1; next} /\[workers-end\]/{flag=0} flag' ${system_file});
do
echo $workers_low >> $work_file_name
done
# 하둡 서버 필요 디렉토리 생성
for ((i=0; i<len_ip_array; i++)); do
current_ip=${ip_array[$i]}
for ((j=0; j<len_need_dir; j++)); do
current_dir=${hadoop_need_dir[$j]}
echo "${current_ip} 서버 필요 디렉토리 삭제 후 생성 $current_dir"
ssh ${current_ip} "rm -rf ${current_dir}"
ssh ${current_ip} "mkdir -p ${current_dir}"
done
done
# 동적 setup된 하둡 설정 관련 파일 하둡 클러스터에 scp
for cp_file in $(ls ${conf_dir});
do
if [[ "$cp_file" != "entrypoint.sh" && "$cp_file" != "hadoop-3.3.5.tar.gz" && "$cp_file" != *.yml && "$cp_file" != *.ini ]]; then
if [[ "$cp_file" == "fair-scheduler.xml" ]]; then
local_path=$(find ${work_dir}/*hadoop*/etc/hadoop -name hadoop -type d)
else
local_path=$(find ${work_dir}/ -name ${cp_file} -type f ! -path "*/sample-conf/*")
fi
for ((i=0; i<len_ip_array; i++));
do
current_ip=${ip_array[$i]}
scp ${conf_dir}/${cp_file} root@${current_ip}:${local_path}
done
fi
done
4. 플레이북 yml 파일 ( /data/work/hadoop_3.3.5_auto_ansible/hadoop_deploy.yml )
---
- name: Create hadoop_tar directory
hosts: servers
become: yes
vars_files:
- /data/work/hadoop_3.3.5_auto_ansible/main.yml
tasks:
- name: Create hadoop_tar directory
file:
path: "{{ hadoop_tar_path }}"
state: directory
- name: Create work directory
file:
path: "{{ work_dir }}"
state: directory
- name: Copy hadoop_tar to servers
hosts: localhost
become: yes
vars_files:
- /data/work/hadoop_3.3.5_auto_ansible/main.yml
tasks:
- name: Copy hadoop_tar to servers
copy:
src: "{{ play_book_dir }}/{{ hadoop_tar_filename }}"
dest: "{{ hadoop_tar_path }}/{{ hadoop_tar_filename }}"
mode: "0644"
delegate_to: "{{ item }}"
with_items:
- '192.168.56.10'
- '192.168.56.11'
- '192.168.56.12'
- name: Extract hadoop_tar
hosts: servers
become: yes
vars_files:
- /data/work/hadoop_3.3.5_auto_ansible/main.yml
tasks:
- name: Extract the hadoop tarball
unarchive:
src: "{{ hadoop_tar_path }}/{{ hadoop_tar_filename }}"
dest: "{{ work_dir }}"
remote_src: yes
- name: entrypoint_sh start
hosts: localhost
become: yes
vars_files:
- /data/work/hadoop_3.3.5_auto_ansible/main.yml
tasks:
- name: entry_point_sh start
shell: "{{ play_book_dir }}/entrypoint.sh {{ play_book_dir }} {{ work_dir }}"
실행 명령어
ansible-playbook -i /data/work/hadoop_3.3.5_auto_ansible/hosts.ini /data/work/hadoop_3.3.5_auto_ansible/hadoop_deploy.yml
5. 하둡 실행
hdfs zkfc -formatZK(master1) : HDFS(ZooKeeper Failover Controller)를 포맷하는 데 사용
- ZKFC의 상태를 초기화하고 주키퍼에 새로운 ZKFC 데이터를 포맷
start-dfs.sh(master1) : 하둡 분산 파일 시스템(HDFS)의 구성 요소를 시작하는 스크립트
/data/sy0218/hadoop-3.3.5/sbin/start-dfs.sh
hdfs namenode -format(master1) : HDFS 네임노드를 포맷
stop-dfs.sh(master1) : 하둡 분산 파일 시스템(HDFS)의 구성 요소를 종료하는 스크립트
/data/sy0218/hadoop-3.3.5/sbin/stop-dfs.sh
/data/sy0218/hadoop-3.3.5/sbin/stop-all.sh
start-all.sh(master1) : 하둡(Hadoop) 클러스터의 여러 구성 요소를 시작
/data/sy0218/hadoop-3.3.5/sbin/start-all.sh
hdfs namenode -bootstrapStandby(master2) : Hadoop 클러스터에서 하나의 네임노드를 부트스트래핑하여 스탠바이 네임노드로 설정
/data/sy0218/hadoop-3.3.5/sbin/stop-all.sh(master1)
/data/sy0218/hadoop-3.3.5/sbin/start-all.sh(master1)
hdfs haadmin -getServiceState namenode1(master1)
hdfs haadmin -getServiceState namenode2(master1)
hdfs dfsadmin -report(master1)
최종 셋팅 확인


Configured Capacity: 198771019776 (185.12 GB)
Present Capacity: 158007382016 (147.16 GB)
DFS Remaining: 158007283712 (147.16 GB)
DFS Used: 98304 (96 KB)
DFS Used%: 0.00%
Replicated Blocks:
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
Erasure Coded Block Groups:
Low redundancy block groups: 0
Block groups with corrupt internal blocks: 0
Missing block groups: 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
-------------------------------------------------
Live datanodes (3):
Name: 192.168.56.10:9866 (kube-control1)
Hostname: kube-control1
Decommission Status : Normal
Configured Capacity: 66257006592 (61.71 GB)
DFS Used: 32768 (32 KB)
Non DFS Used: 10655657984 (9.92 GB)
DFS Remaining: 52202463232 (48.62 GB)
DFS Used%: 0.00%
DFS Remaining%: 78.79%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 0
Last contact: Tue Jul 23 17:09:03 KST 2024
Last Block Report: Tue Jul 23 16:57:26 KST 2024
Num of Blocks: 0
Name: 192.168.56.11:9866 (kube-node1)
Hostname: kube-node1
Decommission Status : Normal
Configured Capacity: 66257006592 (61.71 GB)
DFS Used: 32768 (32 KB)
Non DFS Used: 9957601280 (9.27 GB)
DFS Remaining: 52900519936 (49.27 GB)
DFS Used%: 0.00%
DFS Remaining%: 79.84%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 0
Last contact: Tue Jul 23 17:09:03 KST 2024
Last Block Report: Tue Jul 23 16:55:05 KST 2024
Num of Blocks: 0
Name: 192.168.56.12:9866 (kube-data1)
Hostname: kube-data1
Decommission Status : Normal
Configured Capacity: 66257006592 (61.71 GB)
DFS Used: 32768 (32 KB)
Non DFS Used: 9953820672 (9.27 GB)
DFS Remaining: 52904300544 (49.27 GB)
DFS Used%: 0.00%
DFS Remaining%: 79.85%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 0
Last contact: Tue Jul 23 17:09:02 KST 2024
Last Block Report: Tue Jul 23 16:54:59 KST 2024
Num of Blocks: 0'데이터 엔지니어( 실습 정리 )' 카테고리의 다른 글
| 데이터 저장 정책에 따른 저장 환경 별 데이터 삭제 스크립트 모음 (0) | 2024.10.16 |
|---|---|
| 파이썬 venv를 사용한 파이썬 프로젝트 버전관리 ( Ubuntu 22.04 실습 ) (1) | 2024.10.10 |
| ansible - postgresql 동적 설치 (2) | 2024.07.22 |
| ansible - 주키퍼 동적 설치 (3) | 2024.07.22 |
| ansible 사용 os_auto_config (0) | 2024.07.18 |