HiveはHiveQLというSQL風の言語でHadoop上のデータを操作できます。
Hadoop上のデータベースというとHBaseが有名ですが、
HiveはHDFSに対してよりユーザーフレンドリなインターフェイスを提供するもので、
HBaseとは根本的に存在意義が異なります。

———————————————————————————
http://hive.apache.org/ 抜粋
———————————————————————————
Hive is a data warehouse system for Hadoop that facilitates easy data summarization,
ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems.
Hive provides a mechanism to project structure onto this data and query the data using a
SQL-like language called HiveQL. At the same time this language also allows traditional
map/reduce programmers to plug in their custom mappers and reducers when it is
inconvenient or inefficient to express this logic in HiveQL.
———————————————————————————-

Hadoop+Hive検証環境を構築してみる
SQLライクにHadoop Hiveを使い倒す!
Hadoop Hiveと MySQLの利用例(29-FEB-2012)

[root@colinux ~]# vi /home/hiveuser/.bashrc

hive user

hive user

【/home/hiveuser/.bashrcに追記】
export PATH=$PATH:/usr/java/latest/bin
export JAVA_HOME=/usr/java/latest
export HADOOP_HOME=/usr/local/hadoop

Hive Download Site
http://www.apache.org/dyn/closer.cgi/hive/

Hive Install

[root@colinux ~]# su – hadoop
[hadoop@colinux ~]$ pwd
/home/hadoop
[hiveuser@colinux ~]$ wget http://ftp.kddilabs.jp/infosystems/apache/hive/stable/hive-0.8.1.tar.gz
–2012-05-20 14:50:13– http://ftp.kddilabs.jp/infosystems/apache/hive/stable/hive-0.8.1.tar.gz
ftp.kddilabs.jp をDNSに問いあわせています… 192.26.91.193, 2001:200:601:10:206:5bff:fef0:466c
ftp.kddilabs.jp|192.26.91.193|:80 に接続しています… 接続しました。
HTTP による接続要求を送信しました、応答を待っています… 200 OK
長さ: 31325840 (30M) [application/x-gzip]
`hive-0.8.1.tar.gz’ に保存中

100%[================================================================================>] 31,325,840 2.90M/s 時間 11s

2012-05-20 14:50:24 (2.81 MB/s) – `hive-0.8.1.tar.gz’ へ保存完了 [31325840/31325840]
[hadoop@colinux ~]$

hive install

hive install

[hadoop@colinux ~]$ tar xvfz hive-0.8.1.tar.gz

[root@colinux ~]# mv /home/hadoop/hive-0.8.1 /usr/local/
[root@colinux ~]# cd /usr/local/
[root@colinux local]# ln -s hive-0.8.1 hive

バージョンアップを考えて、展開後にシンボリックリンク作成します。
[root@colinux local]# ls -l
合計 88
drwxr-xr-x 2 root root 4096 2011-12-10 10:12 bin
drwxr-xr-x 2 root root 4096 2011-12-10 10:12 etc
drwxr-xr-x 2 root root 4096 2007-04-17 21:46 games
lrwxrwxrwx 1 root root 23 2012-05-12 12:17 hadoop -> /usr/local/hadoop-1.0.1
drwxr-xr-x 15 hadoop hadoop 4096 2012-05-12 13:06 hadoop-1.0.1
lrwxrwxrwx 1 root root 10 2012-05-20 18:37 hive -> hive-0.8.1
drwxr-xr-x 9 root root 4096 2012-05-20 18:36 hive-0.8.1

[root@colinux local]# chown -R hadoop:hadoop hive/

Hadoopを起動してjpsで起動確認します。

[hadoop@colinux ~]$ /usr/local/hadoop/bin/hadoop namenode -format
12/05/26 09:21:10 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = colinux/127.0.0.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.0.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1243785; compiled by ‘hortonfo’ on Tue Feb 14 08:15:38 UTC 2012
************************************************************/
Re-format filesystem in /tmp/hadoop-hadoop/dfs/name ? (Y or N) Y
12/05/26 09:21:16 INFO util.GSet: VM type = 32-bit
12/05/26 09:21:16 INFO util.GSet: 2% max memory = 19.33375 MB
12/05/26 09:21:16 INFO util.GSet: capacity = 2^22 = 4194304 entries
12/05/26 09:21:16 INFO util.GSet: recommended=4194304, actual=4194304
12/05/26 09:21:17 INFO namenode.FSNamesystem: fsOwner=hadoop
12/05/26 09:21:17 INFO namenode.FSNamesystem: supergroup=supergroup
12/05/26 09:21:17 INFO namenode.FSNamesystem: isPermissionEnabled=true
12/05/26 09:21:17 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
12/05/26 09:21:17 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
12/05/26 09:21:17 INFO namenode.NameNode: Caching file names occuring more than 10 times
12/05/26 09:21:18 INFO common.Storage: Image file of size 112 saved in 0 seconds.
12/05/26 09:21:18 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.
12/05/26 09:21:18 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at colinux/127.0.0.1
************************************************************/
[hadoop@colinux ~]$ /usr/local/hadoop/bin/start-all.sh
starting namenode, logging to /usr/local/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-namenode-colinux.out
localhost: starting datanode, logging to /usr/local/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-datanode-colinux.out
localhost: starting secondarynamenode, logging to /usr/local/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-secondarynamenode-colinux.out
starting jobtracker, logging to /usr/local/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-jobtracker-colinux.out
localhost: starting tasktracker, logging to /usr/local/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-tasktracker-colinux.out
[hadoop@colinux ~]$

[hadoop@colinux ~]$ jps
3438 Jps
3270 JobTracker
3402 TaskTracker
3058 DataNode
3185 SecondaryNameNode
[hadoop@colinux ~]$

テスト用のデータファイルをダウンロードしてHadoopとHiveの動作検証。

[hadoop@colinux ~]$ mkdir ~/localfiles
[hadoop@colinux ~]$ cd localfiles/
[hadoop@colinux localfiles]$ wget http://www.atmarkit.co.jp/fdb/single/s_hive/dl/data.tar.gz
–2012-05-26 08:55:18– http://www.atmarkit.co.jp/fdb/single/s_hive/dl/data.tar.gz
www.atmarkit.co.jp をDNSに問いあわせています… 202.218.219.147
www.atmarkit.co.jp|202.218.219.147|:80 に接続しています… 接続しました。
HTTP による接続要求を送信しました、応答を待っています… 200 OK
長さ: 2071417 (2.0M) [application/x-tar]
`data.tar.gz’ に保存中

100%[=========================================================================================>] 2,071,417 705K/s 時間 2.9s

2012-05-26 08:55:21 (705 KB/s) – `data.tar.gz’ へ保存完了 [2071417/2071417]

[hadoop@colinux localfiles]$

テスト用のファイルを展開して、Hiveコマンドを実行。
[hadoop@colinux ~]$ /usr/local/hive/bin/hive

hive> CREATE TABLE pref (id int, pref STRING)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’ LINES TERMINATED BY ‘\n’;

hive create table

hive create table

hive> desc pref;
OK
id int
pref string
Time taken: 0.61 seconds
hive>

hive> LOAD DATA LOCAL INPATH ‘/home/hadoop/localfiles/pref.csv’ OVERWRITE INTO TABLE pref;

load data

load data

SELECTしてデータの確認。

Hadoop job information for Stage

hive> select A.pref from pref A;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there’s no reduce operator
Starting Job = job_201205260921_0003, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201205260921_0003
Kill Command = /usr/local/hadoop-1.0.1/libexec/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201205260921_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2012-05-26 09:45:36,731 Stage-1 map = 0%, reduce = 0%
2012-05-26 09:45:45,791 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.5 sec
2012-05-26 09:45:46,811 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.5 sec
2012-05-26 09:45:47,831 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.5 sec
2012-05-26 09:45:48,831 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.5 sec
2012-05-26 09:45:49,851 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.5 sec
2012-05-26 09:45:50,881 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.5 sec
2012-05-26 09:45:51,891 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.5 sec
2012-05-26 09:45:52,911 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.5 sec
2012-05-26 09:45:53,931 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.5 sec
2012-05-26 09:45:55,511 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.5 sec
MapReduce Total cumulative CPU time: 1 seconds 500 msec
Ended Job = job_201205260921_0003
MapReduce Jobs Launched:
Job 0: Map: 1 Accumulative CPU: 1.5 sec HDFS Read: 820 HDFS Write: 479 SUCESS
Total MapReduce CPU Time Spent: 1 seconds 500 msec
OK
北海道
青森県
岩手県
宮城県
秋田県

注:先日作成したhadoopユーザーを利用するので,hiveアカウントは利用しませんでした。

select

select

Comments are closed.

Post Navigation