memcached は、データとオブジェクトをメモリ内にキャッシュすることで、
データベースなどからデータを頻繁に取りにいくことを減らしてパフォーマンスを改善させたり、
セッションをmemcached上に載せる事で、高速なセッション管理を行ったりする事が出来る。

The latest stable memcached release is
v1.4.13
memcached.org

Free & open source, high-performance, distributed memory object caching system,
generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.

libeventのインストール

[root@ Sun May 27]# yum install libevent libevent-devel
Loaded plugins: fastestmirror, priorities, security, update-motd
Loading mirror speeds from cached hostfile
* amzn-main: packages.ap-northeast-1.amazonaws.com
* amzn-updates: packages.ap-northeast-1.amazonaws.com
amzn-main | 2.1 kB 00:00
amzn-updates | 2.3 kB 00:00
Setting up Install Process
Resolving Dependencies
–> Running transaction check
—> Package libevent.x86_64 0:1.4.13-1.6.amzn1 will be installed
—> Package libevent-devel.x86_64 0:1.4.13-1.6.amzn1 will be installed
–> Finished Dependency Resolution

省略…
Dependencies ResolvedDownloading Packages:
(1/2): libevent-1.4.13-1.6.amzn1.x86_64.rpm | 114 kB 00:00
(2/2): libevent-devel-1.4.13-1.6.amzn1.x86_64.rpm | 386 kB 00:00

MEMCACHEDインストール

[root@ Sun May 27]# wget http://memcached.googlecode.com/files/memcached-1.4.13.tar.gz
–2012-05-27 17:56:05– http://memcached.googlecode.com/files/memcached-1.4.13.tar.gz
Resolving memcached.googlecode.com… 72.14.203.82
Connecting to memcached.googlecode.com|72.14.203.82|:80… connected.
HTTP request sent, awaiting response… 200 OK
Length: 320751 (313K) [application/x-gzip]
Saving to: “memcached-1.4.13.tar.gz”

100%[==========================================================================>] 320,751 1.37M/s in 0.2s

2012-05-27 17:56:06 (1.37 MB/s) – “memcached-1.4.13.tar.gz” saved [320751/320751]

[root@ Sun May 27]#

[root@ Sun May 27]# tar zxf memcached-1.4.13.tar.gz
[root@ Sun May 27]# ls -l
total 320
drwxr-xr-x 6 1000 1000 4096 Feb 3 06:24 memcached-1.4.13
-rw-r–r– 1 root root 320751 Feb 3 06:27 memcached-1.4.13.tar.gz
[root@ Sun May 27]# cd memcached-1.4.13
[root@ Sun May 27]# ./configure
checking build system type… x86_64-unknown-linux-gnu
checking host system type… x86_64-unknown-linux-gnu
checking target system type… x86_64-unknown-linux-gnu
checking for a BSD-compatible install… /usr/bin/install -c
checking whether build environment is sane… yes
checking for a thread-safe mkdir -p… /bin/mkdir -p
checking for gawk… gawk
checking whether make sets $(MAKE)… yes
checking for gcc… gcc
checking whether the C compiler works… yes
checking for C compiler default output file name… a.out
checking for suffix of executables…
checking whether we are cross compiling… no
checking for suffix of object files… o
checking whether we are using the GNU C compiler… yes
checking whether gcc accepts -g… yes

[省略]……………………………………..

[root@ Sun May 27]# make
make all-recursive
make[1]: Entering directory `/home/ec2-user/memcached/memcached-1.4.13′
Making all in doc
make[2]: Entering directory `/home/ec2-user/memcached/memcached-1.4.13/doc’
make all-am
make[3]: Entering directory `/home/ec2-user/memcached/memcached-1.4.13/doc’
make[3]: Nothing to be done for `all-am’.
make[3]: Leaving directory `/home/ec2-user/memcached/memcached-1.4.13/doc’
make[2]: Leaving directory `/home/ec2-user/memcached/memcached-1.4.13/doc’
make[2]: Entering directory `/home/ec2-user/memcached/memcached-1.4.13′
gcc -std=gnu99 -DHAVE_CONFIG_H -I. -DNDEBUG -g -O2 -pthread -Wall -Werror -pedantic
-Wmissing-prototypes -Wmissing-declarations -Wredundant -decls -fno-strict-aliasing -MT memcached-memcached.o
-MD -MP -MF .deps/memcached-memcached.Tpo -c -o memcached-memcached.o `test -f ‘memcached.c’ || echo ‘./’`memcached.c

[root@ Sun May 27]# make install
make install-recursive
make[1]: Entering directory `/home/ec2-user/memcached/memcached-1.4.13′
Making install in doc
make[2]: Entering directory `/home/ec2-user/memcached/memcached-1.4.13/doc’
make install-am
make[3]: Entering directory `/home/ec2-user/memcached/memcached-1.4.13/doc’
make[4]: Entering directory `/home/ec2-user/memcached/memcached-1.4.13/doc’
make[4]: Nothing to be done for `install-exec-am’.
test -z “/usr/local/share/man/man1” || /bin/mkdir -p “/usr/local/share/man/man1”
/usr/bin/install -c -m 644 memcached.1 ‘/usr/local/share/man/man1′
make[4]: Leaving directory `/home/ec2-user/memcached/memcached-1.4.13/doc’
make[3]: Leaving directory `/home/ec2-user/memcached/memcached-1.4.13/doc’
make[2]: Leaving directory `/home/ec2-user/memcached/memcached-1.4.13/doc’
make[2]: Entering directory `/home/ec2-user/memcached/memcached-1.4.13′
make[3]: Entering directory `/home/ec2-user/memcached/memcached-1.4.13′
test -z “/usr/local/bin” || /bin/mkdir -p “/usr/local/bin”
/usr/bin/install -c memcached ‘/usr/local/bin’
test -z “/usr/local/include/memcached” || /bin/mkdir -p “/usr/local/include/memcached”
/usr/bin/install -c -m 644 protocol_binary.h ‘/usr/local/include/memcached’
make[3]: Leaving directory `/home/ec2-user/memcached/memcached-1.4.13′
make[2]: Leaving directory `/home/ec2-user/memcached/memcached-1.4.13′
make[1]: Leaving directory `/home/ec2-user/memcached/memcached-1.4.13′
[root@ Sun May 27]#

[root@ Sun May 27]# whereis memcached
memcached: /usr/local/bin/memcached
[root@ Sun May 27]#

基本起動確認

[ec2-user@ Sun May 27]$ /usr/local/bin/memcached -p 11211 -m 10m -vv
slab class 1: chunk size 96 perslab 10922
slab class 2: chunk size 120 perslab 8738
slab class 3: chunk size 152 perslab 6898
slab class 4: chunk size 192 perslab 5461
slab class 5: chunk size 240 perslab 4369
slab class 6: chunk size 304 perslab 3449
slab class 7: chunk size 384 perslab 2730
slab class 8: chunk size 480 perslab 2184
slab class 9: chunk size 600 perslab 1747
slab class 10: chunk size 752 perslab 1394
slab class 11: chunk size 944 perslab 1110
slab class 12: chunk size 1184 perslab 885
slab class 13: chunk size 1480 perslab 708
slab class 14: chunk size 1856 perslab 564
slab class 15: chunk size 2320 perslab 451
slab class 16: chunk size 2904 perslab 361
slab class 17: chunk size 3632 perslab 288
slab class 18: chunk size 4544 perslab 230
slab class 19: chunk size 5680 perslab 184
slab class 20: chunk size 7104 perslab 147
slab class 21: chunk size 8880 perslab 118
slab class 22: chunk size 11104 perslab 94
slab class 23: chunk size 13880 perslab 75
slab class 24: chunk size 17352 perslab 60
slab class 25: chunk size 21696 perslab 48
slab class 26: chunk size 27120 perslab 38
slab class 27: chunk size 33904 perslab 30
slab class 28: chunk size 42384 perslab 24
slab class 29: chunk size 52984 perslab 19
slab class 30: chunk size 66232 perslab 15
slab class 31: chunk size 82792 perslab 12
slab class 32: chunk size 103496 perslab 10
slab class 33: chunk size 129376 perslab 8
slab class 34: chunk size 161720 perslab 6
slab class 35: chunk size 202152 perslab 5
slab class 36: chunk size 252696 perslab 4
slab class 37: chunk size 315872 perslab 3
slab class 38: chunk size 394840 perslab 2
slab class 39: chunk size 493552 perslab 2
slab class 40: chunk size 616944 perslab 1
slab class 41: chunk size 771184 perslab 1
slab class 42: chunk size 1048576 perslab 1
<26 server listening (auto-negotiate) <27 server listening (auto-negotiate) <28 send buffer was 229376, now 268435456 <29 send buffer was 229376, now 268435456 <28 server listening (udp) <29 server listening (udp)

バックグラウンド起動し動作検証

デーモンとしてバックグラウンド起動
[ec2-user@ Sun May 27]$ /usr/local/bin/memcached -p 11211 -m 10m -d
[ec2-user@ Sun May 27]$ ps -ef | grep mem
ec2-user 18639 1 0 18:04 ? 00:00:00 /usr/local/bin/memcached -p 11211 -m 10m -d
ec2-user 18646 17197 0 18:04 pts/0 00:00:00 grep mem
[ec2-user@ Sun May 27]$

[ec2-user@ Sun May 27]$ /usr/local/bin/memcached -h
memcached 1.4.13
-p TCP port number to listen on (default: 11211)
-U UDP port number to listen on (default: 11211, 0 is off)
-s UNIX socket path to listen on (disables network support)
-a access mask for UNIX socket, in octal (default: 0700)
-l interface to listen on (default: INADDR_ANY, all addresses)
may be specified as host:port. If you don’t specify
a port number, the value you specified with -p or -U is
used. You may specify multiple addresses separated by comma
or by using -l multiple times
-d run as a daemon
-r maximize core file limit
-u assume identity of (only when run as root)
-m max memory to use for items in megabytes (default: 64 MB)
-M return error on memory exhausted (rather than removing items)
-c max simultaneous connections (default: 1024)
-k lock down all paged memory. Note that there is a
limit on how much memory you may lock. Trying to
allocate more than that would fail, so be sure you
set the limit correctly for the user you started
the daemon with (not for -u user;
under sh this is done with ‘ulimit -S -l NUM_KB’).
-v verbose (print errors/warnings while in event loop)
-vv very verbose (also print client commands/reponses)
-vvv extremely verbose (also print internal state transitions)
-h print this help and exit
-i print memcached and libevent license
-P save PID in , only used with -d option
-f chunk size growth factor (default: 1.25)
-n minimum space allocated for key+value+flags (default: 48)
-L Try to use large memory pages (if available). Increasing
the memory page size could reduce the number of TLB misses
and improve the performance. In order to get large pages
from the OS, memcached will allocate the total item-cache
in one large chunk.
-D Use as the delimiter between key prefixes and IDs.
This is used for per-prefix stats reporting. The default is
“:” (colon). If this option is specified, stats collection
is turned on automatically; if not, then it may be turned on
by sending the “stats detail on” command to the server.
-t number of threads to use (default: 4)
-R Maximum number of requests per event, limits the number of
requests process for a given connection to prevent
starvation (default: 20)
-C Disable use of CAS
-b Set the backlog queue limit (default: 1024)
-B Binding protocol – one of ascii, binary, or auto (default)
-I Override the size of each slab page. Adjusts max item size
(default: 1mb, min: 1k, max: 128m)
-o Comma separated list of extended or experimental options
– (EXPERIMENTAL) maxconns_fast: immediately close new
connections if over maxconns limit
– hashpower: An integer multiplier for how large the hash
table should be. Can be grown at runtime if not big enough.
Set this based on “STAT hash_power_level” before a
restart.
[ec2-user@ Sun May 27]$

telnet経由でデータの入力などの基本動作検証

[ec2-user@ Sun May 27]$ telnet localhost 11211
Trying 127.0.0.1…
Connected to localhost.
Escape character is ‘^]’.
stats
STAT pid 18639
STAT uptime 1139
STAT time 1338110621
STAT version 1.4.13
STAT libevent 1.4.13-stable
STAT pointer_size 64
STAT rusage_user 0.015997
STAT rusage_system 0.015997
STAT curr_connections 10
STAT total_connections 11

set key01 0 0 5
00001
set key02 0 0 5
00002
set key03 0 0 5
00003

memcached

memcached


サーバーを含むシステムの高速化やビックデータ時代の到来に伴い、
分散処理に注目が集まっている様子。

10年前にHPCCが盛り上がった時にはあまり身近に感じなかったが、
HadoopやMongodbのようにオープンサースで気軽に分散処理出来る
システムが導入出来るようになり、ここ2~3年で再び注目を集めている。
忘れがちだったのだが、ネットワークがボトルネックになる可能性も高いので
システム導入の時点できちんとスケールアウトも含めて設計しておく必要がある。

HPCユーザーが知っておきたいTCP/IPの話
ESnet: http://fasterdata.es.net/
———————————————————-
To make better use of its accumulated knowledge, ESnet has developed this Fasterdata Knowledge Base.
The knowledge base provides provides proven, operationally-sound methods for troubleshooting and
solving performance issues. Our solutions fall into five categories:

Network Architecture, including the Science DMZ model
Host Tuning
Network Tuning
Data Transfer Tools
Network Performance Testing
———————————————————-
上記HPCの資料によるとここら辺もきちんとカスタマイズしておいた方が良さそう。
色々なツールもあるので調査したい場合にインストールして現状把握してみても良いかと思います。
nuttcpなどは再送処理なども見つける事が出来るようです。

■Data Transfer Tools
http://fasterdata.es.net/data-transfer-tools/

■Network Troubleshooting Tools
http://fasterdata.es.net/performance-testing/network-troubleshooting-tools/

■Phil Dykstra’s nuttcp quick start guide
http://wcisd.hpc.mil/nuttcp/Nuttcp-HOWTO.html

例)scamperでMTU含めてネットワークパス確認。
———————————————————————-
http://fasterdata.es.net/performance-testing/network-troubleshooting-tools/scamper/

To install scamper:
wget http://www.wand.net.nz/scamper/scamper-cvs-20110421.tar.gz
tar xvzf scamper-cvs-20110421.tar.gz
./configure; make; make install

[root@ip-xxx-xxx-xxx-xxx1 scamper-cvs-20110421]# ./configure; make; make install
checking for a BSD-compatible install… /usr/bin/install -c
checking whether build environment is sane… yes
checking for a thread-safe mkdir -p… /bin/mkdir -p
checking for gawk… gawk
checking whether make sets $(MAKE)… yes
checking build system type… x86_64-unknown-linux-gnu
checking host system type… x86_64-unknown-linux-gnu
checking how to print strings… printf
checking for style of include used by make… GNU
checking for gcc… gcc
checking whether the C compiler works… yes

[root@ip-xxx-xxx-xxx-xxx1 scamper-cvs-20110421]# dig yahoo.co.jp

; <<>> DiG 9.7.3-P3-RedHat-9.7.3-8.P3.15.amzn1 <<>> yahoo.co.jp
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 24120 ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;yahoo.co.jp. IN A ;; ANSWER SECTION: yahoo.co.jp. 287 IN A 203.216.243.240 yahoo.co.jp. 287 IN A 124.83.187.140 ;; Query time: 1 msec ;; SERVER: 172.16.0.23#53(172.16.0.23) ;; WHEN: Sun May 27 08:24:32 2012 ;; MSG SIZE rcvd: 61 [root@ip-xxx-xxx-xxx-xxx1 scamper-cvs-20110421]# [root@ip-xxx-xxx-xxx-xxx1 scamper-cvs-20110421]# /usr/local/bin/scamper -c "trace -M" -i 124.83.187.140 traceroute from 10.157.37.241 to 124.83.187.140 1 10.157.36.2 4.163 ms [mtu: 1500] 2 10.1.22.9 0.378 ms [mtu: 1500] 3 175.41.192.21 0.397 ms [mtu: 1500] 4 27.0.0.165 0.321 ms [mtu: 1500] 5 27.0.0.205 7.595 ms [mtu: 1500] 6 27.0.0.188 10.107 ms [mtu: 1500] 7 61.200.80.201 7.698 ms [mtu: 1500] 8 61.200.80.134 7.857 ms [mtu: 1500] 9 61.200.82.138 7.942 ms [mtu: 1500] 10 124.83.128.26 12.923 ms [mtu: 1500] 11 124.83.128.146 9.725 ms [mtu: 1500] 12 124.83.128.146 9.852 ms !X [mtu: 1500] [root@ip-xxx-xxx-xxx-xxx1 scamper-cvs-20110421]#

その他、サーバー側のNICメモリー設定も環境毎に最適化出来る様子。

[root@colinux ~]# /sbin/sysctl -a | grep mem
net.ipv4.udp_wmem_min = 4096
net.ipv4.udp_rmem_min = 4096
net.ipv4.udp_mem = 2324160 3098880 4648320
net.ipv4.tcp_rmem = 4096 87380 4194304
net.ipv4.tcp_wmem = 4096 16384 4194304
net.ipv4.tcp_mem = 196608 262144 393216
net.ipv4.igmp_max_memberships = 20
net.core.optmem_max = 20480
net.core.rmem_default = 129024
net.core.wmem_default = 129024
net.core.rmem_max = 131071
net.core.wmem_max = 131071
vm.lowmem_reserve_ratio = 256 256 32
vm.overcommit_memory = 0
[root@colinux ~]#

[root@ip-xxx-xxx-xxx-xxx ec2-user]# /sbin/sysctl -a | grep mem
vm.overcommit_memory = 0
vm.lowmem_reserve_ratio = 256 256 32
net.core.wmem_max = 131071
net.core.rmem_max = 131071
net.core.wmem_default = 229376
net.core.rmem_default = 229376
net.core.optmem_max = 20480
net.ipv4.igmp_max_memberships = 20
net.ipv4.tcp_mem = 14679 19574 29358
net.ipv4.tcp_wmem = 4096 16384 626368
net.ipv4.tcp_rmem = 4096 87380 626368
net.ipv4.udp_mem = 14679 19574 29358
net.ipv4.udp_rmem_min = 4096
net.ipv4.udp_wmem_min = 4096
[root@ip-xxx-xxx-xxx-xxx ec2-user]#

補足:
Windowsに関しては、Windows2008からいくつか注意しておくべき事がありそうです。

TCP 受信ウィンドウの自動調整機能が機能しない正しくで Windows Server 2008 R2

All the TCP/IP ports that are in a TIME_WAIT status are not closed after 497 days

Scalable Networking Pack をご存知ですか?


HiveはHiveQLというSQL風の言語でHadoop上のデータを操作できます。
Hadoop上のデータベースというとHBaseが有名ですが、
HiveはHDFSに対してよりユーザーフレンドリなインターフェイスを提供するもので、
HBaseとは根本的に存在意義が異なります。

———————————————————————————
http://hive.apache.org/ 抜粋
———————————————————————————
Hive is a data warehouse system for Hadoop that facilitates easy data summarization,
ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems.
Hive provides a mechanism to project structure onto this data and query the data using a
SQL-like language called HiveQL. At the same time this language also allows traditional
map/reduce programmers to plug in their custom mappers and reducers when it is
inconvenient or inefficient to express this logic in HiveQL.
———————————————————————————-

Hadoop+Hive検証環境を構築してみる
SQLライクにHadoop Hiveを使い倒す!
Hadoop Hiveと MySQLの利用例(29-FEB-2012)

[root@colinux ~]# vi /home/hiveuser/.bashrc

hive user

hive user

【/home/hiveuser/.bashrcに追記】
export PATH=$PATH:/usr/java/latest/bin
export JAVA_HOME=/usr/java/latest
export HADOOP_HOME=/usr/local/hadoop

Hive Download Site
http://www.apache.org/dyn/closer.cgi/hive/

Hive Install

[root@colinux ~]# su – hadoop
[hadoop@colinux ~]$ pwd
/home/hadoop
[hiveuser@colinux ~]$ wget http://ftp.kddilabs.jp/infosystems/apache/hive/stable/hive-0.8.1.tar.gz
–2012-05-20 14:50:13– http://ftp.kddilabs.jp/infosystems/apache/hive/stable/hive-0.8.1.tar.gz
ftp.kddilabs.jp をDNSに問いあわせています… 192.26.91.193, 2001:200:601:10:206:5bff:fef0:466c
ftp.kddilabs.jp|192.26.91.193|:80 に接続しています… 接続しました。
HTTP による接続要求を送信しました、応答を待っています… 200 OK
長さ: 31325840 (30M) [application/x-gzip]
`hive-0.8.1.tar.gz’ に保存中

100%[================================================================================>] 31,325,840 2.90M/s 時間 11s

2012-05-20 14:50:24 (2.81 MB/s) – `hive-0.8.1.tar.gz’ へ保存完了 [31325840/31325840]
[hadoop@colinux ~]$

hive install

hive install

[hadoop@colinux ~]$ tar xvfz hive-0.8.1.tar.gz

[root@colinux ~]# mv /home/hadoop/hive-0.8.1 /usr/local/
[root@colinux ~]# cd /usr/local/
[root@colinux local]# ln -s hive-0.8.1 hive

バージョンアップを考えて、展開後にシンボリックリンク作成します。
[root@colinux local]# ls -l
合計 88
drwxr-xr-x 2 root root 4096 2011-12-10 10:12 bin
drwxr-xr-x 2 root root 4096 2011-12-10 10:12 etc
drwxr-xr-x 2 root root 4096 2007-04-17 21:46 games
lrwxrwxrwx 1 root root 23 2012-05-12 12:17 hadoop -> /usr/local/hadoop-1.0.1
drwxr-xr-x 15 hadoop hadoop 4096 2012-05-12 13:06 hadoop-1.0.1
lrwxrwxrwx 1 root root 10 2012-05-20 18:37 hive -> hive-0.8.1
drwxr-xr-x 9 root root 4096 2012-05-20 18:36 hive-0.8.1

[root@colinux local]# chown -R hadoop:hadoop hive/

Hadoopを起動してjpsで起動確認します。

[hadoop@colinux ~]$ /usr/local/hadoop/bin/hadoop namenode -format
12/05/26 09:21:10 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = colinux/127.0.0.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.0.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1243785; compiled by ‘hortonfo’ on Tue Feb 14 08:15:38 UTC 2012
************************************************************/
Re-format filesystem in /tmp/hadoop-hadoop/dfs/name ? (Y or N) Y
12/05/26 09:21:16 INFO util.GSet: VM type = 32-bit
12/05/26 09:21:16 INFO util.GSet: 2% max memory = 19.33375 MB
12/05/26 09:21:16 INFO util.GSet: capacity = 2^22 = 4194304 entries
12/05/26 09:21:16 INFO util.GSet: recommended=4194304, actual=4194304
12/05/26 09:21:17 INFO namenode.FSNamesystem: fsOwner=hadoop
12/05/26 09:21:17 INFO namenode.FSNamesystem: supergroup=supergroup
12/05/26 09:21:17 INFO namenode.FSNamesystem: isPermissionEnabled=true
12/05/26 09:21:17 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
12/05/26 09:21:17 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
12/05/26 09:21:17 INFO namenode.NameNode: Caching file names occuring more than 10 times
12/05/26 09:21:18 INFO common.Storage: Image file of size 112 saved in 0 seconds.
12/05/26 09:21:18 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.
12/05/26 09:21:18 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at colinux/127.0.0.1
************************************************************/
[hadoop@colinux ~]$ /usr/local/hadoop/bin/start-all.sh
starting namenode, logging to /usr/local/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-namenode-colinux.out
localhost: starting datanode, logging to /usr/local/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-datanode-colinux.out
localhost: starting secondarynamenode, logging to /usr/local/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-secondarynamenode-colinux.out
starting jobtracker, logging to /usr/local/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-jobtracker-colinux.out
localhost: starting tasktracker, logging to /usr/local/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-tasktracker-colinux.out
[hadoop@colinux ~]$

[hadoop@colinux ~]$ jps
3438 Jps
3270 JobTracker
3402 TaskTracker
3058 DataNode
3185 SecondaryNameNode
[hadoop@colinux ~]$

テスト用のデータファイルをダウンロードしてHadoopとHiveの動作検証。

[hadoop@colinux ~]$ mkdir ~/localfiles
[hadoop@colinux ~]$ cd localfiles/
[hadoop@colinux localfiles]$ wget http://www.atmarkit.co.jp/fdb/single/s_hive/dl/data.tar.gz
–2012-05-26 08:55:18– http://www.atmarkit.co.jp/fdb/single/s_hive/dl/data.tar.gz
www.atmarkit.co.jp をDNSに問いあわせています… 202.218.219.147
www.atmarkit.co.jp|202.218.219.147|:80 に接続しています… 接続しました。
HTTP による接続要求を送信しました、応答を待っています… 200 OK
長さ: 2071417 (2.0M) [application/x-tar]
`data.tar.gz’ に保存中

100%[=========================================================================================>] 2,071,417 705K/s 時間 2.9s

2012-05-26 08:55:21 (705 KB/s) – `data.tar.gz’ へ保存完了 [2071417/2071417]

[hadoop@colinux localfiles]$

テスト用のファイルを展開して、Hiveコマンドを実行。
[hadoop@colinux ~]$ /usr/local/hive/bin/hive

hive> CREATE TABLE pref (id int, pref STRING)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’ LINES TERMINATED BY ‘\n’;

hive create table

hive create table

hive> desc pref;
OK
id int
pref string
Time taken: 0.61 seconds
hive>

hive> LOAD DATA LOCAL INPATH ‘/home/hadoop/localfiles/pref.csv’ OVERWRITE INTO TABLE pref;

load data

load data

SELECTしてデータの確認。

Hadoop job information for Stage

hive> select A.pref from pref A;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there’s no reduce operator
Starting Job = job_201205260921_0003, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201205260921_0003
Kill Command = /usr/local/hadoop-1.0.1/libexec/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201205260921_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2012-05-26 09:45:36,731 Stage-1 map = 0%, reduce = 0%
2012-05-26 09:45:45,791 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.5 sec
2012-05-26 09:45:46,811 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.5 sec
2012-05-26 09:45:47,831 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.5 sec
2012-05-26 09:45:48,831 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.5 sec
2012-05-26 09:45:49,851 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.5 sec
2012-05-26 09:45:50,881 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.5 sec
2012-05-26 09:45:51,891 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.5 sec
2012-05-26 09:45:52,911 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.5 sec
2012-05-26 09:45:53,931 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.5 sec
2012-05-26 09:45:55,511 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.5 sec
MapReduce Total cumulative CPU time: 1 seconds 500 msec
Ended Job = job_201205260921_0003
MapReduce Jobs Launched:
Job 0: Map: 1 Accumulative CPU: 1.5 sec HDFS Read: 820 HDFS Write: 479 SUCESS
Total MapReduce CPU Time Spent: 1 seconds 500 msec
OK
北海道
青森県
岩手県
宮城県
秋田県

注:先日作成したhadoopユーザーを利用するので,hiveアカウントは利用しませんでした。

select

select


RDBMS(MS SQL,Oracle,MYSQL等)はOLTP等のリアルタイムのデータ分析、
またNoSQLが最近の世の中のコンテンツリッチ化や大容量データ分析(NOT REALTIME)
のDWH系選択肢として話題になっているので少しレビューしてみたいと思います。
まずは自分のクライアントで手軽に試せるのでWindows版でテストしてみました。

MongoDB is an open source, document-oriented database designed with both scalability
and developer agility in mind. Instead of storing your data in tables and rows as you
would with a relational database, in MongoDB you store JSON-like documents with dynamic schemas.
The goal of MongoDB is to bridge the gap between key-value stores (which are fast and scalable)
and relational databases (which have rich functionality).

MongoDBは、Shardingによって水平スケーリングが可能である。
これはBigTableやPNUTSのスケーリングモデルに非常に良く似ている。
開発者はshardキーを選択し、このキーがデータをどのように分散させるかを決める。
データ中のキーを元にデータを分散させるshardが決められる。

関連ドキュメント
Presentations Matching: MongoDB Tokyo 2012

http://www.10gen.com/
http://www.mongodb.org/
http://www.mongodb.org/downloads

対応OS
OS X 32-bit
note OS X 64-bit Linux 32-bit
note Linux 64-bit Windows 32-bit
note Windows 64-bit Solaris i86pc
note Solaris 64 Source

MongoDB on Windows
http://www.mongodb.org/downloads

Windows7での利用
mongodb-win32-x86_64-2.0.5.zipダウンロード

インストール手順
http://www.mongodb.org/display/DOCS/Quickstart+Windows

① Unzip
Unzip the downloaded binary package to the location of your choice.
You may want to rename mongo-xxxxxxx to just “mongo” for convenience.

② Create a data directory
By default MongoDB will store data in \data\db, but it won’t automatically
create that folder.

※If you prefer to place datafiles elsewhere,
 use the –dbpath command line parameter when starting mongod.exe.
 他のパスにデータを配置したい場合は、MYSQLでいうdatadirのようにデータパスを
 起動時に指定してあげれば良いようです。

ダウンロードした実行ファイルとデータ用にフォルダーを作成しました。

mongodb install

③ Run and connect to the server

C:\mongodb\bin>mongod
mongod –help for help and startup options
Sun May 20 08:20:03 [initandlisten] MongoDB starting : pid=10660 port=27017 dbpath=/data/db 64-bit host=any-place
Sun May 20 08:20:03 [initandlisten] db version v2.0.5, pdfile version 4.5
Sun May 20 08:20:03 [initandlisten] git version: 1bb4de4630302fad8af53824ca4f627db490b753
Sun May 20 08:20:03 [initandlisten] build info: windows sys.getwindowsversion(major=6, minor=1, build=7601, platform=2, service_pack=’Service Pack 1′) BOOST_LIB_VERSION=1_42
Sun May 20 08:20:03 [initandlisten] options: {}
Sun May 20 08:20:03 [initandlisten] journal dir=/data/db/journal
Sun May 20 08:20:03 [initandlisten] recover : no journal files present, no recovery needed
Sun May 20 08:20:04 [initandlisten] waiting for connections on port 27017
Sun May 20 08:20:04 [websvr] admin web console waiting for connections on port 28017
Sun May 20 08:21:04 [clientcursormon] mem (MB) res:20 virt:82 mapped:0

Note: It is also possible to run the server as a Windows Service.

mongod start

④ Start the administrative shell

C:\mongodb\bin>mongo
MongoDB shell version: 2.0.5
connecting to: test
> 3+3
6
> db
test
> db.foo.insert( { a : 1 } )
> db.foo.find()
{ “_id” : ObjectId(“4fb82c2961d4fad4249cd4a5”), “a” : 1 }
> show dbs
local (empty)
test 0.078125GB
> show collections
foo
system.indexes
>

データをINSERT,SAVE,FIND出来ます。
基本的には、”Key”と”Value”がペアになっています。

> db.foo.insert( { d : 4 } )
> db.foo.insert( { e : 5 } )
> db.foo.find()
{ “_id” : ObjectId(“4fb82c2961d4fad4249cd4a5”), “a” : 1 }
{ “_id” : ObjectId(“4fb832e8606ee417ccf80408”), “b” : 2 }
{ “_id” : ObjectId(“4fb8331c606ee417ccf80409”), “c” : 3 }
{ “_id” : ObjectId(“4fb833a4606ee417ccf8040a”), “d” : 4 }
{ “_id” : ObjectId(“4fb833ab606ee417ccf8040b”), “e” : 5 }

> db.foo.save( { f : 6 } )
> db.foo.find()
{ “_id” : ObjectId(“4fb82c2961d4fad4249cd4a5”), “a” : 1 }
{ “_id” : ObjectId(“4fb832e8606ee417ccf80408”), “b” : 2 }
{ “_id” : ObjectId(“4fb8331c606ee417ccf80409”), “c” : 3 }
{ “_id” : ObjectId(“4fb833a4606ee417ccf8040a”), “d” : 4 }
{ “_id” : ObjectId(“4fb833ab606ee417ccf8040b”), “e” : 5 }
{ “_id” : ObjectId(“4fb834a3606ee417ccf8040c”), “f” : 6 }
>

> db.foo.find({“e”:5})
{ “_id” : ObjectId(“4fb833ab606ee417ccf8040b”), “e” : 5 }
>

mongod and mongo

mongod and mongo

———————————————————–
HELP内容
———————————————————–
db.help() help on db methods
db.mycoll.help() help on collection methods
rs.help() help on replica set methods
help admin administrative help
help connect connecting to a db help
help keys key shortcuts
help misc misc things to know
help mr mapreduce

show dbs show database names
show collections show collections in current database
show users show users in current database
show profile show most recent system.profile entries with time >= 1ms
show logs show the accessible logger names
show log [name] prints out the last segment of log in memory, ‘global’ is default
use set current database
db.foo.find() list objects in collection foo
db.foo.find( { a : 1 } ) list objects in foo where a == 1
it result of the last line evaluated; use to further iterate
DBQuery.shellBatchSize = x set default number of items to display on shell
exit quit the mongo shell
———————————————————–

データフォルダーの確認

mongodb data dir

mongodb data dir

http://localhost:28017/にて状況確認

mongodb process confirmation

mongodb process confirmation

Drivers
MongoDB currently has client support for the following programming languages:
http://www.mongodb.org/display/DOCS/Drivers

mongodb.org Supported

C
C++
Erlang
Haskell
Java
Javascript
.NET (C# F#, PowerShell, etc)
Node.js
Perl
PHP
Python
Ruby
Scala

Community Supportedには,その他色々あるようです。
Linux版、Hadoopとの連携等はまた後日確認。


【オリジナルサイト】
Welcome to Apache™ Hadoop™!
Hadoop Common Releases

Other Info
Linux と Hadoop による分散コンピューティング

【JAVAのインストール】
■ JDKインストール Java SE Development Kit(JDK、v1.6以上推奨)
http://www.oracle.com/technetwork/java/javase/downloads/index.html
http://www.oracle.com/technetwork/java/javase/downloads/jdk-6u32-downloads-1594644.html

[root@colinux hadoop]# ls -l
合計 67060
-rw-rw-r– 1 root root 68593311 2012-05-12 08:03 jdk-6u32-linux-i586-rpm.bin
[root@colinux hadoop]# chmod 755 jdk-6u32-linux-i586-rpm.bin

[root@colinux hadoop]# ./jdk-6u32-linux-i586-rpm.bin
Unpacking…
Checksumming…
Extracting…
UnZipSFX 5.50 of 17 February 2002, by Info-ZIP (Zip-Bugs@lists.wku.edu).
inflating: jdk-6u32-linux-i586.rpm
inflating: sun-javadb-common-10.6.2-1.1.i386.rpm
inflating: sun-javadb-core-10.6.2-1.1.i386.rpm
inflating: sun-javadb-client-10.6.2-1.1.i386.rpm
inflating: sun-javadb-demo-10.6.2-1.1.i386.rpm
inflating: sun-javadb-docs-10.6.2-1.1.i386.rpm
inflating: sun-javadb-javadoc-10.6.2-1.1.i386.rpm
準備中… ########################################### [100%]
1:jdk ########################################### [100%]
Unpacking JAR files…
rt.jar…
jsse.jar…
charsets.jar…
tools.jar…
localedata.jar…
plugin.jar…
javaws.jar…
deploy.jar…
Installing JavaDB
準備中… ########################################### [100%]
1:sun-javadb-common ########################################### [ 17%]
2:sun-javadb-core ########################################### [ 33%]
3:sun-javadb-client ########################################### [ 50%]
4:sun-javadb-demo ########################################### [ 67%]
5:sun-javadb-docs ########################################### [ 83%]
6:sun-javadb-javadoc ########################################### [100%]

Java(TM) SE Development Kit 6 successfully installed.

Product Registration is FREE and includes many benefits:
* Notification of new versions, patches, and updates
* Special offers on Oracle products, services and training
* Access to early releases and documentation

[root@colinux hadoop]# java -version
java version “1.6.0_32”
Java(TM) SE Runtime Environment (build 1.6.0_32-b05)
Java HotSpot(TM) Client VM (build 20.7-b02, mixed mode, sharing)
[root@colinux hadoop]#

【 Hadoopインストール】
※2012年5月現在
http://hadoop.apache.org/common/releases.html#Download
http://ftp.kddilabs.jp/infosystems/apache/hadoop/common/
http://ftp.kddilabs.jp/infosystems/apache/hadoop/common/stable/

1.0.X – current stable version, 1.0 release
1.1.X – current beta version, 1.1 release
0.23.X – current alpha version, MR2
0.22.X – does not include security
0.20.203.X – legacy stable version
0.20.X – legacy version

リリースノート
http://ftp.kddilabs.jp/infosystems/apache/hadoop/common/stable/RELEASE_NOTES_HADOOP-1.0.1.html

[root@colinux hadoop]# wget http://ftp.kddilabs.jp/infosystems/apache/hadoop/common/stable/hadoop-1.0.1.tar.gz
–2012-05-12 12:11:03– http://ftp.kddilabs.jp/infosystems/apache/hadoop/common/stable/hadoop-1.0.1.tar.gz
ftp.kddilabs.jp をDNSに問いあわせています… 192.26.91.193, 2001:200:601:10:206:5bff:fef0:466c
ftp.kddilabs.jp|192.26.91.193|:80 に接続しています… 接続しました。
HTTP による接続要求を送信しました、応答を待っています… 200 OK
長さ: 60811130 (58M) [application/x-gzip]
`hadoop-1.0.1.tar.gz’ に保存中

100%[=============================================================================================================================>] 60,811,130 3.24M/s 時間 21s

2012-05-12 12:11:24 (2.72 MB/s) – `hadoop-1.0.1.tar.gz’ へ保存完了 [60811130/60811130]

[root@colinux hadoop]#

[root@colinux hadoop]# mv hadoop-1.0.1.tar.gz /usr/local/
[root@colinux local]# pwd
/usr/local
[root@colinux local]# tar zxf hadoop-1.0.1.tar.gz
[root@colinux local]#

[root@colinux local]# ls -l
合計 84
drwxr-xr-x 2 root root 4096 2011-12-10 10:12 bin
drwxr-xr-x 2 root root 4096 2011-12-10 10:12 etc
drwxr-xr-x 2 root root 4096 2007-04-17 21:46 games
drwxr-xr-x 14 root root 4096 2012-02-14 17:18 hadoop-1.0.1
drwxr-xr-x 3 root root 4096 2011-11-05 09:00 include
drwxr-xr-x 2 root root 4096 2007-04-17 21:46 lib
drwxr-xr-x 2 root root 4096 2007-04-17 21:46 libexec
lrwxrwxrwx 1 root root 38 2009-12-26 01:40 mysql -> mysql-5.5.0-m2-linux-i686-icc-glibc23/
drwxr-xr-x 14 mysql mysql 4096 2009-12-22 00:23 mysql-5.1.41-linux-i686-icc-glibc23
drwxr-xr-x 14 mysql mysql 4096 2009-12-26 01:37 mysql-5.5.0-m2-linux-i686-icc-glibc23
drwxr-xr-x 2 root root 4096 2007-04-17 21:46 sbin
drwxr-xr-x 6 root root 4096 2011-12-10 10:12 share
drwxr-xr-x 2 root root 4096 2011-01-09 17:14 src
drwxrwxrwt 2 root root 40 2012-05-12 06:49 tmp
[root@colinux local]# ln -s /usr/local/hadoop-1.0.1 /usr/local/hadoop
[root@colinux local]# ls -l
合計 84
drwxr-xr-x 2 root root 4096 2011-12-10 10:12 bin
drwxr-xr-x 2 root root 4096 2011-12-10 10:12 etc
drwxr-xr-x 2 root root 4096 2007-04-17 21:46 games
lrwxrwxrwx 1 root root 23 2012-05-12 12:17 hadoop -> /usr/local/hadoop-1.0.1
drwxr-xr-x 14 root root 4096 2012-02-14 17:18 hadoop-1.0.1
drwxr-xr-x 3 root root 4096 2011-11-05 09:00 include
drwxr-xr-x 2 root root 4096 2007-04-17 21:46 lib
drwxr-xr-x 2 root root 4096 2007-04-17 21:46 libexec
lrwxrwxrwx 1 root root 38 2009-12-26 01:40 mysql -> mysql-5.5.0-m2-linux-i686-icc-glibc23/
drwxr-xr-x 14 mysql mysql 4096 2009-12-22 00:23 mysql-5.1.41-linux-i686-icc-glibc23
drwxr-xr-x 14 mysql mysql 4096 2009-12-26 01:37 mysql-5.5.0-m2-linux-i686-icc-glibc23
drwxr-xr-x 2 root root 4096 2007-04-17 21:46 sbin
drwxr-xr-x 6 root root 4096 2011-12-10 10:12 share
drwxr-xr-x 2 root root 4096 2011-01-09 17:14 src
drwxrwxrwt 2 root root 40 2012-05-12 06:49 tmp
[root@colinux local]#

【Hadoopサービスアカウント設定(パス無し鍵認証)】

[root@colinux local]# /usr/sbin/useradd hadoop
[root@colinux local]# chown -R hadoop:hadoop /usr/local/hadoop-1.0.1
[root@colinux local]#
[root@colinux local]# passwd hadoop
Changing password for user hadoop.
新しいUNIX パスワード:
新しいUNIX パスワードを再入力してください:
passwd: all authentication tokens updated successfully.
[root@colinux local]#
[root@colinux local]# id hadoop
uid=503(hadoop) gid=503(hadoop) 所属グループ=503(hadoop)
[root@colinux local]#

[root@colinux local]# su – hadoop
[hadoop@colinux ~]$ ssh-keygen -t dsa -P ” -f ~/.ssh/id_dsa
Generating public/private dsa key pair.
Created directory ‘/home/hadoop/.ssh’.
Your identification has been saved in /home/hadoop/.ssh/id_dsa.
Your public key has been saved in /home/hadoop/.ssh/id_dsa.pub.
The key fingerprint is:
d0:5c:57:22:9b:8e:38:97:e4:47:0f:ac:08:13:4c:ae hadoop@colinux
[hadoop@colinux ~]$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
[hadoop@colinux ~]$ chmod 600 ~/.ssh/authorized_keys
[hadoop@colinux ~]$

[hadoop@colinux ~]$ ssh localhost
The authenticity of host ‘localhost (127.0.0.1)’ can’t be established.
RSA key fingerprint is a2:b7:25:e3:78:61:15:2a:59:ed:fb:9f:1c:e7:94:db.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added ‘localhost’ (RSA) to the list of known hosts.
[hadoop@colinux ~]$exit
[hadoop@colinux ~]$ ssh localhost
Last login: Sat May 12 12:31:20 2012 from localhost.localdomain
[hadoop@colinux ~]$

[hadoop@colinux ~]$ ls -l /usr/java/
合計 4
lrwxrwxrwx 1 root root 16 2012-05-12 08:11 default -> /usr/java/latest
drwxr-xr-x 7 root root 4096 2012-05-12 08:11 jdk1.6.0_32
lrwxrwxrwx 1 root root 21 2012-05-12 08:11 latest -> /usr/java/jdk1.6.0_32
[hadoop@colinux ~]$

【HADOOP設定ファイル変更】

[hadoop@colinux ~]$ cd /usr/local/hadoop-1.0.1/conf/

[hadoop@colinux conf]$ vi hadoop-env.sh
# Set Hadoop-specific environment variables here.
—————————————————-
# The java implementation to use. Required.
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun
export JAVA_HOME=/usr/java/default
# Extra Java CLASSPATH elements. Optional.
# export HADOOP_CLASSPATH=
—————————————————-
[hadoop@colinux conf]$ vi core-site.xml
[hadoop@colinux conf]$ cat core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

[hadoop@colinux conf]$
[hadoop@colinux conf]$ vi hdfs-site.xml
[hadoop@colinux conf]$ cat hdfs-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

[hadoop@colinux conf]$
[hadoop@colinux conf]$ vi mapred-site.xml
[hadoop@colinux conf]$ cat mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
     <property>
         <name>mapred.job.tracker</name>
         <value>localhost:9001</value>
     </property>
</configuration>

[hadoop@colinux conf]$

【初期設定とサービスの開始】

[hadoop@colinux conf]$ /usr/local/hadoop/bin/hadoop namenode -format
12/05/12 13:05:05 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = colinux/127.0.0.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.0.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1243785; compiled by ‘hortonfo’ on Tue Feb 14 08:15:38 UTC 2012
************************************************************/
12/05/12 13:05:06 INFO util.GSet: VM type = 32-bit
12/05/12 13:05:06 INFO util.GSet: 2% max memory = 19.33375 MB
12/05/12 13:05:06 INFO util.GSet: capacity = 2^22 = 4194304 entries
12/05/12 13:05:06 INFO util.GSet: recommended=4194304, actual=4194304
12/05/12 13:05:08 INFO namenode.FSNamesystem: fsOwner=hadoop
12/05/12 13:05:08 INFO namenode.FSNamesystem: supergroup=supergroup
12/05/12 13:05:08 INFO namenode.FSNamesystem: isPermissionEnabled=true
12/05/12 13:05:08 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
12/05/12 13:05:08 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
12/05/12 13:05:08 INFO namenode.NameNode: Caching file names occuring more than 10 times
12/05/12 13:05:09 INFO common.Storage: Image file of size 112 saved in 0 seconds.
12/05/12 13:05:10 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.
12/05/12 13:05:10 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at colinux/127.0.0.1
************************************************************/
[hadoop@colinux conf]$

[hadoop@colinux conf]$ /usr/local/hadoop/bin/start-all.sh
starting namenode, logging to /usr/local/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-namenode-colinux.out
localhost: starting datanode, logging to /usr/local/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-datanode-colinux.out
localhost: starting secondarynamenode, logging to /usr/local/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-secondarynamenode-colinux.out
starting jobtracker, logging to /usr/local/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-jobtracker-colinux.out
localhost: starting tasktracker, logging to /usr/local/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-tasktracker-colinux.out
[hadoop@colinux conf]$

[hadoop@colinux conf]$ jps
4689 Jps
4313 SecondaryNameNode
4062 NameNode
4186 DataNode
4561 TaskTracker
4399 JobTracker
[hadoop@colinux conf]$

【基本設定確認】
NameNode
$ http://localhost:50070/
 例) http://192.168.0.2:50070/dfshealth.jsp

JobTracker
$ http://localhost:50030/
 例)http://192.168.0.2:50030/jobtracker.jsp

【サンプルテスト】

[hadoop@colinux hadoop]$ ./bin/hadoop jar hadoop-examples-1.0.1.jar pi 4 1000
Number of Maps = 4
Samples per Map = 1000
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Starting Job
12/05/12 13:24:36 INFO mapred.FileInputFormat: Total input paths to process : 4
12/05/12 13:24:37 INFO mapred.JobClient: Running job: job_201205121308_0001
12/05/12 13:24:38 INFO mapred.JobClient: map 0% reduce 0%
12/05/12 13:25:35 INFO mapred.JobClient: map 25% reduce 0%
12/05/12 13:25:50 INFO mapred.JobClient: map 50% reduce 0%
12/05/12 13:26:35 INFO mapred.JobClient: map 75% reduce 0%
12/05/12 13:26:54 INFO mapred.JobClient: map 75% reduce 16%
12/05/12 13:27:09 INFO mapred.JobClient: map 100% reduce 25%
12/05/12 13:27:16 INFO mapred.JobClient: map 100% reduce 33%
12/05/12 13:27:29 INFO mapred.JobClient: map 100% reduce 100%
12/05/12 13:27:43 INFO mapred.JobClient: Job complete: job_201205121308_0001
12/05/12 13:27:45 INFO mapred.JobClient: Counters: 30
12/05/12 13:27:45 INFO mapred.JobClient: Job Counters
12/05/12 13:27:45 INFO mapred.JobClient: Launched reduce tasks=1
12/05/12 13:27:45 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=245935
12/05/12 13:27:45 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/05/12 13:27:45 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/05/12 13:27:45 INFO mapred.JobClient: Launched map tasks=4
12/05/12 13:27:45 INFO mapred.JobClient: Data-local map tasks=4
12/05/12 13:27:45 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=110001
12/05/12 13:27:45 INFO mapred.JobClient: File Input Format Counters
12/05/12 13:27:45 INFO mapred.JobClient: Bytes Read=472
12/05/12 13:27:45 INFO mapred.JobClient: File Output Format Counters
12/05/12 13:27:45 INFO mapred.JobClient: Bytes Written=97
12/05/12 13:27:45 INFO mapred.JobClient: FileSystemCounters
12/05/12 13:27:45 INFO mapred.JobClient: FILE_BYTES_READ=94
12/05/12 13:27:45 INFO mapred.JobClient: HDFS_BYTES_READ=964
12/05/12 13:27:45 INFO mapred.JobClient: FILE_BYTES_WRITTEN=108240
12/05/12 13:27:45 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=215
12/05/12 13:27:45 INFO mapred.JobClient: Map-Reduce Framework
12/05/12 13:27:45 INFO mapred.JobClient: Map output materialized bytes=112
12/05/12 13:27:45 INFO mapred.JobClient: Map input records=4
12/05/12 13:27:45 INFO mapred.JobClient: Reduce shuffle bytes=112
12/05/12 13:27:45 INFO mapred.JobClient: Spilled Records=16
12/05/12 13:27:45 INFO mapred.JobClient: Map output bytes=72
12/05/12 13:27:45 INFO mapred.JobClient: Total committed heap usage (bytes)=816316416
12/05/12 13:27:45 INFO mapred.JobClient: CPU time spent (ms)=44840
12/05/12 13:27:45 INFO mapred.JobClient: Map input bytes=96
12/05/12 13:27:45 INFO mapred.JobClient: SPLIT_RAW_BYTES=492
12/05/12 13:27:45 INFO mapred.JobClient: Combine input records=0
12/05/12 13:27:45 INFO mapred.JobClient: Reduce input records=8
12/05/12 13:27:45 INFO mapred.JobClient: Reduce input groups=8
12/05/12 13:27:45 INFO mapred.JobClient: Combine output records=0
12/05/12 13:27:45 INFO mapred.JobClient: Physical memory (bytes) snapshot=451342336
12/05/12 13:27:45 INFO mapred.JobClient: Reduce output records=0
12/05/12 13:27:45 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1873936384
12/05/12 13:27:45 INFO mapred.JobClient: Map output records=8
Job Finished in 189.673 seconds
Estimated value of Pi is 3.14000000000000000000
[hadoop@colinux hadoop]$

[hadoop@colinux hadoop]$ /usr/local/hadoop/bin/stop-all.sh
stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode
[hadoop@colinux hadoop]$

ジョブ実行中の管理画面①

ジョブ実行中の管理画面②

ジョブ実行中の管理画面③ 実行時間など

ジョブ実行中の管理画面④ その他詳細

ジョブ実行中の管理画面⑤ 

参考サイト
Apache Hadoopプロジェクトとは何か?

Hadoop入門 – Hadoopと高可用性(09-MAY-2012)

Hadoopをインストールし使ってみる(06-APR-2011)

Nagiosで Hadoopを監視する(21-APR-2011)

Gangliaで Hadoopを監視する(22-APR-2011)


Google Maps API Version 3についての確認

1)API Key不要(V2までは取得が必要だった)
2)iPhone/Android対応
3)JavaScriptの書き方変更

    ヘッダーに以下のGoogle Map API V3の指定箇所を追記。

それぞれ、”script”タグで閉じる。

個人の場合
src=”http://maps.google.com/maps/api/js?v=3&sensor=false type=”text/javascript”

Googleと契約している場合
src=”http://maps.google.com/maps/api/js?v=3&sensor=false&client=gme-指定されたID&channel=チャネル指定用” type=”text/javascript”

function initialize() {
var latlng = new google.maps.LatLng(35.710269,139.811146);
var opts = {
zoom: 15,
center: latlng,
mapTypeId: google.maps.MapTypeId.ROADMAP
};
var map = new google.maps.Map(document.getElementById(“map_v3”), opts);
}

sensor=
位置検知センサー(GPS)があるスマートフォンなどの機器ではsensor=true
一般的なパソコン用の地図なら、sensor=false とするそうです。

    BODYに以下のGoogle Map APIを呼び出す追記

BODY BGCOLOR=”#ffffee” onload=”initialize()”
………………………….
div id=”map_v3″ style=”width: 500px; height: 500px”
………………………….

Google Map Ver3

Google Map Ver3

注意:Google Map API for BusinessでクライアントIDやChannelを指定する場合は、Defaultだと地図上に詳細(ガスステーション、ホテル)
などが表示されない場合があるので以下の設定を必要に応じて調整する必要があります。

Google Maps JavaScript API V3 のマップ タイプ

抜粋
————————————————————————-
visibility(on、off、または simplified)は、地図に要素を表示するかどうかとその表示方法を指定します。
simplified 表示状態は、地図でこれらの要素の表現方法を見やすいように簡略化することを指定します
(たとえば、道路の構造を簡略化すると表示される道路の数が少なくなる可能性があります)。

参考:緯度経度
http://www.geocoding.jp/
http://gmaps-samples-v3.googlecode.com/svn/trunk/styledmaps/wizard/index.html