wgetのバージョンにより、cssの扱い方が変わると聞いたので確認してみた。

http://ja.wikipedia.org/wiki/GNU_Wget
Wget 1.12 (2009年9月リリース)ウェブ上のCSSからのURL解析機能と国際化資源識別子(IRI)の取り扱いが追加された。

http://wget.addictivecode.org/FrequentlyAskedQuestions?action=show&redirect=Faq

5.2. Can Wget download links found in CSS?
Thanks to code supplied by Ted Mielczarek, Wget can now parse embedded CSS stylesheet data
and text/css files to find additional links for recursion, as of version 1.12.

■比較
——————————————————————————–

[root@colinux tools]# wget --version
GNU Wget 1.10.2 (Red Hat modified)
[root@colinux 1.10.2]# wget -p -H http://kakaku.com/

終了しました --10:09:00--
ダウンロード完了: 519,822 バイト、76 ファイル
[root@colinux 1.10.2]#

[root@colinux wget-1.13]# wget --version
GNU Wget 1.13 built on linux-gnu.
[root@colinux 1.13]# wget -p -H http://kakaku.com/

FINISHED --2011-12-10 10:15:15--
Total wall clock time: 4.6s
Downloaded: 121 files, 601K in 0.3s (2.35 MB/s)

[root@colinux test]# wget -p -H -e robots=off http://kakaku.com/

FINISHED --2011-12-10 11:09:12--
Total wall clock time: 6.1s
Downloaded: 136 files, 727K in 0.3s (2.54 MB/s)
[root@colinux test]#

——————————————————————————–


CSS1.13ではCSSに書かれている画像全てをダウンロードしてきている。
htmlを表示する為に、必要な画像以外もダウンロードしてきている?

[root@colinux wget_compare]# ls -l 1.10.2/ 1.13/
1.10.2/:
合計 24
drwxr-xr-x 3 root root 4096 2011-12-10 10:23 image.akiba.kakaku.com
drwxr-xr-x 5 root root 4096 2011-12-10 10:23 img.kakaku.com
drwxr-xr-x 3 root root 4096 2011-12-10 10:23 img2.kakaku.k-img.com
drwxr-xr-x 4 root root 4096 2011-12-10 10:23 kakaku.com
drwxr-xr-x 2 root root 4096 2011-12-10 10:23 notice.kakaku.com
drwxr-xr-x 2 root root 4096 2011-12-10 10:23 www.googleadservices.com

1.13/:
合計 24
drwxr-xr-x 3 root root 4096 2011-12-10 10:28 image.akiba.kakaku.com
drwxr-xr-x 5 root root 4096 2011-12-10 10:28 img.kakaku.com
drwxr-xr-x 3 root root 4096 2011-12-10 10:28 img2.kakaku.k-img.com
drwxr-xr-x 4 root root 4096 2011-12-10 10:28 kakaku.com
drwxr-xr-x 2 root root 4096 2011-12-10 10:28 notice.kakaku.com
drwxr-xr-x 2 root root 4096 2011-12-10 10:28 www.googleadservices.com
[root@colinux wget_compare]#

[root@colinux wget_compare]# cat wget_1.10.log | egrep -i http:// | awk ‘{print $2}’ > wget_1.10_http.log
[root@colinux wget_compare]# cat wget_1.13.log | egrep -i http:// | awk ‘{print $3}’ > wget_1.13_http.log

[root@colinux wget_compare]# diff wget_1.10_http.log wget_1.13_http.log
77a78,122
> http://img.kakaku.com/images/home/home_header_bg.gif
> http://img.kakaku.com/images/icon_login.gif
> http://img.kakaku.com/images/icon_guide.gif
> http://img.kakaku.com/images/icon_register.gif
> http://img.kakaku.com/images/icon_mypage.gif
> http://img.kakaku.com/images/icon_history.gif
> http://img.kakaku.com/images/h1_btm.gif
> http://img.kakaku.com/images/h1bg.gif
> http://img.kakaku.com/images/itemview/item/bm_tweetn-ja.png
> http://img.kakaku.com/images/itemview/item/icon_guide.gif
> http://img.kakaku.com/images/dot_999999.gif
> http://img.kakaku.com/images/itemview/item/arrow_pagetop.gif
> http://img.kakaku.com/images/article/pickup/template/link_bk.jpg
> http://img.kakaku.com/images/home/arrow_next01.gif
> http://img.kakaku.com/images/itemview/item/tab_bar_default.gif
> http://img.kakaku.com/images/balloonhelp/balloon_tp.png
> http://img.kakaku.com/images/balloonhelp/balloon_tp2.png
> http://img.kakaku.com/images/balloonhelp/balloon_tp3.png
> http://img.kakaku.com/images/balloonhelp/balloon_md.png
> http://img.kakaku.com/images/balloonhelp/balloon_bt.png
> http://img.kakaku.com/images/balloonhelp/balloon_bt2.png
> http://img.kakaku.com/images/balloonhelp/balloon_bt3.png
> http://img.kakaku.com/images/category/btn_search_sub.gif
> http://img.kakaku.com/images/itemlist/btn_search.gif
> http://img.kakaku.com/images/home/icon_all.png
> http://img.kakaku.com/images/home/box_bg_in.png
> http://img.kakaku.com/images/home/box_bg.png
> http://img.kakaku.com/images/home/h2_top_all.png
> http://img.kakaku.com/images/home/dotline01.gif
> http://img.kakaku.com/images/home/bg_search.png
> http://img.kakaku.com/images/home/bg_category.png
> http://img.kakaku.com/images/home/home_icon_category.png
> http://img.kakaku.com/images/home/home_icon_group.png
> http://img.kakaku.com/images/home/icon_all.gif
> http://img.kakaku.com/images/home/icon_slider.png
> http://img.kakaku.com/images/home/h2_sub_all.png
> http://img.kakaku.com/images/home/icon_reviewall.gif
> http://img.kakaku.com/images/home/icon_mag.png
> http://img.kakaku.com/images/home/icon_akiba_all.png
> http://img.kakaku.com/images/home/trendnews_category.png
> http://img.kakaku.com/images/home/icon_tv.png
> http://img.kakaku.com/images/home/menu_boxall_h2.png
> http://img.kakaku.com/images/home/dotline02.gif
> http://img.kakaku.com/images/home/attention_arrow.gif
> http://img.kakaku.com/images/home/menu_group_h2.gif
[root@colinux wget_compare]#

[root@colinux css]# cat global_new.css | grep home_header_bg.gif
background: url(http://img.kakaku.com/images/home/home_header_bg.gif) repeat-x left bottom;
[root@colinux css]#

[root@colinux css]# cat home_common.css | grep attention_arrow.gif
background:url(http://img.kakaku.com/images/home/attention_arrow.gif) no-repeat left top;
[root@colinux css]#

wgetのバージョンによって挙動がかわっている事は確認出来た。

■インストール
——————————————————————————–

[root@colinux wget-1.13]# wget http://ftp.gnu.org/gnu/wget/wget-1.13.tar.gz
[root@colinux wget-1.13]# tar zxvf wget-1.13.tar.gz
[root@colinux wget-1.13]# ./configure --with-ssl=openssl
[root@colinux wget-1.13]# make
[root@colinux wget-1.13]# make install
[root@colinux wget-1.13]# whereis wget
wget: /usr/local/bin/wget
[root@colinux wget-1.13]# ln -s /usr/local/bin/wget /usr/bin/wget
[root@colinux wget-1.13]# wget --version
GNU Wget 1.13 built on linux-gnu.

■古いwgetをアンインストール
———————————————————————————

[root@colinux wget-1.13]# yum list installed | grep wget
wget.i386 1.10.2-15.fc7 installed
[root@colinux wget-1.13]# yum remove wget.i386
Setting up Remove Process
fedora 100% |=========================| 2.1 kB 00:00
updates 100% |=========================| 2.3 kB 00:00
Resolving Dependencies
--> Running transaction check
---> Package wget.i386 0:1.10.2-15.fc7 set to be erased
--> Finished Dependency Resolution

Transaction Test Succeeded
Running Transaction
Erasing : wget ######################### [1/1]

Removed: wget.i386 0:1.10.2-15.fc7
Complete!
[root@colinux wget-1.13]# yum list installed | grep wget
[root@colinux wget-1.13]#

■その他必要だったパツケージの事前インストール
———————————————————————————

[root@colinux wget-1.13]# yum search openssl | grep dev
openssl-devel.i386 : Files for development of applications which will use OpenSSL
openssl-devel.i386 : Files for development of applications which will use OpenSSL
xmlsec1-openssl-devel.i386 : OpenSSL crypto plugin for XML Security Library
tcltls-devel.i386 : Header files for the OpenSSL extension for Tcl
[root@colinux wget-1.13]# yum install openssl-devel.i386 xmlsec1-openssl-devel.i386 tcltls-devel.i386
Setting up Install Process
Parsing package install arguments
Resolving Dependencies
--> Running transaction check
---> Package tcltls-devel.i386 0:1.5.0-11.fc6 set to be updated
--> Processing Dependency: tcltls = 1.5.0-11.fc6 for package: tcltls-devel
---> Package xmlsec1-openssl-devel.i386 0:1.2.9-8.1 set to be updated
--> Processing Dependency: libxslt-devel >= 1.1.0 for package: xmlsec1-openssl-devel
--> Processing Dependency: libxml2-devel >= 2.6.0 for package: xmlsec1-openssl-devel
--> Processing Dependency: xmlsec1 = 1.2.9 for package: xmlsec1-openssl-devel
--> Processing Dependency: xmlsec1-devel = 1.2.9 for package: xmlsec1-openssl-devel
--> Processing Dependency: libxmlsec1-openssl.so.1 for package: xmlsec1-openssl-devel
--> Processing Dependency: xmlsec1-openssl = 1.2.9 for package: xmlsec1-openssl-devel
---> Package openssl-devel.i386 0:0.9.8b-15.fc7 set to be updated
--> Processing Dependency: zlib-devel for package: openssl-devel
--> Processing Dependency: krb5-devel for package: openssl-devel
--> Running transaction check
---> Package libxslt-devel.i386 0:1.1.24-1.fc7 set to be updated
--> Processing Dependency: libgcrypt-devel for package: libxslt-devel
---> Package libxml2-devel.i386 0:2.6.31-1.fc7 set to be updated
---> Package xmlsec1.i386 0:1.2.9-8.1 set to be updated
---> Package tcltls.i386 0:1.5.0-11.fc6 set to be updated
---> Package krb5-devel.i386 0:1.6.1-9.fc7 set to be updated
--> Processing Dependency: e2fsprogs-devel for package: krb5-devel
---> Package xmlsec1-devel.i386 0:1.2.9-8.1 set to be updated
---> Package zlib-devel.i386 0:1.2.3-10.fc7 set to be updated
---> Package xmlsec1-openssl.i386 0:1.2.9-8.1 set to be updated
--> Running transaction check
---> Package e2fsprogs-devel.i386 0:1.40.2-3.fc7 set to be updated
---> Package libgcrypt-devel.i386 0:1.2.4-1 set to be updated
--> Processing Dependency: libgpg-error-devel for package: libgcrypt-devel
--> Running transaction check
---> Package libgpg-error-devel.i386 0:1.4-2 set to be updated
--> Finished Dependency Resolution

Dependencies Resolved

=============================================================================
Package Arch Version Repository Size
=============================================================================
Installing:
tcltls-devel i386 1.5.0-11.fc6 fedora 4.3 k
xmlsec1-openssl-devel i386 1.2.9-8.1 fedora 87 k
Installing for dependencies:
e2fsprogs-devel i386 1.40.2-3.fc7 updates 627 k
krb5-devel i386 1.6.1-9.fc7 updates 1.1 M
libgcrypt-devel i386 1.2.4-1 fedora 274 k
libgpg-error-devel i386 1.4-2 fedora 17 k
libxml2-devel i386 2.6.31-1.fc7 updates 2.1 M
libxslt-devel i386 1.1.24-1.fc7 updates 324 k
openssl-devel i386 0.9.8b-15.fc7 updates 1.8 M
tcltls i386 1.5.0-11.fc6 fedora 28 k
xmlsec1 i386 1.2.9-8.1 fedora 176 k
xmlsec1-devel i386 1.2.9-8.1 fedora 667 k
xmlsec1-openssl i386 1.2.9-8.1 fedora 71 k
zlib-devel i386 1.2.3-10.fc7 fedora 81 k

Transaction Summary
=============================================================================
Install 14 Package(s)
Update 0 Package(s)
Remove 0 Package(s)

Total download size: 7.4 M
Is this ok [y/N]:

その他参考になるURL
Compare cURL Features with Other Download Tools
http://curl.haxx.se/docs/comparison-table.html


curlコマンドとwgetコマンドのメモ書きです。

[root@colinux tmp]# wget http://variable.jp
URLにアクセスしてHTMLをファイルに保存

[root@colinux tmp]# curl http://variable.jp
URLにアクセスしてHTMLを表示
——————————————————————————————–

[root@colinux wget]# wget http://variable.jp/wp-content/themes/cloudy/VARIABLE.J
P.jpg
–15:39:37– http://variable.jp/wp-content/themes/cloudy/VARIABLE.JP.jpg
=> `VARIABLE.JP.jpg.1′
Resolving variable.jp… 59.106.12.216
Connecting to variable.jp|59.106.12.216|:80… connected.
HTTP request sent, awaiting response… 200 OK
Length: 2,873 (2.8K) [image/jpeg]

100%[====================================>] 2,873 –.–K/s

15:39:37 (561.13 KB/s) – `VARIABLE.JP.jpg.1′ saved [2873/2873]

[root@colinux wget]#

ファイルに保存するには「 -O 」オプション
[root@colinux curl]# curl -O http://variable.jp/wp-content/themes/cloudy/VARIABL
E.JP.jpg
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2873 100 2873 0 0 12491 0 –:–:– –:–:– –:–:– 10550
[root@colinux curl]# ls
VARIABLE.JP.jpg
[root@colinux curl]#

または

[root@colinux curl]# curl –remote-name http://variable.jp/wp-content/themes/clo
udy/VARIABLE.JP.jpg
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2873 100 2873 0 0 10640 0 –:–:– –:–:– –:–:– 1787
[root@colinux curl]# ls
VARIABLE.JP.jpg
[root@colinux curl]#

[root@colinux curl]# curl -O “http://variable.jp/wp-content/uploads/2009/01/s{1,
2}.jpg”

[1/2]: http://variable.jp/wp-content/uploads/2009/01/s1.jpg –> s1.jpg
–_curl_–http://variable.jp/wp-content/uploads/2009/01/s1.jpg
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 278 0 278 0 0 1263 0 –:–:– –:–:– –:–:– 0

[2/2]: http://variable.jp/wp-content/uploads/2009/01/s2.jpg –> s2.jpg
–_curl_–http://variable.jp/wp-content/uploads/2009/01/s2.jpg
100 278 0 278 0 0 27800 0 –:–:– –:–:– –:–:– 27800
[root@colinux curl]#

——————————————————————————————–

[root@colinux curl]# curl -o logo1.jpg http://variable.jp/wp-content/themes/clou
dy/VARIABLE.JP.jpg
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2873 100 2873 0 0 14364 0 –:–:– –:–:– –:–:– 11253
[root@colinux curl]# ls -l
total 4
-rw-r–r– 1 root root 2873 2009-01-29 15:54 logo1.jpg
[root@colinux curl]# curl -o logo2.jpg http://variable.jp/wp-content/themes/clou
dy/VARIABLE.JP.jpg
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2873 100 2873 0 0 13058 0 –:–:– –:–:– –:–:– 9929
[root@colinux curl]# ls -l
total 8
-rw-r–r– 1 root root 2873 2009-01-29 15:54 logo1.jpg
-rw-r–r– 1 root root 2873 2009-01-29 15:54 logo2.jpg
[root@colinux curl]# curl -o logo3.jpg http://variable.jp/wp-content/themes/clou
dy/VARIABLE.JP.jpg
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2873 100 2873 0 0 14364 0 –:–:– –:–:– –:–:– 10550
[root@colinux curl]#

——————————————————————————————–

[root@colinux curl]# ls -l
total 12
-rw-r–r– 1 root root 2873 2009-01-29 15:54 logo1.jpg
-rw-r–r– 1 root root 2873 2009-01-29 15:54 logo2.jpg
-rw-r–r– 1 root root 2873 2009-01-29 15:54 logo3.jpg
[root@colinux curl]# curl -R -o logo4.jpg http://variable.jp/wp-content/themes/c
loudy/VARIABLE.JP.jpg
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2873 100 2873 0 0 13680 0 –:–:– –:–:– –:–:– 11253
[root@colinux curl]# ls -l
total 16
-rw-r–r– 1 root root 2873 2009-01-29 15:54 logo1.jpg
-rw-r–r– 1 root root 2873 2009-01-29 15:54 logo2.jpg
-rw-r–r– 1 root root 2873 2009-01-29 15:54 logo3.jpg

    -rw-r–r– 1 root root 2873 2008-12-31 09:55 logo4.jpg

[root@colinux curl]#

※ -R はタイムスタンプを保持
——————————————————————————————–

[root@colinux curl]# curl http://www.google.com/

302 Moved

302 Moved


The document has moved
here.

[root@colinux curl]# curl -L http://www.google.com/

※ -L はリダイレクトに対応。

——————————————————————————————–

リファラー指定 -e
[root@colinux curl]# curl -e http://variable.jp/ http://reconfirm.jp/

ユーザエージェントの変更 -A
[root@colinux curl]# curl -A curlAgent http://variable.jp/

HTTPのヘッダーの変更 -H
フォームの送信 -d
フォームでのファイルアップロード(POST) -F
ファイルのアップロード -T
最大転送時間制限 -m
[root@colinux curl]# curl -m 60 http://variable.jp/
最大接続時間制限 –connect-timeout

[root@colinux curl]# curl -# -O http://variable.jp/wp-content/themes/cloudy/VARI
ABLE.JP.jpg
######################################################################## 100.0%
[root@colinux curl]#

[root@colinux curl]# curl -I http://variable.jp/
HTTP/1.1 200 OK
Date: Thu, 29 Jan 2009 21:25:53 GMT
Server: Apache
X-Powered-By: PHP/4.4.9
X-Pingback: http://variable.jp/xmlrpc.php
Content-Type: text/html; charset=UTF-8

※ vオプションを利用すると、サーバレスポンスを見ることが出来ます。
[root@colinux curl]# curl -vIL http://variable.jp/
* Trying 59.106.12.216… connected
* Connected to variable.jp (59.106.12.216) port 80 (#0)
> HEAD / HTTP/1.1
> User-Agent: curl/7.16.4 (i386-redhat-linux-gnu) libcurl/7.16.4 OpenSSL/0.9.8b
zlib/1.2.3 libidn/0.6.8
> Host: variable.jp
> Accept: */*
>
< HTTP/1.1 200 OK HTTP/1.1 200 OK < Date: Thu, 29 Jan 2009 21:26:53 GMT Date: Thu, 29 Jan 2009 21:26:53 GMT < Server: Apache Server: Apache < X-Powered-By: PHP/4.4.9 X-Powered-By: PHP/4.4.9 < X-Pingback: http://variable.jp/xmlrpc.php X-Pingback: http://variable.jp/xmlrpc.php < Content-Type: text/html; charset=UTF-8 Content-Type: text/html; charset=UTF-8 * no chunk, no close, no size. Assume close to signal end < * Closing connection #0 その他にも色々なオプションがあるので便利ですね。 --helpで調べて見ましょう。 ※WGETやCURLでアクセス wget_curl

curl-o

curl-h