¤³¤ì¤Ï HTML ¤¬¤É¤ó¤Êʸ»ú¥¨¥ó¥³¡¼¥Ç¥£¥ó¥°¤Çµ½Ò¤µ¤ì¤Æ¤¤¤ë¤«¤ò¿äÄꤷ¤¿Ãͤǡ¢
- HTMLÃæ¤Ë meta ¤Î4ʸ»ú¤¬¤¢¤ë¾ì¹ç¡¢HTML¤ÎmetaÍ×ÁǤÎcharset
- meta ¤Î4ʸ»ú¤¬Ìµ¤¤¾ì¹ç¡¢¥ì¥¹¥Ý¥ó¥¹¥Ø¥Ã¥À¤ÎContent-Type¤Îcharset
- Content-Type¥ì¥¹¥Ý¥ó¥¹¥Ø¥Ã¥À¤â̵¤¤¾ì¹ç¡¢HTML¤òNKF.guess¤·¤¿·ë²Ì
¤È¤¤¤¦½ç¤Ë¼«Æ°¿ä¬¤µ¤ì¤Þ¤¹¡£
HTML ¤Î¥Ñ¡¼¥¹Á°¤ÎÊÑ´¹¡Ê¥Ú¡¼¥¸¥¨¥ó¥³¡¼¥Ç¥£¥ó¥°¢ªUTF-8¡Ë¤ä¥Õ¥©¡¼¥àÁ÷¿®¤Î¥Þ¥ë¥Á¥Ð¥¤¥È¥Æ¥¥¹¥È¥Ç¡¼¥¿¤ÎÊÑ´¹¡ÊUTF-8¢ª¥Ú¡¼¥¸¥¨¥ó¥³¡¼¥Ç¥£¥ó¥°¡Ë¤ËÍѤ¤¤é¤ì¤Þ¤¹¡£ÊÑ´¹¤Ï Iconv ¤Ç¹Ô¤ï¤ì¤ë¤Î¤Ç¡¢Iconv ¤¬²ò¼á¤Ç¤¤ë¼ÂºÝ¤Îʸ»ú¥¨¥ó¥³¡¼¥Ç¥£¥ó¥°¤È¿©¤¤°ã¤Ã¤Æ¤¤¤¿¾ì¹ç¡¢Æ°ºî¤¬ÉÔÎɤˤʤê¤Þ¤¹¡ÊËÜÍè¤Ï¤½¤Î¾ì¤Ç Iconv ¤ÎÎã³°¤¬¾å¤¬¤Ã¤Æ¥æ¡¼¥¶¡¼¤Ë½èÃÖ¤òÇ÷¤ë¤Ù¤¤Ê¤Î¤Ç¤¹¤¬¡¢¤¢¤È¤Ç¤Þ¤È¤á¤Æ½èÃ֤Ǥ¤ë¤è¤¦¤ËÍÞÀ©¤µ¤ì¤Æ¤¤¤Þ¤¹¡Ë¡£
require 'rubygems'; require 'mechanize'
require 'kconv'
html = <<HTML.tosjis
<html>
<meta http-equiv="content-type" content="text/html; charset=Shift_JIS">
<title>¡¢£</title>
</html>
HTML
agent = Mechanize.new
page = Mechanize::Page.new(
URI.parse('http://example.com/'),
{'content-type' => 'text/html'},
html,
'200',
agent)
agent.__send__(:add_to_history, page)
p agent.page.title
·ë²Ì¡§
nil
Iconv ¤Î¹Í¤¨¤ë Shift_JIS ¤Ï̤³ÈÄ¥¤ÎËÜÍè¤Î Shift_JIS ¤Ç´Ý¤Ä¤¿ô»ú¤Ï´Þ¤Þ¤ì¤Æ¤Þ¤»¤ó¤Î¤Ç¡¢¥Ñ¡¼¥¹¤Ë¼ºÇÔ¤·¡¢title Í×ÁǤò¼èÆÀ¤Ç¤¤¹¤Ë¤¤¤Þ¤¹¡£
¥Ñ¡¼¥¹»þ¤Î¥¨¥é¡¼¤Ï agent.page.parser.errors ¤ËÇÛÎó¤È¤·¤ÆÆþ¤Ã¤Æ¤¤¤Þ¤¹¡Êɸ½à¤ÎNokogiri¤Î¾ì¹ç¤Î¤ß¡Ë¡£HTML ¤Îʸˡ¥¨¥é¡¼¤Ê¤É¤â´Þ¤Þ¤ì¤Æ¤¤¤Þ¤¹¤¬¡¢Iconv ¤Îʸ»ú¥¨¥ó¥³¡¼¥Ç¥£¥ó¥°¤Ë´Ø¤¹¤ë¥¨¥é¡¼¤â¤¤Á¤ó¤ÈµºÜ¤µ¤ì¤Æ¤¤¤Þ¤¹¡£
puts agent.page.parser.errors # ¥Ñ¡¼¥¹¤Ë´Ø¤¹¤ë¥¨¥é¡¼¤¬¤¢¤ì¤Ðɽ¼¨
·ë²Ì¡§
input conversion failed due to input error, bytes 0x87 0x40 0x87 0x41
input conversion failed due to input error, bytes 0x87 0x40 0x87 0x41
htmlCheckEncoding: encoder error
input conversion failed due to input error, bytes 0x87 0x40 0x87 0x41
encoder error
¾åµ¤Î¤è¤¦¤Ë encoding ¤ä conversion ¤Ë´Ø¤¹¤ë¥¨¥é¡¼¤¬½Ð¤Æ¤¤¤ë¤È¤¤Ï¡¢HTML ¤Î¼ÂºÝ¤Ë¨¤·¤¿¡¢Iconv ¤¬²ò¼á²Äǽ¤Êʸ»ú¥¨¥ó¥³¡¼¥Ç¥£¥ó¥°Ì¾¤ò
#encoding= ¤ÇºÆ»ØÄꤷ¤Æ¤¯¤À¤µ¤¤¡£»ØÄꤷ¤¿Ê¸»ú¥¨¥ó¥³¡¼¥Ç¥£¥ó¥°¤Î HTML ¤À¤È¤ß¤Ê¤·¤ÆºÆ¥Ñ¡¼¥¹¤¬¹Ô¤ï¤ì¤Þ¤¹¡£
require 'rubygems'; require 'mechanize'
require 'kconv'
html = <<HTML.tosjis
<html>
<meta http-equiv="content-type" content="text/html; charset=Shift_JIS">
<title>¡¢£</title>
</html>
HTML
agent = Mechanize.new
page = Mechanize::Page.new(
URI.parse('http://example.com/'),
{'content-type' => 'text/html'},
html,
'200',
agent)
agent.__send__(:add_to_history, page)
agent.page.encoding = 'CP932'
p agent.page.title
·ë²Ì¡§
"\342\221\240\342\221\241\342\221\242"
Windows ¤Ç¤è¤¯»ÈÍѤµ¤ì¤ë¡ÖShift_JIS¡×¤Ï¡¢Iconv ¤Ç¤Ï CP932 ¤¬¶á¤¤¤â¤Î¤Ë¤Ê¤ê¤Þ¤¹¡ÊWindows-31J ¤ÏÆüËܸìÍѥѥåÁÈÇiconv¤¬É¬Íסˡ£¤Ê¤ª¡¢Ruby ¤Î kconv ¤ä NKF ¤Î "sjis" ¤Ï Shift_JIS ¤Ç¤Ï¤Ê¤¯ CP932 ÁêÅö¤Ç¤¹¡£
Windows ¤È¤ÎÁê¸ß±¿ÍѤò°Õ¼±¤·¤¿ºòº£¤ÎÉáÄ̤ΠEUC-JP ¤Ï Iconv ¤Ç¤Ï CP51932 ¤ä eucJP-ms ¤ä EUC-JP-MS ¤Ë¤Ê¤ê¤Þ¤¹¤¬¡¢ÆüËܸìÍѥѥåÁ¤ÎÅö¤¿¤Ã¤¿¥¢¥ó¥ª¥Õ¥£¥·¥ã¥ë¤Ê iconv ¤¬É¬ÍפǤ¹¡£
ŬÀڤʽèÍý¤Î¤Ç¤¤ë iconv ¤¬ÍѰդǤ¤Ê¤¤¾ì¹ç¤Ï¡¢Ruby1.8 ¤Î kconv ¤ä Ruby1.9 ¤Î Encoding ¤Ê¤É¤ÎÆüËܸìÊÑ´¹¤ò°Õ¼±¤·¤¿¥é¥¤¥Ö¥é¥ê¤Ç HTML ¼«ÂΤòÊÑ´¹¤·¤Æ¤·¤Þ¤¦¤È¤¤¤¦¼ê¤â¤¢¤ê¤Þ¤¹¡£
- Mechanize#post_connect_hooks ¤Ç param[:body] ¤ËÂФ·¤Æ toutf8 ÊÑ´¹¤È meta charset ¤Î UTF-8 ¤Ø¤ÎÃÖ´¹¤ò¹Ô¤¦
- #body= ¤Ç toutf8 ¤·¤¿ body ¤ò»ØÄꤷ #encoding= ¤Ç UTF-8 ¤ò»Ø¼¨¤·¤ÆºÆ¥Ñ¡¼¥¹¤µ¤»¤ë
¤Î¤É¤Á¤é¤â¤¦¤Þ¤¯¤¤¤¯¤Ï¤º¤Ç¤¹¡£¤¿¤À¤·¡¢¥Õ¥©¡¼¥àÁ÷¿®¤ò¤¹¤ë¾ì¹ç¤Ï Mechanize ¤Ï
#encoding ¤ÎÊÖ¤êÃ͡ʤ³¤Î¾ì¹ç UTF-8¡Ë¤ËÊÑ´¹¤·¤ÆÁ÷¿®¤·¤è¤¦¤È¤¹¤ë¤Î¤Ç¡¢°ìÈÌŪ¤Ê Web ¥Ö¥é¥¦¥¶¤ÎµóÆ°¡Ö¤Ê¤Ë¤â¤Ê¤±¤ì¤Ð¥Õ¥©¡¼¥à¤Î½ñ¤«¤ì¤¿HTML¤Îʸ»ú¥¨¥ó¥³¡¼¥Ç¥£¥ó¥°¤Ç¥Ç¡¼¥¿¤òÁ÷¤ë¡×¤È¿©¤¤°ã¤Ã¤Æ¤·¤Þ¤¤¤Þ¤¹¡£¤³¤ì¤¬ÌäÂê¤Ë¤Ê¤ë¾ì¹ç¤Ï¥Õ¥©¡¼¥àÁ÷¿®Ä¾Á°¤Ë
Mechanize::Form#page ¤Î encoding= ¥á¥½¥Ã¥É¤ÇËÜÍè¤Î¥¨¥ó¥³¡¼¥Ç¥£¥ó¥°Ì¾¡Ê¤ÇIconv¤¬Æɤá¤ë¤â¤Î¡Ë¤ò»ØÄꤷ¤Þ¤¹¡£
agent.get(uri)
# Á´ÂΤò kconv ¤ÇÊÑ´¹¤·¤Æ¤·¤Þ¤¦
agent.page.body = agent.page.body.toutf8
agent.page.encoding = 'UTF-8'
agent.page.form_with(:name => 'f1'){|form|
...
form.page.encoding = 'CP932' # ËÜÍè¤Îʸ»ú¥¨¥ó¥³¡¼¥Ç¥£¥ó¥°Ì¾
form.click_button
}
Mechanize ÆâÉô¤Î iconv ¤Ç¤Ï¡¢ÊÑ´¹¤Îʸ»ú¥¨¥ó¥³¡¼¥Ç¥£¥ó¥°Èϰϳ°¤Îʸ»ú¤¬Â¸ºß¤·¤¿¾ì¹ç¤ÏÎã³°¤¬È¯À¸¤·¤Þ¤¹¡ÊNokogiri Æâ¤Ç¤ÎÎã³°¤ÏȯÀ¸¸åÍÞÀ©¤µ¤ì¤Æ¤¤¤Þ¤¹¡Ë¡£kconv ¤ä NKF ¤Î¾ì¹ç¤ÏÊÑ´¹ÉÔǽ¤Êʸ»ú¤ÏÀÚ¤ê¼Î¤Æ¤é¤ì¤Æ¤½¤³¤À¤±¾Ã¤¨¤Þ¤¹¡ÊRuby1.9¤ÎEncoding¤Ï½èÍý¤òÁª¤Ù¤Þ¤¹¡Ë¡£É¬Íפʥǡ¼¥¿¤¬¤¤Á¤ó¤È»Ä¤Ã¤Æ¤¤¤ë¤«¤É¤¦¤«¤Ï¤è¤¯³Îǧ¤·¤Æ¤¯¤À¤µ¤¤¡£