從 HTML 檔案取得標題

lua-users home
wiki

此範例程式會列印出傳遞給命令列上所有 HTML 檔案的標題。它提供一個從 HTML 檔案擷取資料的粗略範例。儘管如此,它不一定穩健可靠(例如:考量包含 <!-- <title>ack</title> --> 的註解行的罕見情況)。

使用範例(從 shell)

$ ls *.html
cgi.html          htaccess.html  mod_include.html   urlmapping.html
configuring.html  mod_auth.html  mod_rewrite.html
core.html         mod_cgi.html   rewriteguide.html
$ ./title.lua *.html
cgi.html: Apache Tutorial: Dynamic Content with CGI
configuring.html: Configuration Files
core.html: Apache Core Features
htaccess.html: Apache Tutorial: .htaccess files
mod_auth.html: Apache module mod_auth
mod_cgi.html: Apache module mod_cgi
mod_include.html: Apache module mod_include
mod_rewrite.html: Apache module mod_rewrite
rewriteguide.html: Apache 1.3 URL Rewriting Guide
urlmapping.html: Mapping URLs to Filesystem Locations - Apache HTTP Server

下方是 Lua 程式 title.lua

#!/usr/bin/env lua

function getTitle(fname)
  local fp = io.open(fname, "r")
  if fp == nil then
    return false
  end

  -- Read up to 8KB (avoid problems when trying to parse /dev/urandom)
  local s = fp:read(8192)
  fp:close()

  -- Remove optional spaces from the tags.
  s = string.gsub(s, "\n", " ")
  s = string.gsub(s, " *< *", "<")
  s = string.gsub(s, " *> *", ">")

  -- Put all the tags in lowercase.
  s = string.gsub(s, "(<[^ >]+)", string.lower)

  local i, f, t = string.find(s, "<title>(.+)</title>")
  return t or ""
end

if arg[1] == nil then
  print("Usage: lua " .. arg[0] .. " <filename> [...]")
  os.exit(1)
end

i = 1
while arg[i] do
  t = getTitle(arg[i])
  if t then
    print(arg[i] .. ": " .. t)
  else
    print(arg[i] .. ": File opening error.")
  end
  i = i + 1
end
os.exit(0)

-- AlexandreErwinIttner

或者,可以使用 [lua-gumbo] 函式庫

#!/usr/bin/env lua

local gumbo = require "gumbo"
local document = assert(gumbo.parseFile(arg[1] or io.stdin))
print(document.title)

在這種情況下,HTML5 分析器和 Document.title 實作完全符合規格,而且應該能產生與現代瀏覽器完全相同的結果。

lua-gumbo 可透過以下命令取得:luarocks install gumbo


最近變更 · 偏好設定
編輯 · 歷史
上次編輯於 2018 年 6 月 16 日上午 12:00 GMT (diff)