Find URLs on web pages
Mit nur drei Zeilen Ruby lassen sich leicht die URIs/URLs aus dem Quelltext einer Webseite auslesen. Eignet sich besonders gut für das Auffinden Semantischer Verweise im Text. Ein Mini-Beispiel mit Text aus dem letzten Post; der Code ist rot, das Ergebnis blau, der Quelltext grau.
------------------
irb(main):001:0> require 'uri'
=> true
irb(main):002:0> text = %{Today, my son showed me this commercial made by adidas. The soccer-kids from St. Margarets are not usual winners like David Beckham and others. But they are not less greatly!}
irb(main):004:0> URI.extract(text)
=> ["http://www.adidas.com/us/shared/home.asp"]
irb(main):005:0>
------------------
Three lines Ruby-Code are enough to find all URIs/URLs within the source of web pages. So you can also search very easy for semantic annotations. The example shows code red-colored, text gray-colored and the result blue-colored.
------------------
irb(main):001:0> require 'uri'
=> true
irb(main):002:0> text = %{Today, my son showed me this commercial made by adidas. The soccer-kids from St. Margarets are not usual winners like David Beckham and others. But they are not less greatly!}
irb(main):004:0> URI.extract(text)
=> ["http://www.adidas.com/us/shared/home.asp"]
irb(main):005:0>
------------------
... IN ENGLISH
Three lines Ruby-Code are enough to find all URIs/URLs within the source of web pages. So you can also search very easy for semantic annotations. The example shows code red-colored, text gray-colored and the result blue-colored.
Labels: Code, microformats, Mikroformate, OWL, RDF, Ruby, Semantic Web, Semantisches Web, Web Search, Webbased

