Importing HTML files to Bookpedia?

phule92 · Post by **phule92** » Sun Jan 31, 2010 9:06 pm

Is it possible to import a listing of books in an HTML file into Bookpedia

Post by **Conor** » Wed Feb 03, 2010 10:55 am

Because HTML is so variable there is no direct import from HTML. The best technique is to transform the HTML via regular expressions int a tab delimited file. The file can then be read by the import function in Bookpedia. However, since regular expressions can be complicated to pull out a number of details for each book you can concentrate on pulling out a single value and creating a list of ISBNs or titles that can then be copied and pasted into the add multiple window.

A program like BBEdit or TextWrangler will do a regular expression search over a number of files, if not all books are in a single HTML file. You run a multiple find using the regular expression such as <title>(.*)</title> to extract all the information between the title tags. You can then actually copy the results list in to a new document resulting in something like this:

Code: Select all

/Users/me/anExport/page1.html:6:  <title>Adaptation</title>
/Users/me/anExport/page2.html:6:    <title>The Lies of Locke Lamora</title>
/Users/me/anExport/page3.html:6:  <title>Little House on the Prairie</title>

This can be cleaned up to be only the title with the following find and replace:
find: .*<title>(.*)</title>
replace: \1

This is just an example of how to pull out the title on one particular template, I would recommend pulling out the ISBN if possible as it will give you exact results. The list can be copied into the add multiple window even though the field is only one line high it will take a long list and separate them at the new line character.

phule92 · Post by **phule92** » Wed Feb 03, 2010 1:26 pm

Conor wrote:Because HTML is so variable there is no direct import from HTML. The best technique is to transform the HTML via regular expressions int a tab delimited file. The file can then be read by the import function in Bookpedia. However, since regular expressions can be complicated to pull out a number of details for each book you can concentrate on pulling out a single value and creating a list of ISBNs or titles that can then be copied and pasted into the add multiple window.

[SNIP]

Since importing a HTML file is so complicated, I'll stick with a tab delimited file instead. And here I thought HTML would be simpler. Oh well. Thanks for clearing things up.

Bruji

Importing HTML files to Bookpedia?

Importing HTML files to Bookpedia?

Re: Importing HTML files to Bookpedia?

Re: Importing HTML files to Bookpedia?