BlackBerry Forums Support Community               

Closed Thread
 
LinkBack Thread Tools
Old 03-17-2009, 11:10 AM   #1 (permalink)
Thumbs Must Hurt
 
Join Date: Nov 2007
Model: Bold
PIN: N/A
Carrier: Rogers
Posts: 109
Post Thanks: 0
Thanked 0 Times in 0 Posts
Default How to put all of Project Gutenberg's e-book collection on your BlackBerry

Please Login to Remove!

One of the things I love about my Bold is that I've really been able to increase the time I spend reading for pleasure, in that it makes an excellent e-book reader. An incredible resource for free and legal e-books is Project Gutenberg which aims to collect and make available for free every work of literature before copyright laws made it impossible to do so. While the website makes it easy to locate and download e-books, I don't like reading the plain text format the books are in (I much prefer to read PDFs featuring clean sans-serif fonts with PDF To Go's excellent word wrap and zoom capabilities), and I regularly find myself at 30,000 feet with nothing new to read.

Fortunately Gutenberg makes their collection available in DVD format, including a 4GB DVD containing their first 17,000 books. Unfortunately all the books are zipped text files, making them unsuitable (for me at least) to put on my Bold's memory card.

Here then are the steps I took to extract all the Project Gutenberg text files, convert them to PDF and copy them to my Bold's 16GB memory card. Please note this thread is just for reference and a certain degree of technical know-how is required to follow along.

1. Download the July 2006 DVD using your favourite BitTorrent client.

2. Extract the ISO's contents to a directory on your HD, I use Macs so I'll extract to /temp. Use that as your working directory from now on.

3. Next, unzip all the ZIP files containing the e-books in TXT format. This command will unzip each file's contents to the same directory of the ZIP file itself, which I prefer over unzipping all files to one directory since it can be hard to find what you're looking for in a directory with 17,000+ files (plus I didn't know how the Bold would handle it):

find . -name '*.zip' -execdir unzip -o {} \;

4. Now build a list of all the text files you've just unzipped:

find /temp/ -name '*.txt'>/temp/txts.txt

5. At this point you're free to use the PDF creation software that you wish (I tried using AppleScript to call Acrobat but it choked on the huge number of files). I've used iText for Java in the past so that's what I'll use here.

Download the iText JAR, add it to your CLASSPATH, then write a Java class to iterate through txts.txt and make PDF documents from each one:

Code:
import com.lowagie.text.*;
import com.lowagie.text.pdf.BaseFont;
import com.lowagie.text.pdf.PdfWriter;

import java.io.*;

public class GutenbergConverter {

    public static final String ROOT = "/temp";
    public static final String ZIPS = "/temp/zips.txt";
    public static final String TXTS = "/temp/txts.txt";

    public static void main(String[] args) {
        try {
            System.out.println("run find /temp/ -name '*.txt'>/temp/txts.txt then press enter");
            new BufferedReader(new InputStreamReader(System.in)).readLine();
            System.out.println("Processing");

            // make PDFs from each text file in txts.txt
            String line = null;
            BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(TXTS)));
            while ((line = br.readLine())
                      != null) {
                makePdf(line);
            }

        } catch (Exception e) {
            System.out.println("Error: " + e);
        }
    }

    private static void makePdf(String line) {
        try {
			// create pdf file name
            String pdf = line.substring(0, line.lastIndexOf(".")) + ".pdf";
            System.out.println(pdf);
			// create new PDF document, see iText documentation for options
            Document document = new Document(PageSize.A4, 50, 50, 50, 50);
            PdfWriter.getInstance(document, new FileOutputStream(pdf));            
            document.open();
            // this next step is necessary because PDF To Go doesn't recognize the standard Helvetica font
            // you must install Arial on your machine to make this work, currently hardcoded to point to location on Mac
            BaseFont bfArial = BaseFont.createFont("/Library/Fonts/Arial.ttf", BaseFont.CP1252, BaseFont.NOT_EMBEDDED);
            Font arial = new Font(bfArial, 12);
            FileReader fr = new FileReader(line);
            BufferedReader br = new BufferedReader(fr);
            while((line = br.readLine()) !=null)  {
                if (line.equals("")) {
                    document.add(new Paragraph(new Chunk(Chunk.NEWLINE)));
                } else {
                    document.add(new Paragraph(line, arial));
                }
            }

            document.close();

        } catch (Exception e)
            {
            System.out.println("Error" + e.getMessage() );
            e.printStackTrace();//print the error
        } 
    }
}
6. Run the Java class to generate the PDF files, then clean up the ZIP and TXT files you don't need anymore:

xargs rm < zips.txt (replace zips.txt with txts.txt to get rid of the text files)

NOTE: there are a handful of other file types on the DVD that you may or may not want to keep, such as GIF/JPG/PNG/TIF, MP3/MID etc. If so you can use a find with a regex to find all files in the tree that are not PDFs and feed them to rm, but I can't recall the exact syntax at the moment.

At this point you've converted all the e-books in Project Gutenberg's July 2006 DVD to PDFs in a comfortable sans-serif font for reading on a BlackBerry with PDF To Go or other software. The complete collection is just over 7GB meaning it will fit comfortably on an 8GB card, and if you're having trouble figuring out where to start I highly recommend Harold Bloom's Western Canon as a fantastic guide to the history and important works of Western literature. Happy reading!
Offline  
Closed Thread


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On





Copyright 2004-2014 BlackBerryForums.com.
The names RIM and BlackBerry are registered Trademarks of BlackBerry Inc.