Automate downloading books and pdf’s on springerlink.com

Linux

Researching

Author

Vinh Nguyen

Published

November 2, 2010

Having an electronic version of a book is great. I can skim and search through it very easily. Although I do find hardcopies useful at times, I prefer softcopies 99% of the time due to their accessibility and searchability.

Most universities have deals with publishers where students can access the electronic version of a book at the publisher's website. This saves me the trip of running to the library when I need a book and solves the "book checked out" issue.

Springer books can be found online. You have to be on your school's network (VPN) to access them. The crappy thing is they put up books by chapters, so you have to manually save them if you need to look at the entire book. I've used wget before to easily download the pdf files. However, wget doesn't seem to work anymore because the files are no longer html links. A quick query on google ("springerlink download whole book") yielded the springer_downlad python script. It depends on stapler which in turn depends on pyPDF. To install and use:

git clone git://github.com/milianw/springer_download
git clone http://github.com/hellerbarde/stapler.git
git clone http://github.com/mfenniak/pyPdf
cd pyPDF
sudo ./setup.py --install
cd ../stapler/
cp ./stapler.sh ~/Documents/bin/ ## or copy it to /usr/local/bin
cd ../springer_download
cp springer_download.py ~/Documents/bin ## or copy it to /usr/local/bin
## to download
springer_download.py -l http://springerlink.com/content/HASH/STUFF
## output: a concatenated, full pdf file of the book

Very neat!