Mining the Social Web, 2nd Edition

Appendix A: Virtual Machine Experience

This IPython Notebook provides a brief overview of how to install and configure the book's virtual machine for maximal enjoyment in following along with the numbered examples from Mining the Social Web (2nd Edition). You are very strongly encouraged to install the virtual machine as a development environment instead of using your existing Python installation because there are some non-trivial configuration management issues that are involved in installing IPython Notebook and its dependencies along with various 3rd party Python packages that are used throughout the book, and the need to support users across multiple platforms only exacerbates the complexity. In short, the virtual machine experience is intended to provide all readers and consumers of this book's source code with the best possible experience. Even if you are an expert in working with Python developer tools, you will still likely save some time by taking advantage of the book's virtual machine experience on your first pass through the book, so give it a try.

The remainder of this notebook provides a brief overview of how to install the virtual machine along with a few important notes to keep in mind each step of the way.

In the somewhat unlikely event that you've somehow stumbled across this notebook outside of its context on GitHub, you can find the full source code repository here.

Screencast Overview

The following screencast is less than 3 minutes and illustrates the step-by-step instructions below for installing the Mining the Social Web virtual machine.

Quick Start Instructions

In order to start the Vagrant-based virtual machine for Mining the Social Web, there are just a few easy steps to follow:

  1. Download and install the latest copy of VirtualBox for your operating system
    • Although additional working knowledge of VirtualBox could be helpful, just accomplishing an installation is sufficient. However, see an important note below for Windows and Linux users that may require you to adjust your computer's BIOS settings for virtualization
    • The version of VirtualBox used as of this writing is 4.2.x. Type "VirtualBox --help" in a terminal to get version information if you already have it installed.
  2. Download and install Vagrant for your operating system
    • It is highly recommended that you take a moment to read Vagrant's excellent "Getting Started" guide as a matter of initial familiarization
    • If you already have Vagrant installed, be sure that it's running version 1.2 or higher by typing "vagrant -v" in a terminal
    • The creator of Vagrant has written a book about it entitled Vagrant: Up and Running
  3. Checkout this book's source code from its GitHub repository to your machine using Git or with the download links at the top of the main GitHub page.
    • Windows users should install Git for Windows as it comes bundled with an SSH client that might come in handy later if you'd like to easily login to your virtual machine guest with Vagrant. (See screenshots and notes below. It is critical that you choose the option that includes an SSH client.)
    • Although you could opt to download the latest version of the source code from GitHub as a zip file, basic familiarization with Git is likely to serve you well in your programming endeavors and is encouraged
    • In a terminal, navigate to the top level directory of the source code checkout that contains Vagrantfile.
    • On a Windows system, look for a Command Prompt program that's likely somewhere under your "Accessories" menu.
  4. Run the following command from within the top level directory that contains your Vagrantfile: vagrant up
    • The first time you run this command, Vagrant will prompt you to download a base image for your virtual machine called precise64, which is an Ubuntu 12 Linux image. It may take anywhere between 10 and 30 minutes to download the base image and install the necessary updates and 3rd party packages depending on your connection speed.
    • In the event that you are running a 32-bit system, you'll need to change "precise64" to "precise32" in your Vagrantfile.
    • You should disable any settings which may allow your system to go into a sleep or hybernation mode while your virtual machine initially bootstraps.

What Happens Next

[2013-07-27T01:45:27+00:00] INFO: runit_service[ipython] enabled [2013-07-27T01:45:27+00:00] INFO: Chef Run complete in 1553.918395 seconds [2013-07-27T01:45:27+00:00] DEBUG: Cleaning the checksum cache [2013-07-27T01:45:27+00:00] INFO: Running report handlers [2013-07-27T01:45:27+00:00] INFO: Report handlers complete [2013-07-27T01:45:27+00:00] DEBUG: Exiting

Vagrant Cheat Sheet

You are strongly encouraged to peruse Vagrant's documentation online to get a basic understanding of how it works. Once you have a general working knowledge, the following commands are likely to be the primary ones that you'll want to know how to use. Anytime you run these commands, it needs to be in the top level source code directory in which your Vagrantfile is located. Your Vagrantfile provides the basis for which the commands operate.

Essential Commands

Commands for Advanced Users

Troubleshooting

Consult (or contribute to) the Virtual Machine Installation Page on the wiki or open a ticket if you experience any problems.

Git and GitHub

In the event that you've never used a version control system such as Git to obtain or manage source code, be assured that it's well worth the investment to learn Git fundamentals. The first two chapters of http://gitscm.com/ are particularly worth the 15 or so minutes that it takes to complete, and you'll also find that Stack Overflow also contains a plethora of answers to common Git questions and best practice guidelines.

The absolute minimum Git skills that you'll want to know for consuming the source code of this book include:

As you become more comfortable with Git, you may want to fork a Git repository, commit changes to it, and push your changes to the master branch on GitHub. Consult http://gitscm.com/ for more information on how to do these things when you are ready to make that additional leap.

You are certainly able to download a zip archive of a GitHub repository's source code (look for the "Download ZIP" button in the right margin), but doing so would be a bit ironic. This book is all about the social web, and you'd be avoiding the premier social coding platform that hosts its project code. GitHub is inherently social, and there are benefits to participating that you can't gain any other way besides plugging in, being part of the community, and applying some Git fundamentals to contribute from time to time. Forking code, opening pull requests, and otherwise contributing within the boundaries of the GitHub platform tooling is much easier than you might initially think because GitHub delivers such a tremendous user experience. Take a few extra minutes to checkout the source code from GitHub instead of downloading a zip archive. You'll be glad that you took those steps.

"Git for Windows" Installation Screenshots

The following screenshots may be helpful as references for Windows users who are installing Git for Windows.


Windows users should opt to install the developer tools while installing Git for Windows in order to get SSH, which allows Vagrant's "vagrant ssh" command to seamlessly work.


Logging into your virtual machine (should you need or desire to do so for advanced troubleshooting) is as easy as "vagrant ssh" so long as you have an SSH client in your path


Once you have run "vagrant up" and your virtual machine is up and running, you essentially operate as though the virtual machine is just a piece of software running like any other. For example, you'll operate in your web browser just like normal to access IPython Notebook, which is where you'll spend all of your time. The nice thing about the virtual machine experience is that it allows you to use your host operating system as usual, although it encapsulates all of the messy configuration management details to a well-known and highly controlled environment.

Thank You!

Please file tickets here on GitHub if you experience any troubles whatsoever, and thanks again for your interest in Mining the Social Web (2nd Edition). The goal in providing you with a completely turn-key machine experience is so that you can get the most out of the book and its source code -- not to divert your attention into unnecessary system configuration issues. Feedback on ways to improve this experience is always welcome, and pull requests are especially appreciated.

You are free to use or adapt this notebook for any purpose you'd like. However, please respect the Simplified BSD License that governs its use.