2013 LBJava tutorial

Overview

This page collects some supplemental material for the IllinoisCloudNLP tutorial on October 23, 2014. The relevant material (that is not specific to the lab where the tutorial takes place) will be added to the main IllinoisCloudNLP documentation at a later date. For reference, that documentation is here: http://cogcomp.cs.illinois.edu/page/software_view/CloudNLP

Setting up your computer to use IllinoisCloudNLP

First, check you have Java 1.7 or later. If the first command indicates you have 1.6 or lower, use the other commands to identify what versions are available, and to load an appropriate one.



[user2030@lab001 ~] java -version
java version "1.6.0_10"
Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode)

[user2030@lab001 ~] module avail
...
sun-jdk/1.6.0_33-el6-i686
sun-jdk/1.6.0_33-el6-x86_64
sun-jdk/1.7.0-latest-el6-i686
sun-jdk/1.7.0-latest-el6-x86_64
...

[user2030@lab001 ~] module load sun-jdk/1.7.0-latest-el6-x86_64
[user2030@lab001 ~] java -version
java version "1.7.0_17"
Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)

    

Setting up your Amazon EC2 account

You need to set up your EC2 account by launching a test instance of a "m3.2xlarge" virtual machine. This allows the IllinoisCloudNLP client to start up this type of instance remotely.

  1. In your browser, navigate to https://uiuc-cse.signin.aws.amazon.com/console
  2. Sign in using your id and password (see the CSE tutorial page at http://uiuc-cse.github.io/data-fa14/#nlpcloud)
  3. Click on the "Launch Instance" button
  4. Select the first option on the list shown (Amazon Linux)
  5. Select the "m3.2xlarge" option (7th in the list)
  6. Click on the "Review and Launch" button at the bottom right corner of the page
  7. Click on the "Launch" button at the bottom right of the page
  8. In the top menu on the dialog that appears, select "Proceed without a key pair" and check the box "I acknowledge..."
  9. Click on the "Launch Instance" button at the bottom of the dialog box
  10. In the top box on the page that appears, which should have the title "Your instance is now launching", click on the link after "The following instance launch has been initiated"
  11. You should now see the "instances" pane. The information about your new instance should be displayed with the status "running" in the Instance State column. Right click anywhere in the row showing the instance information and select "Terminate" (under the "Actions" subtitle). In the dialog that appears, select "Yes, Terminate".

Download the tutorial data:

http://goo.gl/4a8WD2

View the report after training a custom text classifier

  1. In the AWS interface: select Services, then S3
  2. You will see a list of buckets. Pick the bucket name corresponding to your account name.
  3. You should see a directory named "classifier_coll". Click on it, and you should see a directory with the name you gave your classifier.
  4. Click on the classifier directory. You can examine the report generated by the machine learning program, which includes a report of performance.

Run an application using processed S3 data

First, download the S3 client "s3cmd" here: http://s3tools.org/s3cmd



[user2030@lab001 ~] cd cloudnlp

[user2030@lab001 cloudnlp] mv ~/Downloads/s3cmd-master.zip .

[user2030@lab001 cloudnlp] unzip s3cmd-master.zip
Archive:  s3cmd-master.zip
83b933ca0d96b4ff9374e5beca360e9edf3deb38
   creating: s3cmd-master/
...
[user2030@lab001 cloudnlp] cd s3cmd-master

[user2030@lab001 s3cmd-master]./s3cmd --configure


    

Follow the prompts given by the s3cmd program -- enter your access key and secret key, and specify a password. For the rest you should be able to use the default values suggested by s3cmd.

Next, you can list and download the annotated records from your S3 bucket. In the command listed below, you should use your own bucket name rather than "csetraining19".



[user2030@lab001 s3cmd-master]./s3cmd ls s3://csetraining19/record_coll/*
2014-10-23 06:05    104225   s3://csetraining19/record_coll/ff61d811e068450e0c012cdee3cfbaad3e98949c
....

[user2030@lab001 s3cmd-master]./s3cmd get --recursive s3://csetraining19/record_coll
...
s3://csetraining19/record_coll/ff61d811e068450e0c012cdee3cfbaad3e98949c -> ./record_coll/ff61d811e068450e0c012cdee3cfbaad3e98949c  [396 of 396]
 104225 of 104225   100% in    0s  1735.45 kB/s  done

[user2030@lab001 s3cmd-master]

    

Now you can download curator-utils from here: http://cogcomp.cs.illinois.edu/download/software/58. Now run the sample application on the processed data. The following instructions assume you downloaded and unzipped curator-utils into the same cloudnlp/ directory.



[user2030@lab001 s3cmd-master] cd ~/cloudnlp/curator-utils
[user2030@lab001 curator-utils] ./scripts/runNerHistogram.sh ~/cloudnlp/s3cmd-master/result_coll/election-news/election2012/ > nerHistogram.out
[user2030@lab001 curator-utils]cat nerHistogram.out | more

    
http://s3tools.org/s3cmd

Credits

Code licensed under Apache License v2.0, documentation under CC BY 3.0.