OverviewThis page collects some supplemental material for the IllinoisCloudNLP tutorial on October 23, 2014. The relevant material (that is not specific to the lab where the tutorial takes place) will be added to the main IllinoisCloudNLP documentation at a later date. For reference, that documentation is here: http://cogcomp.cs.illinois.edu/page/software_view/CloudNLP |
First, check you have Java 1.7 or later. If the first command indicates you have 1.6 or lower, use the other commands to identify what versions are available, and to load an appropriate one.
[user2030@lab001 ~] java -version
java version "1.6.0_10"
Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode)
[user2030@lab001 ~] module avail
...
sun-jdk/1.6.0_33-el6-i686
sun-jdk/1.6.0_33-el6-x86_64
sun-jdk/1.7.0-latest-el6-i686
sun-jdk/1.7.0-latest-el6-x86_64
...
[user2030@lab001 ~] module load sun-jdk/1.7.0-latest-el6-x86_64
[user2030@lab001 ~] java -version
java version "1.7.0_17"
Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)
You need to set up your EC2 account by launching a test instance of a "m3.2xlarge" virtual machine. This allows the IllinoisCloudNLP client to start up this type of instance remotely.
First, download the S3 client "s3cmd" here: http://s3tools.org/s3cmd
[user2030@lab001 ~] cd cloudnlp
[user2030@lab001 cloudnlp] mv ~/Downloads/s3cmd-master.zip .
[user2030@lab001 cloudnlp] unzip s3cmd-master.zip
Archive: s3cmd-master.zip
83b933ca0d96b4ff9374e5beca360e9edf3deb38
creating: s3cmd-master/
...
[user2030@lab001 cloudnlp] cd s3cmd-master
[user2030@lab001 s3cmd-master]./s3cmd --configure
Follow the prompts given by the s3cmd program -- enter your access key and secret key, and specify a password. For the rest you should be able to use the default values suggested by s3cmd.
Next, you can list and download the annotated records from your S3 bucket. In the command listed below, you should use your own bucket name rather than "csetraining19".
[user2030@lab001 s3cmd-master]./s3cmd ls s3://csetraining19/record_coll/*
2014-10-23 06:05 104225 s3://csetraining19/record_coll/ff61d811e068450e0c012cdee3cfbaad3e98949c
....
[user2030@lab001 s3cmd-master]./s3cmd get --recursive s3://csetraining19/record_coll
...
s3://csetraining19/record_coll/ff61d811e068450e0c012cdee3cfbaad3e98949c -> ./record_coll/ff61d811e068450e0c012cdee3cfbaad3e98949c [396 of 396]
104225 of 104225 100% in 0s 1735.45 kB/s done
[user2030@lab001 s3cmd-master]
Now you can download curator-utils from here: http://cogcomp.cs.illinois.edu/download/software/58. Now run the sample application on the processed data. The following instructions assume you downloaded and unzipped curator-utils into the same cloudnlp/ directory.
[user2030@lab001 s3cmd-master] cd ~/cloudnlp/curator-utils
[user2030@lab001 curator-utils] ./scripts/runNerHistogram.sh ~/cloudnlp/s3cmd-master/result_coll/election-news/election2012/ > nerHistogram.out
[user2030@lab001 curator-utils]cat nerHistogram.out | more
http://s3tools.org/s3cmd
Code licensed under Apache License v2.0, documentation under CC BY 3.0.