Modifier and Type | Class and Description |
---|---|
static interface |
BatchTrainer.DoneWithRound
Provides access to a hook into
train(int) so that
additional processing can be performed at the end of each round. |
Modifier and Type | Field and Description |
---|---|
protected int |
examples
The number of examples extracted during pre-extraction.
|
protected Field |
fieldIsTraining
learner 's isTraining field. |
protected Learner |
learner
The learning classifier being trained.
|
protected Class |
learnerClass
learner 's class. |
protected int |
lexiconSize
The number of features extracted during pre-extraction.
|
protected String |
messageIndent
Spacing for making status messages prettier.
|
protected Parser |
parser
The parser from which training data for
learner is received. |
protected int |
progressOutput
The number of training examples in between status messages printed to
STDOUT , or
0 to suppress these messages. |
Constructor and Description |
---|
BatchTrainer(Learner l,
Parser p)
Creates a new trainer that doesn't produce status messages.
|
BatchTrainer(Learner l,
Parser p,
int o)
Creates a new trainer that produces status messages.
|
BatchTrainer(Learner l,
Parser p,
int o,
String i)
Creates a new trainer that produces status
messages with the specified indentation spacing for status messages.
|
BatchTrainer(Learner l,
String p)
Creates a new trainer that doesn't produce status messages.
|
BatchTrainer(Learner l,
String p,
boolean z)
Creates a new trainer that doesn't produce status
messages.
|
BatchTrainer(Learner l,
String p,
boolean z,
int o)
Creates a new trainer that produces status
messages.
|
BatchTrainer(Learner l,
String p,
boolean z,
int o,
String i)
Creates a new trainer that produces status
messages with the specified indentation spacing for status messages.
|
BatchTrainer(Learner l,
String p,
int o)
Creates a new trainer that produces status messages.
|
BatchTrainer(Learner l,
String p,
int o,
String i)
Creates a new trainer that produces status
messages with the specified indentation spacing for status messages.
|
Modifier and Type | Method and Description |
---|---|
double[][] |
crossValidation(int[] rounds,
int k,
FoldParser.SplitPolicy splitPolicy,
double alpha,
TestingMetric metric,
boolean statusMessages)
Performs cross validation, computing a confidence interval on the performance of the learner
after each of the specified rounds of training.
|
protected double |
crossValidationTesting(FoldParser foldParser,
TestingMetric metric,
boolean clone,
boolean statusMessages)
Tests the learner
as a subroutine inside cross validation.
|
void |
fillInSizes()
This method sets the
examples and lexiconSize
variables by querying parser and learner respectively. |
protected boolean |
getIsTraining()
Returns the value of the static
isTraining flag inside
learner 's runtime class. |
Parser |
getParser()
Returns the value of
parser . |
int |
getProgressOutput()
Returns the value of
progressOutput . |
Lexicon |
preExtract(String exampleFile)
Performs labeled feature vector pre-extraction into the specified
file (or memory), replacing
parser with one that reads from that file (or memory). |
Lexicon |
preExtract(String exampleFile,
boolean zip)
Performs labeled feature vector pre-extraction into the
specified file (or memory), replacing
parser with one that reads from that file (or
memory). |
Learner |
preExtract(String exampleFile,
boolean zip,
Lexicon.CountPolicy countPolicy)
Performs labeled feature vector
pre-extraction into the specified file (or memory), replacing
parser with one that
reads from that file (or memory). |
Learner |
preExtract(String exampleFile,
Lexicon.CountPolicy countPolicy)
Performs labeled feature vector
pre-extraction into the specified file (or memory), replacing
parser with one that
reads from that file (or memory). |
void |
pruneDataset(String exampleFile,
boolean zip,
Lexicon.PruningPolicy policy,
Learner preExtractLearner)
Prunes the data returned
by
parser according to the given policy, under the assumption that feature counts
have already been compiled in the given learner's lexicon. |
void |
pruneDataset(String exampleFile,
Lexicon.PruningPolicy policy,
Learner preExtractLearner)
Prunes the data returned by
parser according to the given policy, under the assumption that feature counts have
already been compiled in the given learner's lexicon. |
protected void |
setIsTraining(boolean b)
Sets the static
isTraining flag inside
learner 's runtime class to the specified value. |
protected double |
testMidTraining(Parser testParser,
TestingMetric metric,
boolean clone)
Tests
learner on the
specified data while making provisions under the assumption that this test happens in between
rounds of training. |
void |
train(int rounds)
Trains
learner for the specified number of rounds. |
void |
train(int rounds,
BatchTrainer.DoneWithRound dwr)
Trains
learner for the specified number of rounds. |
void |
train(int start,
int rounds)
Trains
learner for the specified number of rounds. |
void |
train(int start,
int rounds,
BatchTrainer.DoneWithRound dwr)
Trains
learner for the specified number of
rounds. |
Learner.Parameters |
tune(Learner.Parameters[] parameters,
int[] rounds,
int k,
FoldParser.SplitPolicy splitPolicy,
double alpha,
TestingMetric metric)
Tune learning algorithm parameters using cross validation.
|
Learner.Parameters |
tune(Learner.Parameters[] parameters,
int[] rounds,
Parser devParser,
TestingMetric metric)
Tune learning algorithm
parameters against a development set.
|
static void |
writeExample(edu.illinois.cs.cogcomp.core.datastructures.vectors.ExceptionlessOutputStream out,
int[] featureIndexes,
double[] featureValues,
int[] labelIndexes,
double[] labelValues)
Writes an
example vector to the specified stream, with all features being written in the order they
appear in the vector.
|
static void |
writeExample(edu.illinois.cs.cogcomp.core.datastructures.vectors.ExceptionlessOutputStream out,
int[] featureIndexes,
double[] featureValues,
int[] labelIndexes,
double[] labelValues,
int unpruned)
Writes an
example vector to the specified stream, with all features being written in the order they
appear in the vector.
|
static void |
writeExample(edu.illinois.cs.cogcomp.core.datastructures.vectors.ExceptionlessOutputStream out,
int[] featureIndexes,
double[] featureValues,
int[] labelIndexes,
double[] labelValues,
int unpruned,
Lexicon lexicon)
Writes an example vector contained in an object array to the underlying output stream, with
features sorted according to their representations in the given lexicon if present, or in the
order they appear in the vector otherwise.
|
static void |
writeExample(edu.illinois.cs.cogcomp.core.datastructures.vectors.ExceptionlessOutputStream out,
int[] featureIndexes,
double[] featureValues,
int[] labelIndexes,
double[] labelValues,
Lexicon lex)
Writes
an example vector contained in an object array to the underlying output stream, with features
sorted according to their representations in the given lexicon if present, or in the order
they appear in the vector otherwise.
|
protected Learner learner
protected int progressOutput
STDOUT
, or
0 to suppress these messages.protected String messageIndent
protected int examples
protected int lexiconSize
public BatchTrainer(Learner l, String p)
l
- The learner to be trained.p
- The path to an example file.public BatchTrainer(Learner l, String p, int o)
l
- The learner to be trained.p
- The path to an example file.o
- The number of examples in between status messages on STDOUT.public BatchTrainer(Learner l, String p, int o, String i)
l
- The learner to be trained.p
- The path to an example file.o
- The number of examples in between status messages on STDOUT.i
- The indentation spacing for status messages.public BatchTrainer(Learner l, String p, boolean z)
l
- The learner to be trained.p
- The path to an example file.z
- Whether or not the example file is compressed.public BatchTrainer(Learner l, String p, boolean z, int o)
l
- The learner to be trained.p
- The path to an example file.z
- Whether or not the example file is compressed.o
- The number of examples in between status messages on STDOUT.public BatchTrainer(Learner l, String p, boolean z, int o, String i)
l
- The learner to be trained.p
- The path to an example file.z
- Whether or not the example file is compressed.o
- The number of examples in between status messages on STDOUT.i
- The indentation spacing for status messages.public BatchTrainer(Learner l, Parser p)
l
- The learner to be trained.p
- The parser from which training data is received.public BatchTrainer(Learner l, Parser p, int o)
l
- The learner to be trained.p
- The parser from which training data is received.o
- The number of examples in between status messages on STDOUT.public BatchTrainer(Learner l, Parser p, int o, String i)
l
- The learner to be trained.p
- The parser from which training data is received.o
- The number of examples in between status messages on STDOUT.i
- The indentation spacing for status messages.public static void writeExample(edu.illinois.cs.cogcomp.core.datastructures.vectors.ExceptionlessOutputStream out, int[] featureIndexes, double[] featureValues, int[] labelIndexes, double[] labelValues)
out
- The output stream.featureIndexes
- The lexicon indexes of the features.featureValues
- The values or "strengths" of the features.labelIndexes
- The lexicon indexes of the labels.labelValues
- The values or "strengths" of the labels.public static void writeExample(edu.illinois.cs.cogcomp.core.datastructures.vectors.ExceptionlessOutputStream out, int[] featureIndexes, double[] featureValues, int[] labelIndexes, double[] labelValues, int unpruned)
out
- The output stream.featureIndexes
- The lexicon indexes of the features.featureValues
- The values or "strengths" of the features.labelIndexes
- The lexicon indexes of the labels.labelValues
- The values or "strengths" of the labels.unpruned
- The number of features in the vector that aren't pruned.public static void writeExample(edu.illinois.cs.cogcomp.core.datastructures.vectors.ExceptionlessOutputStream out, int[] featureIndexes, double[] featureValues, int[] labelIndexes, double[] labelValues, Lexicon lex)
out
- The output stream.featureIndexes
- The lexicon indexes of the features.featureValues
- The values or "strengths" of the features.labelIndexes
- The lexicon indexes of the labels.labelValues
- The values or "strengths" of the labels.lex
- A lexicon.public static void writeExample(edu.illinois.cs.cogcomp.core.datastructures.vectors.ExceptionlessOutputStream out, int[] featureIndexes, double[] featureValues, int[] labelIndexes, double[] labelValues, int unpruned, Lexicon lexicon)
out
- The output stream.featureIndexes
- The lexicon indexes of the features.featureValues
- The values or "strengths" of the features.labelIndexes
- The lexicon indexes of the labels.labelValues
- The values or "strengths" of the labels.unpruned
- The number of features in the vector that aren't pruned.lexicon
- A lexicon.public int getProgressOutput()
progressOutput
.protected void setIsTraining(boolean b)
isTraining
flag inside
learner
's runtime class to the specified value. This probably doesn't need to be
tinkered with after pre-extraction, since it can only affect the code that does the
extraction.b
- The new value for the flag.protected boolean getIsTraining()
isTraining
flag inside
learner
's runtime class.public Lexicon preExtract(String exampleFile)
parser
with one that reads from that file (or memory).
After pre-extraction, the lexicon is written to disk. It is assumed that learner
already knows where to write the lexicon. If it doesn't, call
Learner.setLexiconLocation(String)
or
Learner.setLexiconLocation(java.net.URL)
on that object before calling this method.
Calling this method is equivalent to calling preExtract(String,boolean)
with the
second argument true
.
exampleFile
- The full path to a file into which examples will be written, or
null
to extract into memory.public Lexicon preExtract(String exampleFile, boolean zip)
parser
with one that reads from that file (or
memory). After pre-extraction, the lexicon is written to disk. It is assumed that
learner
already knows where to write the lexicon. If it doesn't, call
Learner.setLexiconLocation(String)
or
Learner.setLexiconLocation(java.net.URL)
on that object before calling this method.exampleFile
- The full path to a file into which examples will be written, or
null
to extract into memory.zip
- Whether or not to compress the extracted examples.public Learner preExtract(String exampleFile, Lexicon.CountPolicy countPolicy)
parser
with one that
reads from that file (or memory). If exampleFile
already exists, this method
writes the examples to a temporary file, then copies the contents to the existing file after
pre-extraction completes. This is done in case the parser providing the examples to this
method is reading the existing file.
Note that this method does not write the feature lexicon it produces to disk. Calling
this method is equivalent to calling preExtract(String,boolean,Lexicon.CountPolicy)
with the second argument true
.
exampleFile
- The full path to a file into which examples will be written, or
null
to extract into memory.countPolicy
- The feature counting policy for the learner's feature lexicon.public Learner preExtract(String exampleFile, boolean zip, Lexicon.CountPolicy countPolicy)
parser
with one that
reads from that file (or memory). If exampleFile
already exists, this method
writes the examples to a temporary file, then copies the contents to the existing file after
pre-extraction completes. This is done in case the parser providing the examples to this
method is reading the existing file.
Note that this method does not write the feature lexicon it produces to disk.
exampleFile
- The full path to a file into which examples will be written, or
null
to extract into memory.zip
- Whether or not to compress the extracted examples.countPolicy
- The feature counting policy for the learner's feature lexicon.public void fillInSizes()
examples
and lexiconSize
variables by querying parser
and learner
respectively. It sets
examples
to 0 if parser
is not an
ArrayFileParser
and lexiconSize
to 0 if
learner
doesn't either have the lexicon loaded or know where to find it.public void pruneDataset(String exampleFile, Lexicon.PruningPolicy policy, Learner preExtractLearner)
parser
according to the given policy, under the assumption that feature counts have
already been compiled in the given learner's lexicon. The pruned data is written to the given
file (or memory), and at the end of the method, parser
is replaced with a new parser
that reads from that file (or memory). The pruned lexicon is also written to disk.
If exampleFile
already exists, this method writes the examples to a temporary
file, then copies the contents to the existing file after pruning completes. This is done in
case the parser providing the examples to this method is reading the existing file.
When calling this method, it must be the case that parser
is a
ArrayFileParser
. This condition is easy to
satisfy, since the preExtract(String,boolean,Lexicon.CountPolicy)
method will
usually be called prior to this method to count the features in the dataset, and this method
also replaces parser
with a
ArrayFileParser
.
It is assumed that preExtractLearner
already knows where to write the lexicon.
If it doesn't, call Learner.setLexiconLocation(String)
or
Learner.setLexiconLocation(java.net.URL)
on that object before calling this method.
Calling this method is equivalent to calling
pruneDataset(String,boolean,Lexicon.PruningPolicy,Learner)
with the second argument
true
.
exampleFile
- The full path to a file into which examples will be written, or
null
to extract into memory.policy
- The type of feature pruning.preExtractLearner
- A learner whose lexicon contains all the necessary feature count
information.public void pruneDataset(String exampleFile, boolean zip, Lexicon.PruningPolicy policy, Learner preExtractLearner)
parser
according to the given policy, under the assumption that feature counts
have already been compiled in the given learner's lexicon. The pruned data is written to the
given file (or memory), and at the end of the method, parser
is replaced with a new
parser that reads from that file (or memory). The pruned lexicon is also written to disk.
If exampleFile
already exists, this method writes the examples to a temporary
file, then copies the contents to the existing file after pruning completes. This is done in
case the parser providing the examples to this method is reading the existing file.
When calling this method, it must be the case that parser
is an
ArrayFileParser
. This condition
is easy to satisfy, since the preExtract(String,boolean,Lexicon.CountPolicy)
method
will usually be called prior to this method to count the features in the dataset, and this
method also replaces parser
with an
ArrayFileParser
.
It is assumed that preExtractLearner
already knows where to write the lexicon.
If it doesn't, call Learner.setLexiconLocation(String)
or
Learner.setLexiconLocation(java.net.URL)
on that object before calling this method.
exampleFile
- The full path to a file into which examples will be written, or
null
to extract into memory.zip
- Whether or not to compress the extracted examples.policy
- The type of feature pruning.preExtractLearner
- A learner whose lexicon contains all the necessary feature count
information.public void train(int rounds)
learner
for the specified number of rounds. This learning
happens on top of any learning that learner
may have already done.rounds
- The number of passes to make over the training data.public void train(int start, int rounds)
learner
for the specified number of rounds. This
learning happens on top of any learning that learner
may have already done.start
- The 1-based number of the first training round.rounds
- The total number of training rounds including those before start
.public void train(int rounds, BatchTrainer.DoneWithRound dwr)
learner
for the specified number of rounds.
This learning happens on top of any learning that learner
may have already done.rounds
- The number of passes to make over the training data.dwr
- Performs post processing at the end of each round.public void train(int start, int rounds, BatchTrainer.DoneWithRound dwr)
learner
for the specified number of
rounds. This learning happens on top of any learning that learner
may have already
done.start
- The 1-based number of the first training round.rounds
- The total number of training rounds including those before start
.dwr
- Performs post processing at the end of each round.public double[][] crossValidation(int[] rounds, int k, FoldParser.SplitPolicy splitPolicy, double alpha, TestingMetric metric, boolean statusMessages)
learner
has
not yet done any learning. The learner will again be empty in this sense when the method
exits, except that any label lexicon present before the method was called will be restored.
The label lexicon needs to persist in this way so that it can ultimately be written into the
model file.rounds
- An array of training rounds after which performance of the learner should be
evaluated on the testing data.k
- The number of folds.splitPolicy
- The policy according to which the data is split up.alpha
- The fraction of the distribution to leave outside the confidence interval. For
example, alpha
= .05
gives a 95% confidence interval.metric
- A metric with which to evaluate the learner on testing data.statusMessages
- If set true
status messages will be produced, even if
progressOutput
is zero.results
where results[i][0]
is the average
performance of the learner after rounds[i]
rounds of training and
results[i][1]
is half the size of the corresponding confidence interval.protected double crossValidationTesting(FoldParser foldParser, TestingMetric metric, boolean clone, boolean statusMessages)
foldParser
- The cross validation parser that splits up the data.metric
- The metric used to evaluate the performance of the learner.clone
- Whether or not the learner should be cloned (and it should be cloned if more
learning will take place after making this call).statusMessages
- If set true
status messages will be produced, even if
progressOutput
is zero.Accuracy
.public Learner.Parameters tune(Learner.Parameters[] parameters, int[] rounds, int k, FoldParser.SplitPolicy splitPolicy, double alpha, TestingMetric metric)
Learner.Parameters
objects and
an array of rounds. As such, the value in the
Learner.Parameters.rounds
field is ignored
during tuning. It is also overwritten in each of the
Learner.Parameters
objects when the optimal
number of rounds is determined in terms of the other parameters in each object. Finally, in
addition to returning the parameters that got the best performance, this method also sets
learner
with those parameters at the end of the method.
This method assumes that learner
has not yet done any learning. The learner will
again be empty in this sense when the method exits, except that any label lexicon present
before the method was called will be restored. The label lexicon needs to persist in this way
so that it can ultimately be written into the model file.
parameters
- An array of parameter settings objects.rounds
- An array of training rounds after which performance of the learner should be
evaluated on the testing data.k
- The number of folds.splitPolicy
- The policy according to which the data is split up.alpha
- The fraction of the distribution to leave outside the confidence interval. For
example, alpha =
.05
gives a 95% confidence interval.metric
- A metric with which to evaluate the learner.parameters
that resulted in the best performance
according to metric
.public Learner.Parameters tune(Learner.Parameters[] parameters, int[] rounds, Parser devParser, TestingMetric metric)
Learner.Parameters
objects and an array of
rounds. As such, the value in the
Learner.Parameters.rounds
field is ignored
during tuning. It is also overwritten in each of the
Learner.Parameters
objects when the optimal
number of rounds is determined in terms of the other parameters in each object. Finally, in
addition to returning the parameters that got the best performance, this method also sets
learner
with those parameters at the end of the method.
This method assumes that learner
has not yet done any learning. The learner will
again be empty in this sense when the method exits, except that any label lexicon present
before the method was called will be restored. The label lexicon needs to persist in this way
so that it can ultimately be written into the model file.
parameters
- An array of parameter settings objects.rounds
- An array of training rounds after which performance of the learner should be
evaluated on the testing data.devParser
- A parser from which development set examples are obtained.metric
- A metric with which to evaluate the learner.parameters
that resulted in the best performance
according to metric
.protected double testMidTraining(Parser testParser, TestingMetric metric, boolean clone)
learner
on the
specified data while making provisions under the assumption that this test happens in between
rounds of training.testParser
- A parser producing labeled testing examples.metric
- The metric used to evaluate the performance of the learner.clone
- Whether or not the learner should be cloned (and it should be cloned if more
learning will take place after making this call).Accuracy
.Copyright © 2016. All rights reserved.