Processors¶
Base Processors¶
BaseProcessor¶
BaseBatchProcessor¶
-
class
forte.processors.base.batch_processor.
BaseBatchProcessor
[source]¶ The base class of processors that process data in batch. This processor enables easy data batching via analyze the context and data objects. The context defines the scope of analysis of a particular task.
For example, in dependency parsing, the context is normally a sentence, in entity coreference, the context is normally a document. The processor will create data batches relative to the context.
- Key fields in this processor:
context_type (Annotation): define the context (scope) to process.
input_info: A data request. Based on this input_info. If use_coverage_index is set to true, the processor will build the index based on the input information to speed up the entry searching time.
batcher: The processing batcher used for this processor.The batcher will also keep track of the relation between the pack and the batch data.
use_coverage_index: If true, the index will be built based on the input_info.
-
initialize
(resources, configs)[source]¶ The pipeline will call the initialize method at the start of a processing. The processor and reader will be initialized with
configs
, and register global resources intoresource
. The implementation should set up the states of the component.- Parameters
resources (Resources) – A global resource register. User can register shareable resources here, for example, the vocabulary.
configs (Config) – The configuration passed in to set up this component.
-
flush
()[source]¶ Indicate that there will be no more packs to be passed in, handle what’s remaining in the buffer.
-
abstract
predict
(data_batch)[source]¶ The function that task processors should implement. Make predictions for the input
data_batch
.- Parameters
data_batch (dict) – A batch of instances in our
dict
format.- Returns
The prediction results in dict datasets.
-
pack_all
(packs, output_dict)[source]¶ Pack the prediction results
output_dict
back to the corresponding packs.
-
abstract
pack
(pack, inputs)[source]¶ The function that task processors should implement.
Add corresponding fields to
pack
. Custom function of how to add the value back.- Parameters
pack (PackType) – The pack to add entries or fields to.
inputs – The prediction results returned by
predict()
. You need to add entries or fields corresponding to this prediction results topack
.
-
abstract static
define_batcher
()[source]¶ Define a specific batcher for this processor. Single pack
BatchProcessor
initialize the batcher to be aProcessingBatcher
. AndMultiPackBatchProcessor
initialize the batcher to be aMultiPackProcessingBatcher
.
BatchProcessor¶
BasePackProcessor¶
Task Processors¶
CoNLLNERPredictor¶
-
class
forte.processors.ner_predictor.
CoNLLNERPredictor
[source]¶ An Named Entity Recognizer trained according to Ma, Xuezhe, and Eduard Hovy. “End-to-end sequence labeling via bi-directional lstm-cnns-crf.”.
Note that to use
CoNLLNERPredictor
, theontology
ofPipeline
must be an ontology that includeft.onto.base_ontology.Token
andft.onto.base_ontology.Sentence
.-
initialize
(resources, configs)[source]¶ The pipeline will call the initialize method at the start of a processing. The processor and reader will be initialized with
configs
, and register global resources intoresource
. The implementation should set up the states of the component.- Parameters
resources (Resources) – A global resource register. User can register shareable resources here, for example, the vocabulary.
configs (Config) – The configuration passed in to set up this component.
-
predict
(data_batch)[source]¶ The function that task processors should implement. Make predictions for the input
data_batch
.- Parameters
data_batch (dict) – A batch of instances in our
dict
format.- Returns
The prediction results in dict datasets.
-
pack
(data_pack, output_dict=None)[source]¶ Write the prediction results back to datapack. by writing the predicted ner to the original tokens.
-
get_batch_tensor
(data, device=None)[source]¶ Get the tensors to be fed into the model.
- Parameters
data – A list of tuple (word_ids, char_id_sequences)
device – The device for the tensors.
- Returns
A tuple where
words
: A tensor of shape [batch_size, batch_length] representing the word ids in the batchchars
: A tensor of shape [batch_size, batch_length, char_length] representing the char ids for each word in the batchmasks
: A tensor of shape [batch_size, batch_length] representing the indices to be masked in the batch. 1 indicates no masking.lengths
: A tensor of shape [batch_size] representing the length of each sentences in the batch
-
SRLPredictor¶
-
class
forte.processors.srl_predictor.
SRLPredictor
[source]¶ An Semantic Role labeler trained according to He, Luheng, et al. “Jointly predicting predicates and arguments in neural semantic role labeling.”.
-
initialize
(resources, configs)[source]¶ The pipeline will call the initialize method at the start of a processing. The processor and reader will be initialized with
configs
, and register global resources intoresource
. The implementation should set up the states of the component.- Parameters
resources (Resources) – A global resource register. User can register shareable resources here, for example, the vocabulary.
configs (Config) – The configuration passed in to set up this component.
-
predict
(data_batch)[source]¶ The function that task processors should implement. Make predictions for the input
data_batch
.- Parameters
data_batch (dict) – A batch of instances in our
dict
format.- Returns
The prediction results in dict datasets.
-
pack
(data_pack, inputs)[source]¶ The function that task processors should implement.
Add corresponding fields to
pack
. Custom function of how to add the value back.- Parameters
pack (PackType) – The pack to add entries or fields to.
inputs – The prediction results returned by
predict()
. You need to add entries or fields corresponding to this prediction results topack
.
-
VocabularyProcessor¶
Alphabet¶
-
class
forte.processors.vocabulary_processor.
Alphabet
(name, word_cnt=None, keep_growing=True, ignore_case_in_query=True, other_embeddings=None)[source]¶ - Parameters
name –
keep_growing –
ignore_case_in_query – If it’s True, Alphabet will try to query the lowercased input from it’s vocabulary if it cannot find the input in its keys.