Processors

Base Processors

BaseProcessor

class forte.processors.base.base_processor.BaseProcessor[source]

Base class inherited by all kinds of processors such as trainer, predictor and evaluator.

classmethod default_configs()[source]

Returns a dict of configurations of the processor with default values. Used to replace the missing values of input configs during pipeline construction.

BaseBatchProcessor

class forte.processors.base.batch_processor.BaseBatchProcessor[source]

The base class of processors that process data in batch. This processor enables easy data batching via analyze the context and data objects. The context defines the scope of analysis of a particular task.

For example, in dependency parsing, the context is normally a sentence, in entity coreference, the context is normally a document. The processor will create data batches relative to the context.

Key fields in this processor:
  • context_type (Annotation): define the context (scope) to process.

  • input_info: A data request. Based on this input_info. If use_coverage_index is set to true, the processor will build the index based on the input information to speed up the entry searching time.

  • batcher: The processing batcher used for this processor.The batcher will also keep track of the relation between the pack and the batch data.

  • use_coverage_index: If true, the index will be built based on the input_info.

initialize(resources, configs)[source]

The pipeline will call the initialize method at the start of a processing. The processor and reader will be initialized with configs, and register global resources into resource. The implementation should set up the states of the component.

Parameters
  • resources (Resources) – A global resource register. User can register shareable resources here, for example, the vocabulary.

  • configs (Config) – The configuration passed in to set up this component.

flush()[source]

Indicate that there will be no more packs to be passed in, handle what’s remaining in the buffer.

abstract predict(data_batch)[source]

The function that task processors should implement. Make predictions for the input data_batch.

Parameters

data_batch (dict) – A batch of instances in our dict format.

Returns

The prediction results in dict datasets.

pack_all(packs, output_dict)[source]

Pack the prediction results output_dict back to the corresponding packs.

classmethod default_configs()[source]

A default config contains the field for batcher.

abstract pack(pack, inputs)[source]

The function that task processors should implement.

Add corresponding fields to pack. Custom function of how to add the value back.

Parameters
  • pack (PackType) – The pack to add entries or fields to.

  • inputs – The prediction results returned by predict(). You need to add entries or fields corresponding to this prediction results to pack.

abstract static define_batcher()[source]

Define a specific batcher for this processor. Single pack BatchProcessor initialize the batcher to be a ProcessingBatcher. And MultiPackBatchProcessor initialize the batcher to be a MultiPackProcessingBatcher.

BatchProcessor

class forte.processors.base.batch_processor.BatchProcessor[source]

The batch processors that process DataPack.

BasePackProcessor

class forte.processors.base.pack_processor.BasePackProcessor[source]

The base class of processors that process one pack sequentially. If you are looking for batching (that might happen across packs, refer to BaseBatchProcessor.

PackProcessor

class forte.processors.base.pack_processor.PackProcessor[source]

The base class of processors that process one DataPack each time.

Task Processors

CoNLLNERPredictor

class forte.processors.ner_predictor.CoNLLNERPredictor[source]

An Named Entity Recognizer trained according to Ma, Xuezhe, and Eduard Hovy. “End-to-end sequence labeling via bi-directional lstm-cnns-crf.”.

Note that to use CoNLLNERPredictor, the ontology of Pipeline must be an ontology that include ft.onto.base_ontology.Token and ft.onto.base_ontology.Sentence.

initialize(resources, configs)[source]

The pipeline will call the initialize method at the start of a processing. The processor and reader will be initialized with configs, and register global resources into resource. The implementation should set up the states of the component.

Parameters
  • resources (Resources) – A global resource register. User can register shareable resources here, for example, the vocabulary.

  • configs (Config) – The configuration passed in to set up this component.

predict(data_batch)[source]

The function that task processors should implement. Make predictions for the input data_batch.

Parameters

data_batch (dict) – A batch of instances in our dict format.

Returns

The prediction results in dict datasets.

pack(data_pack, output_dict=None)[source]

Write the prediction results back to datapack. by writing the predicted ner to the original tokens.

get_batch_tensor(data, device=None)[source]

Get the tensors to be fed into the model.

Parameters
  • data – A list of tuple (word_ids, char_id_sequences)

  • device – The device for the tensors.

Returns

A tuple where

  • words: A tensor of shape [batch_size, batch_length] representing the word ids in the batch

  • chars: A tensor of shape [batch_size, batch_length, char_length] representing the char ids for each word in the batch

  • masks: A tensor of shape [batch_size, batch_length] representing the indices to be masked in the batch. 1 indicates no masking.

  • lengths: A tensor of shape [batch_size] representing the length of each sentences in the batch

classmethod default_configs()[source]

Default config for NER Predictor

SRLPredictor

class forte.processors.srl_predictor.SRLPredictor[source]

An Semantic Role labeler trained according to He, Luheng, et al. “Jointly predicting predicates and arguments in neural semantic role labeling.”.

initialize(resources, configs)[source]

The pipeline will call the initialize method at the start of a processing. The processor and reader will be initialized with configs, and register global resources into resource. The implementation should set up the states of the component.

Parameters
  • resources (Resources) – A global resource register. User can register shareable resources here, for example, the vocabulary.

  • configs (Config) – The configuration passed in to set up this component.

predict(data_batch)[source]

The function that task processors should implement. Make predictions for the input data_batch.

Parameters

data_batch (dict) – A batch of instances in our dict format.

Returns

The prediction results in dict datasets.

pack(data_pack, inputs)[source]

The function that task processors should implement.

Add corresponding fields to pack. Custom function of how to add the value back.

Parameters
  • pack (PackType) – The pack to add entries or fields to.

  • inputs – The prediction results returned by predict(). You need to add entries or fields corresponding to this prediction results to pack.

classmethod default_configs()[source]

This defines a basic config structure :return:

VocabularyProcessor

class forte.processors.vocabulary_processor.VocabularyProcessor[source]

Build vocabulary from the input DataPack, write the result into the shared resources.

Alphabet

class forte.processors.vocabulary_processor.Alphabet(name, word_cnt=None, keep_growing=True, ignore_case_in_query=True, other_embeddings=None)[source]
Parameters
  • name

  • keep_growing

  • ignore_case_in_query – If it’s True, Alphabet will try to query the lowercased input from it’s vocabulary if it cannot find the input in its keys.

get_index(instance)[source]
Parameters

instance – the input token

Returns

the index of the queried token in the dictionary

save(output_directory, name=None)[source]

Save both alphabet records to the given directory.

Parameters
  • output_directory – Directory to save model and weights.

  • name – The alphabet saving name, optional.