Processors

Base Processors

BaseProcessor

class forte.processors.base.base_processor.BaseProcessor[source]

Base class inherited by all kinds of processors such as trainer, predictor and evaluator.

record(record_meta)[source]

Method to add output record of the current processor to forte.data.data_pack.Meta.record. The key of the record should be the entry type and values should be attributes of the entry type. All the information would be used for consistency checking purpose if the pipeline is initialized with enforce_consistency=True.

Parameters

record_meta – The field in the datapack for type record that need to fill in for consistency checking.

expected_types_and_attributes()[source]

Method to add expected types and attributes for the input of the current processor which would be checked before running the processor if if the pipeline is initialized with enforce_consistency=True.

check_record(input_pack)[source]

Method to check type consistency if the pipeline is initialized with enforce_consistency=True. If any expected type or its attribute does not exist in the datapack record of the previous pipeline component, an error of ExpectedRecordNotFound will be raised.

Parameters

input_pack – The input datapack.

write_record(input_pack)[source]

Method to write records of the output type of the current processor to the datapack. The key of the record should be the entry type and values should be attributes of the entry type. All the information would be used for consistency checking purpose if the pipeline is initialized with enforce_consistency=True.

Parameters

input_pack – The input datapack.

classmethod default_configs()[source]

Returns a dict of configurations of the processor with default values. Used to replace the missing values of input configs during pipeline construction.

BaseBatchProcessor

class forte.processors.base.batch_processor.BaseBatchProcessor[source]

The base class of processors that process data in batch. This processor enables easy data batching via analyze the context and data objects. The context defines the scope of analysis of a particular task.

For example, in dependency parsing, the context is normally a sentence, in entity coreference, the context is normally a document. The processor will create data batches relative to the context.

Key fields in this processor:

  • batcher: The processing batcher used for this processor. The batcher will also keep track of the relation between the pack and the batch data.

  • use_coverage_index: If true, the index will be built based on the requests.

initialize(resources, configs)[source]

The pipeline will call the initialize method at the start of a processing. The processor and reader will be initialized with configs, and register global resources into resource. The implementation should set up the states of the component.

Parameters
  • resources (Resources) – A global resource register. User can register shareable resources here, for example, the vocabulary.

  • configs (Config) – The configuration passed in to set up this component.

flush()[source]

Indicate that there will be no more packs to be passed in, handle what’s remaining in the buffer.

classmethod default_configs()[source]

Defines the default configs for batching processor.

abstract classmethod define_batcher()[source]

Define a specific batcher for this processor. Single pack BatchProcessor initialize the batcher to be a ProcessingBatcher. And MultiPackBatchProcessor initialize the batcher to be a MultiPackProcessingBatcher.

BasePackProcessor

class forte.processors.base.pack_processor.BasePackProcessor[source]

The base class of processors that process one pack in a streaming way. If you are looking for batching (that might happen across packs, refer to BaseBatchProcessor.

PackProcessor

class forte.processors.base.pack_processor.PackProcessor[source]

The base class of processors that process one DataPack each time.

Task Processors

CoNLLNERPredictor

class forte.processors.nlp.ner_predictor.CoNLLNERPredictor[source]

An Named Entity Recognizer trained according to Ma, Xuezhe, and Eduard Hovy. “End-to-end sequence labeling via bi-directional lstm-cnns-crf.”.

Note that to use CoNLLNERPredictor, the ontology of Pipeline must be an ontology that include ft.onto.base_ontology.Token and ft.onto.base_ontology.Sentence.

initialize(resources, configs)[source]

The pipeline will call the initialize method at the start of a processing. The processor and reader will be initialized with configs, and register global resources into resource. The implementation should set up the states of the component.

Parameters
  • resources (Resources) – A global resource register. User can register shareable resources here, for example, the vocabulary.

  • configs (Config) – The configuration passed in to set up this component.

predict(data_batch)[source]

The function that task processors should implement. Make predictions for the input data_batch.

Parameters

data_batch (dict) – A batch of instances in our dict format.

Returns

The prediction results in dictionary form.

pack(pack, predict_results, _=None)[source]

Write the prediction results back to datapack. by writing the predicted ner to the original tokens.

get_batch_tensor(data, device=None)[source]

Get the tensors to be fed into the model.

Parameters
  • data – A list of tuple (word_ids, char_id_sequences)

  • device – The device for the tensors.

Returns

A tuple where

  • words: A tensor of shape [batch_size, batch_length] representing the word ids in the batch

  • chars: A tensor of shape [batch_size, batch_length, char_length] representing the char ids for each word in the batch

  • masks: A tensor of shape [batch_size, batch_length] representing the indices to be masked in the batch. 1 indicates no masking.

  • lengths: A tensor of shape [batch_size] representing the length of each sentences in the batch

classmethod default_configs()[source]

Default config for NER Predictor

SRLPredictor

class forte.processors.nlp.srl_predictor.SRLPredictor[source]

An Semantic Role labeler trained according to He, Luheng, et al. “Jointly predicting predicates and arguments in neural semantic role labeling.”.

initialize(resources, configs)[source]

The pipeline will call the initialize method at the start of a processing. The processor and reader will be initialized with configs, and register global resources into resource. The implementation should set up the states of the component.

Parameters
  • resources (Resources) – A global resource register. User can register shareable resources here, for example, the vocabulary.

  • configs (Config) – The configuration passed in to set up this component.

predict(data_batch)[source]

The function that task processors should implement. Make predictions for the input data_batch.

Parameters

data_batch (dict) – A batch of instances in our dict format.

Returns

The prediction results in dictionary form.

pack(pack, predict_results, _=None)[source]

The function that task processors should implement. It is the custom function on how to add the predicted output back to the data pack.

Parameters
  • pack (PackType) – The pack to add entries or fields to.

  • predict_results (Dict) – The prediction results returned by predict(). This processor will add these results to the provided pack as entry and attributes.

  • context (Optional[Annotation]) – The context entry that the prediction is performed, and the pack operation should be performed related to this range annotation. If None, then we consider the whole data pack is used as the context.

classmethod default_configs()[source]

This defines the default configuration structure for the predictor.

VocabularyProcessor

class forte.processors.misc.vocabulary_processor.VocabularyProcessor[source]

Build vocabulary from the input DataPack, write the result into the shared resources.

Alphabet

class forte.processors.misc.vocabulary_processor.Alphabet(name, word_cnt=None, keep_growing=True, ignore_case_in_query=True, other_embeddings=None)[source]
Parameters
  • name – The name of the alphabet

  • keep_growing – If True, new instances not found ruing get_index will be added to the vocabulary.

  • ignore_case_in_query – If it’s True, Alphabet will try to query the lower-cased input from it’s vocabulary if it cannot find the input in its keys.

get_index(instance)[source]
Parameters

instance – the input token

Returns

the index of the queried token in the dictionary

save(output_directory, name=None)[source]

Save both alphabet records to the given directory.

Parameters
  • output_directory – Directory to save model and weights.

  • name – The alphabet saving name, optional.