A pipeline component that wraps inference model and set up inference related work.
initialize(): Pipeline will call it at the start of processing. The processor will be initialized with
configs, and register global resources into
forte.common.Resources. The implementation should set up the states of the component.
default_configsis a class method that returns default configuration in a dictionary format. Parent reader class configuration will be merged or overwritten by child class.
To use an existing processor, User should check configurations from method
default_configsof the particular processor used to find out what configurations can be customized. For example, suppose after checking processor API we decide to use
BaseProcessor. Then we need to check the source of
forte.processors.base.base_processor.BaseProcessor.default_configs()and found that
"overwrite"is a boolean configuration and we can set it to
Falsein our customized configuration when we don’t want the default configuration. The default configuration will be overwritten when we initialize the processor with our customized configuration.
To implement a new processor, User should check the appropriate processor to inherit from. One consideration is whether User wants to process a
DataPackbatch for each processing iteration. If it’s the
DataPack, then User should inherit from
PackProcessor. If it’s the
DataPackbatch, then User should inherit from
BaseBatchProcessorFor example, in the implementation of
VocabularyProcessor, it inherits from
PackProcessorbecause it builds vocabulary from
DataPack. Then User can consider adding a new configuration field in
default_configs()based on the needs or overwrite the configuration field from its parent class. It’s just a simple consideration to explain the process of choosing the right processor, there are many other processors with more features that User can inherit from. User can refer to Processors API for more information.
resourceis for advanced developer. It’s an shared object that stores data accessible by all
PipelineComponentin the pipeline.
_process(): The main function of the processor. The implementation should process the
input_pack, and conduct operations such as adding entries into the pack, or produce some side-effect such as writing data into the disk.
We also have plenty of written processors available to use. If you don’t find one suitable in your case, you can refer to pipeline examples, API or tutorials to customize a new processor.