Processor¶
A pipeline component that wraps inference model and set up inference related work.
Functions¶
initialize()
: Pipeline will call it at the start of processing. The processor will be initialized withconfigs
, and register global resources intoforte.common.Resources
. The implementation should set up the states of the component.default_configs
is a class method that returns default configuration in a dictionary format. Parent reader class configuration will be merged or overwritten by child class.default_configs
usage exampleTo use an existing processor, User should check configurations from method
default_configs
of the particular processor used to find out what configurations can be customized. For example, suppose after checking processor API we decide to useBaseProcessor
. Then we need to check the source offorte.processors.base.base_processor.BaseProcessor.default_configs()
and found that"overwrite"
is a boolean configuration and we can set it toFalse
in our customized configuration when we don’t want the default configuration. The default configuration will be overwritten when we initialize the processor with our customized configuration.To implement a new processor, User should check the appropriate processor to inherit from. One consideration is whether User wants to process a data pack or a data pack batch for each processing iteration. If it’s the data pack, then User should inherit from
PackProcessor
. If it’s the data pack batch, then User should inherit fromBaseBatchProcessor
For example, in the implementation ofVocabularyProcessor
, it inherits fromPackProcessor
because it builds vocabulary from data pack. Then User can consider adding a new configuration field indefault_configs()
based on the needs or overwrite the configuration field from its parent class. It’s just a simple consideration to explain the process of choosing the right processor, there are many other processors with more features that User can inherit from. User can refer to Processors API for more information.
resource
is for advanced developer. It’s an shared object that stores data accessible by allPipelineComponent
in the pipeline.
_process()
: The main function of the processor. The implementation should process theinput_pack
, and conduct operations such as adding entries into the pack, or produce some side-effect such as writing data into the disk.
We also have plenty of written processors available to use. If you don’t find one suitable in your case, you can refer to pipeline examples, API or tutorials to customize a new processor.