Processor¶
A pipeline component that wraps inference model and set up inference related work.
Functions¶
initialize(): Pipeline will call it at the start of processing. The processor will be initialized withconfigs, and register global resources intoforte.common.Resources. The implementation should set up the states of the component.default_configsis a class method that returns default configuration in a dictionary format. Parent reader class configuration will be merged or overwritten by child class.default_configsusage exampleTo use an existing processor, User should check configurations from method
default_configsof the particular processor used to find out what configurations can be customized. For example, suppose after checking processor API we decide to useBaseProcessor. Then we need to check the source offorte.processors.base.base_processor.BaseProcessor.default_configs()and found that"overwrite"is a boolean configuration and we can set it toFalsein our customized configuration when we don’t want the default configuration. The default configuration will be overwritten when we initialize the processor with our customized configuration.To implement a new processor, User should check the appropriate processor to inherit from. One consideration is whether User wants to process a
DataPackor aDataPackbatch for each processing iteration. If it’s theDataPack, then User should inherit fromPackProcessor. If it’s theDataPackbatch, then User should inherit fromBaseBatchProcessorFor example, in the implementation ofVocabularyProcessor, it inherits fromPackProcessorbecause it builds vocabulary fromDataPack. Then User can consider adding a new configuration field indefault_configs()based on the needs or overwrite the configuration field from its parent class. It’s just a simple consideration to explain the process of choosing the right processor, there are many other processors with more features that User can inherit from. User can refer to Processors API for more information.
resourceis for advanced developer. It’s an shared object that stores data accessible by allPipelineComponentin the pipeline.
_process(): The main function of the processor. The implementation should process theinput_pack, and conduct operations such as adding entries into the pack, or produce some side-effect such as writing data into the disk.
We also have plenty of written processors available to use. If you don’t find one suitable in your case, you can refer to pipeline examples, API or tutorials to customize a new processor.