Pipeline

Process Pipeline

class forte.pipeline.Pipeline(resource=None)[source]

This controls the main inference flow of the system. A pipeline is consisted of a set of Components (readers and processors). The data flows in the pipeline as data packs, and each component will use or add information to the data packs.

init_from_config_path(config_path)[source]

Read the configurations from the given path config_path and build the pipeline with the config.

Parameters

config_path – A string of the configuration path, which is is a YAML file that specify the structure and parameters of the pipeline.

init_from_config(configs)[source]

Initialized the pipeline (ontology and processors) from the given configurations.

Parameters

configs – The configs used to initialize the pipeline.

add_gold_packs(pack)[source]

Add gold packs to the dictionary. This dictionary is used by the evaluator while calling consume_next(…)

Parameters

pack (Dict) – A key, value pair containing job.id -> gold_pack mapping

process(*args, **kwargs)[source]

Alias for process_one().

Parameters
  • args – The positional arguments used to get the initial data.

  • kwargs – The keyword arguments used to get the initial data.

run(*args, **kwargs)[source]

Run the whole pipeline and ignore all returned DataPack. This is mostly used when you need to run the pipeline and do not require the output but rely on the side-effect. For example, if the pipeline writes some data to disk.

Calling this function will automatically call the initialize() at the beginning, and call the finish() at the end.

Parameters
  • args – The positional arguments used to get the initial data.

  • kwargs – The keyword arguments used to get the initial data.

process_one(*args, **kwargs)[source]

Process one single data pack. This is done by only reading and processing the first pack in the reader.

Parameters

kwargs – the information needed to load the data. For example, if _reader is StringReader, this should contain a single piece of text in the form of a string variable. If _reader is a file reader, this can point to the file path.

process_dataset(*args, **kwargs)[source]

Process the documents in the data source(s) and return an iterator or list of DataPacks. The arguments are directly passed to the reader to take data from the source.

finish()[source]

Call the finish method of all pipeline component. This need to be called explicitly to release all resources.

Returns:

Train Pipeline

class forte.train_pipeline.TrainPipeline(train_reader, trainer, dev_reader, configs, preprocessors=None, evaluator=None, predictor=None)[source]

Pipeline Component

class forte.pipeline_component.PipelineComponent[source]
initialize(resources, configs)[source]

The pipeline will call the initialize method at the start of a processing. The processor and reader will be initialized with configs, and register global resources into resource. The implementation should set up the states of the component.

Parameters
  • resources (Resources) – A global resource register. User can register shareable resources here, for example, the vocabulary.

  • configs (Config) – The configuration passed in to set up this component.

add_entry(pack, entry)[source]

The component can manually call this function to add the entry into the data pack immediately. Otherwise, the system will add the entries automatically when this component finishes.

Parameters
  • pack (BasePack) – The pack to add the entry into.

  • entry (Entry) – The entry to be added.

Returns:

flush()[source]

Indicate that there will be no more packs to be passed in, handle what’s remaining in the buffer.

finish(resource)[source]

The pipeline will call this function at the end of the pipeline to notify all the components. The user can implement this function to release resources used by this component. The component can also add objects to the resources.

Parameters

resource (Resources) – A global resource registry.

classmethod make_configs(configs)[source]

Create the component configuration for this class, by merging the provided config with the default_config.

The following config conventions are expected:
  • The top level key can be a special config_path.

  • config_path should be point to a file system path, which will

    be a YAML file containing configurations.

  • Other key values in the configs will be considered as parameters.

Parameters

configs – The input config to be merged with the default config.

Returns

The merged configuration.

classmethod default_configs()[source]

Returns a dict of configurations of the component with default values. Used to replace the missing values of input configs during pipeline construction.