Core Design Principles¶
The core design principle of Forte is the abstraction of NLP concepts and machine learning models, which provides better separation between data, model and tasks, but enables interactions between different components of the pipeline. Based on this, we make Forte:
Composable: Forte helps users to decompose a problem into data, models and tasks. The tasks can further be divided into sub-tasks. A complex use case can be solved by composing heterogeneous modules via straightforward python APIs or declarative configuration files. The components (e.g. models or tasks) in the pipeline can be flexibly swapped in and out, as long as the API contracts are matched. The approach greatly improves module reusability, enables fast development and makes the library flexible for user needs.
Generalizable and Extensible: Forte promotes generalization to support not only a wide range of NLP tasks, but also extensible for new tasks or new domains. In particular, Forte provides the Ontology system that helps users define types according to their tasks. Users can simply specify the type declaratively through JSON files. Our Code Generation tool will automatically generate python files ready to be used into your project. Check out our Ontology Generation documentation for more details.
Transparent Data Flow: Central to Forte’s composable architecture is a universal data format that supports seamless data flow between different steps. Forte advocates a transparent data flow to facilitate flexible process intervention and simple pipeline control. Combined with the general data format, Forte makes a perfect tool for data inspection, component swapping and result sharing. This is particularly helpful during team collaborations!
- DataPack: a data class that stores structured data and supports efficient data retrieval.
- Pipeline: an inference system that contains a set of processing components.
- Ontology: a system that defines the relations between NLP annotations, for example, the relation between words and documents, or between two words.
- Rich examples are included to demonstrate the use of Forte, including
implementation of cutting-edge models/algorithms and system construction.
More examples are continuously added…
Serialization: Showcasing how to serialize and deserialize data.
Chat Bot: This example showcases the use of Forte to build a retrieval-based chatbot and perform text analysis on the retrieved results.
Audio Reading: a simple speech processing example here to showcase forte’s capability to support a wide range of audio processing tasks.
Classification: a text classification example that support various format of table-like dataset
Clinical Pipeline: a project handling clinical datasets shows how to make Forte and Stave work side by side.
Content Rewriter: a example which rewrites the sentence based on the table given a table and a sentence.
Data Augmentation: this example demonstrates the usage of forte/models/da_rl/MetaAugmentationWrapper, that wraps a BERT Masked Language Model data augmentation model to perform this RL adaptive learning with a BERT-based text classifier downstream model.
Tagging: an implementation of CNN-BiLSTM-CRF model, built on top of Texar and Pytorch
Twitter sentiment analysis: this example show the use of Forte to perform sentiment analysis on the user’s retrieved tweets