Vocabulary¶
Vocabulary¶
-
class
forte.data.vocabulary.
Vocabulary
(method, need_pad, use_unk)[source]¶ This class will store “Elements” that are added, assign “Ids” to them and return “Representations” if queried. These three are the main concepts in this class.
Element: Any hash-able instance that the user want to store.
Id: Each element will have an unique Id, which is an integer.
Representation: according to the configuration, the representation for an element could be an integer (in this case, would be “Id”), or an one-hot vector (in this case, would be a list of integer).
There are two special elements.
One is <PAD> element, which will be mapped into Id of 0 or -1 and have different representation according to different setting.
The other one is <UNK> element, which, if added into the vocabulary, will be the default element if the queried element is not found.
Here is a table on how our Vocabulary class behavior under different settings. Element0 means the first element that is added to the vocabulary. Elements added later will be element1, element2 and so on. They will follow the same behavior as element0. For readability, they are not listed in the table.
Vocabulary Behavior under different settings.¶ vocab_method
raw (handle outside)
indexing
indexing
one-hot
one-hot
need_pad
assume False
True
False
True
False
get_pad_value
None
0
None
[0,0,0]
None
inner_mapping
None
0:pad 1:element0
0:element0
-1:<PAD> 0:element0
0:element0
element2repr
raise Error
pad->0 element0->1
element0->0
<PAD>->[0,0,0] element0->[1,0,0]
element0->[1,0,0]
id2element
raise Error
0->pad 1->element0
0->element0
-1 -> <PAD> 0->element0 (be careful)
0->element0
- Parameters
-
add_element
(element)[source]¶ This function will add element to the vocabulary.
- Parameters
element (Hashable) – The element to be added.
-
element2repr
(element)[source]¶ This function will map element to representation.
- Parameters
element (Hashable) – The queried element.
- Returns
The corresponding representation of the element. Check the behavior of this function under different setting in the documentation.
- Return type
- Raises
KeyError – If element is not found and vocabulary does not use <UNK> element.
-
has_element
(element)[source]¶ This function checks whether an element is added to vocabulary.
- Parameters
element (Hashable) – The queried element.
- Returns
Whether element is found.
- Return type
-
items
()[source]¶ This function will loop over the (element, id) pair inside this class.
- Returns
Iterables of (element, id) pair.
- Return type
Iterable[Tuple]