nefnir package¶
Submodules¶
nefnir.nefnir module¶
-
class
nefnir.nefnir.
Nefnir
[source]¶ Bases:
object
A rule-based lemmatizer
-
lemmatize
(form, tag)[source]¶ Lemmatize a word form given its part-of-speech tag.
Parameters: - form – A word form.
- tag – The word form’s part-of-speech tag.
Returns: The word form’s lemma.
-
recase
(form, tag, lemma)[source]¶ Determine how to properly case a lemma given the word form and part of speech tag it was derived from.
Nefnir transforms words into lowercase prior to lemmatization. Some words, such as proper nouns, abbreviations and foreign words therefore need to be re-capitalized or changed back into uppercase.
Parameters: - form – A word form, cased as it was written.
- tag – The word form’s part-of-speech tag.
- lemma – The word form’s lemma, in lowercase.
Returns: A properly cased lemma.
-
nefnir.wrapper module¶
-
nefnir.wrapper.
lemmatize
(form: str, tag: str) → str[source]¶ Lemmatize a word form given its part-of-speech tag.
Parameters: - form – A word form.
- tag – The word form’s part-of-speech tag.
Returns: The word form’s lemma.
-
nefnir.wrapper.
lemmatize_line
(line: str, separator: str = '\t') → Tuple[Optional[str], Optional[str], Optional[str]][source]¶ Lemmatize a word form given its part-of-speech tag.
Parameters: - line – A line with form and tag separated by seperator.
- separator – The token separator.
Returns: Tuple with form, tag, lemma (any can be None if data invalid).
-
nefnir.wrapper.
recase
(form: str, tag: str, lemma: str) → str[source]¶ Determine how to properly case a lemma given the word form and part of speech tag it was derived from.
Nefnir transforms words into lowercase prior to lemmatization. Some words, such as proper nouns, abbreviations and foreign words therefore need to be re-capitalized or changed back into uppercase.
Parameters: - form – A word form, cased as it was written.
- tag – The word form’s part-of-speech tag.
- lemma – The word form’s lemma, in lowercase.
Returns: A properly cased lemma.
Module contents¶
Top-level package for nefnir (nefnir-package).
-
nefnir.
lemmatize
(form: str, tag: str) → str[source]¶ Lemmatize a word form given its part-of-speech tag.
Parameters: - form – A word form.
- tag – The word form’s part-of-speech tag.
Returns: The word form’s lemma.
-
nefnir.
lemmatize_line
(line: str, separator: str = '\t') → Tuple[Optional[str], Optional[str], Optional[str]][source]¶ Lemmatize a word form given its part-of-speech tag.
Parameters: - line – A line with form and tag separated by seperator.
- separator – The token separator.
Returns: Tuple with form, tag, lemma (any can be None if data invalid).
-
nefnir.
recase
(form: str, tag: str, lemma: str) → str[source]¶ Determine how to properly case a lemma given the word form and part of speech tag it was derived from.
Nefnir transforms words into lowercase prior to lemmatization. Some words, such as proper nouns, abbreviations and foreign words therefore need to be re-capitalized or changed back into uppercase.
Parameters: - form – A word form, cased as it was written.
- tag – The word form’s part-of-speech tag.
- lemma – The word form’s lemma, in lowercase.
Returns: A properly cased lemma.