nefnir package¶

Submodules¶

nefnir.nefnir module¶

class nefnir.nefnir.Nefnir[source]¶

Bases: object

A rule-based lemmatizer

lemmatize(form, tag)[source]¶

Lemmatize a word form given its part-of-speech tag.

Parameters:	form – A word form. tag – The word form’s part-of-speech tag.
Returns:	The word form’s lemma.

recase(form, tag, lemma)[source]¶

Determine how to properly case a lemma given the word form and part of speech tag it was derived from.

Nefnir transforms words into lowercase prior to lemmatization. Some words, such as proper nouns, abbreviations and foreign words therefore need to be re-capitalized or changed back into uppercase.

Parameters:	form – A word form, cased as it was written. tag – The word form’s part-of-speech tag. lemma – The word form’s lemma, in lowercase.
Returns:	A properly cased lemma.

nefnir.nefnir.get_suffixes(s)[source]¶

Return an iterator yielding a string’s suffixes, from the largest to the smallest.

Parameters:	s – A text string.
Returns:	An iterator for the string’s suffixes.

nefnir.nefnir.main()[source]¶

nefnir.wrapper module¶

nefnir.wrapper.init() → None[source]¶: Read configuration files.

nefnir.wrapper.lemmatize(form: str, tag: str) → str[source]¶

Lemmatize a word form given its part-of-speech tag.

Parameters:	form – A word form. tag – The word form’s part-of-speech tag.
Returns:	The word form’s lemma.

nefnir.wrapper.lemmatize_line(line: str, separator: str = '\t') → Tuple[Optional[str], Optional[str], Optional[str]][source]¶

Lemmatize a word form given its part-of-speech tag.

Parameters:	line – A line with form and tag separated by seperator. separator – The token separator.
Returns:	Tuple with form, tag, lemma (any can be None if data invalid).

nefnir.wrapper.recase(form: str, tag: str, lemma: str) → str[source]¶

Determine how to properly case a lemma given the word form and part of speech tag it was derived from.

Nefnir transforms words into lowercase prior to lemmatization. Some words, such as proper nouns, abbreviations and foreign words therefore need to be re-capitalized or changed back into uppercase.

Parameters:	form – A word form, cased as it was written. tag – The word form’s part-of-speech tag. lemma – The word form’s lemma, in lowercase.
Returns:	A properly cased lemma.

Module contents¶

Top-level package for nefnir (nefnir-package).

nefnir.init() → None[source]¶: Read configuration files.

nefnir.lemmatize(form: str, tag: str) → str[source]¶

Lemmatize a word form given its part-of-speech tag.

Parameters:	form – A word form. tag – The word form’s part-of-speech tag.
Returns:	The word form’s lemma.

nefnir.lemmatize_line(line: str, separator: str = '\t') → Tuple[Optional[str], Optional[str], Optional[str]][source]¶

Lemmatize a word form given its part-of-speech tag.

Parameters:	line – A line with form and tag separated by seperator. separator – The token separator.
Returns:	Tuple with form, tag, lemma (any can be None if data invalid).

nefnir.recase(form: str, tag: str, lemma: str) → str[source]¶

Determine how to properly case a lemma given the word form and part of speech tag it was derived from.

Nefnir transforms words into lowercase prior to lemmatization. Some words, such as proper nouns, abbreviations and foreign words therefore need to be re-capitalized or changed back into uppercase.

Parameters:	form – A word form, cased as it was written. tag – The word form’s part-of-speech tag. lemma – The word form’s lemma, in lowercase.
Returns:	A properly cased lemma.

nefnir package¶

Submodules¶

nefnir.nefnir module¶

nefnir.wrapper module¶

Module contents¶

nefnir

Navigation

Related Topics