Processing "de Meunier" doesn't recognized the prefix #121

akimotode · 2021-03-17T00:21:25Z

I'm not sure if I am missing something, but if I run the parser on the string "de Mesnil", I am expecting it to give me either a first or a last name of "de Mesnil" (preferably the latter), given that "de" is a known prefix.

Instead I am getting a first name "de" and a last name "Mesnil".

That seems contradictory to the documentation for prefixes: Name pieces that appear before a last name. Prefixes join to the piece that follows them to make one new piece.

derek73 · 2021-03-17T18:03:50Z

I have not looked into the code to see for sure, but I believe the parser treats prefixes as a name piece instead of a prefix when there are only 2 total space-separated strings. This would be helpful for people who have first names that clash with prefixes, I'd have to look if there are any specific examples in the tests.

I'm curious your use case. Do you have examples in your data that occasionally include only last names and you want the parser to tell you that it is indeed a last name?

akimotode · 2021-03-18T21:18:04Z

Thanks for the quick answer!

I believe my use is what you imply. In this specific case, the text includes three versions of the "name" in different places: "Sergeant de Mesnil", "Walter de Mesnil" and "de Mesnil".

After adding "sergeant" as a custom title I get three different parsings:
"Sergeant de Mesnil" --> {title: "sergeant", last_name: "de Mesnil"}
"Walter de Mesnil" ---> {first_name: "Walter", last_name: "de Mesnil"}
"de Mesnil" ---> {first_name: "de", last_name: "Mesnil"}

On a side-note: It would be neat if there was an explicit LAST_NAME_TITLE option for titles. This would be handy for military titles like General, Colonel, Major, etc. as well as most nobility titles outside of King/Queen and Lord/Lady. I think it sort of works out-of-the-box, but I was surprised to not see it explicit.

derek73 · 2021-03-19T17:57:04Z

There is a set of titles that when followed by a single name assume that name is a first name. (It looks like it's not exposed in the documentation though.):

python-nameparser/nameparser/config/titles.py

Line 4 in d498968

FIRST_NAME_TITLES = set([

All other titles are handled by the normal rest of the parser process, so assumed to be last names because there's more than one name part.

It currently includes King/queen but not Lady/Lord, maybe it should. Wikipedia page seems to make me think it could be either: https://en.wikipedia.org/wiki/Lady

akimotode · 2021-03-23T22:12:21Z

FYI, I fixed the issue now by manually checking and fixing the output after parsing for the known prefix cases I have in my data.

if human_name.first in ['de', 'st', 'st.', 'van']:
human_name.last = human_name.first + " " + human_name.last
human_name.first = ""

I think the default behaviour could (should?) be similar to the above. if the original is , the output should be last = + " " + instead of first = & last =

Thanks for the pointer on the FIRST_NAME_TITLES. Using it now.

patvdleer · 2022-01-30T20:50:51Z

I'm running into something similar with my name, Patrick van der Leer. in Dutch we call the "van der" part a tussenvoegsel. Even Patrick van Leer gives me "van Leer" as the surname/last_name and nothing for the middle name.

EDIT

python-nameparser/tests.py

Line 2069 in 8b73ff9

def test_multiple_prefixes(self):

This was not what I was expecting, "van der" would be part of the full surname/last name yes but I would set "van der" as a middle name or prefix of the surname

patvdleer mentioned this issue Jan 31, 2022

Tussenvoegsels / family name prefixes #130

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Processing "de Meunier" doesn't recognized the prefix #121

Processing "de Meunier" doesn't recognized the prefix #121

akimotode commented Mar 17, 2021

derek73 commented Mar 17, 2021

akimotode commented Mar 18, 2021

derek73 commented Mar 19, 2021

akimotode commented Mar 23, 2021

patvdleer commented Jan 30, 2022 •

edited

Loading

Processing "de Meunier" doesn't recognized the prefix #121

Processing "de Meunier" doesn't recognized the prefix #121

Comments

akimotode commented Mar 17, 2021

derek73 commented Mar 17, 2021

akimotode commented Mar 18, 2021

derek73 commented Mar 19, 2021

akimotode commented Mar 23, 2021

patvdleer commented Jan 30, 2022 • edited Loading

patvdleer commented Jan 30, 2022 •

edited

Loading