Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processing "de Meunier" doesn't recognized the prefix #121

Open
akimotode opened this issue Mar 17, 2021 · 5 comments
Open

Processing "de Meunier" doesn't recognized the prefix #121

akimotode opened this issue Mar 17, 2021 · 5 comments

Comments

@akimotode
Copy link

I'm not sure if I am missing something, but if I run the parser on the string "de Mesnil", I am expecting it to give me either a first or a last name of "de Mesnil" (preferably the latter), given that "de" is a known prefix.

Instead I am getting a first name "de" and a last name "Mesnil".

That seems contradictory to the documentation for prefixes: Name pieces that appear before a last name. Prefixes join to the piece that follows them to make one new piece.

@derek73
Copy link
Owner

derek73 commented Mar 17, 2021

I have not looked into the code to see for sure, but I believe the parser treats prefixes as a name piece instead of a prefix when there are only 2 total space-separated strings. This would be helpful for people who have first names that clash with prefixes, I'd have to look if there are any specific examples in the tests.

I'm curious your use case. Do you have examples in your data that occasionally include only last names and you want the parser to tell you that it is indeed a last name?

@akimotode
Copy link
Author

Thanks for the quick answer!

I believe my use is what you imply. In this specific case, the text includes three versions of the "name" in different places: "Sergeant de Mesnil", "Walter de Mesnil" and "de Mesnil".

After adding "sergeant" as a custom title I get three different parsings:
"Sergeant de Mesnil" --> {title: "sergeant", last_name: "de Mesnil"}
"Walter de Mesnil" ---> {first_name: "Walter", last_name: "de Mesnil"}
"de Mesnil" ---> {first_name: "de", last_name: "Mesnil"}

On a side-note: It would be neat if there was an explicit LAST_NAME_TITLE option for titles. This would be handy for military titles like General, Colonel, Major, etc. as well as most nobility titles outside of King/Queen and Lord/Lady. I think it sort of works out-of-the-box, but I was surprised to not see it explicit.

@derek73
Copy link
Owner

derek73 commented Mar 19, 2021

There is a set of titles that when followed by a single name assume that name is a first name. (It looks like it's not exposed in the documentation though.):

FIRST_NAME_TITLES = set([

All other titles are handled by the normal rest of the parser process, so assumed to be last names because there's more than one name part.

It currently includes King/queen but not Lady/Lord, maybe it should. Wikipedia page seems to make me think it could be either: https://en.wikipedia.org/wiki/Lady

@akimotode
Copy link
Author

FYI, I fixed the issue now by manually checking and fixing the output after parsing for the known prefix cases I have in my data.

if human_name.first in ['de', 'st', 'st.', 'van']:
human_name.last = human_name.first + " " + human_name.last
human_name.first = ""

I think the default behaviour could (should?) be similar to the above. if the original is , the output should be last = + " " + instead of first = & last =

Thanks for the pointer on the FIRST_NAME_TITLES. Using it now.

@patvdleer
Copy link

patvdleer commented Jan 30, 2022

I'm running into something similar with my name, Patrick van der Leer. in Dutch we call the "van der" part a tussenvoegsel. Even Patrick van Leer gives me "van Leer" as the surname/last_name and nothing for the middle name.


EDIT

def test_multiple_prefixes(self):

This was not what I was expecting, "van der" would be part of the full surname/last name yes but I would set "van der" as a middle name or prefix of the surname

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants