-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider names followed by a period as titles or suffixes #109
Comments
I'm working with version nameparser==1.0.6 |
I think the problem with these is that they are all titles that could also be first names, possibly with the exception of "Major". We do have the abbreviation "maj" but not the full title. Ok, just did a quick search and it seems there are about 50-500 people born per year named "Major". That's surprising. Seems it became more popular in 2016. And there are women named "Major" too? wow. Ok, I guess. :) The parser avoids including titles that could also be first names in its set of titles, because the title check happens first and that would mean that names like "Dean", although they are common titles, would require them to always be parsed as titles and fail in the much more common case of Dean as a first name. So I think these are the kind of edge cases that there's no way to always be right with a simple rule-based approach. It's probably more upsetting for those people named "Major" to always have the parser get their name wrong than for those with the title Major to have it accidentally think that's their first name? At least the later error is a more understandable one for a computer to make? And there are other ways less ambiguous ways of formatting "Major" as title where the parser would interpret it correctly. It is possible to adjust the parser so that all of these names would always be counted as titles. As long as Dona isn't upset by that. :) I was going to close this as won't fix, but I guess in these examples all of these names are followed by a period. At least in these examples, we could take that as a clue and count them as titles. I wonder if we could do that as a rule, any name part that is followed by a period must be some kind of title or suffix, as long as it's longer than 1 character? |
Thanks for reviewing these cases. Your insight about the period is interesting, particularly for the format: last_name, title. first_name middle_initial_or_nameThis is the style of the list I'm looking at. I decided to see if there were any style guides that I could find that included the "." period after Major. In a brief look, I did not find any style guides that suggest using Major. (with the period). Most academic / literary citation formats seem to prefer leaving out titles. Looking at the modern US army style guides for things like wedding invitations if the word is fully spelled out they do not seem to put a period. It is not clear how last_name first "guest lists" should work. For my little project, I can just change Major. to Maj. and things will work with this library. Thank you for a super helpful library. |
There are a couple of somewhat obscure Titles in the Titanic data set that Name Parser does not get right by itself
In these names.
Love the work that NameParser does. Passing along the issue base on your request for feedback.
The text was updated successfully, but these errors were encountered: