-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tests for English transcriptions with J and EE #26
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -33,17 +33,22 @@ module.exports = { | |
english: { | ||
|
||
// english (appropriate mode) (sorted) | ||
"Jack": "anga;quesse:a,tilde-below", | ||
"happy": "hyarmen;parma:a,tilde-below;long-carrier:y", | ||
"style": "silme;tinco;lambe:y,i-below", | ||
"yellow": "anna;lambe:e,tilde-below;vala:o", | ||
"phone": "formenparma;numen:o,i-below", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I’ve called this parma-extended. I like formenparma and alike better though. Would be grateful for an issue to track that. |
||
"cake": "quesse;quesse:a,i-below", | ||
"cakes": "quesse;quesse:a;silme-nuquerna:e", | ||
"cats.": "quesse;tinco:a,s-final;full-stop", // regression | ||
"green": "ungwe;romen;long-carrier:e;numen", | ||
"green": "ungwe;romen;short-carrier:e;numen:e", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are there any cases where we need to preserve the long carrier for English in the mode for general use? This will require changes in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't specifically need the long carrier at all for English. Tolkien sometimes used it for pronounced vowels at the ends of words, like the Y in "by" and "history" in the title page inscription. But, other times he didn't use a long carrier at all. So, for General Use, there is no pressing need for there to be a long carrier. I like to use it at the ends of words because I think it's pretty though. But, that's just my personal preference. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We might be able to preserve the long carrier for final Y æsthetic. Thank you for the details. |
||
"hobbits": "hyarmen;umbar:o,tilde-below;tinco:i,s-final", | ||
"hobbits'": "hyarmen;umbar:o,tilde-below;tinco:i,s-inverse", | ||
"hobbits''": "hyarmen;umbar:o,tilde-below;tinco:i,s-extended", | ||
"hobbits'''": "hyarmen;umbar:o,tilde-below;tinco:i,s-flourish", | ||
"there": "thule;romen:e,i-below", | ||
"there": "thule;ore:e,i-below", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you restate concisely the rule for romen vs ore? My understanding is that they are both mode and language dependent, and may even in this case have special consideration given that the following vowel is silent. It would be good to include test cases to exercise every variation. Otherwise, we might end up whacking moles. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When Tolkien used Rómen and Órë, he used them according to his dialect, which is an R-drop dialect. This means that the R was only retained when followed by a pronounced vowel ("there" would use Órë, but "therein" uses Rómen) but since the computer can't read pronunciation, I suggest if it is followed by a vowel (excluding silent E) that it automatically uses Rómen and uses Órë the rest of the time. This will get it close to being correct most of the time. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks again for the details. I agree we’d have a hard time capturing all the rules of the rhotic dialect. These examples are good. Let’s capture all of them in cases, including the ones where we’d need to use backtick or full word exception to force the right output, like |
||
"these": "thule;silme-nuquerna:e;short-carrier:i-below", | ||
"these'": "thule;silme-nuquerna:e;short-carrier:e", | ||
"these'": "thule;silme-nuquerna:e;short-carrier:e", // invalid input | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let’s add tests for words with a final "e" that would be the exceptions to the rule like "see" and "naïve". In these cases, we will need to use a notation to allow the user to override the default behavior, and maybe even hard-code as many known exceptions as possible. I’ll note that pattern matching vowel-consonant-vowel to distinguish the silent E case from other voiced final E cases may require extraordinary programming. Anything valid should be expressible, even if it requires manual intervention to express. That’s the current function of the apostrophe. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there anyways to make classifications for the letters so it can tell "symbol from this list is vowel, and symbol from this list is consonant" ? Or is this what you mean by "extraordinary programming"? I don't know enough about this sort of thing to be able to tell if this would be too complex or not. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Classifying vowels and consonants is easy, except for the cases like w and y where they’re ambiguous for the purposes of English rules. Matching patterns of three phonemes requires considerable additional work because it involves back-tracking and also taking into account consonant and vowel clusters (not a one character to one phoneme relation). Far easier to pass the burden of distinguishing silent and voiced final e to the user, just assuming e is unvoiced and ë or e` are voiced. As in the case of naïve, to infer that the final e is voiced from the diaeresis over the ï, we’d have to thread a hint forward, over the following consonant and into the code that matches final e. That is not so hard, but threading the additional state requires altering most of the function calls in the transcriber. |
||
"finwë": "formen;short-carrier:i;numen:w;short-carrier:e", | ||
"finwe": "formen;short-carrier:i;numen:w,i-below", // invalid input | ||
"helcaraxë": "hyarmen;lambe:e;quesse;romen:a;quesse:a,s;short-carrier:e", | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need to create mappings for y-above in
dan-smith.js
. The y in this notation currently refers to double dots below and can only be carried by the round carrier (which looks like a Latin "c").https://tengwarjs-fpft2n6u3.now.sh/#y
We can change the names in this notation as long as we are through. It may be better to make "y" imply "above", or make both "y-above" and "y-below" explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then for the Sindarin vowel Y, y-sindar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe the appearance of y-sindar? I’m only aware of two dots below and the inverted chapeau above.