Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for English transcriptions with J and EE #26

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions test/general-use.js
Original file line number Diff line number Diff line change
Expand Up @@ -33,17 +33,22 @@ module.exports = {
english: {

// english (appropriate mode) (sorted)
"Jack": "anga;quesse:a,tilde-below",
"happy": "hyarmen;parma:a,tilde-below;long-carrier:y",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to create mappings for y-above in dan-smith.js. The y in this notation currently refers to double dots below and can only be carried by the round carrier (which looks like a Latin "c").

https://tengwarjs-fpft2n6u3.now.sh/#y

We can change the names in this notation as long as we are through. It may be better to make "y" imply "above", or make both "y-above" and "y-below" explicit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then for the Sindarin vowel Y, y-sindar?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you describe the appearance of y-sindar? I’m only aware of two dots below and the inverted chapeau above.

"style": "silme;tinco;lambe:y,i-below",
"yellow": "anna;lambe:e,tilde-below;vala:o",
"phone": "formenparma;numen:o,i-below",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve called this parma-extended. I like formenparma and alike better though. Would be grateful for an issue to track that.

"cake": "quesse;quesse:a,i-below",
"cakes": "quesse;quesse:a;silme-nuquerna:e",
"cats.": "quesse;tinco:a,s-final;full-stop", // regression
"green": "ungwe;romen;long-carrier:e;numen",
"green": "ungwe;romen;short-carrier:e;numen:e",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any cases where we need to preserve the long carrier for English in the mode for general use? This will require changes in general-use.js.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't specifically need the long carrier at all for English. Tolkien sometimes used it for pronounced vowels at the ends of words, like the Y in "by" and "history" in the title page inscription. But, other times he didn't use a long carrier at all. So, for General Use, there is no pressing need for there to be a long carrier.

I like to use it at the ends of words because I think it's pretty though. But, that's just my personal preference.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be able to preserve the long carrier for final Y æsthetic. Thank you for the details.

"hobbits": "hyarmen;umbar:o,tilde-below;tinco:i,s-final",
"hobbits'": "hyarmen;umbar:o,tilde-below;tinco:i,s-inverse",
"hobbits''": "hyarmen;umbar:o,tilde-below;tinco:i,s-extended",
"hobbits'''": "hyarmen;umbar:o,tilde-below;tinco:i,s-flourish",
"there": "thule;romen:e,i-below",
"there": "thule;ore:e,i-below",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you restate concisely the rule for romen vs ore? My understanding is that they are both mode and language dependent, and may even in this case have special consideration given that the following vowel is silent. It would be good to include test cases to exercise every variation. Otherwise, we might end up whacking moles.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When Tolkien used Rómen and Órë, he used them according to his dialect, which is an R-drop dialect. This means that the R was only retained when followed by a pronounced vowel ("there" would use Órë, but "therein" uses Rómen) but since the computer can't read pronunciation, I suggest if it is followed by a vowel (excluding silent E) that it automatically uses Rómen and uses Órë the rest of the time. This will get it close to being correct most of the time.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for the details. I agree we’d have a hard time capturing all the rules of the rhotic dialect.

These examples are good. Let’s capture all of them in cases, including the ones where we’d need to use backtick or full word exception to force the right output, like "ther`e`in".

"these": "thule;silme-nuquerna:e;short-carrier:i-below",
"these'": "thule;silme-nuquerna:e;short-carrier:e",
"these'": "thule;silme-nuquerna:e;short-carrier:e", // invalid input
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s add tests for words with a final "e" that would be the exceptions to the rule like "see" and "naïve". In these cases, we will need to use a notation to allow the user to override the default behavior, and maybe even hard-code as many known exceptions as possible.

I’ll note that pattern matching vowel-consonant-vowel to distinguish the silent E case from other voiced final E cases may require extraordinary programming.

Anything valid should be expressible, even if it requires manual intervention to express. That’s the current function of the apostrophe.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anyways to make classifications for the letters so it can tell "symbol from this list is vowel, and symbol from this list is consonant" ? Or is this what you mean by "extraordinary programming"? I don't know enough about this sort of thing to be able to tell if this would be too complex or not.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Classifying vowels and consonants is easy, except for the cases like w and y where they’re ambiguous for the purposes of English rules.

Matching patterns of three phonemes requires considerable additional work because it involves back-tracking and also taking into account consonant and vowel clusters (not a one character to one phoneme relation). Far easier to pass the burden of distinguishing silent and voiced final e to the user, just assuming e is unvoiced and ë or e` are voiced.

As in the case of naïve, to infer that the final e is voiced from the diaeresis over the ï, we’d have to thread a hint forward, over the following consonant and into the code that matches final e. That is not so hard, but threading the additional state requires altering most of the function calls in the transcriber.

"finwë": "formen;short-carrier:i;numen:w;short-carrier:e",
"finwe": "formen;short-carrier:i;numen:w,i-below", // invalid input
"helcaraxë": "hyarmen;lambe:e;quesse;romen:a;quesse:a,s;short-carrier:e",
Expand Down