Skip to content

Commit

Permalink
Parse IRI links (#57)
Browse files Browse the repository at this point in the history
* upgrade rust toolchain to 1.77.2

* don't parse internal markdown links. see #66 in the repo for details

* move parenthesis, bracket and angle parsing into dedicated function

* fix parenthesis in target of labeled link

---------

Co-authored-by: Simon Laux <[email protected]>
  • Loading branch information
farooqkz and Simon-Laux authored May 7, 2024
1 parent 2d03478 commit bbf5061
Show file tree
Hide file tree
Showing 33 changed files with 2,206 additions and 872 deletions.
8 changes: 4 additions & 4 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ jobs:
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: 1.64.0
toolchain: 1.77.2
override: true
- run: rustup component add rustfmt
- uses: actions-rs/cargo@v1
Expand All @@ -31,7 +31,7 @@ jobs:
- uses: actions/checkout@v2
- uses: actions-rs/toolchain@v1
with:
toolchain: 1.64.0
toolchain: 1.77.2
components: clippy
override: true
- uses: actions-rs/clippy-check@v1
Expand Down Expand Up @@ -68,9 +68,9 @@ jobs:
matrix:
include:
- os: ubuntu-latest
rust: 1.64.0
rust: 1.77.2
- os: windows-latest
rust: 1.64.0
rust: 1.77.2
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@master
Expand Down
20 changes: 20 additions & 0 deletions benches/moar_links.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
Let's add some more links just for testing and benching:

these are some IPv6 links:

gopher://[::1]/
https://[::1]/سلام
https://[2345:0425:2CA1:0000:0000:0567:5673:23b5]/hello_world
https://[2345:425:2CA1:0:0:0567:5673:23b5]/hello_world

an IPvfuture link:
ftp://mrchickenkiller@[vA.A]/var/log/boot.log

some normal links:

https://www.ietf.org/rfc/rfc3987.txt
https://iamb.chat/messages/index.html
https://github.com/deltachat/message-parser/issues/67
https://far.chickenkiller.com
gopher://republic.circumlunar.space
https://far.chickenkiller.com/religion/a-god-who-does-not-care/
9 changes: 8 additions & 1 deletion benches/my_benchmark.rs
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
use criterion::{black_box, criterion_group, criterion_main, Criterion};
use deltachat_message_parser::parser::{parse_desktop_set, parse_markdown_text, parse_only_text};
use deltachat_message_parser::parser::{
parse_desktop_set, parse_markdown_text, parse_only_text, LinkDestination,
};

pub fn criterion_benchmark(c: &mut Criterion) {
let testdata = include_str!("testdata.md");
let lorem_ipsum_txt = include_str!("lorem_ipsum.txt");
let r10s_update_message = include_str!("r10s_update_message.txt");
let links = include_str!("moar_links.txt");

c.bench_function("only_text_lorem_ipsum.txt", |b| {
b.iter(|| parse_only_text(black_box(lorem_ipsum_txt)))
Expand Down Expand Up @@ -35,6 +38,10 @@ pub fn criterion_benchmark(c: &mut Criterion) {
c.bench_function("markdown_r10s_update_message.txt", |b| {
b.iter(|| parse_markdown_text(black_box(r10s_update_message)))
});

c.bench_function("parse_link_moar_links.txt", |b| {
b.iter(|| LinkDestination::parse(black_box(links)))
});
}

criterion_group!(benches, criterion_benchmark);
Expand Down
760 changes: 760 additions & 0 deletions benches/testdata.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion message_parser_wasm/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ pub fn parse_text(s: &str, enable_markdown: bool) -> JsValue {
serde_wasm_bindgen::to_value(&ast).expect("Element converts to JsValue")
}

/// parses text to json AST (text elements and labled links, to replicate current desktop implementation)
/// parses text to json AST (text elements and labeled links, to replicate current desktop implementation)
#[wasm_bindgen]
pub fn parse_desktop_set(s: &str) -> JsValue {
serde_wasm_bindgen::to_value(&deltachat_message_parser::parser::parse_desktop_set(s))
Expand Down
2 changes: 1 addition & 1 deletion rust-toolchain
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.64.0
1.77.2
1 change: 1 addition & 0 deletions spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ Make email addresses clickable, opens the chat with that contact and creates it
Make URLs clickable.

- detect all valid hyperlink URLs that have the `://` (protocol://host).
- according to [RFC3987](https://www.rfc-editor.org/rfc/rfc3987) and [RFC3988](https://www.rfc-editor.org/rfc/rfc3988)

- other links like `mailto:` (note there is just a single `:`, no `://`) will get separate parsing that includes a whitelisted protocol name, otherwise there will likely be unexpected behavior if user types `hello:world` - will be recognized as link.

Expand Down
2 changes: 1 addition & 1 deletion src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
clippy::get_last_with_len,
clippy::get_unwrap,
clippy::get_unwrap,
clippy::integer_arithmetic,
clippy::arithmetic_side_effects,
clippy::match_on_vec_items,
clippy::match_wild_err_arm,
clippy::missing_panics_doc,
Expand Down
Loading

0 comments on commit bbf5061

Please sign in to comment.