Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Struct unmarshmaling and more #29

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Struct unmarshmaling and more #29

wants to merge 6 commits into from

Conversation

buger
Copy link
Owner

@buger buger commented Mar 29, 2016

This is ongoing PR but i wanted to show what i'm working on right now.

There is 2 major changes here:

  • Support for querying multiple keys at once
  • Struct unmarshmaling, same way as encoding/json works (use feature above)

I added new API function KeyOffset which returns offsets in base data structure for all keys at once, so it can do less reads.

Here is example taken from benchmark:

r := smallFixture
offsets := jsonparser.KeyOffsets(r,
    []string{"uuid"},
    []string{"tz"},
    []string{"ua"},
    []string{"st"},
)

jsonparser.Get(r[offsets[0]:])
jsonparser.GetInt(r[offsets[1]:])
jsonparser.Get(r[offsets[2]:])
jsonparser.GetInt(r[offsets[3]:])

As you can see from UX perspective it looks way worse, but it can give significant speedup, see benchmarks below. As future improvement i wanted implement iterator like access, similar to ArrayEach which reduce number of reads even more.

Based on feature above, i implemented basic version of Unmarshaler, you can check the code and bechmarks (looks not bad, considering that it does not use code generation). As unique feature i plan add support for nested keys, so you can specify them like:

struct Person {
    githubFollowers int `json:"person.github.followers_count"`
    ...
}

Here is initial benchmarks:

BenchmarkJsonParserLarge              100000         75014 ns/op          32 B/op          2 allocs/op
BenchmarkJsonParserLargeOffsets       100000         81327 ns/op         528 B/op         33 allocs/op
BenchmarkJsonParserMedium             500000         15577 ns/op          48 B/op          3 allocs/op
BenchmarkJsonParserMediumOffsets     1000000          7757 ns/op          80 B/op          4 allocs/op
BenchmarkJsonParserSmall             5000000          1289 ns/op           0 B/op          0 allocs/op
BenchmarkJsonParserSmallOffsets     10000000          1065 ns/op          32 B/op          1 allocs/op
BenchmarkJsonParserSmallStruct       2000000          3280 ns/op         384 B/op         11 allocs/op

KeyOffsets can be optimized much more in my view.

@buger
Copy link
Owner Author

buger commented Mar 29, 2016

@daboyuka
Copy link
Contributor

This looks really cool. Thanks for keeping the offset-computing code separate so as to not slow down the simpler searchKeys case (this is nice for me since I'm not sure the multi-key search will be needed for my use case).

@daboyuka
Copy link
Contributor

What if you made a wrapper around the core functions for a more user-friendly experience? E.g.:

lookup := NewMultiKeyLookup(data)
refs := lookup.MultiGet(
    []string{"a"},
    []string{"b"},
    []string{"c"},
)

val1 := lookup.Get(ref[0])
val2 := lookup.GetInt(ref[1])
val3 := lookup.Get(ref[2])

The refs returned by MultiGet would probably just be the offset to the value, but could be typedefed to be something meaningful (valueref or something).

The idea is to hide the need for slicing the data and calling Get() for each offset lookup inside the API.

Just thinking out loud, not sure this is actually a good idea :).

@buger
Copy link
Owner Author

buger commented Mar 29, 2016

Yeah, this could be a good idea, and we can hide caching inside.

@buger
Copy link
Owner Author

buger commented Mar 29, 2016

Well actually syntax above is not supported, as Go do not support variable list of returned arguments

@daboyuka
Copy link
Contributor

Doh, you're right, I guess I meant refs := and then refs[0], etc. Edited.

@Allendar
Copy link

Allendar commented Jun 10, 2016

I want to test this soon with some big data-sets and will supply the results here. Just by mere conversion from ffjson for inspections on JSON on multiple of my API's I'm seeing very reliable 9,21 (1k entries) to 9,38 (100k entries) times more speed and way lower footprint. Can't wait to try this out next! Thanks so much for this amazing contribution!

Update: Maybe there's also focus points here. These are the hotspots on my pprof:

(pprof) top5
1060ms of 1260ms total (84.13%)
Showing top 5 nodes out of 48 (cum >= 110ms)
      flat  flat%   sum%        cum   cum%
     470ms 37.30% 37.30%      470ms 37.30%  github.com/buger/jsonparser.stringEnd
     210ms 16.67% 53.97%      210ms 16.67%  runtime.usleep
     150ms 11.90% 65.87%      460ms 36.51%  github.com/buger/jsonparser.blockEnd
     140ms 11.11% 76.98%      320ms 25.40%  github.com/buger/jsonparser.searchKeys

@buger
Copy link
Owner Author

buger commented Jun 11, 2016

@Allendar also check this change that was merged yesterday #45

Also, I'll be frank, this PR is not ready yet and supports only simple structs.

@ryanleary
Copy link

@buger Is work continuing on this effort?

@Villenny
Copy link
Contributor

Doh, you're right, I guess I meant refs := and then refs[0], etc. Edited.

Doh, you're right, I guess I meant refs := and then refs[0], etc. Edited.

Have you considered some variation on:

type ValueOffset int
type ValueOffsets []ValueOffset 

lookup := jsonparser.ParseKeys(make([]ValueOffset, 3)[:],
    []string{"a"},
    []string{"b"},
    []string{"c"},
)

val1 := lookup.Get(0)
val2 := lookup.GetInt(1)
val3 := lookup.GetUnsafeString(2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants