Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ubergraph filters #28

Open
1 task done
dosumis opened this issue Aug 3, 2023 · 7 comments
Open
1 task done

Ubergraph filters #28

dosumis opened this issue Aug 3, 2023 · 7 comments

Comments

@dosumis
Copy link
Contributor

dosumis commented Aug 3, 2023

STATUS: DRAFT

Problem

The results of direct queries of ubergraph redundant and non-redundant graphs often give suboptimal results for biologist-facing use cases. The redundant

Case 1: Most precise object term needed.

Query of object graph => object terms that are too abstract. Query of non-redundant graph fails to => any object term in cases where redundancy stripping assumes users will be able to infer from the properties of a more general class.

Examples:

cell_ontology GO
sensory epithelial cell biological_process
interneuron biological_process
motor neuron biological_process
sensory neuron biological_process
polymodal neuron biological_process
  • Querying the non-redundant graph gives too little, e.g.,GABAergic only links to GO BP on the most general grouping class.

https://api.triplydb.com/s/166HwhWEo

cell_ontology GO
GABAergic neuron gamma-aminobutyric acid secretion, neurotransmission

The redundant graph => 67 cell types

cell_ontology GO
basket cell gamma-aminobutyric acid secretion, neurotransmission
cerebellar Golgi cell gamma-aminobutyric acid secretion, neurotransmission
GABAergic neuron gamma-aminobutyric acid secretion, neurotransmission
Kolmer-Agduhr neuron gamma-aminobutyric acid secretion, neurotransmission
rosehip neuron gamma-aminobutyric acid secretion, neurotransmission
cerebral cortex GABAergic interneuron gamma-aminobutyric acid secretion, neurotransmission
GABAergic interneuron gamma-aminobutyric acid secretion, neurotransmission
...
  • What we want:
cell_ontololgy GO
fan Martinotti neuron biological_process
fan Martinotti neuron transmission of nerve impulse
fan Martinotti neuron secretion by cell
fan Martinotti neuron acid secretion
fan Martinotti neuron gamma-aminobutyric acid secretion, neurotransmission
fan Martinotti neuron secretion
fan Martinotti neuron transport
fan Martinotti neuron cellular process
fan Martinotti neuron biological regulation
fan Martinotti neuron regulation of neurotransmitter levels
fan Martinotti neuron system process
fan Martinotti neuron neurotransmitter transport
fan Martinotti neuron neurotransmitter secretion
fan Martinotti neuron signal release
fan Martinotti neuron multicellular organismal process
fan Martinotti neuron nervous system process
fan Martinotti neuron gamma-aminobutyric acid secretion
fan Martinotti neuron gamma-aminobutyric acid transport
fan Martinotti neuron localization
fan Martinotti neuron establishment of localization
fan Martinotti neuron regulation of biological quality
fan Martinotti neuron organic substance transport
fan Martinotti neuron export from cell
fan Martinotti neuron establishment of localization in cell
fan Martinotti neuron signal release from synapse

-->

cell_ontololgy GO
fan Martinotti neuron transmission of nerve impulse
f
fan Martinotti neuron gamma-aminobutyric acid secretion, neurotransmission
  • Proposed Solution
    For each subject: query for all subClassOf relationships between object terms. Filter out all triples from the original query where the term has subclasses according to this second query. However, this would require many secondary queries and so would be inefficient. Is there some clever way to do this in SPARQL with subqueries?

CASE2: Graph-view generation

Aim: simple redundancy stripping that does not assume users can deal with inheritance of properties down the class heirarchy.

{details and examples TBA}

@dosumis
Copy link
Contributor Author

dosumis commented Aug 3, 2023

@balhoff - wondering if there is a way to add a third graph with less redundancy stripping than the nonredundant graph that fulfills this use case. Also would like to consider OAK as the best home for methods to take advantage of this if we can't fix with additional graphs.

@balhoff
Copy link
Member

balhoff commented Aug 3, 2023

@dosumis what do you think about these query results? https://api.triplydb.com/s/MGJqUWP_6

@dosumis
Copy link
Contributor Author

dosumis commented Aug 3, 2023

Not quite. e.g. the first entry here is redundant with the second in both of these cases:

image image

Finding it hard to get my head around why this works as well as it does.

@balhoff
Copy link
Member

balhoff commented Aug 3, 2023

@dosumis I think this one more directly encodes what you're looking for, but sometimes NOT EXISTS is expensive. It seems to work in this case: https://api.triplydb.com/s/c5DmHCzmG

and I think it gives the right results for those two terms: https://api.triplydb.com/s/SinnoFp-e

@dosumis
Copy link
Contributor Author

dosumis commented Aug 4, 2023

That works & makes sense too. Thanks! (I think too slowly in SPARQL)

@balhoff
Copy link
Member

balhoff commented Aug 4, 2023

Do you want to try the query approach and hold off on thinking about new graphs for the moment?

@dosumis
Copy link
Contributor Author

dosumis commented Aug 6, 2023

I think the query approach makes the most sense & seems to scale well. We could always revisit the new graph approach in future if use-cases justify it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants