Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big Table Questions #30

Open
sr320 opened this issue Oct 3, 2016 · 12 comments
Open

Big Table Questions #30

sr320 opened this issue Oct 3, 2016 · 12 comments

Comments

@sr320
Copy link
Owner

sr320 commented Oct 3, 2016

Hi @mdelrio1

Thanks for updating the "Big Table"! I have a couple of questions, suggestions.

  1. Looks like annotation is duplicated? My pink on the far right is redundant with the columns with blue thin line? Could Blue line columns be removed.?
  2. I suggest removing columns with black lines.
  3. For expression I would only include columns you used, I believe unique not total?

bt

full-size snapshot

@mdelrio1
Copy link
Collaborator

mdelrio1 commented Oct 3, 2016

Hello Steven
@sr320
1, 2. Yes, they are redundant columns, I was checking this early today. I´ll remove the blue and black marked columns
3. yes, I only used unique, I´ll delete the total column too

Also, I was wondering whether we need to add to the table the data from the blast results
Column5= pident
Column6= length
Column7= mismatch
Column8= gapopen
Column9= start
Column10= qend
Column11= sstart
Column12= send
in order to change those sequences that need to be as reverse complementary and place it in the fasta file.
I have not been able to find these results in the repository. Only for the
paper-pano-go/data-results/Geoduck-transcriptome-v2-GO-Slim.csv
however, this file does not agree with
Geoduck-transcriptome_v3.fa
since there are some contigs which are not in both files (for instance comp100065_c0_seq1 is in v3, but not in v2-GO-Slim).
do you have it? please let me know. Thank you
Miguel

@mdelrio1
Copy link
Collaborator

mdelrio1 commented Oct 4, 2016

Hi @sr320
do you want me to delete the old big-table and replace it for the new one without the repeated columns and the other columns (black marked)?

@sr320
Copy link
Owner Author

sr320 commented Oct 4, 2016

Sure - you could just overwrite it. GitHub will keep the older versions.

I do not think sequences need to be / should be flipped - Though we can discuss.

I will be at the Hatchery all day tomorrow so cannot make the call. Should we try to reschedule?
Maybe 1pm Thursday?

@mdelrio1
Copy link
Collaborator

mdelrio1 commented Oct 5, 2016

Hi @sr320
I'm having a meeting in Thursday at 11:00, I hope it finishes before 1:00 pm, in case it doesn't could you please wait for me? Thanks
I'll update the files.

@mdelrio1
Copy link
Collaborator

mdelrio1 commented Oct 5, 2016

files are uptodate
paper-pano-go/jupyter-nbs/10panopeadataresults.ipynb [https://github.com/sr320/paper-pano-go/blob/master/jupyter-nbs/10panopeadataresults.ipynb]
and
paper-pano-go/data-results/Geoduck-transcriptome_v3_bigtable.csv.zip [https://github.com/sr320/paper-pano-go/blob/master/data-results/Geoduck-transcriptome_v3_bigtable.csv.zip]

@mdelrio1
Copy link
Collaborator

Hi Steven @sr320
I'm trying to add the matching files in order to calculate the GC and CpG considering whether the sequence is in match (5'-3') or in reverse complimentary (3'-5') but the file
paper-pano-go/jupyter-nbs/analyses/Geoduck-tranv3-blastx_sprot.sorted
(where all the blast results are) does not match the
paper-pano-go/data-results/Geoduck-transcriptome-v3.fa.zip
please let me know if you have the file (I don't understand what happened when we talked about ir, sorry)

@sr320
Copy link
Owner Author

sr320 commented Oct 12, 2016

@mdelrio1
I will take a look - I am at a conference today - (so cannot make the Skype call today - sorry I think I forgot to tell you before now).

Be in touch soon.
thanks

@mdelrio1
Copy link
Collaborator

@sr320
Don´t worry, let me know when you come back.
Take care

@sr320
Copy link
Owner Author

sr320 commented Oct 18, 2016

@mdelrio1 Can you clarify what you mean when you say?

paper-pano-go/jupyter-nbs/analyses/Geoduck-tranv3-blastx_sprot.sorted
(where all the blast results are) does not match the
paper-pano-go/data-results/Geoduck-transcriptome-v3.fa.zip

@mdelrio1
Copy link
Collaborator

Hi @sr320
Please check paper-pano-go/jupyter-nbs/10Panopea_databases.ipynb
https://github.com/sr320/paper-pano-go/blob/master/jupyter-nbs/10Panopea_databases.ipynb
I think I optimise some cells and finally got the merging working properly!
there are still some issues. I think we may reduce the number of columns and probably split the datafile into two:
a) contigs with general information and expression levels (sex included) all 153982
b) blastx, gigaton, ruphi, Dh, and sex expression values only 22974 contigs
this is in order to reduce the amount of empty cells.
Check In[54] where is shows all the column names.
I tried to reduce redundant columns but found that there are some missing data when I compared the columns with "UniProt_Acc","sseqid2", "SPID" data (last In, but clear it before uploading the file, sorry). I hope I can run it tomorrow morning and insert this information. Lets talk about it tomorrow.

@sr320
Copy link
Owner Author

sr320 commented Oct 26, 2016

Sounds good - but I guess I do not see an issue with empty cells.

@mdelrio1
Copy link
Collaborator

@sr320
the jupyter notebooks are ready
https://github.com/sr320/paper-pano-go/blob/master/jupyter-nbs/10Panopea_databases.ipynb
for the big and the small table.
The tables themselves are in the data-results file
https://github.com/sr320/paper-pano-go/tree/master/data-results
I couldn't upload file bigger than 25MB, so I had to compress the bigtable file.

The small table notebook is for processing small table data for the Venn diagrams in R and python
https://github.com/sr320/paper-pano-go/blob/master/jupyter-nbs/10Panopea_databases_smalltable.ipynb.

The Venn diagrams are in the manuscript figures file
https://github.com/sr320/paper-pano-go/tree/master/manuscript/figures

I hope, I'm not missing anything we talked about, let me know

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants