Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterator query over network stuck with CRAM on FTP #1877

Open
rick-heig opened this issue Jan 22, 2025 · 0 comments
Open

Iterator query over network stuck with CRAM on FTP #1877

rick-heig opened this issue Jan 22, 2025 · 0 comments

Comments

@rick-heig
Copy link

rick-heig commented Jan 22, 2025

Hello,
I am accessing CRAM files over the network and sometimes sam_itr_querys gets stuck indefinitely (while still downloading data).

I have tested HTSLIB 1.16 and 1.21 (git checkout the tag) and get the same behaviour.

This may be related with issue : #604


I open my files the following way and iterate on regions with sam_itr_query() :

        htsFile *fp = hts_open(cram_file.c_str(), "r");
        if (!fp) {
            std::string error("Cannot open ");
            error += cram_file;
            throw DataCallerError(error);
        }
        hts_idx_t *idx = sam_index_load(fp, std::string(cram_file + ".crai").c_str());
        if (!idx) {
            throw DataCallerError(std::string("Failed to load index file"));
        }
        sam_hdr_t * hdrhdr = sam_hdr_read(fp);
        if (!hdr) {
            std::string error("Failed to read header from file ");
            error += cram_file;
            throw DataCallerError(error);
        }

        hts_itr_t *iter;
        while(...) { /* Iterate over many regions */
            if (iter) {
                 sam_itr_destroy(iter);
                 iter = NULL;
            }
            hts_itr_t *iter = sam_itr_querys(idx, hdr, region.c_str());
            ... do some work, e.g., pile up of reads ...
        }
       

Sometimes it works well and I can access the CRAM file data and sometimes it gets stuck and executes indefinitely. When I check network activity it downloads data continuously. If I rerun, normally the query returns quickly and downloads only little data.

When I interrupt my program I get the following backtrace :

  * frame #0: 0x00007ff80ba8dd1a libsystem_kernel.dylib`__select + 10
    frame #1: 0x000000010010430c phase_caller`wait_perform(fp=0x000000010124c340) at hfile_libcurl.c:729:17 [opt]
    frame #2: 0x0000000100105710 phase_caller`libcurl_read(fpv=0x000000010124c340, bufferv=0x0000000102809000, nbytes=<unavailable>) at hfile_libcurl.c:834:17 [opt]
    frame #3: 0x0000000100049d86 phase_caller`refill_buffer(fp=0x000000010124c340) at hfile.c:186:13 [opt]
    frame #4: 0x000000010004a0ee phase_caller`hread2(fp=<unavailable>, destv=0x0000700007d75960, nbytes=43, nread=65493) at hfile.c:339:23 [opt]
    frame #5: 0x00000001000ca179 phase_caller`cram_seek [inlined] hread(fp=0x000000010124c340, buffer=0x0000700007d75960, nbytes=65536) at hfile.h:244:56 [opt]
    frame #6: 0x00000001000ca127 phase_caller`cram_seek(fd=<unavailable>, offset=11493247130, whence=<unavailable>) at cram_io.c:5453:20 [opt]
    frame #7: 0x00000001000bea42 phase_caller`cram_seek_to_refpos(fd=0x00000001003af000, r=0x0000700007d85af8) at cram_index.c:583:22 [opt]
    frame #8: 0x00000001000cabd4 phase_caller`cram_set_voption(fd=0x00000001003af000, opt=<unavailable>, args=0x0000700007d85ac0) at cram_io.c:5815:17 [opt]
    frame #9: 0x00000001000ca789 phase_caller`cram_set_option(fd=<unavailable>, opt=<unavailable>) at cram_io.c:5703:9 [opt]
    frame #10: 0x0000000100063b94 phase_caller`cram_itr_query(idx=0x000000010124c9d0, tid=16, beg=<unavailable>, end=248678, readrec=<unavailable>) at sam.c:1696:19 [opt]
    frame #11: 0x0000000100057b9e phase_caller`hts_itr_querys(idx=0x000000010124c9d0, reg="chr17:248676-248678", getid=(phase_caller`bam_name2id at sam.h:780), hdr=0x000000010124ce30, itr_query=(phase_caller`cram_itr_query at sam.c:1681), readrec=<unavailable>) at hts.c:4161:12 [opt]
    frame #12: 0x0000000100063d21 phase_caller`sam_itr_querys(idx=<unavailable>, hdr=<unavailable>, region=<unavailable>) at sam.c:1757:12 [opt] [artificial]

I tested with the following CRAM file :

ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR323/ERR3239334/NA12878.final.cram
ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR323/ERR3239334/NA12878.final.cram.crai

I have managed to execute a few thousand of queries and sometimes after a few it gets stuck.

If you have any insights what to look for I can try some debugging.
Thanks.
Rick

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant