Presentation is loading. Please wait.

Presentation is loading. Please wait.

Graph Data Management Lab, School of Computer Science gdm.fudan.edu.cn Luyiqi Locus based alignment storage.

Similar presentations


Presentation on theme: "Graph Data Management Lab, School of Computer Science gdm.fudan.edu.cn Luyiqi Locus based alignment storage."— Presentation transcript:

1 Graph Data Management Lab, School of Computer Science Email: luyiqi@gmail.com GDM@FUDAN gdm.fudan.edu.cn Luyiqi gdm@fudan Locus based alignment storage & XMLSnippet

2 Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 2 Summary of works  Locus based storage: A Novel Approach for Alignments Output Storage Problem Facing Clinical Scenarios  XMLSnippet

3 Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 3 Background  A sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of relationships between the sequences.  Data storage costs have become an appreciable proportion of total cost  the rate of increase in DNA sequencing is significantly outstripping the rate of increase in disk storage capacity.

4 Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 4 Related works  Reference-based compression Markus Hsi-Yang Fritz et al., 2011  SAM/BAM toolset

5 Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 5 Basic Idea  Locus based storage

6 Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 6 Basic idea  Objectives Minimize the number of the intervals by renumbering the reads

7 Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 7 Basic Idea  Solution 1.Reduce to Travelling Salesman Problem(TSP) to get the suboptimal reads numbering function. 2.Generate the intervals under the new NF. 3.Further optimizing

8 Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 8 Basic Idea  Solution 1.Reduce to Travelling Salesman Problem(TSP) to get the suboptimal reads numbering function. 2.Generate the intervals under the new NF. 3.Further optimizing

9 Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 9 Travelling Salesman Problem

10 Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 10 Reduction

11 Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 11 Example r001 r006 r005 r002 r003 r004

12 Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 12 Example (Re-numbering) r001 r006 r005 r002 r003 r004

13 Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 13 Basic Idea  Solution 1.Reduce to Travelling Salesman Problem(TSP) to get the suboptimal reads numbering function. 2.Generate the intervals under the new NF. 3.Further optimizing

14 Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 14 Optimization Technique

15 Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 15 Optimization Technique  Ordered Intervals Refinement (current) T[5,6],10 C4, [7,8] G[2,3],9,11 A1,[12,15] 1. Determine the ACGT value order T[5,6],10 C[4,8] G[2,11] A[1,15] 2. Refine the intervals 3. How to restore?

16 Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 16 Optimization Technique  restore temp T[5,6],10T C[4,8] G[2,11] A[1,15] C4, [7,8] G[2,3],9,11 A1,[12,15] [5,6],10 [4,8],10 [2,11] real_now = org_now – temp; temp = temp + org_now; OriginReal OK! Optimization Technique

17 Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 17 Binary Format and Index  Inverted Table  GZIP

18 Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 18 Summary of works  Locus based storage: A Novel Approach for Alignments Output Storage Problem Facing Clinical Scenarios  XMLSnippet

19 Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 19 Motivation of XMLSnippet  Framework Based Application is very popular in current commercial application building environment. The most leading Framework based application is J2EE applications.  The framework is varies and each one requires certain learning curve for fresh man.  Most of these frameworks are open source from open communities The document may not be complete. The programmer may not have enough time to command all the detail.

20 Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 20 XMLSnippet – Related work  Non-mining based compare the context of the code under development with the code samples in the example repositories interacts with a code search engine to gather relevant code samples and performs static analysis over the gathered samples  Mining based Mine sequence association rules  Predefined directly generating sub elements/attributes of a certain element based on the predefined schema of the XML files

21 Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 21 Our solution  Basic idea

22 Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 22 XMLSnippet – basic idea

23 Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 23 XMLSnippet - Framework

24 Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 24 XMLSnippet – Key techniques  Closed frequent tree pattern mining  XML tree pattern & syntax tree pattern mapping


Download ppt "Graph Data Management Lab, School of Computer Science gdm.fudan.edu.cn Luyiqi Locus based alignment storage."

Similar presentations


Ads by Google