Download presentation
Presentation is loading. Please wait.
Published byLesley Anderson Modified over 9 years ago
1
Graph Data Management Lab, School of Computer Science Email: luyiqi@gmail.com GDM@FUDAN gdm.fudan.edu.cn Luyiqi gdm@fudan Locus based alignment storage & XMLSnippet
2
Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 2 Summary of works Locus based storage: A Novel Approach for Alignments Output Storage Problem Facing Clinical Scenarios XMLSnippet
3
Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 3 Background A sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of relationships between the sequences. Data storage costs have become an appreciable proportion of total cost the rate of increase in DNA sequencing is significantly outstripping the rate of increase in disk storage capacity.
4
Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 4 Related works Reference-based compression Markus Hsi-Yang Fritz et al., 2011 SAM/BAM toolset
5
Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 5 Basic Idea Locus based storage
6
Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 6 Basic idea Objectives Minimize the number of the intervals by renumbering the reads
7
Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 7 Basic Idea Solution 1.Reduce to Travelling Salesman Problem(TSP) to get the suboptimal reads numbering function. 2.Generate the intervals under the new NF. 3.Further optimizing
8
Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 8 Basic Idea Solution 1.Reduce to Travelling Salesman Problem(TSP) to get the suboptimal reads numbering function. 2.Generate the intervals under the new NF. 3.Further optimizing
9
Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 9 Travelling Salesman Problem
10
Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 10 Reduction
11
Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 11 Example r001 r006 r005 r002 r003 r004
12
Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 12 Example (Re-numbering) r001 r006 r005 r002 r003 r004
13
Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 13 Basic Idea Solution 1.Reduce to Travelling Salesman Problem(TSP) to get the suboptimal reads numbering function. 2.Generate the intervals under the new NF. 3.Further optimizing
14
Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 14 Optimization Technique
15
Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 15 Optimization Technique Ordered Intervals Refinement (current) T[5,6],10 C4, [7,8] G[2,3],9,11 A1,[12,15] 1. Determine the ACGT value order T[5,6],10 C[4,8] G[2,11] A[1,15] 2. Refine the intervals 3. How to restore?
16
Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 16 Optimization Technique restore temp T[5,6],10T C[4,8] G[2,11] A[1,15] C4, [7,8] G[2,3],9,11 A1,[12,15] [5,6],10 [4,8],10 [2,11] real_now = org_now – temp; temp = temp + org_now; OriginReal OK! Optimization Technique
17
Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 17 Binary Format and Index Inverted Table GZIP
18
Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 18 Summary of works Locus based storage: A Novel Approach for Alignments Output Storage Problem Facing Clinical Scenarios XMLSnippet
19
Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 19 Motivation of XMLSnippet Framework Based Application is very popular in current commercial application building environment. The most leading Framework based application is J2EE applications. The framework is varies and each one requires certain learning curve for fresh man. Most of these frameworks are open source from open communities The document may not be complete. The programmer may not have enough time to command all the detail.
20
Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 20 XMLSnippet – Related work Non-mining based compare the context of the code under development with the code samples in the example repositories interacts with a code search engine to gather relevant code samples and performs static analysis over the gathered samples Mining based Mine sequence association rules Predefined directly generating sub elements/attributes of a certain element based on the predefined schema of the XML files
21
Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 21 Our solution Basic idea
22
Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 22 XMLSnippet – basic idea
23
Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 23 XMLSnippet - Framework
24
Graph Data Management Lab, School of Computer Science GDM@FUDAN Email: luyiqi@gmail.com gdm.fudan.edu.cn 24 XMLSnippet – Key techniques Closed frequent tree pattern mining XML tree pattern & syntax tree pattern mapping
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.