Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lung Cancer Project Data Management. Requirements Manage/Produce data for analysis 1.For every family, establish a valid structure 2.Genotype data transformation.

Similar presentations


Presentation on theme: "Lung Cancer Project Data Management. Requirements Manage/Produce data for analysis 1.For every family, establish a valid structure 2.Genotype data transformation."— Presentation transcript:

1 Lung Cancer Project Data Management

2 Requirements Manage/Produce data for analysis 1.For every family, establish a valid structure 2.Genotype data transformation 3.Phenotype data transformation Store the data –Store simply? –Store effectively (for retrieving and processing, and maybe for space) Others –Tracking anything Why?

3 Project dataflow Family data are collected at different sites on different levels/concerns –Demographic –Medical –Samples –Samples are genotyped at different sites Data are sent to us, abstracted (HIPPA?) –A huge bulk –A few records Sites then use the transformations for further analysis

4 Data Management Issues/Concerns Technical issues –User validation, Normalization, security, safety, … Business issues –Inconsistencies, Duplications, … User (usually oneself) friendly issues –Effective, reusable approaches Data processing: quality

5 Family Structure Establishment Definition: establish a family (in linkage format) such that the relationships among its members are valid (at least for common used linkage tools) 3 rd party tools to help: MAKEPED, PEDCHECK, RELATIVE Modify relationships within a family to eliminate problem (and meet linkage analysis related restrictions) Common actions: trimming, adding place holders, reassigning parents, …

6 Phenotype Data Preparation Definition: for every person in a given family structure, get all the predefined attributes. A relatively stable format (~50 variables) Many fields are derived –Age, hasLungCancer, hasAnyCancer, Square of (py- 22.9)/10, … –Our API: isSib, getYoungest, getAverageSibAge, … Transformation logic is totally implemented on the database server side –Benefits: real time data, at network speed –Code maintenance?

7 Genotype Data Preparation Definition: –for every person in a given family structure, and for every mark on an interested chromosome, process lab data and then align the data (in a format that most linkage tools will accept). –Meanwhile in the whole process, quality of lab data are checked (and dropped if necessary) –Heavy involvement of authorities Tools: binner, downcoder, Pedcheck, Makeped, Relative, (and for every tool we might need: input formatter and output interpreter).

8 Genotype Data Preparation continued Pedcheck Binner LabData Families + Markers ErrorInterpreter Relative DownCoder Rn Other Formatter


Download ppt "Lung Cancer Project Data Management. Requirements Manage/Produce data for analysis 1.For every family, establish a valid structure 2.Genotype data transformation."

Similar presentations


Ads by Google