Download presentation
Presentation is loading. Please wait.
Published byCamilla Moore Modified over 8 years ago
1
داده كاوي و كاربرد آن در پزشكي بنام خدا نام دانشجو : بابك رزاقي شماره دانشجويي : 85233510 استاد راهنما : جناب آقاي دكتر توحيد خواه ( سمينار درس كاربرد فناوري اطلاعات در پزشكي )
2
Necessity is mother of invention Huge amounts of data Electronic records of our decisions Choices in the supermarket Financial records Our comings and goings We swipe our way through the world – every swipe is a record in a database Data rich – but information poor Lying hidden in all this data is information! 2
3
3 Extracting or “mining” knowledge from large amounts of data Data -driven discovery and modeling of hidden patterns in large volumes of data Extraction of implicit, previously unknown and unexpected, potentially extremely useful information from data
4
4 Large database Data mining Data visualization Ways of seeing patterns in large data sets Uses the efficiency of human pattern recognition
5
5 Gold Mining Knowledge mining from databases Knowledge extraction Data/pattern analysis Knowledge Discovery Databases or KDD
6
6 ____ __ __ Transformed Data Patterns and Rules Target Data Raw Data Knowledge Data Mining Transformation Interpretation & Evaluation Selection & Cleaning Integration Understanding Knowledge Discovery Process DATA Ware house Knowledge
7
7 Find true patterns and avoid overfitting (false patterns due to randomness)
8
8 Classification: predicting an item class Clustering: finding clusters in data Associations: e.g. A & B & C occur frequently Visualization: to facilitate human discovery Summarization: describing a group Estimation: predicting a continuous value Deviation Detection: finding changes Link Analysis: finding relationships
9
9 Computationally expensive to investigate all possibilities Dealing with noise/missing information and errors in data Choosing appropriate attributes/input representation Finding the minimal attribute space Finding adequate evaluation function(s) Extracting meaningful information Not over fitting
10
10 INSIGHTFUL MINER Angoss Knowledge ACCESS ARMiner Eudaptics Viscovery Goal TV MDR Viscovery SOMine SPSS
11
11 Science: Chemistry, Physics Bioscience Sequence-based analysis Protein structure and function prediction Protein family classification Microarray gene expression Financial Industry - banks, businesses, e-commerce Stock and investment analysis Pharmaceutical companies Health care Sports and Entertainment
12
Clinical Data Mining processes Digital format for all pertinent data Create structure Obtain coded information Natural language understanding Create a widely accessible repository 12
13
13
14
Minimum systolic blood pressure over a 24-hour period following admission to the hospital Class 2: Early death Age of Patient Class 1: Survivors Was there sinus tachycardia? Class 1: Survivors Class 2: Early death <= 91 > 91 <=62.5 >62.5 YES NO 14
15
15
16
16 An organism’s genome is the “program” for making the organism, encoded in DNA Human DNA has about 30-35,000 genes A gene is a segment of DNA that specifies how to make a protein Cells are different because of differential gene expression About 40% of human genes are expressed at one time Microarray devices measure gene expression
17
17 Gene Value D26528_at 193 D26561_cds1_at -70 D26561_cds2_at 144 D26561_cds3_at 33 D26579_at 318 D26598_at 1764 D26599_at 1537 D26600_at 1204 D28114_at 707 Scanner enlarged section of raw image raw data
18
18 New and better molecular diagnostics New molecular targets for therapy few new drugs, large pipeline, … Outcome depends on genetic signature best treatment? Fundamental Biological Discovery finding and refining biological pathways Personalized medicine ?!
19
19 Avoiding false positives, due to too few records (samples), usually < 100 too many columns (genes), usually > 1,000 Model needs to be robust in presence of noise For reliability need large gene sets; for diagnostics or drug targets, need small gene sets Estimate class probability Model needs to be explainable to biologists
20
20
21
21
22
22
23
23
24
24
25
25
26
26
27
27
28
28
29
29 Discover useful relationships in data Discover information otherwise overlooked Provide intelligence to improve various phases Intellectual property Competitive advantages: Getting more out of your data Finding other relevant information faster Exploratory, hypothesis-generating analyses Increase productivity – reduced amount of time and money
30
30
31
31 Thank You All razaghi.b@gmail.com
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.