Presentation is loading. Please wait.

Presentation is loading. Please wait.

داده كاوي و كاربرد آن در پزشكي بنام خدا نام دانشجو : بابك رزاقي شماره دانشجويي : 85233510 استاد راهنما : جناب آقاي دكتر توحيد خواه ( سمينار درس كاربرد.

Similar presentations


Presentation on theme: "داده كاوي و كاربرد آن در پزشكي بنام خدا نام دانشجو : بابك رزاقي شماره دانشجويي : 85233510 استاد راهنما : جناب آقاي دكتر توحيد خواه ( سمينار درس كاربرد."— Presentation transcript:

1 داده كاوي و كاربرد آن در پزشكي بنام خدا نام دانشجو : بابك رزاقي شماره دانشجويي : 85233510 استاد راهنما : جناب آقاي دكتر توحيد خواه ( سمينار درس كاربرد فناوري اطلاعات در پزشكي )

2  Necessity is mother of invention  Huge amounts of data  Electronic records of our decisions  Choices in the supermarket  Financial records  Our comings and goings  We swipe our way through the world – every swipe is a record in a database  Data rich – but information poor  Lying hidden in all this data is information! 2

3 3  Extracting or “mining” knowledge from large amounts of data  Data -driven discovery and modeling of hidden patterns in large volumes of data  Extraction of implicit, previously unknown and unexpected, potentially extremely useful information from data

4 4 Large database Data mining Data visualization  Ways of seeing patterns in large data sets  Uses the efficiency of human pattern recognition

5 5  Gold Mining  Knowledge mining from databases  Knowledge extraction  Data/pattern analysis  Knowledge Discovery Databases or KDD

6 6 ____ __ __ Transformed Data Patterns and Rules Target Data Raw Data Knowledge Data Mining Transformation Interpretation & Evaluation Selection & Cleaning Integration Understanding Knowledge Discovery Process DATA Ware house Knowledge

7 7 Find true patterns and avoid overfitting (false patterns due to randomness)

8 8  Classification: predicting an item class  Clustering: finding clusters in data  Associations: e.g. A & B & C occur frequently  Visualization: to facilitate human discovery  Summarization: describing a group  Estimation: predicting a continuous value  Deviation Detection: finding changes  Link Analysis: finding relationships

9 9  Computationally expensive to investigate all possibilities  Dealing with noise/missing information and errors in data  Choosing appropriate attributes/input representation  Finding the minimal attribute space  Finding adequate evaluation function(s)  Extracting meaningful information  Not over fitting

10 10 INSIGHTFUL MINER Angoss Knowledge ACCESS ARMiner Eudaptics Viscovery Goal TV MDR Viscovery SOMine SPSS

11 11  Science: Chemistry, Physics  Bioscience  Sequence-based analysis  Protein structure and function prediction  Protein family classification  Microarray gene expression  Financial Industry - banks, businesses, e-commerce  Stock and investment analysis  Pharmaceutical companies  Health care  Sports and Entertainment

12 Clinical Data Mining processes  Digital format for all pertinent data  Create structure  Obtain coded information  Natural language understanding  Create a widely accessible repository 12

13 13

14 Minimum systolic blood pressure over a 24-hour period following admission to the hospital Class 2: Early death Age of Patient Class 1: Survivors Was there sinus tachycardia? Class 1: Survivors Class 2: Early death <= 91 > 91 <=62.5 >62.5 YES NO 14

15 15

16 16  An organism’s genome is the “program” for making the organism, encoded in DNA  Human DNA has about 30-35,000 genes  A gene is a segment of DNA that specifies how to make a protein  Cells are different because of differential gene expression  About 40% of human genes are expressed at one time  Microarray devices measure gene expression

17 17 Gene Value D26528_at 193 D26561_cds1_at -70 D26561_cds2_at 144 D26561_cds3_at 33 D26579_at 318 D26598_at 1764 D26599_at 1537 D26600_at 1204 D28114_at 707 Scanner enlarged section of raw image raw data

18 18  New and better molecular diagnostics  New molecular targets for therapy  few new drugs, large pipeline, …  Outcome depends on genetic signature  best treatment?  Fundamental Biological Discovery  finding and refining biological pathways  Personalized medicine ?!

19 19  Avoiding false positives, due to  too few records (samples), usually < 100  too many columns (genes), usually > 1,000  Model needs to be robust in presence of noise  For reliability need large gene sets; for diagnostics or drug targets, need small gene sets  Estimate class probability  Model needs to be explainable to biologists

20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29  Discover useful relationships in data  Discover information otherwise overlooked  Provide intelligence to improve various phases  Intellectual property  Competitive advantages:  Getting more out of your data  Finding other relevant information faster  Exploratory, hypothesis-generating analyses  Increase productivity – reduced amount of time and money

30 30

31 31 Thank You All razaghi.b@gmail.com


Download ppt "داده كاوي و كاربرد آن در پزشكي بنام خدا نام دانشجو : بابك رزاقي شماره دانشجويي : 85233510 استاد راهنما : جناب آقاي دكتر توحيد خواه ( سمينار درس كاربرد."

Similar presentations


Ads by Google