Presentation is loading. Please wait.

Presentation is loading. Please wait.

These slides are additional material for TIES4451 Data Mining Lecture 1 TIES445 Data mining Nov-Dec 2007 Sami Äyrämö.

Similar presentations


Presentation on theme: "These slides are additional material for TIES4451 Data Mining Lecture 1 TIES445 Data mining Nov-Dec 2007 Sami Äyrämö."— Presentation transcript:

1 These slides are additional material for TIES4451 Data Mining Lecture 1 TIES445 Data mining Nov-Dec 2007 Sami Äyrämö

2 These slides are additional material for TIES4452 Data Mining 12-14 lectures (on weeks 44-50) Mondays 12:15-14:00 Tuesdays 10:15-12:00 NOTE: No lectures on week 47 3 x 2h demonstrations (one weeks 48-50 in a computer classroom) Final exam in January 2008 3cr without seminar work 5cr with seminar work (will be held in January 2008)

3 These slides are additional material for TIES4453 About lectures The lectures are based on: Han and Kamber (based on Data Mining: Concepts and Techniques) http://www-faculty.cs.uiuc.edu/~hanj/bk2/slidesindex.html Tan, Steinbach and Kumar (based on Introduction to Data Mining) http://www-users.cs.umn.edu/~kumar/dmbook/index.php#item4 Some slides by the lecturer

4 These slides are additional material for TIES4454 Literature l P-N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining, Addison Wesley, 2005. l J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2005. l D. Hand, H. Mannila, and P. Smyth, Principles of Data Mining, MIT Press, 2001. l D. Pyle, Data Preparation for Data Mining, Morgan Kaufmann, 1999. l M. Berry, Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management, Wiley, 2004. l T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer-Verlag, 2001. l U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, Advances in Knowledge Discovery and Data Mining, MIT Press, 1996. l M.H. Dunham, Data Mining Introductory and Advanced Topics, Prentice Hall, 2003. l F. Witten, Data Mining: Practical Learning Tools and Techniques with Java Implementations, Morgan Kaufmann, 2000. l J.P. Bigus, Data Mining with Neural Networks, McGraw-Hill, 1996. l J-M- Adamo, Data Mining for Association Rules and Sequential Patterns: Sequential and Parallel Algorithms, Springer-Verlag, 2001. l H. Liu and H., Motoda, Feature Selection for Knowledge Discovery and Data mining, Kluwer, 1998.

5 These slides are additional material for TIES4455 Theses, publications etc. M. Pechenizkiy, Feature Extraction for Supervised Learning in Knowledge Discovery Systems, PhD thesis, University of Jyväskylä, 2005. S. Äyrämö, Knowledge Mining using Robust Clustering, PhD thesis, University of Jyväskylä, 2006. J. Mäkinen, Roskapostin älykäs suodattaminen, Pro gradu, Jyväskylän yliopisto, 2003. M. Nurminen, Tiedonlouhinta rakenteisista dokumenteista, Pro gradu, Jyväskylän yliopisto, 2005. K. Arkko, Assosiaatioiden ja sekvenssien louhinta suurista tietomassoista, Pro gradu, Jyväskylän yliopisto, 2006. J. Hänninen, Batch- ja online-hermoverkko-opetusalgoritmien ominaisuudet ja eroavaisuudet, Pro gradu, Jyväskylän yliopisto, 2006. Kärkkäinen, T., MLP-network in a layer-wise form with applications to weight decay. Neural Computing, 14 (6), 1451-1480, 2002. Kärkkäinen, T. & Heikkola, E., Robust Formulations for Training Multilayer Perceptrons. Neural Computation, 16 (4), 837-862, 2004. Kärkkäinen, T. and Äyrämö, S., Robust Clustering Methods for Incomplete and Erroneous Data, in Data Mining V: Data Mining, Text Mining and their Business Applications, 2004. Äyrämö, S., Kärkkäinen, T. & Majava, K., Robust refinement of initial prototypes for partitioning-based clustering algorithms. In C. Skiadas (Eds.), Recent Advances in Stochastic Modeling and Data Analysis, pp. 473-482, World Scientific, 2007....many more!

6 These slides are additional material for TIES4456 Journals, conferences,… l Journals –Data Mining and Knowledge Discovery, Springer –The Transactions on Knowledge Discovery from Data (TKDD), ACM –IEEE Transactions on Knowledge and Data Engineering, IEEE –SIGKDD Explorations –Statistical Analysis and Data Mining, Wiley –Data & Knowledge Engineering, Elsevier –Computational Statistics & Data Analysis, Elsevier l Conferences, seminars, workshops –ACM SIGKDD, PKDD, PAKDD, (IEEE) ICDM, SIAM data mining (SDM), DMIN,... –ICTAI, IJCAI, VLDB, ICDE, ICML, CVPR, MSR,...

7 These slides are additional material for TIES4457 Control data Process data Quality Feedback Customer Manager Operator Laborant Sample application

8 These slides are additional material for TIES4458 Real-world data set

9 These slides are additional material for TIES4459 Mining Large Data Sets - Motivation R. Grossman (2001):”During the next decade, the amount of data will continue to explode, while the number of scientists and engineers available to analyze it will remain essentially constant.” P.S. Bradley (2003) : “The ability of organizations to effectively utilize this information for decision support typically lags behind their ability to collect and store it. But, organizations that can leverage their data for decision support are more likely to have a competitive edge in their sector of the market.”

10 These slides are additional material for TIES44510 Knowledge Mining (KM) process

11 These slides are additional material for TIES44511 l Draws ideas from machine learning/AI, pattern recognition, statistics, and database systems Database systems Statistics/ Numerical optimization Origins of Data Mining Machine Learning/ Pattern Recognition/ Artificial Intelligence Data Mining Visualization

12 These slides are additional material for TIES44512 Major Issues and Challenges in DM/KDD l Mining methodology –Mining different kinds of knowledge from diverse data types, e.g., bio, stream, Web –Algorithmic requirements: Performance: efficiency, scalability, robustness, reliability –High dimensionality, complex and heterogeneous data –Pattern evaluation: the interestingness problem –Incorporation of background knowledge –Data quality: Handling noise and incomplete data (robustness, reliability) –Parallel, distributed and incremental mining methods –Integration of the discovered knowledge with existing one: knowledge fusion –Data Ownership and Distribution l User interaction –Expression and visualization of data mining results –Interactive mining of knowledge at multiple levels of abstraction l Applications and social impacts –Domain-specific data mining & invisible data mining –Protection of data security, integrity, and privacy


Download ppt "These slides are additional material for TIES4451 Data Mining Lecture 1 TIES445 Data mining Nov-Dec 2007 Sami Äyrämö."

Similar presentations


Ads by Google