CS 257 Database Systems Dr. T Y Lin Ultimate Goal Data Science (Big Data)
CS 257- OverView CS257 and Big Data +: VLDB (Very Large Database ) +: Unstructured Data, i.e. Text/Web Image, Multimedia, Video, Vision Bio, Scientific Data Processing Light: Cloud Computing Light: Data Science /Knowledge Engineering etc
CS 257- OverView Major Applications in Big Data Medical Informatic VLDB + Image +Cloud + Security (CS286) Financial Informatic VLDB + BI + Cloud + Security (CS286) Web Engineering Business Intelligence(BI) Data Science (Knowledge Engineering in Web/Image/Bio/etc Data)
CS 257- OverView Instructor: IEEE Best Contribution Award in Data Mining (ICDM 2001) ACM/IEEE Best Service Award Web Intelligent (WI-2007) Best Contribution Award Rough Set (2005) Pioneer Award in Granular Computing (2008)
CS 257- OverView
6 Project Overview Verification and Validation of the Core Engine of a Concept Based Semantic Search Engine
7 Main Idea A set of documents is associated with a Matrix, called 1) Latent Semantic Index(LSI), by treating the row vectors as points in Euclidean space (point=TFIDF), - Google’s approach
8 Main Idea 2) Topological approach : A polyhedron (combinatorially, = a Simplicial Complex) is built to capture and structure the concepts
9 An open segment is a 1-simplex, an open triangle (faces) is a 2-simplex and an open tetrahedron is a 3-simplex, and... n-simplex.segmentfaces A collection of simlexes (satisfies closed condition) is called simplicial complex that is a combinatorial representation of a polyhedron that led to a “new” subject called algebraic topology. The project is algebraic topology based search engine.polyhedron