Presentation is loading. Please wait.

Presentation is loading. Please wait.

Big Data Analytics Carlos Ordonez. Big Data Analytics research Input? BIG DATA (large data sets, large files, many documents, many tables, fast growing)

Similar presentations


Presentation on theme: "Big Data Analytics Carlos Ordonez. Big Data Analytics research Input? BIG DATA (large data sets, large files, many documents, many tables, fast growing)"— Presentation transcript:

1 Big Data Analytics Carlos Ordonez

2 Big Data Analytics research Input? BIG DATA (large data sets, large files, many documents, many tables, fast growing) How? Fast external algorithms; memory-efficient data structures at two storage levels; parallel: multi-threaded or multi-node Efficiency goal: linear time O(n) and linear speedup Hardware? single node or parallel cluster Infrastructure? parallel file system; any large files Challenging: Theory+programming in action

3 Systems research today Transaction processing? Main memory, lock-free Efficient analysis? Optimal joins, compiled queries, streams, exploit ample RAM, explout multi-core Compiler versus interpreter? Massive storage? Posix, HDFS Fast external algorithms? Simple tasks. Parallel computation? Multi-core with threads, Shared- nothing, message-passing Exploiting new hardware? Difficult/customized Analyzing: queries, cubes, statistics. Machine learning Hot today: Information integration (database+files)

4 DB Systems involves Core CS research: Theory+Programming Theory we use: –Time complexity (big O()) and I/O cost (disk, solid state memory) –Data structures (trees, hash tables, linked lists) –Relational model and information retrieval models –Multivariate statistics, machine learning, discrete mathematics, linear algebra –Compilers and programming languages: parsing/compiling/optimizing code; recursion Programming: –Languages: mostly C++, but also R, SQL, Java –Unix, but we have a lot of past work on MS Windows –Systems: Threads, binary I/O, parallel file systems, code generation, code optimization, interpreter runtime

5 Sample of target problems Business Intelligence: cubes, lattices Big Data summarization: vector outer products Bayesian statistics: MCMC, classification, regression, variable/feature selection Graph transitive closure and linear recursion

6 Why join the DBMS group? Just came back from ATT Labs (formerly the famous ATT Bell Labs)..my head is spinning with C++ 14 and Unix commands. Currently programming with my PhD students. Balance between theory (mathematics) and programming (C++) Mature and stable CS research area Job prospect upon graduation is excellent. Great opportunity to join industrial labs. Visit my web page, DBLP. Google “Ordonez SQL”, stop by on my office hours


Download ppt "Big Data Analytics Carlos Ordonez. Big Data Analytics research Input? BIG DATA (large data sets, large files, many documents, many tables, fast growing)"

Similar presentations


Ads by Google