Speaker: Hyelim Choi Date: Feb 27, 2018

Slides:



Advertisements
Similar presentations
Big Data Management and Analytics Introduction Spring 2015 Dr. Latifur Khan 1.
Advertisements

3-1 Chapter 3 Data and Knowledge Management
Data Warehouse Toolkit Introduction. Data Warehouse Bill Inmon's paradigm: Data warehouse is one part of the overall business intelligence system. An.
CSSE 533 – Database Systems Week 1, Day 1 Steve Chenoweth CSSE Dept.
IT – DBMS Concepts Relational Database Theory.
Business Intelligence Zamaneh Jahed. What is Business Intelligence? Business Intelligence (BI) is a broad category of applications and technologies for.
Spatial Tajo Supporting Spatial Queries on Apache Tajo Slideshare Shorten URL : goo.gl/j0VLXpgoo.gl/j0VLXp.
Datawarehouse A sneak preview. 2 Data Warehouse Approach An old idea with a new interest: Cheap Computing Power Special Purpose Hardware New Data Structures.
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
1 Copyright © Oracle Corporation, All rights reserved. Business Intelligence and Data Warehousing.
BIG DATA. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database.
By: Haytham Abdel-Qader. Topics in Data Management include: I. Data analysis II. Database management system III. Data modeling IV. Database administration.
Bartek Doruch, Managing Partner, Kamil Karbowiak, Managing Partner, Using Power BI in a Corporate.
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Big Data-An Analysis. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult.
Supervisor : Prof . Abbdolahzadeh
Jaclyn Hansberry MIS2502: Data Analytics The Things You Can Do With Data The Information Architecture of an Organization Jaclyn.
Fundamental of Databases
Telling Stories with Data
5/9/2018 7:28 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS.
BigData - NoSQL Hadoop - Couchbase
Data Platform and Analytics Foundational Training
From DBA to DPA – Becoming a Data Platform Administrator
Introduction to Computing
BI tools: Excel’s Pivot table
Operational & Analytical Database
The Future of Accounting is here
ICT Database Lesson 1 What is a Database?.
Coding - The Ultimate Survival Skill
SocialBoards Self-Service, Multichannel Support Ticket Notifications in Microsoft Office 365 Groups Help Customer Care Teams to Provide Better Care OFFICE.
Information Systems in Organizations 2
Basic Concepts in Data Management
Case Study Modernizing an Operational Data Architecture
Azure SQL Database: A Guided Tour
Components of the Data Warehouse Michael A. Fudge, Jr.
Agolo Summarization Platform Integrates with Microsoft OneDrive to Relate Enterprise Cloud Documents with Real-Time News Summaries OFFICE 365 APP BUILDER.
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie
An Introduction to Data Warehousing
MIS2502: Data Analytics The Information Architecture of an Organization Aaron Zhi Cheng Acknowledgement:
XtremeData on the Microsoft Azure Cloud Platform:
THR1171 Azure Data Integration: Choosing between SSIS, Azure Data Factory, and Azure Databricks Cathrine Wilhelmsen, | cathrinew.net.
Business Intelligence
Analytics in the Cloud using Microsoft Azure
Business Intelligence
Data Warehouse.
Charles Tappert Seidenberg School of CSIS, Pace University
Deep Into the Cosmos(DB)
Database Management Systems
Data Warehousing Concepts
BI tools: Excel’s Pivot table
DATABASE TECHNOLOGIES
Chapter 3 Database Management
MIS2502: Data Analytics MySQL and MySQL Workbench
Big DATA.
Terms: Data: Database: Database Management System: INTRODUCTION
How To Load A Fact Table Really, Really Fast
CHAPTER 5 THE DATA RESOURCE
Data Wrangling for ETL enthusiasts
David Gilmore & Richard Blevins Senior Consultants April 17th, 2012
Data Warehouse and OLAP Technology
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
UNIT 6 RECENT TRENDS.
SQL Server 2019 Bringing Apache Spark to SQL Server
Visual Data Flows – Azure Data Factory v2
Visual Data Flows – Azure Data Factory v2
Architecture of modern data warehouse
Big Data.
Presentation transcript:

Speaker: Hyelim Choi Date: Feb 27, 2018 What I did at Work Speaker: Hyelim Choi Date: Feb 27, 2018 Good morning everyone, thank you for coming. Today, my topic is “What I did at Work”.

Career 2012.01 ~ 2017.02 SK holdings C&C IT Outsourcing, Application Development Target Industry: Telecommunications, Finance, Semiconductors… Key Topic: AI, Cloud, Smart Factory 2018.02 ~ NEMO Lab. Of Yonsei University I worked for SK holdings C&C from January 2012 to February 2017. The company provides IT services to its customers, such as IT outsourcing and application development. Major customer industries include telecommunications, finance, semiconductor, energy/chemical, and logistics. And the areas of interest these days are artificial intelligence, cloud services, and smart factories. And now, I entered yonsei university again for studding at the NEMO lab. So nice to meet you.

Work Experience 2012.04 ~ 2012.12 SKT Marketing System OLTP system of SKT for marketing and billing Development a part of settlement of mobile device installment bond Language & Tool: C, JS, Unix, Oracle 11g 2013.01 ~ 2015.10 SKT/B DBM OLAP System of SKT and SKB for data-driven decision support Operate and maintain data warehouse Data modeling, ETL Programming Language & Tool: SQL, Shell(Ksh), Unix, Sybase IQ 2016.11 ~ 2017.02 SKB MDW Media-specific data warehouse of SKB (e.g. IPTV viewing data) System maintenance project leader Requirement analysis, Hadoop Cluster maintenance, Data Consistency management Language & Tool: SQL, Shell(Bash), Linux(RHEL), Exadata, Hadoop ecosystem 1. My first work was improvement SK Telecom marketing system, named “Ukey”- a online transaction processing system essential to the businesses, such as subscription, billing, and termination of mobile phones. And I was in charge of development a part of settlement of mobile phone installment bond. It’s very difficult words – settlement, installment, bond.. Do you understand? When you buy a cell phone, you will not pay for the device at once. You will pay it monthly by including it in your cell phone bill. And telcos sell to the banks the rights to receive the money you need to pay. This is installment bond. (Bond: a certificate of debt that is issued by a government or corporation in order to raise money) I developed a batch process program that calculates and transfers value of bonds using C and Java script language based on Unix OS and Oracle DB. 2. My second work was to operate and maintain a database marketing system, called DBM. The system is for online analytical Processing (OLAP). OLAP is a system that enables end users to analyze data collected original data source(ODS), the term OLAP is recently used as OLAP tool. The OLAP tool makes it easy for anyone who is not familiar with the database query to analyze and visualize data, and I‘ll skip more details. I was in charged of analyzing customer requirement, modeling the database, and developing the ETL program in the system. I will explain what ETL is in the next slide. I developed the programs using SQL, Shell script based on Sybase IQ DB and Unix OS. 3. Up to this point, I have played the role of developer in the projects. Next, I moved other team, I get the chance to lead a whole project. The project was to manage a Hadoop based media-specific data warehouse of SK Broadband. It contained IPTV viewing, channel schedule, customer information and so on. Media data was so massive that grew about 10TB per week despite periodic erasure. So I ran a Hadoop cluster of about 60 servers to manage it, and when I left the company, our customer had a plan to add up to hundred. What I learned during this project is that many people including me think Hadoop is cheaper because it is an open source. but it is not. Although software licenses are free of charge, they are difficult to trouble-shoot, so it needs much time and personnel costs. But actually, it is very easy to scale-out. By the way, in this project, I was in charged of analyzing customer’s requirement, maintaining Hadoop cluster and managing data consistency. (Data consistency 데이터 정합성 혹은 일관성)

Key Concepts ODS : Original Data Source ETL: Extract, Transform, Load Structured / Unstructured Data Structured data: table Unstructured data: log, picture, video RDBMS / noSQL RDBMS: Oracle RDB, MySQL, SQL Server noSQL: HBase, Cassandra, Redis, MongoDB Dimension / Measure Dimension: sex, age group, rate plan, device model Measure: ARPU, viewing time, data usage Now, let me introduce about what data warehouse is. I don’t want to explain too difficult. If you think it is too detail, please tell me. Before get to the point, I have a few concepts to talk about. Maybe, you already heard about some of them before. First, ODS means original data source. It could be a OLTP system like banking system, and sensor’s data, files, streaming data and so on. Anyway, the every system that generate data could be ODS. Next, ETL means Extract, Transform, Load. ETL programs and tools extract data from ODS, transform the data, load data to database. Next, I talk about the types of data. Data could be structured or unstructured. Structured data is well-shaped data. It could be stored in a table like excel sheet. In contrast, Unstructured data is not storable in the table, such as social media log or pictures. Special techniques are needed to process such data. I don’t know either but I want to know it. Next, RDBMS or RDB means relational database management system like Oracle. RDB stores all data in a 2-dim table structure. And the relations are defined between each tables such as primary key, foreign key, and other constraints. This provides more efficient access to complex data. NoSQL is based on key-value structure. You don't need to have a complicated relationship with your data, and you can choose this structure to quickly access the data. The examples of noSQL database are Hbase, Cassandra, Redis, MongoDB and so on.

Data Warehouse Architecture Next, Let me introduce the data warehouse architecture. This is commonly used architecture in DW system whether handling big data or not. There is only a difference of what hardware or software is used at each point. For example, if Hadoop eco used in these points, it generally called big data system. This architecture is very simple. Look this figure. First, gather the data from ODS. It is E of ETL. It could be a structured or unstructured data. For this process, You need to use the proper technique and tools to extract data from ODS. Second, transform and load it, and it is T and L of ETL. In this step, You can use the tool of RDBMS or noSQL DB such as Oracle, Hive, Cassandra. Finally analyze the data. You can use OLAP tool in this step. If you know analytic language like SQL, R, Python, you can analyze the data directly. And recently, companies have been expecting these skills from analysts. Let me show an example of data. This data is downloaded from data.go.kr (open data portal) Look this sheet. it is If you get involved in the DW project, This and this area is the role of data architect and DBA. Their need some tools and policies to do these things, but you don't need to know detail at now. And this is data engineer’s area, this is business analyst’s area. Also, maybe you know what people call the person who carries out overall work. yes it is data scientist. I was mainly in charged of this area so far. But I want to expand my area to here. And that’s why I’m here now.

Thank you ! Q & A Thank you for listening. Does anyone have a question?