Speaker: Hyelim Choi Date: Feb 27, 2018 What I did at Work Speaker: Hyelim Choi Date: Feb 27, 2018 Good morning everyone, thank you for coming. Today, my topic is “What I did at Work”.
Career 2012.01 ~ 2017.02 SK holdings C&C IT Outsourcing, Application Development Target Industry: Telecommunications, Finance, Semiconductors… Key Topic: AI, Cloud, Smart Factory 2018.02 ~ NEMO Lab. Of Yonsei University I worked for SK holdings C&C from January 2012 to February 2017. The company provides IT services to its customers, such as IT outsourcing and application development. Major customer industries include telecommunications, finance, semiconductor, energy/chemical, and logistics. And the areas of interest these days are artificial intelligence, cloud services, and smart factories. And now, I entered yonsei university again for studding at the NEMO lab. So nice to meet you.
Work Experience 2012.04 ~ 2012.12 SKT Marketing System OLTP system of SKT for marketing and billing Development a part of settlement of mobile device installment bond Language & Tool: C, JS, Unix, Oracle 11g 2013.01 ~ 2015.10 SKT/B DBM OLAP System of SKT and SKB for data-driven decision support Operate and maintain data warehouse Data modeling, ETL Programming Language & Tool: SQL, Shell(Ksh), Unix, Sybase IQ 2016.11 ~ 2017.02 SKB MDW Media-specific data warehouse of SKB (e.g. IPTV viewing data) System maintenance project leader Requirement analysis, Hadoop Cluster maintenance, Data Consistency management Language & Tool: SQL, Shell(Bash), Linux(RHEL), Exadata, Hadoop ecosystem 1. My first work was improvement SK Telecom marketing system, named “Ukey”- a online transaction processing system essential to the businesses, such as subscription, billing, and termination of mobile phones. And I was in charge of development a part of settlement of mobile phone installment bond. It’s very difficult words – settlement, installment, bond.. Do you understand? When you buy a cell phone, you will not pay for the device at once. You will pay it monthly by including it in your cell phone bill. And telcos sell to the banks the rights to receive the money you need to pay. This is installment bond. (Bond: a certificate of debt that is issued by a government or corporation in order to raise money) I developed a batch process program that calculates and transfers value of bonds using C and Java script language based on Unix OS and Oracle DB. 2. My second work was to operate and maintain a database marketing system, called DBM. The system is for online analytical Processing (OLAP). OLAP is a system that enables end users to analyze data collected original data source(ODS), the term OLAP is recently used as OLAP tool. The OLAP tool makes it easy for anyone who is not familiar with the database query to analyze and visualize data, and I‘ll skip more details. I was in charged of analyzing customer requirement, modeling the database, and developing the ETL program in the system. I will explain what ETL is in the next slide. I developed the programs using SQL, Shell script based on Sybase IQ DB and Unix OS. 3. Up to this point, I have played the role of developer in the projects. Next, I moved other team, I get the chance to lead a whole project. The project was to manage a Hadoop based media-specific data warehouse of SK Broadband. It contained IPTV viewing, channel schedule, customer information and so on. Media data was so massive that grew about 10TB per week despite periodic erasure. So I ran a Hadoop cluster of about 60 servers to manage it, and when I left the company, our customer had a plan to add up to hundred. What I learned during this project is that many people including me think Hadoop is cheaper because it is an open source. but it is not. Although software licenses are free of charge, they are difficult to trouble-shoot, so it needs much time and personnel costs. But actually, it is very easy to scale-out. By the way, in this project, I was in charged of analyzing customer’s requirement, maintaining Hadoop cluster and managing data consistency. (Data consistency 데이터 정합성 혹은 일관성)
Key Concepts ODS : Original Data Source ETL: Extract, Transform, Load Structured / Unstructured Data Structured data: table Unstructured data: log, picture, video RDBMS / noSQL RDBMS: Oracle RDB, MySQL, SQL Server noSQL: HBase, Cassandra, Redis, MongoDB Dimension / Measure Dimension: sex, age group, rate plan, device model Measure: ARPU, viewing time, data usage Now, let me introduce about what data warehouse is. I don’t want to explain too difficult. If you think it is too detail, please tell me. Before get to the point, I have a few concepts to talk about. Maybe, you already heard about some of them before. First, ODS means original data source. It could be a OLTP system like banking system, and sensor’s data, files, streaming data and so on. Anyway, the every system that generate data could be ODS. Next, ETL means Extract, Transform, Load. ETL programs and tools extract data from ODS, transform the data, load data to database. Next, I talk about the types of data. Data could be structured or unstructured. Structured data is well-shaped data. It could be stored in a table like excel sheet. In contrast, Unstructured data is not storable in the table, such as social media log or pictures. Special techniques are needed to process such data. I don’t know either but I want to know it. Next, RDBMS or RDB means relational database management system like Oracle. RDB stores all data in a 2-dim table structure. And the relations are defined between each tables such as primary key, foreign key, and other constraints. This provides more efficient access to complex data. NoSQL is based on key-value structure. You don't need to have a complicated relationship with your data, and you can choose this structure to quickly access the data. The examples of noSQL database are Hbase, Cassandra, Redis, MongoDB and so on.
Data Warehouse Architecture Next, Let me introduce the data warehouse architecture. This is commonly used architecture in DW system whether handling big data or not. There is only a difference of what hardware or software is used at each point. For example, if Hadoop eco used in these points, it generally called big data system. This architecture is very simple. Look this figure. First, gather the data from ODS. It is E of ETL. It could be a structured or unstructured data. For this process, You need to use the proper technique and tools to extract data from ODS. Second, transform and load it, and it is T and L of ETL. In this step, You can use the tool of RDBMS or noSQL DB such as Oracle, Hive, Cassandra. Finally analyze the data. You can use OLAP tool in this step. If you know analytic language like SQL, R, Python, you can analyze the data directly. And recently, companies have been expecting these skills from analysts. Let me show an example of data. This data is downloaded from data.go.kr (open data portal) Look this sheet. it is If you get involved in the DW project, This and this area is the role of data architect and DBA. Their need some tools and policies to do these things, but you don't need to know detail at now. And this is data engineer’s area, this is business analyst’s area. Also, maybe you know what people call the person who carries out overall work. yes it is data scientist. I was mainly in charged of this area so far. But I want to expand my area to here. And that’s why I’m here now.
Thank you ! Q & A Thank you for listening. Does anyone have a question?