The Contemporary Firm 550 By: Beatriz Guzman BIG DATA The Contemporary Firm 550 By: Beatriz Guzman
What is Big Data? Big data is defined as any kind of data source that has the following: Volume Velocity Variety Veracity Big data is important because it enables organizations to gather, store, manage, and manipulate vast amounts data at the right speed, at the right time, to gain the right insights
Where does data come from? Transaction Processing Systems: (loyalty cards, cash register) Enterprise Software: “Customer relationship management systems are often used to empower employees to track and record data at nearly every point of customer contact. Someone calls for a quote? Brings a return back to a store? Writes a complaint e-mail?” Surveys: “Sometimes firms supplement operational data with additional input from surveys and focus groups. Oftentimes, direct surveys can tell you what your cash register can’t. Zara store managers informally survey customers in order to help shape designs and product mix.” External Sources
Management Decisions: Once a team has its goals established, then they can address how to utilize their data to help accomplish them by asking themselves the following questions: Data sourcing Data quantity Data quality Data hosting Data governance
Data Challenges Businesses want to stay up to date with competitors in the market so attempt to capture as much client information as possible creating high volumes of data, but it is only useful if they are able to utilize it correctly. Some data is structured and stored in a traditional relational database, while other data, including documents, customer service records, and even pictures and videos, is unstructured. How should one organize this valuable information?
Cycle of Big Data Data must first be captured, and then organized and integrated. After this phase is successfully implemented, data can be analyzed based on the problem being addressed. Finally, management takes action based on the outcome of that analysis.
Business Processes Structured query language (SQL): It has evolved in lock step with RDBMS technology and is the most widely used mechanism for creating, querying, maintaining, and operating relational databases. Hadoop: “It is not a type of database, but rather a software ecosystem that allows for massively parallel computing. It is an enabler of certain types NoSQL distributed databases, which can allow for data to be spread across thousands of servers with little reduction in performance.”(Lo)
Business Process: Hadoop Hadoop Distribute File System HDFS works by breaking large files into smaller pieces called blocks. The blocks are stored on data nodes, and it is the responsibility of the NameNode to know what blocks on which data nodes make up the complete file. The Namenodes talk to the datanodes to know where they belong.
Business Processes: Hadoop Mapreduce HDFS and MapReduce perform their work on nodes in a cluster hosted on racks of commodity servers.
Real World Use The financial sector companies are using Hadoop to reduce risk, analyze fraud patterns, identify rogue traders, more precisely target their marketing campaigns based on customer segmentation, and improve customer satisfaction. Customer Segmentation Analysis Credit Risk Assessment
The future of big data is getting cloudy. “Businesses using data will see $430 billion in productivity benefits over their competition not using data by 2020, according to International Institute for Analytics. Data volumes will continue to grow. There’s absolutely no question that we will continue generating larger and larger volumes of data, especially considering that the number of handheld devices and Internet- connected devices is expected to grow exponentially. Big data will face huge challenges around privacy,especially with the new privacy regulation by the European Union. Companies will be forced to address the ‘elephant in the room’ around their privacy controls and procedures. Gartner predicts that by 2018, 50% of business ethics violations will be related to data.”(Forbes) The future of big data is getting cloudy.
Question 1 Which software programming language is the most common? SQL PaaS Hadoop
Question 2 Which of the following make up the new and 4th “V” in the Big Data description ? a) vivid b) veracity c) visible d) versify
Question 3 What process does Hadoop use to split the input date-set into categories? Mapreduce OLAP Matchreduce
Reference https://www.forbes.com/forbes/welcome/?toURL=https://www.forbes.com/sites/bernardmarr/2016/03/15/17-predictions-about-the-future-of-big-data-everyone-should-read/&refURL=https://www.google.com/&referrer=https://www.google.com/ http://eecs.wsu.edu/~yinghui/mat/courses/fall%202015/resources/Big%20data%20for%20dummies.pdf https://scholar.flatworldknowledge.com/books/30507/fwk-38086-ch10_s02/read https://www-01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=BK&infotype=PM&appname=SWGE_IM_EZ_USEN&htmlfid=IMM14164USEN&attachment=IMM14164USEN.PDF Banks can create a more meaningful and effective context for marketing to customers if they can define distinct categories, or “segments” in which each customer belongs. MapR Converged Data Platform to collect and analyze all of the data, such as daily transaction data, interaction data from multiple customer touchpoints (e.g., online, call centers), home value data, and merchant records. Banks can then analyze these data sets to group customers into one or more segments based on their needs in terms of banking products and services, and plan their sales, promotion and marketing campaigns accordingly. Due to the global financial crisis, there are now much more stringent rules for determining whether or not to give a customer a loan, so banks need more accurate ways to determine a person’s credit risk. A number of quantitative indicators are used for credit risk assessment and credit scoring. The MapR Converged Data Platform enables banks to pull in customer data on everything from deposit information to customer service emails to credit card purchase history in order to gain a holistic view of their customers. With the MapR Converged Data Platform, financial institutions now have the tools they need to construct an in-depth view of their customers so they can properly provide accurate credit scoring and analysis.