SEMINAR IN DATABASE SYSTEMS 236826 Oded Shmueli Computer Science Dept. Technion April 2016
Coordinates 236826: Seminar in Databases Prof. Oded Shmueli oshmu@cs.technion.ac.il Tel. 4280 Office Hours: Wednesday, 16:00-17:00 Seminar time: Wed. 10:30-12:30 Spring semester dates: 20/3/17-4/7/17
Topics The seminar deals with recent developments in database systems in the context of cloud-based systems The focus will be on developments in concurrency control, recovery and query processing in the context of modern distributed databases and cloud-based systems
Background: Cloud Storage and Computation Outline Background: Cloud Storage and Computation The Basic Idea Public, Private, Hybrid Clouds Challenges: security, pricing, availability, reliability, efficiency Example: Amazon’s AWS: EC2, S3, EBS, CloudFront. Our focus: data and databases on the cloud Seminar Goals Rules of the game
Cloud Storage and Computation The Basic Idea Pay as you use instead of maintain enterprise-based computing and storage View computing and storage as utilities Elasticity: adapt resource consumption to business needs Economies of scale allows up-to-date, available, reliable services with less HR Need technology: virtualization of machines, storage and network resources
The Space Map – Cloud Times, http://cloudtimes.org/2011/11/30/top-paas-saas-and-iaas-cloud-companies-by-cloudtimes/
Main Players Amazon Microsoft Google Salesforce IBM
Cloud Storage and Computation Public, Private, Hybrid Clouds A public cloud is shared resources among many organizations Private cloud usually operates on the premises of a single organization, may use the same technologies as a public cloud In a hybrid cloud both private and public clouds are used by an organization
Reminder: Storage Sizes Decimal Value Metric 1000 kB kilobyte 10002 MB megabyte 10003 GB gigabyte 10004 TB terabyte 10005 PB petabyte 10006 EB exabyte 10007 ZB zettabyte 10008 YB yottabyte “The sheer amount of data being created by digital systems is growing in almost incomprehensible proportions. In the past two years, more data has been created than in all previous history. Today, more than 4.4 zettabytes of data exist but by 2020, that will grow tenfold to 44 zettabytes, according to the IDC Digital Universe Study.”
Example: Amazon’s EC2 Elastic Compute Cloud, a few zones Enterprise software types: WebLogic server, DB2… Xen hypervisor for running machine instances Various machine images: Linux, Windows. Various characteristics: speed, main memory, disk space Services: queues, notification, cloud watch, load balancing Storage attached to machine instances: fast, expensive Pricing models: On-Demand Instance, Reserved Instance, Spot Instance We plan to see it in action Amazon Web Services (AWS), a subsidiary of Amazon.com,[2] offers a suite of cloud computing services that make up an on-demand computing platform. These services operate from 16 geographical regions across the world.[3] They include Amazon Elastic Compute Cloud, also known as "EC2", and Amazon Simple Storage Service, also known as "S3". As of 2016 AWS has more than 70 services, spanning a wide range, including compute, storage, networking, database, analytics, application services, deployment, management, mobile, developer tools and tools for the Internet of things. Amazon markets AWS as a service to provide large computing capacity quicker and cheaper than a client company building an actual physical server farm.[4] Reserved Instances Reserved Instances are a billing discount and capacity reservation that is applied to instances to lower hourly running costs. A Reserved Instance is not a physical instance. The discounted usage price is fixed for as long as you own the Reserved Instance, allowing you to predict compute costs over the term of the reservation. If you are expecting consistent, heavy, use, Reserved Instances can provide substantial savings over owning your own hardware or running only On-Demand instances. For more information, see Choosing a Reserved Instance Payment Option. When you purchase a Reserved Instance, the reservation is automatically applied to running instances that match your specified parameters. Alternatively, you can launch an On-Demand EC2 instance with the same configuration as the purchased reserved capacity. No Upfront and Partial Upfront Reserved Instances are billed for usage on an hourly basis, regardless of whether or not they are being used. All Upfront Reserved Instances have no additional hourly charges. Reserved Instances do not renew automatically; you can continue using the EC2 instance without interruption, but you will be charged On-Demand rates. New Reserved Instances can have the same parameters as the expired ones, or you can purchase Reserved Instances with different parameters. You can use Auto Scaling or other AWS services to launch the On-Demand instances that use your Reserved Instance benefits. For information about launching On-Demand instances, see Launch Your Instance. For information about launching instances using Auto Scaling, see the Auto Scaling Developer Guide. For product pricing information, see the following pages: AWS Service Pricing Overview Amazon EC2 On-Demand Instances Pricing Amazon EC2 Reserved Instance Pricing For information about the Reserved Instance pricing tiers, see Understanding Reserved Instance Discount Pricing Tiers. Note
Regions and Availability Zones (Azs) See: https://blog.rackspace.com/aws-201-understanding-the-default-virtual-private-cloud?_ga=1.95528028.1105833043.1475352311 AZ: a logical data center in a Region available for use by any AWS customer Each AZ has redundant and separate power, networking and connectivity Each AZ zone is backed by one or more physical data centers, possibly 5
Amazon’s AWS Storage: Temporal, S3, EBS, CloudFront AMI instance storage is temporal S3 - Simple Storage System, persistent Object size up to 5GB Flat storage Web API Cheap, slow, reliable, availability issues EBS – Elastic Block Store, persistent High operational performance, available, more expensive Volumes with blocks, up to 1TB, initially unformatted Appear as devices attached to AMIs, supports duplication CloudFront – CDN (Content Delivery Network) Primarily for serving files, pages and objects to many clients in various zones Reserved Instances Reserved Instances are a billing discount and capacity reservation that is applied to instances to lower hourly running costs. A Reserved Instance is not a physical instance. The discounted usage price is fixed for as long as you own the Reserved Instance, allowing you to predict compute costs over the term of the reservation. If you are expecting consistent, heavy, use, Reserved Instances can provide substantial savings over owning your own hardware or running only On-Demand instances. For more information, see Choosing a Reserved Instance Payment Option. When you purchase a Reserved Instance, the reservation is automatically applied to running instances that match your specified parameters. Alternatively, you can launch an On-Demand EC2 instance with the same configuration as the purchased reserved capacity. No Upfront and Partial Upfront Reserved Instances are billed for usage on an hourly basis, regardless of whether or not they are being used. All Upfront Reserved Instances have no additional hourly charges. Reserved Instances do not renew automatically; you can continue using the EC2 instance without interruption, but you will be charged On-Demand rates. New Reserved Instances can have the same parameters as the expired ones, or you can purchase Reserved Instances with different parameters. You can use Auto Scaling or other AWS services to launch the On-Demand instances that use your Reserved Instance benefits. For information about launching On-Demand instances, see Launch Your Instance. For information about launching instances using Auto Scaling, see the Auto Scaling Developer Guide. For product pricing information, see the following pages: AWS Service Pricing Overview Amazon EC2 On-Demand Instances Pricing Amazon EC2 Reserved Instance Pricing For information about the Reserved Instance pricing tiers, see Understanding Reserved Instance Discount Pricing Tiers. Note
Comparison: Temporal, S3, EBS, CloudFront See Table 9-3 in Cloud Computing Bible Amazon Elastic Block Store (Amazon EBS) provides persistent block level storage volumes for use with Amazon EC2 instances in the AWS Cloud. Each Amazon EBS volume is automatically replicated within its Availability Zone to protect you from component failure, offering high availability and durability. Amazon EBS volumes offer the consistent and low-latency performance needed to run your workloads. With Amazon EBS, you can scale your usage up or down within minutes – all while paying a low price for only what you provision.
Amazon S3 Storage Classes Amazon S3 offers a range of storage classes designed for different use cases These include Amazon S3 Standard for general-purpose storage of frequently accessed data, Amazon S3 Infrequent Access for long-lived, but less frequently accessed data, and Amazon Glacier for long-term archive Amazon S3 also offers configurable lifecycle policies for managing your data throughout its lifecycle Once a policy is set, data will automatically migrate to the most appropriate storage class without any changes to your application
Main Focus Distributed Relational Databases Distributed Storage (and non-relational) Performance Concurrency and Recovery Consistency and Properties of Distributed Systems Query processing Distributed Synchronization Tools Miscellaneous
Background: Cloud Storage and Computation Outline Background: Cloud Storage and Computation The Basic Idea Public, Private, Hybrid Clouds Challenges: security, pricing, availability, reliability, efficiency Example: Amazon’s AWS: EC2, S3, EBS, CloudFront. Our focus: data and databases on the cloud Seminar Goals Rules of the game
Seminar Goals Gain familiarity with the cloud Read and digest a scientific/technical paper (including its related works) Understand the “big picture” as well as the details Construct a coherent presentation Deliver in class in a friendly efficient way
Background: Cloud Storage and Computation Outline Background: Cloud Storage and Computation The Basic Idea Public, Private, Hybrid Clouds Challenges: security, pricing, availability, reliability, efficiency Example: Amazon’s AWS: EC2, S3, EBS, CloudFront. Our focus: data and databases on the cloud Seminar Goals Rules of the game
איך לבחור נושא עיינו במאמרים (ברשת) ובספרים (בספרייה) עיינו בערכים הרלוונטיים ב- WIKIPEDIA החל מה-22/3/17 ועד ה- 4/4/17 (כולל) יש להציע לי בדוא"ל 3 נושאים עליהם תרצו להרצות ומי הזוג (אם רלוונטי לנושא) אנא הסבירו גם, לכול נושא, מדוע לדעתכם הוא תואם במיוחד את כישוריכם או שטחי ההתעניינות שלכם הקובץ הנוכחי יעודכן החל מה-24/3/17 בנושאי הסמינר שכבר "נתפסו". במידה והנושאים בהם אתם מעוניינים "נתפסו", אנא הקימו איתי קשר לניסיון נוסף
דרישות הקורס אין דרישת קדם פורמאלית אבל עליכם להיות בקיאים במסדי נתונים לפחות ברמה של הקורס "מערכות מסדי נתונים" נוכחות מלאה חובה תרגיל בית לזוגות (10%) הכנת הרצאה והצגתה – בקיאות והבנה של הנושא והרקע (כולל עיקר הביבליוגרפיה), הצגה בהירה הפורמט: בד"כ מצגת, ייתכן גם DEMO (מוכן היטב ובדוק מראש בחדר ההרצאה)
ההרצאות ההרצאות לא יחולו בהכרח בשבוע המיועד ועליך להיות מוכנ/ה להרצות בכל פגישה מהשבוע שלפני השבוע המיועד ואילך, ובלי קשר למתן הרצאות אחרות כלשהן ושאר חישובים למיניהם יתכן שהמפגש של ה-17.5 יתקיים ביום א' ה-21.5 בשעה 16:30 בטאוב 4; אנא שריינו זמן זה יתכן שתהיינה פגישות מעבר למספר המפגשים (13) הצפוי. ההשתתפות מומלצת בפגישות כאלו אך אינה חובה כל הנושאים מחייבים עיון ובקיאות בספרות רלוונטית ולא רק בנושא הספציפי; מאמרים לעתים מהווים נקודת התחלה שממנה יש להרחיב לכלל מצגת שלמה עד ליום שני בשבוע הלימודים שלפני השבוע בו חל מועד ההרצאה המתוכנן, יש לשלוח אלי את המצגת; זה יאפשר עיון בה לכלל הסטודנטים הדבר חשוב גם כי ייתכן שתתבקשו להציג לפני המועד המתוכנן בשל אילוצים בלתי צפויים מראש חלק מהנושאים מיועדים לבודדים וחלק לצוות של שני מציגים
ציון 80% להכנת הנושא והצגתו, 10% על נוכחות ועמידה בכללים ו- 10% על תרגיל הבית ייתכן בונוס של 10% על DEMO מוצלח במיוחד, בכול מקרה הציון המקסימלי – 100 אין אחוזי ציון על השתתפות אקטיבית, אך כדי שכולנו נפיק את המרב מהסמינר ההשתתפות חשובה מאד
ולעבודה... https://www.youtube.com/watch?v=RVDCQppvvDA