Download presentation
Presentation is loading. Please wait.
Published byErin Barrett Modified over 9 years ago
1
© Hortonworks Inc. 2013. MR / Tez Query Comparison Page 1 HW = 20 Node (48 GB RAM, 6x disk) SW = Hive Trunk (Nov 13 2013) + ORCFile + Vectorization Hive Trunk M/RHive Trunk Tez (Cold)Hive Trunk Tez (Hot)Tez Relative GainHot / Cold Gain (%) query12297.520.09.72958.8%105.2% query1575.962.658.729.4%6.7% query2139.352.446.9-16.2%11.7% query2646.933.023.3101.0%41.3% query2737.617.78.4348.8%111.8% query2839.924.212.5218.4%93.4% query358.923.916.1265.9%48.7% query3487.731.725.3246.6%25.3% query39234.562.855.5322.1%13.1% query4357.034.426.0119.2%32.4% query46103.346.130.9234.4%49.2% query5259.923.515.6285.3%51.3% query5559.327.718.3224.1%51.4% query67820.9821.1787.24.3% query68102.753.142.2143.2%25.9% query747.927.718.9153.1%46.2% query7387.729.722.7287.1%31.0% query88483.395.090.2435.6%5.3% query90122.341.126.9355.3%53.0% query92278.6142.5135.7105.4%5.1% query9642.824.416.4160.6%48.6% query97279.0147.5133.4109.1%10.6% query981451.550.038.63662.2%29.5%
2
© Hortonworks Inc. 2013. Query 88 Page 2 select * from (select count(*) h8_30_to_9 from store_sales JOIN household_demographics ON store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk JOIN time_dim ON store_sales.ss_sold_time_sk = time_dim.t_time_sk JOIN store ON store_sales.ss_store_sk = store.s_store_sk where time_dim.t_hour = 8 and time_dim.t_minute >= 30 and ((household_demographics.hd_dep_count = 3 and household_demographics.hd_vehicle_count<=3+2) or (household_demographics.hd_dep_count = 0 and household_demographics.hd_vehicle_count<=0+2) or (household_demographics.hd_dep_count = 1 and household_demographics.hd_vehicle_count<=1+2)) and store.s_store_name = 'ese') s1 JOIN (select count(*) h9_to_9_30 from store_sales... 8 full table scans
3
© Hortonworks Inc. 2013. Query 88: M/R Page 3 Total MapReduce jobs = 29... Total MapReduce CPU Time Spent: 0 days 2 hours 52 minutes 39 seconds 380 msec OK 34561768762568613110328421030364606859604232692428 Time taken: 403.28 seconds, Fetched: 1 row(s)
4
© Hortonworks Inc. 2013. Query 88: Tez Page 4 Map 1: 1/1 Map 11: 1/1 Map 12: 1/1 Map 13: 1/1 Map 14: 1/1 Map 15: 1/1 Map 16: 241/241 Map 18: 1/1 Map 19: 1/1 Map 2: 1/1 Map 20: 1/1 Map 21: 1/1 Map 22: 1/1 Map 23: 1/1 Map 24: 241/241 Map 26: 1/1 Map 27: 1/1 Map 28: 1/1 Map 29: 1/1 Map 3: 241/241 Map 30: 240/240 Map 32: 241/241 Map 34: 1/1 Map 35: 1/1 Map 36: 1/1 Map 37: 1/1 Map 38: 1/1 Map 39: 241/241 Map 42: 1/1 Map 43: 1/1 Map 44: 240/240 Map 46: 241/241 Reducer 10: 1/1 Reducer 17: 1/1 Reducer 25: 1/1 Reducer 31: 1/1 Reducer 33: 1/1 Reducer 4: 1/1 Reducer 40: 1/1 Reducer 41: 1/1 Reducer 45: 1/1 Reducer 47: 1/1 Reducer 5: 1/1 Reducer 6: 1/1 Reducer 7: 1/1 Reducer 8: 1/1 Reducer 9: 1/1 Status: Finished successfully OK 345617 687625 686131 1032842 1030364 606859 604232 692428 Time taken: 90.233 seconds, Fetched: 1 row(s)
5
© Hortonworks Inc. 2013. Status Broadcast Join –Regular tasks to filter/prep the side to broadcast –Hashtables assembled in the join task –Can run in any vertex (not just map) TezSessions (AM, FS, UGI, MetaStore) –Start with cli/hs2 session –Brings up AM, connects to metastore, etc –Setup only once per session Container reuse –Task launch is now cheap –Multiple waves and re-use within session –Stragglers Multiple inputs/outputs/ TezProcessor –Can handle multiple scatter/gather + broadcast + 1-1 edges –Can handle multiple outputs for multi-table insert case –No need for single task with multiple operator pipelines
6
© Hortonworks Inc. 2013. Status Localization –Works with hive-exec + UDFs –If desired: Avoids re-localization of hive-exec Split Gen in AM/TezGroupedSplits/Caching –Splits generated according to headroom –Caching of NN connections Statistics (not Tez specific) –Allows to compute num of tasks –Used for join conversion –Degrades with available stats MetaStore improvements (not Tez specific) –Partition pruning is MUCH faster now TezMiniMR –.q file tests for Tez Explain plan
7
© Hortonworks Inc. 2013. Current limitations Not in phase I –RC Merge task/ analyze uses MR on Tez –UNION ALL not yet supported –SMB join not yet supported In phase I –More testing + bug fixes! –Integrate with new annotated –Re-localization (Tez) –Tez release
8
© Hortonworks Inc. 2013. Try Tez For Yourself 1: Download Hortonworks Sandbox 2.0 : hortonworks.com/sandbox 2: Log in: root/hadoop 3: git clone https://github.com/t3rmin4t0r/tez-autobuild/ 4: cd tez-autobuild ; make dist install 5: /opt/hive/bin/hive 6: set hive.optimize.tez=true/false
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.