Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lesson 2: Getting Started

Similar presentations


Presentation on theme: "Lesson 2: Getting Started"— Presentation transcript:

1 Lesson 2: Getting Started
Chapter 2B: How to use HDFS or S3 data

2 Lesson 2 – Chapter 2B CHAPTER 2B – How to use HDFS and S3 Data
In this Chapter, you will: Create a Flow from: HDFS S3 A datasourse is a reference to a set of data that has been imported into the system. This source is not modified within the application datasource and can be used in multiple datasets. It is important to note that when you use Trifacta to wrangle a source, or file, the original file is not modified – therefore, it can be used over and over – to prepare output in multiple ways, for example. Datasources are created in the Datasources Page, or when a new dataset is created. There are two ways to add a datasource to your Trifacta instance: You can locate and select a file in HDFS – HDFS stands for Hadoop File System. You can use the file browser to locate and select the file. You can also upload a local file from your machine. Note that there is a 1 GB file size limit for local files. Several file formats are supported: CSV LOG JSON AVRO EXCEL – Note that if you upload an Excel file with multiple worksheets, each worksheet will be imported as a separate source. Trifacta. Confidential & Proprietary.


Download ppt "Lesson 2: Getting Started"

Similar presentations


Ads by Google