Presentation is loading. Please wait.

Presentation is loading. Please wait.

"Data sources index" a web application to list projects in Hadoop Luca Menichetti.

Similar presentations


Presentation on theme: ""Data sources index" a web application to list projects in Hadoop Luca Menichetti."— Presentation transcript:

1 "Data sources index" a web application to list projects in Hadoop Luca Menichetti

2 Scope, Problem One goal of the AWG: to collect data (in our Hadoop clusters) coming from different IT service projects, allowing easy and fast approach to the analysis. TWiki page: contains an inventory of all data sources and their available metrics. TWiki page Limitations: ◦ It is a static list (no ETL updates from origin sources) ◦ Focused on the origin metrics description written by the provider (no future manipulations or alternative formats) ◦ No offering defined APIs 2

3 Purpose To offer a web application where users can: ◦ browse all collected data sources and see which are the actual current availabilities for each project, ◦ monitor the daily ETL state from the origin source to the cluster, ◦ list other formats besides the main one, ◦ share new derived dataset, (e.g. created from a join with other datasets or enriched with external information, and so on) ◦ provide public APIs …without replacing the TWiki function which is the main documentation reference for each project. 3

4 How A Java Web application with REST API implementing CRUD logic. Running on: OpenStack, win.medium size Container: Tomcat Database: MongoDB (Morphia for object mapping)Morphia Content Type standard: Collection+JSONCollection+JSON Web interface: Bootstrap 4 Web homepage: http://awg-virtual/data-sources-indexhttp://awg-virtual/data-sources-index Source code: Gitlab projectGitlab projectLinks

5 REST model For each data source in the TWiki, there is one or more “Projects” stored in the web index application. A Project may have many “Formats” ◦ CSV, Parquet, Avro, … A Format may have a list of “Entries” ◦ Representing single imports “Notes” can be attached to a Format and optionally to an Entry ◦ General purpose messages 5

6 Web interface and API 6

7 Example – landb (1) 7 TWikiData Sources Index

8 Example – landb (2) Compact view for a fast visuali- zation of all formats and relative schemas Bookmark the web page for a quick reference Not meant to replace the TWiki, where there are described all the metrics Every resource is accessible through its REST path No “Entries” available for these Formats (click on links will produce an empty result) 8

9 Example – experiment job monitoring (1) 9 Link: TWikiTWikiLink: Data Sources IndexData Sources Index

10 Example – experiment job monitoring (2) 10 Listing formats…Listing entries…

11 Example – EOS logs (1) 11 Listing formats…Listing notes…

12 Example – REST API (1) $ cat templates/jm-atlas_CJ_template.json { "template": { "data": [ { "name": "project_name", "value": "jm-atlas" }, { "name": "full_name", "value": "Experiment Dashboard Job Monitoring Atlas" }, { "name": "description", "value": "the job monitoring logs of all executions submitted by Atlas in.. },... 12

13 Example – REST API (2) # Create a project curl -X POST -d @templates/jm-atlas_CJ_template.json -H "Content-Type: application/vnd.collection+json" awg-virtual/data-sources-index/rest/projects/ # Retrieve curl awg-virtual/data-sources-index/rest/projects/jm-atlas # Delete curl -X DELETE awg-virtual/data-sources-index/rest/projects/jm-atlas # Create a format curl -X POST -d @templates/jm-atlas-Avro_CJ_template.json -H "Content-Type: application/vnd.collection+json" awg-virtual/data-sources-index/rest/projects/jm-atlas/formats 13

14 Example – REST API (3) $ curl -v awg-virtual/data-sources-index/rest/projects/jm-atlas/formats > GET /data-sources-index/rest/projects/eos-alice HTTP/1.1 < HTTP/1.1 200 OK, Content-Type: application/vnd.collection+json { "collection": { "version":"1.0", "href":"/projects/jm-atlas/formats", "items": [ { "href":"/formats/55b72f41080d827ec79968f9", "data": [ { "name":"ID", "prompt":"the internal ID given by morphia to store the object in mongodb", "value":"55b72f41080d827ec79968f9“ }, { "name":“format_name", "prompt":"the short name for the project, commonly used in the HDFS path and as parameter for awgrepotool", "value":“jm-atlas-avro"}, … 14

15 References Web application: http://awg-virtual/data-sources-indexhttp://awg-virtual/data-sources-index ◦ Not visible outside CERN REST API: http://awg-virtual/data-sources-index/resthttp://awg-virtual/data-sources-index/rest Gitlab project / Wiki Gitlab projectWiki ◦ Visible to all CERN members ◦ Only restricted users can modify the project Documentation and examples Documentationexamples ◦ REST APIs detailed description ◦ Python scripts, ready to use Contact: lmeniche@cern.ch ◦ or open an issueopen an issue 15

16 Questions or suggestions ? Thank You 16


Download ppt ""Data sources index" a web application to list projects in Hadoop Luca Menichetti."

Similar presentations


Ads by Google