Download presentation
Presentation is loading. Please wait.
Published byPenelope Kern Modified over 9 years ago
1
Click to add text © 2012 IBM Corporation 1 DataExplorerPush Operator InfoSphere Streams Version 3.0 Manasa K Rao Toolkits
2
© 2012 IBM Corporation 2 Important Disclaimer THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “ AS IS ”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBM ’ S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF: CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS); OR ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENT GOVERNING THE USE OF IBM SOFTWARE. The information on the new product is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information on the new product is for informational purposes only and may not be incorporated into any contract. The information on the new product is not a commitment, promise, or legal obligation to deliver any material, code or functionality. The development, release, and timing of any features or functionality described for our products remains at our sole discretion. THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.
3
© 2012 IBM Corporation 3 Agenda Overview Architecture Diagram Use Cases Overview of InfoSphere Data Explorer Terminologies and Concepts Software Prerequisites Using the DataExplorerPush operator Update scenario Using the optional error output port Metrics
4
© 2012 IBM Corporation 4 Overview DataExplorerPush operator is a Java primitive operator added to the existing BigData toolkit It is a Streams sink adapter providing ability to push data into IBM InfoSphere Data Explorer infrastructure It can be found in the namespace com.ibm.streams.bigdata.dataexplorer It has one non windowed input port and an optional error output port It supports sending data of the types: int8, int16, int32, int64, uint8, uint16, uint32, float32, float64, timestamp, rstring and ustring
5
© 2012 IBM Corporation 5 Architecture Diagram
6
© 2012 IBM Corporation 6 Use Cases Consider a large sports equipment manufacturing firm. In addition to the multiple data sources that already exist within the firm, the social media data is an indispensable source of information that can give indicators on the user experiences and sentiments regarding their products. The social media data can be tapped into Streams and sent into InfoSphere Data Explorer using the DataExplorerPush operator. This data in conjunction with the already existing enterprise data and knowledge from analysis of this data can be used to quickly discover positive trends and negative trends, causes of the negative trends, leader-follower patterns and tap into these valuable information on time.
7
© 2012 IBM Corporation 7 Overview of Data Explorer InfoSphere® Data Explorer V8.2 can help organizations discover, navigate, and visualize vast amounts of structured and unstructured information across many enterprise systems and data repositories. Some of the benefits that InfoSphere Data Explorer offers: –Unlocks the value of big data by enabling organizations to quickly navigate large volumes of content to discover high value sources. –Creates applications that combine in a single interface structured, semistructured, and unstructured information that enables organizations to create complete contextual view of topics such as customers, products, employees, projects, and more. –Delivers a new application framework component that changes the information access paradigm by proactively pushing relevant information to each user based on their activities and business context. –Empowers organizations to cost-effectively build 360 degree information applications to improve efficiency and solve information- intensive business challenges.
8
© 2012 IBM Corporation 8 Terminologies and Concepts BigSearch API - A set of APIs that provides the API user with the capability of adding/modifying records on to Data Explorer index and hides the complexity of the operation from the API user. It internally uses the IBM InfoSphere Data Explorer API Connection document – Connection document refers to a text file containing information for connection to Data Explorer. It is of the form: zookeeperNamespace= zookeeperEndpoints=
9
© 2012 IBM Corporation 9 Software Prerequisites BigSearch API is required for using the DataExplorerPush operator The BigSearch API and its dependencies need to be present in an accessible location to DataExplorerPush operator An environment variable BIGSEARCH_JAR needs to be set to point to the name of the BigSearch API jar For example, if the jar file bigsearch1.jar is the name of the bigsearch api jar file and is located inside /opt/DataExplorer/lib, then, the BIGSEARCH_JAR is set as follows: export BIGSEARCH_JAR='/opt/DataExplorer/lib/bigsearch1.jar'
10
© 2012 IBM Corporation 10 Using the DataExplorerPush operator namespace application; use com.ibm.streams.bigdata.dataexplorer::DataExplorerPush; composite DataExplorerPushMain { graph stream InStream = FileSource(){ param file: "Tweet.txt"; } () as Sink1 = DataExplorerPush(InStream){ param connectionDocument : “/home/streamsuser/connections/DataExplorerConnection.txt”; recordType : "Tweet"; recordIdAttribute : “c”; retrievableAttributes : “a”,”b”,”c”,”d”,'e” sortableAttributes: “b”,”e”; filterableAttributes: “a”; nonSearchableAttributes: “a”; suppress: “d”; } Contents of DataExplorerConnection.txt zookeeperNamespace = Test zookeeperEndpoints = xxxxxxxxx.ibm.com Contents of Tweet.txt "Text1",11,1,11.1,"ai\u00f1ata" "Text2",22,2,22.2,"bi\u00f1ata" "Text3",33,3,33.3,"ci\u00f1ata" "Text4",44,4,44.4,"di\u00f1ata" "Text5",55,5,55.5,"ei\u00f1ata"
11
© 2012 IBM Corporation 11 Using the DataExplorerPush operator (cont'd)
12
© 2012 IBM Corporation 12 Using the DataExplorerPush operator (cont'd) Attribute 'a' is nonSearchable Consider record : "Text2",22,2,22.2,"bi\u00f1ata" Search using value of 'b', i.e 22 yields: Search using value of 'a', i.e Text2 yields:
13
© 2012 IBM Corporation 13 Update scenario Update scenario – If a record with the same recordId as the current record exists in the collection and is of the same record type, then, an update would be performed on that record Contents of Tweet.txt "Text1Changed",11,1,11.1,"ai\u00f1ata" "Text2",22,2,22.2,"bi\u00f1ata" "Text3",33,3,33.3,"ci\u00f1ata" "Text4",44,4,44.4,"di\u00f1ata" "Text5",55,5,55.5,"ei\u00f1ata"
14
© 2012 IBM Corporation 14 Using the optional error output port namespace application; use com.ibm.streams.bigdata.dataexplorer::DataExplorerPush; composite DataExplorerPushMain { graph stream InStream = FileSource(){ param file: "Tweet.txt"; } stream inTuple, rstring recordId, rstring errorMsg, rstring collectionName, rstring recordType> = DataExplorerPush(InStream){ param connectionDocument : “/home/streamsuser/connections/DataExplorerConnection.txt”; recordType : "Tweet"; recordIdAttribute : “c”; retrievableAttributes : “a”,”b”,”c”,”d”,'e” sortableAttributes: “b”,”e”; filterableAttributes: “a”; nonSearchableAttributes: “a”; suppress: “d”; } {a="Text1Changed",e="aiñata"},"1","com.ibm.data explorer.bigsearch.IndexerException: com.ibm.dataexplorer.bigsearch.IndexerException: Failure.","induceerror","Tweet" {a="Text2",e="biñata"},"2","com.ibm.dataexplorer.b igsearch.IndexerException: com.ibm.dataexplorer.bigsearch.IndexerException: Failure.","induceerror","Tweet" {a="Text3",e="ciñata"},"3","com.ibm.dataexplorer.b igsearch.IndexerException: com.ibm.dataexplorer.bigsearch.IndexerException: Failure.","induceerror","Tweet" {a="Text4",e="diñata"},"4","com.ibm.dataexplorer.b igsearch.IndexerException: com.ibm.dataexplorer.bigsearch.IndexerException: Failure.","induceerror","Tweet" {a="Text5",e="eiñata"},"5","com.ibm.dataexplorer.b igsearch.IndexerException: com.ibm.dataexplorer.bigsearch.IndexerException: Failure.","induceerror","Tweet"
15
© 2012 IBM Corporation 15 Metrics 4 metrics : nRecordsPushed, nRequestsOutstanding, nRecordsFailed and nRecordsWithNonIndexableFields are supported
16
© 2012 IBM Corporation 16 Thank You
17
© 2012 IBM Corporation 17 Backup Slides
18
© 2012 IBM Corporation 18 Zookeeper Namespace To create a zookeeper namespace: In the bigsearch lib: java -jar xxx.jar -n -s -i Entity model file: This file contains information on which velocity instance/instances is this zookeeper being configured, collection name/names that the data need to go to and entity type/types of the data that is being sent to.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.