High Performance Computing Cluster (HPCC) Mary Galvin Managing Principal, American Innovations Consulting https://www.linkedin.com/pub/mary-galvin/15/340/397.

Slides:



Advertisements
Similar presentations
Committed to making the worlds scientific and medical literature a public resource Donna Okubo, Institutional Relations Manager.
Advertisements

1 Opportunities and Challenges of Social Computing Kirsti Ala-Mutka European Commission, JRC Institute for Prospective Technological Studies Information.
Jacqueline A. Gill, Associate Professor Slides will change automatically or you may click the screen to move forwards.
Providing collections, tools and services for digital humanities A national library perspective Clément Oury Head of Digital Legal Deposit Bibliothèque.
RESEARCHING STATE LEGISLATIVE HISTORY. State Legislative History Research Legislative Documents  Session Laws  Law Library (e.g., Indiana Acts)  Official.
1 Working with Social Media in Research Settings Victoria Wade Careers Consultant.
Copyright in Academia Tim Bowen, Director, Academic Products & Services Karen Melanson, Licensing Consultant October 26, 2012.
ANALYSING RESEARCH – A GLOBAL PERSPECTIVE Krzysztof Szymanski – Country Manager Thomson Reuters October 2009.
Resource Description and Access (RDA): a new standard for the digital world Ann Huthwaite Library Resource Services Manager, QUT.
6 Biggest Mistakes Companies Make Using Social Media HELPING BUSINESS USE SOCIAL MEDIA MARKETING FOR A
February 6, Background: Where We Are The Internet is changing the way Americans obtain news and information 55 million blogs Explosion of social.
Social Media Intro to Business & Marketing. The most three most trusted forms of advertising are: Recommendations from people I know - 90% Consumer opinions.
Reach – Engage – Convert - Analyze. To be an innovative internet company. Realty fact is a highly innovative and emerging Internet technology company.
TC2-Computer Literacy Mr. Sencer February 4, 2010.
Lund Online 07/10/2009 Ingolf Kaspar, Regional Sales Manager EBSCO Publishing.
An Overview of the NISE Network Presentation Overview NISE Network Network Community Educational Products Get More Involved.
Development and the role of Nepali media in the UK Nepali community Chandra Laksamba.
New media for outreach New media and you. 77% of active internet users regularly read blogs Twitter has 20 million new users every month Facebook has.
Is it easy to create one? Characteristics Aren’t they just online diaries?
_________________________________________________________________________________ McMaster University Libraries library.mcmaster.ca x23318.
The Scientific Library of NUPh for students: services and electronic resources The presentation of services, provided by the Scientific Library of NUPh.
Making News. Communicating news information  News reporting is a genre with its own specific characteristics  Its characteristics have evolved owing.
Making an impact ANU Library What is impact What (the heck) are bibliometrics Publish with impact – an overview Debate on impact How innovative are you.
Recorded Books Electronic Services 1 All News is Local All News is Global Play Worldcrunch Overview.
© Spinnaker Labs, Inc. Google Cluster Computing Faculty Training Workshop Open Source Tools for Teaching.
Taking care of business
Legal Administration: Communications Doing Library Research at Durham College.
The DSpace Course Module – An introduction to DSpace.
The role of Parthenos for CLARIN ERIC Steven Krauwer CLARIN ERIC Executive Director 1.
BARCELONA January 2011 European Commission Information Society and Media GaLA Game and Learning Alliance The European Network of Excellence on Serious.
Finding Primary Documents A Tutorial. What Are Primary Sources? Although the terms primary and secondary are not always sharply divided, in general. primary.
BRIDGING THE DIGITAL DIVIDE A Basic Understanding.
An Overview of the NISE Network Presentation Overview NISE Network Network Community Educational Products Evaluation and Research.
Finding Credible Sources
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
Feminist Social Media Resources Click a picture to begin!
National Work Zone Safety Information Clearinghouse Hong Yu Texas Transportation Institute.
AGORA The Portal for Parliamentary Development International Parliamentary Conference on Gender and Politics Houses.
Making the most of the Net: or wouldn’t it be great if... David Wilcox Partnerships Online
Strategies for Conducting Research on the Internet Angela Carritt User Coordinator, Oxford University Library Services Angela Carritt User Education Coordinator,
May 2, 2013 An introduction to DSpace. Module 1 – An Introduction By the end of this module, you will … Understand what DSpace is, and what it can be.
Journalism Today Chapter 1. Traditional Media ► Traditional media make a strong industry. ► More attention is being paid to the Internet to gain and maintain.
CSA Discovery Services!! Community of Scholars PapersInvited COS Funding Opportunities.
© 2009 IBM Corporation IBM developerWorks: The Front Door to the Cloud Janine Gerber March, 2010.
LIBRARY SERVICES Strategies for gaining and maintaining academic support for the institutional open access.
PUBLICATION Research Data Management. Research Data Management Publication Finishing Touches of Research Data Management Where should you publish: Academic.
What Is IEI ? The Institution of Engineers (India), IEI, is a multidisciplinary professional body that encompasses 15 engineering disciplines and gives.
OMICS international Contact us at: OMICS International through its Open Access Initiative is committed to make genuine and.
Ashley Williams Academic Writing Workshop How to do research.
The Claromentis Digital Workplace An Introduction
Rachel Stone Director of Operations Microsoft Office Specialist Master Instructor COMMONWEALTH COMPUTER TRAINING.
Social Media & Social Networking 101 Canadian Society of Safety Engineering (CSSE)
+ Welcome to PAHO/WHO Sustainable Development and Health Toolkit for the UN Global Conference RIO + 20 Welcome to PAHO/WHO Sustainable Development and.
The Impact of the Social Sciences Jane
AACRAO American Association of Collegiate Registrars and Admissions Officers Presented by: Meredith Head.
Use of Digital Commons at Macalester Johan Oberg Digital Scholarship and Services Librarian,
© 2007 IBM Corporation IBM Software Strategy Group IBM Google Announcement on Internet-Scale Computing (“Cloud Computing Model”) Oct 8, 2007 IBM Confidential.
E 3 : The Enlighten Embedding Experience William J Nixon How embedded and integrated is your repository? #jiscrte Nottingham 10 February 2012.
Full Text Finder Publication Finder Overview
Finding Credible Sources Online
Scoop.it Scoop.it is a web-based curation tool that allows you to collect and organize web information. Once you curate tools and resources, Scoop.it.
Map Reduce.
Fall 2017 TCU Library Training
Role of Social Media in Learning
Using the Web for Teaching and Learning
BBI 3423 LANGUAGE AND ICT.
Ritchie Michel Nathalie Moreira Adrian Saucedo Tya-Marie Savain
ELearning Platform.
Presentation transcript:

High Performance Computing Cluster (HPCC) Mary Galvin Managing Principal, American Innovations Consulting

Big Data at LexisNexis US Public Records - 50 billion records - 10k+ data sources mil. unique identities billion unique businesses Patent Data patenting authorities - Translations for all non-English content - On average, sources go back roughly 30 years (some go back 100+ years) Case Law - 20 million + court records from federal, state and local governments - Non-US countries include France, Australia, Hong Kong, Canada, and the UK News Articles - 20k+ sources, including traditional print (newspapers, magazines, trade journals, etc) and “new” media (ie, blogs, Twitter feeds, audio & video transcripts)

Late 90s/Early 2000s 2012 Google’s MapReduce Paper is Published. The HPCC is Officially Released to the Open Source Community! United States Government Sought After Getting LexisNexis’ Data Capabilities In- House for their Internal Data Mining Needs. The Idea of Releasing the HPCC to the OSS Community was Presented to LexisNexis Corporate Management. The Spread of HPCC Users has Gone Global, and as a Result, Innovation Ignites First Release of Hadoop Available (designed after Map Reduce Papers). History of the HPCC Designed and Developed from the Ground-Up to Meet LexisNexis’ Internal Big Data Needs.

HPCC Architectural Overview

ECL Overview Task: Produce a set of records wherein a particular field contains a specific set of values Typical approach for solving this in many programming languages

ECL Overview (cont’d) Task: Produce a set of records wherein a particular field contains a specific set of values Approach for solving this problem in ECL

HPCC Modules & Plugins Exploratory Data Analysis (EDA) Toolkit Scalable Automated Linking Technology (SALT) Data Ingest Data Profiling Data Hygiene Clustering Relationship Extraction Other H2H Connector Machine Learning Module R Integration Eclipse IDE JDBC Driver ……..

HPCC Academic Program Audience: Colleges and Universities Benefits: Internship opportunities Invitation-only conferences Free training for qualifying projects Access to an external cluster, as available

Additional Learning Options Online: Includes both prerequisites and tailored courses depending on role type (ie, developers, analysts, and administrators) In-Person:

Getting Started