Download presentation
Presentation is loading. Please wait.
1
Fabrizio Gagliardi EMEA & LATAM Director Technical Computing MSR External Research Microsoft Corporation
2
Advancement of Science Global Collaboration Technology Excellence Interoperability Putting computing into science… Applying Microsoft products and research technologies to advance the scientific research and engineering innovation process Putting science into computing… Ensuring that research community requirements are factored into future versions of Microsoft software
3
Current or Completed Projects o Cornell – arXiv.org + Word 2007 (and repository interoperability via SWORD) o MIT / Broad Institute – Authoring (Word 2007) + data for research reproducibility o MSR – CMT++ interoperability with data + metadata transfer/exchange (conference management tool enhancements) o LiveLabs – eJournal publishing online service (community publishing tool) o UC San Diego / PLoS – Semantic mark-up of scholarly articles (+ submission) o Chem4Word with Office & Cambridge University – Create add-in to Word 2007 to facilitate drawing of chemical compounds and equations o Johns Hopkins University – Digital Archive for Astronomy/Astrophysics data (storage, preservation and access) o Planets Project / EU (with MSR – Cambridge) OpenXML and file format preservation + interoperability o eChemistry Project (Cornell, Penn State, Indiana, Cambridge, Southampton) – ORE exemplar: access to compound chemical info objects (cross-repository access to open chemistry data) o British Library – Researcher Information Centre (RIC) online workflow tool for scientists and researchers o Creative Commons Add-in for Office 2007 – evolving the Word 2003 effort o University of Southampton (UK) – Port ePrints Repository Software for installation on the Windows platform o University of Manchester / “MyExperiment” Project – social networking for scientists o ORE Acceleration Project (OAI – Object Reuse & Exchange) – Alpha spec development o Indiana University – Toolbox for Social Networking (SRT) o UK National Archives – Virtual PC / Emulation of legacy systems to facilitate preservation o National Library of Medicine / NCBI – “PubMed Int’l” UK version of PubMed + NLM DTD Pipeline o DRIVER 2 (EU) – Infrastructure integration of across a network of European research repositories
4
Goals A platform for building services and tools for research output repositories Papers, Videos, Presentations, Lectures, References, Data, Code, etc. Relationships between stored entities Enable a tools and services ecosystem for “research output” repositories on MS technologies Execution Utilizing OAI-ORE, SWORD, and other community protocols In development, deployment within MSR in early Q4 Beta release to the community in late Q4 Built on SQL Server 2008 + Entity Framework Using WPF and Silverlight for UI Research output repository platform UIs Desktop Tools SyndicationInterop Search
5
Goals Create a platform for building “research output” repositories Engage with the digital library and scholarly communications community Become the “research output” repository for MSR (RMCr project) – Papers, Videos, Presentations, Lectures, References, Data, Code, etc. Support an ecosystem of services and tools Available to the community for free (we are still considering the open source route) Build an easy-to-install collection of basic services and tools Non-goals A generic platform for asset management Support the lifecycle of publications Compete with existing repository solutions
6
Researchers manage their personal research entities (data, citations, documents, workflows, etc.) Entities + Relationships can be synched to cloud storage so that they are: - Always Available - Sharable - Mixable - Harvestable Support of harvesting & federation to/from Institutional Repositories - arXiv.org - DSpace - ePrints - Fedora - etc.
7
Limit Tech Preview release due June 2008 Public Beta targeted for Aug/Sept 2008 For more details – Contact: Alex Wade (Program Manager) / alex.wade@microsoft.comalex.wade@microsoft.com – Community Forum: http://community.research.microsoft.com/forums/90.aspx
8
The cyberinfrastructure for the next generation of researchers
9
Expect scientific research environments will follow similar trends to the commercial sector – Leverage computing and data storage in the cloud – Scientists already experimenting with Amazon S3 and EC2 services, with mixed results; For many of the same reasons – Siloed research teams, no resource sharing across labs – High storage costs – Low resource utilization – Excess capacity – High costs of reliably keeping machines up-to-date – Little support for developers, system operators 9
10
Collective intelligence – If last.fm can recommend what song to broadcast to me based on what my friends are listening to, why cannot the cyberinfrastructure of the future recommend articles of potential interest based on what the experts in the field that I respect are reading? – Already examples emerging but the process is manual (Connotea, BioMedCentral Faculty of 1000...) Automatic correlation of scientific data Smart composition of services and functionality Cloud computing to aggregate, process, analyze and visualize data
11
Important/key considerations – Formats or “well-known” representations of data/information – Pervasive access protocols are key (e.g. HTTP) – Data/information is uniquely identified (e.g. URIs) – Links/associations between data/information Data/information is inter- connected through machine- interpretable information (e.g. paper X is about star Y) Social networks are a special case of ‘data networks’ Attribution: Richard CyganiakRichard Cyganiak
12
scholarly communications domain-specific services instant messaging identity document store blogs & social networking mail notification search books citations search books citations visualization and analysis services storage/data services compute services virtualization compute services virtualization Project management Reference management knowledge management knowledge discovery Vision of Future Research Environment with both Software + Services
14
Thousand years ago – Experimental Science – Description of natural phenomena Last few hundred years – Theoretical Science – Newton’s Laws, Maxwell’s Equations… Last few decades – Computational Science – Simulation of complex phenomena Today – eScience or Data-centric Science – Unify theory, experiment, and simulation – Using data exploration and data mining Data captured by instruments Data generated by simulations Data generated by sensor networks – Scientists overwhelmed with data – Computer Science and IT companies have technologies that will help (With thanks to Jim Gray)
15
Web users... Generate content on the Web – Blogs, wikis, podcasts, videocasts, etc. Form communities – Social networks, virtual worlds Interact, collaborate, share – Instant messaging, web forums, content sites Consume information and services – Search, annotate, syndicate Scientists... Annotate, share, discover data – Custom, standalone tools Conferences, Journals – Publication process is long, subscriptions, discoverability issues Collaborate on projects, exchange ideas – Email, F2F meetings, video- conferences Use workflow tools to compose services – Domain-specific services/tools
16
http://ecrystals.chem.soton.ac.uk Thanks to Jeremy Frey
17
SensorMap Functionality: Map navigation Data: sensor-generated temperature, video camera feed, traffic feeds, etc. Taverna Workflow Compose services from the Web
18
With thanks to Catharine van Ingen
19
Sloan Digital Sky Server/SkyServer http://cas.sdss.org/dr5/en/
20
storingcomputing managingindexing huge amounts of data For example, Google and Microsoft both have copies of the Web for indexing purposes Computers are great tools for
21
acquisitiondiscovery aggregationorganization correlationanalysis interpretationinference We would like computers to also help with the automatic of the world’s information storingcomputing managingindexing huge amounts of data Computers will still be great tools for
23
Set of concepts and technologies – Data modeling – Relationships – Ontologies – Machine learning (entity extraction) – Inference, reasoning – Data, information, knowledge… DataInformationKnowledgeIntelligenceWisdom Current technologies Possibilities for innovation
24
Term used to refer to the concept of “meaning” The linguistics, AI, Natural Language Processing, etc. communities have been working on “meaning” and ”knowledge” related technologies for decades Pragmatic approach to Semantic Computing – Emergence of a new breed of technologies to capture meaning (RDF, OWL, etc.) – Combine with the pervasiveness of the Web community technologies such as folksonomies …
25
The term is used to describe a set of technologies used to represent data, concepts, and their relationships – Become a buzzword like Web 2.0 Prefer to use the term “Semantic Computing” which is about modeling data in ways that can be automatically processed by computers
26
Some efforts are driven by the traditional “knowledge engineering” community – Engaged in building well-controlled ontologies – Important for domain-specific vocabularies with data formats and relationships specific to a community – Model does not easily scale to the Internet Some efforts are driven by the Web 2.0 community – Focus on the pervasiveness of Web protocols/standards – Emphasis on microformats (small, flexible, embeddable structures) – Exploit evolving and ever-expanding vocabularies such as folksonomies and tag clouds
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.