Data Science: Challenges and Directions A paper by longbing cao Presented by Nathan Navarro griffin
The Paper The data The present The future
What Is Data Science? The intersection of statistics, informatics, computing, communication, sociology, and management given data, domain, and thinking Goals of translating data into insight and intelligence Complex systems X-Complexities X-Intelligence
The data
X-Complexities Complex relationships encoded in data Often hidden Some involved complexities: Behavior of subjects Domain (expert knowledge, policies, …) Social dynamics Learning (for humans)
X-Intelligence Refers to comprehensive and valuable information Informs underlying challenges Complex systems X-complexities Some involved intelligences: Behavior intelligence Domain intelligence Human intelligence Social intelligence
The present
Understanding Known-to-Unknown Modern data science seeks to find hidden “CKI”: Complexities Knowledge Intelligence Modern data science focuses on: “We do not know what we do not know” Tackles CKI invisibility
Modern Data Science Challenges in areas not currently well managed Data/business understanding Mathematical and statistical foundation Data quality and social issues Data value, impact, utility Data-to-decision and action-taking challenges Mine/model data to better understand these issues
The IID Assumption Relatively narrow assumptions are made IID says data is simple, even though it is not Assumes strongly structured data Often ignores many x-complexities Influences in data that affect linkage/co- occurrence/correlation/etc. Needs more research to resolve Deeper understanding of non-IID characteristics
Data Characteristics Data characteristics X-complexities Other factors (dimension, heterogeneity, uncertainty…) A need to understand data characteristics 1. How do we define data characteristics? 2. How do we represent and model data characteristics s? 3. data characteristics driven data understanding 4. How do we evaluate quality of understanding?
The Future
Human-like Intelligence Public likes to talk about it AlphaGo Nothing has exhibited human-like thought Creativity Curiosity Models cannot yet upgrade their own intelligence Data analytical thinking
Building Complex Systems Need to synthesize X- intelligence in complex data from many fields Ubiquitous intelligence to understand data Qualitative-to-quantitative metasynthesis Iterative cognitive Human-machine-cooperative
What to take away Challenges in the hidden data reflect the immaturity of the field X-intelligence X-complexities Future work will require cross-disciplinary action Impact of the field could be unprecedented Draw parallels to the internet revolution
Questions?