Keynote Speaker

Tanveer Syeda-Mahmood, Ph.D
IBM Fellow & Chief Scientist, Medical Sieve Readiology Grand Challenge
IBM Almaden Research Center

Dr. Tanveer Syeda-Mahmood is an IBM Fellow and the Chief Scientist/overall lead for the global Medical Sieve Radiology Grand Challenge project in IBM Research. As a worldwide expert in artificial Intelligence for medical imaging and clinical decision support, she is leading the company's future in cognitive health and helping define new IBM products through her groups research in biomedical imaging, computer vision, deep learning, knowledge and reasoning.
Dr. Tanveer Syeda-Mahmood graduated with a Ph.D from the MIT Artificial Intelligence Lab in 1993. Prior to coming to IBM, Dr. Syeda-Mahmood led the image indexing program at Xerox Research and was one of the early originators of the field of content-based image and video retrieval. Over the past 30 years, her research interests have been in a variety of areas relating to artificial intelligence ranging from computer vision, image and video databases, to recent applications in medical image analysis, healthcare informatics and clinical decision support. She has over 250 refereed publications and over 100 filed patents. Dr. Syeda-Mahmood has chaired numerous conferences and workshops over the years at forums such as IEEE CVPR, ICCV, ACM, and MICCAI including MICCAI 2016 (Industrial Chair), IEEE HISB 2011 (General Chair), and IEEE CVPR 2008 (Program Chair). 
Dr. Syeda-Mahmood is a Fellow of IEEE. She is also a member of IBM Academy of Technology. Dr. Syeda-Mahmood was declared Master Inventor in 2011. She is the recipient of key awards including IBM Corporate Award 2015, Best of IBM Award 2015, 2016 and several outstanding innovation awards. In 2016, she received the highest technical honor at IBM and was conferred the title of IBM Fellow. 
Speech Title: Role of Deep Learning and Artificial Intelligence in Clinical Decision Support for Imaging
Abstract: With the advent of new machine learning techniques, the field of automated clinical decision support is poised for a new growth. Previously, the decision support systems have been predominantly rule-based and built on fixed pre-determined associations from clinical knowledge. The IBM AALIM system pioneered a new direction in evidence-based medicine using the concept of patient-data driven learning by exploiting the consensus opinions of other physicians who have looked at similar patients. With the advent of deep learning methods, learning-based decision support can be combined with clinical knowledge-driven techniques to define the next generation of clinical decision support systems.
In this talk, I will discuss the role of deep learning techniques in decision support giving examples in radiology and cardiology imaging. I will also describe the IBM Medical Sieve Radiology Grand Challenge, a worldwide collaborative research effort across IBM research labs that is expanding patient data and knowledge-driven learning to define new clinical decision support systems for radiologists that will one day serve as their cognitive assistants. 



Prof. Hong Shen (沈鸿教授)

国家特聘专家; 中组部"千人计划"入选者; 中国科学院"百人计划"入选者 
Sun Yat-sen University, China

Hong Shen is a specially-appointed and endowed Professor in Sun Yat-sen University, China. He is also a tenured Professor of Computer Science in the University of Adelaide, Australia.  He received the B.Eng. degree from Beijing University of Science and Technology, M.Eng. degree from University of Science and Technology of China, Ph.Lic. and Ph.D. degrees from Abo Akademi University, Finland, all in Computer Science. He was Professor and Chair of the Computer Networks Laboratory in Japan Advanced Institute of Science and Technology (JAIST) during 2001-2006, and Professor of Compute Science at Griffith University, Australia, where he taught 9 years since 1992. With main research interests in parallel and distributed computing, algorithms, data mining, privacy-preserving computing and high performance networks, he has led numerous research centres and projects in different countries. He has published more than 400 papers including over 100 papers in international journals such as a variety of IEEE and ACM transactions. Prof. Shen received many honours/awards including National Endowed Expert of China and “100 Talents” of Chinese Academy of Sciences. He has served on the editorial boards of several major international journals and chaired numerous conferences..

Speech Title: Privacy-Preserving Big Data Computing in Clouds

Abstract: With the rapid increase in popularity and variety of big data analytics in cloud environments, privacy disclosure has become a major concern that obstacles the widespread of big data analytics in clouds. Privacy-preserving computing (PPC) has shown to be effective in eliminating this concern. It achieves secure distributed computing on big data in clouds by trading off the utility (accuracy) of the data to be shared (published), either original or final (results). In this talk, I will first address the research challenges of privacy-preserving computing in the cloud computing environment and big data analytics, and then give an overview of the existing work based on data protection techniques and output security levels. I will then introduce our recent work on privacy-preserving statistical and set operations on annoymised and randomized data, structural information protection in hypergraph data publishing, privacy-preserving recommendation, and privacy-preserving clustering. Finally I will conclude the talk by presenting some interesting open problems in privacy-preserving combinatorial optimization currently my team is working on.

NONG YE, Professor
School of Computing, Informatics, and Decision Systems Engineering
Arizona State University, USA

Dr. Ye is a full professor at Arizona State University. Her past and current research has received over $8M external funding support and has produced eighty-five journal papers and five books, including Data Mining: Theories, Algorithms, and Examples. Her recent research focuses on developing data mining algorithms to discover multivariate data associations for capturing both partial-value and full-value variable associations as well as both individual and interactive effects of multiple variables. New algorithms have been applied to cyber attack detection, engineering retention and education, and energy systems modeling.

Speech Title: Learning Partial-Value Variable Associations

Abstract: Existing data analytic techniques are mostly based on building the same one model of variable relations over the full ranges of all variable values, although relations of variables may exist only for certain values of variables or different relations exist for different values of variables. This speech introduces and describes the Partial-Value Association Discovery (PVAD) algorithm which discovers variable relations/associations that exist in partial ranges of variable values from large amounts of data in a computationally efficient way. The PVAD algorithm allows building a structural model of partial- and full-value variable associations in multiple layers that captures individual and interactive effects of multiple variables by learning from data. The applications of the PVAD algorithm to cyber attack detection and engineering retention are also presented


 Jun Wang, Professor, University of Central Florida, USA

Dr. Jun Wang is a Full Professor of Computer Engineering; and Director of the Computer Architecture and Storage Systems (CASS) Laboratory at the University of Central Florida, Orlando, FL, USA. He received his Ph.D. in Computer Science and Engineering from University of Cincinnati in 2002. He is the recipient of National Science Foundation Early Career Award 2009 and Department of Energy Early Career Principal Investigator Award 2005. He has authored over 120 publications in premier journals such as IEEE Transactions on Computers, IEEE Transactions on Parallel and Distributed Systems, and leading HPC and systems conferences such as VLDB, HPDC, EuroSys, IPDPS, ICS, Middleware, FAST. He has conducted extensive research in the areas of Computer Systems and High Performance Computing. His specific research interests include massive storage and file System in local, distributed and parallel systems environment. He has graduated 13Ph.D. students who upon their graduations were employed by major US IT corporations (e.g., Google, Microsoft, etc). He has served as numerous US NSF grant panelists and US DOE grant panelists and TPC members for many premier conferences. He has been serving on the editorial board for the IEEE transactions on parallel and distributed systems, and IEEE transactions on cloud computing. He is a general executive chair for IEEE DASC/DataCom/PIcom/CyberSciTech 2017, and has co-chaired technical programs in numerous computer systems conferences including the 20th IEEE International Conference on High Performance Computing and Communications (HPCC-2018), the 10th IEEE International Conference on Networking, Architecture, and Storage (NAS 2015), and 1st International Workshop on Storage and I/O Virtualization, Performance, Energy, Evaluation and Dependability (SPEED 2008) held together with HPCA. 

Speech title: Approximation and Sampling in Big Data and Big Learning 

Abstract: In this talk, we cover state of the art sampling techniques in big data and big learning era. One example is to enable both efficient and accurate approximations on arbitrary sub-datasets of a large dataset. Due to the prohibitive storage overhead of caching offline samples for each sub-dataset, existing offline sample based systems provide high accuracy results for only a limited number of sub-datasets, such as the popular ones. On the other hand, current online sample based approximation systems, which generate samples at runtime, do not take into account the uneven storage distribution of a sub-dataset. They work well for uniform distribution of a sub-dataset while suffer low sampling efficiency and poor estimation accuracy on unevenly distributed sub-datasets. 
To address the problem, we develop a distribution aware method called Sapprox. Our idea is to collect the occurrences of a sub-dataset at each logical partition of a dataset (storage distribution) in the distributed system, and make good use of such information to facilitate online sampling. There are three thrusts in Sapprox. First, we develop a probabilistic map to reduce the exponential number of recorded sub-datasets to a linear one. Second, we apply the cluster sampling with unequal probability theory to implement a distribution-aware sampling method for efficient online sub-dataset sampling. Third, we quantitatively derive the optimal sampling unit size in a distributed file system by associating it with approximation costs and accuracy. We have implemented Sapprox into Hadoop ecosystem as an example system and open sourced it on GitHub. Our comprehensive experimental results show that Sapprox can achieve a speedup by up to a factor of 20 over the precise execution.