Keynote Speakers

Keynote Speakers

We are honored to announce four keynote speakers at CIKM 2013, Dr. Alon Y. Halevy, Dr. C. Lee Giles, Dr. Ronald Fagin andDr. Carlos Guestrin. They will bring us exciting insights into several research areas in information and knowledge management.

Alon Y. Halevy

Keynote Talk: Structured Data in Web Search

Abstract – For the first time since the emergence of the Web, structured data is playing a key role in search engines and is therefore being collected via a concerted effort. Much of this data is being extracted from the Web, which contains vast quantities of structured data on a variety of domains, such as hobbies, products and reference data. Moreover, the Web provides a platform that encourages publishing more data sets from governments and other public organizations. The Web also supports new data management opportunities, such as effective crisis response, data journalism and crowd-sourcing data sets.
I will describe some of the efforts we are conducting at Google to collect structured data, filter the high-quality content, and serve it to our users. These efforts include providing Google Fusion Tables, a service for easily ingesting, visualizing and integrating data, mining the Web for high-quality HTML tables, and contributing these data assets to Google’s other services.

Biography – Dr. Alon Halevy [Personal page at Google][Personal page at University of Washington] heads the Structured Data Group at Google Research. Prior to that, he was a Professor of Computer Science at the University of Washington, where he founded the Database Research Group. From 1993 to 1997 he was a Principal Member of Technical Staff at AT&T; Bell Laboratories (later AT&T; Laboratories). He received his Ph.D in Computer Science from Stanford University in 1993, and his Bachelors degree in Computer Science and Mathematics from the Hebrew University in Jerusalem in 1988.

Dr. Halevy’s research interests are in data integration, structured-data on the Web, semantic heterogeneity, personal information management, management of XML data, web-site management, peer-data management systems, query optimization, database theory, knowledge representation, and more generally, the intersection between Database and AI technologies. His research developed several systems, such as the Information Manifold data integration system, the Strudel web-site management system, the LSD schema matching system, the Tukwila XML data integration system. He was also a co-developer of XML-QL, which later contributed to the development of XQuery standard for querying XML data. His group at Google developed Google Fusion Tables, a collaborative tool for data management and visualization in the cloud.

In 1999, Dr. Halevy co-founded Nimble Technology, one of the first companies in the Enterprise Information Integration space (acquired by Actuate in 2003). In 2004, Dr. Halevy founded Transformic Inc., a company that created search engines for the deep web, content residing in databases behind web forms (acquired by Google in 2005).

Dr. Halevy is a Fellow of the Association of Computing Machinery (ACM). He was a Sloan Fellow (1999-2000), and received the Presidential Early Career Award for Scientists and Engineers (PECASE) in 2000. His original paper on the Information Manifold System received the VLDB 2006 Test of Time Award. He currently serves on the Board of Trustees of the VLDB Endowment. He served on the editorial board of the VLDB Journal, as an associate editor and advisor for the Journal of Artificial Intelligence Research and on the editorial board of ACM Transactions on Internet Technology. He served as the program chair for the ACM SIGMOD 2003 Conference. He has given multiple keynotes at top conferences, and Distinguished Lectures at several Computer Science Departments.

C. Lee Giles

Keynote Talk: Scholarly Big Data: Information Extraction and Data Mining

Abstract – Collections of scholarly documents are usually not thought of as big data. However, large collections of scholarly documents often have many millions of publications, authors, citations, equations, figures, etc., and large scale related data and structures such as social networks, slides, data sets, etc. We discuss scholarly big data challenges, insights, methodologies and applications. We illustrate scholarly big data issues with examples of specialized search engines and recommendation systems that use information extraction and data mining in various areas such as computer science, chemistry, archaeology, acknowledgements, reference recommendation, collaboration recommendation, and others.

Biography – Dr. C. Lee Giles [Personal page at PSU] is the David Reese Professor at the College of Information Sciences and Technology at the Pennsylvania State University, University Park, PA. He is also graduate college Professor of Computer Science and Engineering, courtesy Professor of Supply Chain and Information Systems, and Director of the Intelligent Systems Research Laboratory. He directs the Next Generation CiteSeer, CiteSeerx project and codirects the ChemxSeer project at Penn State. He has been associated with Columbia University, the University of Maryland, University of Pennsylvania, Princeton University, and the University of Trento.

He and his collaborators, including current and former graduate students, have published over 300 journal and conference papers, book chapters, edited books and proceedings. His work has over 18,000 citations and his h-index is 62, according to Google Scholar, and is one of the top 100 h-indexes in Computer Science. His 2006 coauthored paper in Science proposes a cyberinfrastructure for the historial sciences. His coauthored paper in 2004 in the Proceedings of the National Academy of Sciences created an automatic acknowledgement indexing methodology and showed that various funding agencies and individuals in computer and information science are much more acknowledged than others. In 2002, he coauthored the paper “Winners Don’t Take All” published in the Proceedings of the National Academy of Sciences on how the topic based web does not follow a power law distribution. In 1998, he coauthored a paper published in Science on the size and search engine coverage of the Web that was well cited in the popular press and in 1999 a well received follow-up paper in Nature.

He has been involved in the creation and development of various novel search engines and digital libraries. He was one of the creators of the novel metasearch engines, Inquirus and Inquirus2. He was also one of the creators of the popular computer and information science search engine, CiteSeer, an autonomous citation indexing search engine and digital library, now hosted at the College of Information Sciences and Technology at Penn State University. Recently, it has been replaced by the Next Generation CiteSeer, CiteSeerx. He also created a niche search engine eBizSearch, a search engine for e-business documents, and, SMEALSearch, a search engine and digital library for academic business documents. He is very interested in cyberinfrastructure for science and the academy and is currently a codeveloper in the research and development of a portal and search tool for environmental chemistry, ChemxSeer. He prototyped a novel search engine for archaeology, ArchSeer, and also developed a new search engine for robots.txt, BotSeer, that indexed over 2 million robots.txt files. Currently, he is working on collaboration networks, CollabSeer, and citation recommendation, RefSeer.

He is a Fellow of the ACM, a Fellow of the IEEE and a Fellow of the International Neural Network Society, and a member of AAAI and AAAS. He has twice received the IBM Distinguished Faculty Award. He is also a member of Sigma Xi, Tau Beta Pi, and Eta Kappa Nu. His previous positions include a Senior Research Scientist at NEC Research Institute (now NEC Labs), Princeton, NJ; a Program Manager at the Air Force Office of Scientific Research in Washington, D.C.; a research scientist at the Naval Research Laboratory, Washington, D.C.; and an Assistant Professor of Electrical and Computer Engineering at Clarkson University, Potsdam, N.Y. During part of his graduate education he was a research engineer at Ford Motor Company’s Scientific Research Laboratory. His graduate degrees are from the University of Michigan and the University of Arizona and his undergraduate degrees are from Rhodes College and the University of Tennessee. His academic genealogy includes two Nobel laureates and prominent mathematicians.

Ronald Fagin

Keynote Talk: Applying Theory to Practice

Abstract – We discuss the art of applying theory to practice. In particular, we discuss in detail our interactions with two research projects at IBM Almaden: the Garlic project, which built a multimedia database system on top of various existing systems, and the Clio project, which developed tools for converting data from one format to another. We discuss the problems we resolved, and the impact this had both on the Garlic or Clio systems and on the broader scientific community. We draw morals from these interactions, including why theoreticians do better theory by working with system builders, and why system builders build better systems by working with theoreticians. We present the remarkably simple Threshold Algorithm [16], which is optimal in an extremely strong sense: optimal not just in the worst case, or in the average case, but in every case! The Threshold Algorithm and its variants have applications to numerous areas, including information retrieval, fuzzy and uncertain databases, group recommendation systems, and the semantic web.

Biography – Ronald Fagin [Personal Page at IBM] is an IBM Fellow at IBM Research – Almaden. The title of Fellow is IBM’s highest technical honor: there are about 70 Fellows among 450,000 IBM employees, and there have been 246 Fellows in the 50 year history of the program.

Fagin received his B.A. in mathematics from Dartmouth College, and his Ph.D. in mathematics from the University of California at Berkeley. In his Ph.D. thesis, he essentially initiated the field of finite model theory. In his thesis he proved what is now known widely as Fagin’s Theorem, which says that the important complexity class NP coincides with the class of properties expressible in existential secondorder logic. This is a syntactic characterization of NP, which, remarkably, does not involve explicitly any notion of machine, computation, or time. In his thesis, he also proved the zero-one law of first-order logic, which says that every property expressible in first-order logic is either almost surely true (in an asymptotic probabilistic sense) or almost surely false.

Most of his research has focused on database theory. He first became well-known in databases through his introduction of the strongest normal forms for relational databases. Normal forms describe ways to structure the relations to avoid various anomalies (such as the deletion of one piece of information accidentally leading to the loss of other information). Normal forms are critical for good database design. In recent years, his main focus in database theory has been on the problem of data exchange, where data is translated from one format to another.

He has also been a key leader in the area of reasoning about knowledge, a topic of importance in distributed computing systems, artificial intelligence, cryptography, and game theory. These subjects all deal with systems in which multiple agents are reasoning about the world and about each other’s knowledge. Fagin is a co-author of the definitive text on reasoning about knowledge.

Fagin was named a Fellow of IEEE for “contributions to finite-model theory and to relational database theory”. He was named a Fellow of ACM for “creating the field of finite model theory and for fundamental research in relational database theory and in reasoning about knowledge”. He was named a Fellow of AAAS (American Association for the Advancement of Science), for “fundamental contributions to computational complexity theory, database theory, and the theory of multi-agent systems”. He was named Docteur Honoris Causa by the University of Paris, and a “Highly Cited Researcher” by ISI (the Institute for Scientific Information). He won Best Paper awards at the 1985 International Joint Conference on Artificial Intelligence, the 2001 ACM Symposium on Principles of Database Systems, and the 2010 International Conference on Database Theory. He won Test-of-Time Awards at the 2011 ACM Symposium on Principles of Database Systems and the 2013 International Conference on Database Theory. He won a 2011 IEEE Technical Achievement Award “for pioneering contributions to the theory of rank and score aggregation”, and the 2012 IEEE W. Wallace McDowell Award “for fundamental and lasting contributions to the theory of databases”. He was the winner of the 2004 ACM SIGMOD Edgar F. Codd Innovations Award, a lifetime achievement award in databases, for “fundamental contributions to database theory”.

Carlos Guestrin

Keynote Talk: Usability in Machine Learning at Scale with GraphLab

Abstract – Today, machine learning (ML) methods play a central role in industry and science. The growth of the Web and improvements in sensor data collection technology have been rapidly increasing the magnitude and complexity of the ML tasks we must solve. This growth is driving the need for scalable, parallel ML algorithms that can handle Big Data. In this talk, we will focus on:
1. Examining common algorithmic patterns in distributed ML methods.
2. Qualifying the challenges of implementing these algorithms in real distributed systems.
3. Describing computational frameworks for implementing these algorithms at scale.
4. Addressing a significant core challenge to large-scale ML — enabling the widespread adoption of machine learning beyond experts.
In the latter part, we will focus mainly on the GraphLab framework, which naturally expresses asynchronous, dynamic graph.

Biography – Dr. Carlos Guestrin [Personal page at University of Washington] is the Amazon Professor of Machine Learning at the Computer Science & Engineering Department of the University of Washington. He is also a co-founder and CEO of GraphLab Inc., focusing large-scale machine learning and graph analytics. His previous positions include the Finmeccanica Associate Professor at Carnegie Mellon University and senior researcher at the Intel Research Lab in Berkeley. Carlos received his PhD and Master from Stanford University, and a Mechatronics Engineer degree from the University of Sao Paulo, Brazil.

Carlos’ current research interests focus on two core aspects of “Big Data”: 1) The computing perspective, where he works on developing algorithms,abstractions and systems for tackling large-scale machine learning and graph analytics, and 2) the human perspective, where he focuses on taming information overload, by developing algorithms and representations to help humans understand huge and complex sources of information.

Carlos’ work has been recognized by awards at a number of conferences and two journals: KDD 2007 and 2010, IPSN 2005 and 2006, VLDB 2004, NIPS 2003 and 2007, UAI 2005, ICML 2005, AISTATS 2010, JAIR in 2007 & 2012, and JWRPM in 2009. He is also a recipient of the ONR Young Investigator Award, NSF Career Award, Alfred P. Sloan Fellowship, IBM Faculty Fellowship, the Siebel Scholarship and the Stanford Centennial Teaching Assistant Award. Carlos was named one of the 2008 ‘Brilliant 10’ by Popular Science Magazine, received the IJCAI Computers and Thought Award and the Presidential Early Career Award for Scientists and Engineers (PECASE). He is a former member of the Information Sciences and Technology (ISAT) advisory group for DARPA.