Seminar #1:
Exploring the Power of Links in Scalable Data Analysis
Jiawei Han, University of Illinois, Urbana-Champaign; Xiaoxin
Yin, Google and Philip Yu, IBM T. J. Watson Research
Center
Duration: 1.5 hours
Tuesday Apr 8th, 11.15-12.45
Location: Uxmal
|
Algorithms like PageRank and HITS
have been developed in late 1990s to explore links among Web pages to
discover authoritative pages and hubs. Links have also been popularly
used in citation analysis and social network analysis. However, there
is a lack of systematic treatment on how to fully explore the power of
links in scalable data analysis. Besides a survey of the recent
research work on link analysis and link mining, we also show that the
power of links can be explored thoroughly to improve the effectiveness
and efficiency of typical data analysis tasks, including
classification, clustering, information integration, and other
interesting data mining tasks, especially in the multi-relational
databases and/or the World-Wide Web environments. Some recent results
that explore the crucial information hidden in links will be
introduced, including (1) multi-relational data mining, (2)
user-guided multi-relational clustering, (3) scalable methods for
link-based cluster analysis, (4) information integration and object
distinction analysis, and (5) link analysis for graph mining and
information network mining. The power of links for other analysis
tasks will also be discussed in the tutorial.
|
 |
Jiawei Han, Professor, Department of
Computer Science, University of Illinois at Urbana-Champaign. He has
been working on research into data mining, data warehousing, database
systems, data mining from spatiotemporal data, multimedia data, stream
and RFID data, Web data, social network data, and biological data,
with over 350 journal and conference publications. He has chaired or
served on over 100 program committees of international conferences and
workshops, including PC co-chair for KDD, SDM, and ICDM conferences,
vice chair for ICDE and ICDM conferences, and Americas Coordinator for
a VLDB conference. He is also serving as the founding Editor-In-Chief
of ACM Transactions on Knowledge Discovery from Data. He is an ACM
Fellow and has received 2004 ACM SIGKDD Innovations Award and 2005
IEEE Computer Society Technical Achievement Award. His book "Data
Mining: Concepts and Techniques" (2nd ed., Morgan Kaufmann, 2006)
has been popularly used as a textbook worldwide.
|
 |
Xiaoxin Yin is an applied researcher
in Microsoft Research. He received Ph.D. in Computer Science from
University of Illinois at Urbana-Champaign in May 2007. His research
interests include data mining, link analysis, clustering,
classification, similarity analysis, and multi-relational data mining.
Xiaoxin Yin served as the information director for the ACM
Transactions on Knowledge Discovery from Data in 2005-2007, and a
reviewer for many journals and conferences, including IEEE
Transactions on Knowledge and Data Engineering, Data Mining and
Knowledge Discovery, SIGMOD, KDD, ICDE, and VLDB conferences.
|
 |
Philip S. Yu received B.S. Degree in
E.E. from National Taiwan University, M.S. and Ph.D. degrees in E.E.
from Stanford University, and M.B.A. degree from New York University.
He is with the IBM T.J. Watson Research Center and currently manager
of the Software Tools and Techniques group. Dr. Yu has published more
than 500 papers in refereed journals and conferences. He holds or has
applied for more than 300 US patents.
Dr. Yu is a Fellow of the ACM and of the IEEE. He is associate editors
of ACM Transactions on the Internet Technology and ACM Transactions on
Knowledge Discovery from Data. He is on the steering committee of IEEE
Conference on Data Mining. He was the Editor-in-Chief of IEEE
Transactions on Knowledge and Data Engineering (2001-2004). Dr. Yu has
received several IBM honors including 2 IBM Outstanding Innovation
Awards, an Outstanding Technical Achievement Award, 2 Research
Division Awards and the 92nd plateau of Invention Achievement Awards.
He received a Research Contributions Award from IEEE Intl. Conference
on Data Mining in 2003 and also an IEEE Region 1 Award for "promoting
and perpetuating numerous new electrical engineering concepts" in
1999. Dr. Yu is an IBM Master Inventor.
|
Seminar #2:
Mobile and Embedded Database Systems and Technology
Anil Nori, Microsoft Corp.
Duration: 3 hours
Tuesday Apr 8th, 14.00-15.30 and 16.00-17.30
Location: Uxmal
|
Recent advances in processors,
memory, storage, and connectivity have paved the way for next
generation applications that are data-driven, whose data can reside
anywhere (i.e. on the server, desktop, devices, embedded in
applications) and that support access from anywhere (i.e. local,
remote, over the network, from PDAs, in connected and disconnected
fashion). Memory sizes have gone up and prices have come down
significantly; with 64 bit addressability, it is not uncommon to
configure servers with 8 – 16GB of memory, and desktops with 2
– 4 GBs of memory. With advances in Flash memory technology,
large flash drives are available at reasonable prices. Computers with
32 GB flash drives are making way into the market. Flash drives not
only eliminate seek time and rotational latency they consume
significantly less power than the conventional disk drives, making
them ideal for portable devices. All these trends lead way to
applications that are data-centric, distributed, mobile, and embedded.
In this tutorial we will cover the following topics in detail: flash
technologies and their applications; device trends and technologies;
mobile and embedded applications and their requirements; mobile and
embedded DBMS architectures; embedded DBMSs like Berkeley DB, Sybase
iAnywhere, SQL Server Compact, TinyDB etc; future trends and
challenges.
|
 |
Anil Nori is a Distinguished
Engineer at Microsoft. He is an architect in the SQL Server
organization, focusing on data and application platform technologies.
He is part of the senior leadership overseeing the overall vision,
strategy, and architecture for the Microsoft data platforms. He has
over 25 years of experience in building complex database and
application systems. Before coming to Microsoft, Anil was the CTO of
Asera, which he co-founded with Vinod Khosla of Kleiner Perkins. Asera
pioneered Composite Applications and the composite application
platform. Prior to Asera, Anil was at Oracle as an Architect for the
Oracle database system, where he was responsible for Oracle
object-relational and extensible technology, Internet and multi-media
DBMS development, and XML technology. Before joining Oracle, Anil was
a Database Architect for DEC Rdb database products, where he was
involved in the development of centralized and distributed DBMS
products. Prior to DEC, Anil was a Computer Scientist at Computer
Corporation of America, a leader in Database Research. Anil is active
and well known in the database and applications community.
|
Seminar #3:
Stream Processing: Going Beyond Database Management Systems
Sharma Chakravarthy, University of Texas, Arlington
Duration: 3 hours
Wednesday Apr 9th, 9.00-10.30 and 11.00-12.30
Location: Uxmal
|
Currently, a large class of
data-intensive applications in which data is in the form of continuous
streams has been widely recognized. Furthermore, these applications
have to respond in a timely manner. In other words, these applications
have specific Quality of Service (QoS) requirements for query
processing. In this tutorial, we discuss main challenges, approaches,
techniques, and solutions for developing a general-purpose data stream
management system (or DSMS) and present our work in this area as well
as the work in the literature. We present work on Aurora, Stream,
Fjord/Telegraph, and MavStream (to name a few) covering the major
efforts in data stream management systems.
We will cover the following topics in detail during the tutorial:
differences between traditional query processing in a DBMS and
continuous query processing, operator and query modeling for stream
processing, scheduling strategies (for conserving memory and reducing
tuple latency), capacity estimation (to determine strategies and to
determine when and how much load to shed), and load shedding
strategies. The emphasis will be on satisfying QoS requirements as it
is extremely important for stream processing applications.
Implementation of a stream processing system will also be covered
using the implementation of MavStream at UTA. Finally, the need for
the integration of stream and complex event processing will be
outlined.
|
|
Sharma Chakravarthy is Professor of
Computer and Engineering Department at The University of Texas at
Arlington, Texas. He established the Information Technology Laboratory
at UT Arlington in Jan 2000 and currently heads it. He is the
recipient of the college level “Excellence in Research”
award in 2006, university level “Creative Outstanding
Researcher” award in 2003 and the department level senior
outstanding researcher award in 2002. His current research includes
web technologies, stream data processing, information integration,
mining and knowledge discovery – association, graph and text,
active databases, distributed and heterogeneous databases, query
optimization, and multi-media databases. He has published over 120
papers in refereed international journals and conference proceedings.
He has given tutorial on a number of database topics, such as graph
mining, stream processing, database mining, active, real-time,
distributed, object-oriented, and heterogeneous databases in North
America, Europe, and Asia. He is listed in Who's Who among South
Asian Americans and Who's Who among America's Teachers.
Prior to joining UTA, he was with the University of Florida,
Gainesville. Prior to that, he worked as a Computer Scientist at the
Computer Corporation of America (CCA) and as a Member, Technical Staff
at Xerox Advanced Information Technology, Cambridge, MA. Sharma
Chakravarthy received the B.E. degree in Electrical Engineering from
the Indian Institute of Science, Bangalore and M.Tech from IIT Bombay,
India. He worked at TIFR (Tata Institute of Fundamental Research),
Bombay, India for a few years. He received M.S. and Ph.D degrees from
the University of Maryland in College park in 1981 and 1985,
respectively.
|
Seminar #4:
The Java Persistence API (JPA): Technology, Standards, and
Implementations
Patrick Linskey, BEA Systems, Inc.
Duration: 3 hours
Wednesday Apr 9th, 14:00-15:30 and 16:00-17:30
Location: Uxmal
|
The tutorial starts with a high
level overview of the Java Persistence API (JPA), mainly focusing on
JPA version 1.0. We will do a brief survey of the landscape of JPA
implementations, and discuss upcoming features in the JPA 2.0
specification, currently in development. Thereafter, we will examine
the steps required to build a persistent domain model model and a
related service that uses JPA. Two deployment scenarios are
considered: deployment as an EJB 3 stateless session bean and as part
of a J2SE application. The last half of the tutorial examines
JPA's support for relationships, detachment, attachment, and
queries. Questions are welcomed throughout.
|
|
Patrick has been involved in
object/relational mapping for 6+ years. As the founder and CTO of
SolarMetric, Patrick drove the technical direction of the company and
oversaw the development of Kodo. Now at BEA, he leads the EJB team in
designing and implementation of the WebLogic Server EJB solution.
Patrick is one of the leaders on the EJB3 and the JDO specification
teams, and is BEA's representative on the EJB3 expert group.
Patrick is involved in several industry consortia, serving as a
luminary on JDOcentral and as the moderator on forthcoming
JavaPersistence.com. He has been the face of standards-based
persistence, having evangelized JDO and EJB Persistence in hundreds of
talks throughout the world. Patrick is co-author of Bitter EJB, and is
on the JAOO Conference Program Committee. Patrick has also worked for
TechTrader, MIT's Media Lab and Bank One in various technical
roles. Under Patrick's leadership, Kodo has become the market
leading JDO implementation with over 450 customers throughout the
world spanning all industries.
|
Seminar #5:
Data and Metadata Alignment: Concepts and Techniques
Lise Getoor, University of Maryland and Renee Miller, University
of Toronto
Duration: 3 hours
Thursday Apr 10th, 14.00-15.30 and 16.00-17.30
Location: Uxmal
|
Alignment is the act of adjusting
or aligning the parts of a device in relation to each other.
Information alignment is the process of finding, modeling, and using
the correspondences or connections that place information artifacts in
relation to each other. Alignment forms the basis of many
information integration, sharing, and management tasks ranging from
data integration and exchange to data cleaning, record linkage, and
deduplication. In many cases, there is no single optimal alignment,
the best alignment is task and context-specific.
The way in which alignment is performed can also be quite different
depending on the task and context. For example, in aligning ontologies
the primary tools used are conceptual modeling techniques together
with logical inference. In aligning data objects, statistical
inference is used. For both tasks, inference may be augmented with
techniques from natural language processing or other reasoning based
on information theory, semantics, or a variety of task-specific
principles.
This tutorial will provide an introduction to the basics of alignment
as used in common information management tasks. We will give a
taxonomy of tasks, and discuss how alignment is exploited in each. Our
classification includes data alignment, metadata alignment, and new
unified approaches which combine both. A primary goal of our tutorial
will be to identify commonalities and differences in the way alignment
has been formalized and used in different environments and
communities.
|
|
Lise Getoor is an assistant
professor in the Computer Science Department at the University of
Maryland, College Park. She received her PhD from Stanford University
in 2001. Her current work includes research on link mining,
statistical relational learning and representing uncertainty in
structured and semi-structured data. She has published numerous
articles in machine learning, data mining, database, and AI
forums. She is member of AAAI Executive council, is on the
editorial board of the Machine Learning Journal, is a JAIR associate
editor and has served on a variety of program committees including
AAAI, ICML, IJCAI, KDD, SIGMOD, UAI, VLDB, and WWW.
|
|
Renee J. Miller is a professor of
computer science and the Bell Chair of Information Systems at the
University of Toronto. She received the US Presidential Early Career
Award for Scientists and Engineers (PECASE), the highest honor
bestowed by the United States government on outstanding scientists and
engineers beginning their careers. She received an NSF CAREER Award,
the Premier's Research Excellence Award, and an IBM Faculty
Award. Her research interests are in the efficient, effective
use of large volumes of complex, heterogeneous data. This interest
spans data integration and exchange, inconsistent and uncertain data
management, and knowledge curation. She serves on the Board of
Trustees of the VLDB Endowment, was a member of and chaired the ACM
Kanellakis Awards committee, and served as PC co-chair of VLDB in
2004. She received her PhD in Computer Science from the
University of Wisconsin, Madison and bachelor's degrees in
Mathematics and Cognitive Science from MIT.
|
Seminar #6:
Performance Evaluation in Database Research: Principles and Experience
Ioana Manolescu, INRIA Futurs and Stefan Manegold, CWI
Duration: 3 hours
Thursday Apr 10th, 14.00-15.30 and 16.00-17.30
Location: Coba
|
A significant part of today
database research focuses on improving performance of a specific
system. Quantitative experiments are the best way to validate such
results. However, performing experiments is not always easy. Besides
the complexity of the system under test, designing an experiment,
chosing the right environment and parameter values, analyzing the data
which is gathered, and reporting it to a third party in an expressive
and intelligible way is hard.
In this tutorial, we present a general roadmap to the above steps,
based on classical measure taking theory, as well as our own
experience. The tutorial is primarily aimed at MS and PhD students
seeking to improve their experiment practices, but more senior
attendants may also find it interesting.
The tutorial will also devote a short time (~15 minutes) to tips and
tricks on how to organize and present code that performs experiments,
so that an outsider can repeat them.
|
|
Ioana Manolescu is a researcher in
the Gemo group of INRIA Futurs, in France. Her research work is
centered around XML data management, distributed data management
systems, and Web application modeling. She is a founder of two
SIGMOD-affiliated workshops, XIMEP on XQuery processing, and EXPDB on
experimental evaluation in database research.
|
|
Stefan Manegold is a researcher in
the database group at CWI in Amsterdam, The Netherlands. His research
work comprises database architectures and data management on modern
hardware as well as database-supported XML processing, with a
particular interest in performance and benchmarking. He is co-founder
of the DaMoN workshop series (co-located with SIGMOD since 2005) and
co-chair of ExpDB 2007. |
|