| |
Saturday Keynote
DATA INTEGRATION CHALLENGES IN COMMUNITY SYSTEMS
AnHai Doan (University of Wisconsin)
Abstract. Online communities often contain much data that
are scattered across many disparate sources. Community Information
Management Systems (CIMS) discover, extract, and integrate such
data, then provide services over the processed data to community
users. These users can clean the processed data and contribute even
more data to the systems, using a variety of Web 2.0-style
techniques. As such, CIMS can prove valuable in many domains,
ranging from e-science, government agencies, business data
management, to end-user communities on the World-Wide Web. Building
CIMS however raises many difficult data integration challenges. In
this talk I will describe these challenges, then sketch our
preliminary solutions, as developed in Cimple, a joint project
between Wisconsin and Yahoo Research. In particular I will focus on
developing best-effort, pay-as-you-go integration architecture, on
integrating text data, and on integrating the data discovered by the
system with the data contributed by a multitude of users. Finally, I
will reflect on the broader integration challenges facing our
research community, as we move inexorably into a community-centric
world drowned in text data of unprecedented proportion.
|