Workshop on Information Integration Methods, Architectures, and Systems (IIMAS)

Friday & Saturday April 11-12, 2008
In conjunction with ICDE 2008


 

Saturday Keynote

DATA INTEGRATION CHALLENGES IN COMMUNITY SYSTEMS

AnHai Doan (University of Wisconsin)

Abstract. Online communities often contain much data that are scattered across many disparate sources. Community Information Management Systems (CIMS) discover, extract, and integrate such data, then provide services over the processed data to community users. These users can clean the processed data and contribute even more data to the systems, using a variety of Web 2.0-style techniques. As such, CIMS can prove valuable in many domains, ranging from e-science, government agencies, business data management, to end-user communities on the World-Wide Web. Building CIMS however raises many difficult data integration challenges. In this talk I will describe these challenges, then sketch our preliminary solutions, as developed in Cimple, a joint project between Wisconsin and Yahoo Research. In particular I will focus on developing best-effort, pay-as-you-go integration architecture, on integrating text data, and on integrating the data discovered by the system with the data contributed by a multitude of users. Finally, I will reflect on the broader integration challenges facing our research community, as we move inexorably into a community-centric world drowned in text data of unprecedented proportion.