Truth Finding on the Deep Web
The Web has been changing our lives enormously and people rely more and more
on the Web to fulfill their information needs. Compared with traditional
media, information on the Web can be published fast, but with fewer
guarantees on quality and credibility. Indeed, Web sources are of different
qualities, sometimes providing conflicting, out-of-date and incomplete data.
The sources can also easily copy, reformat and modify data from other
sources, propagating erroneous data.
In this talk we present a recent study for truthfulness of Deep Web data in
two domains where we believed data quality is important to people's lives:
Stock and Flight. We then describe how we can resolve conflicts from
different sources by leveraging accuracy of the sources and the copying
relationships between the sources using statistical models. We demo our
SOLOMON system, which can effectively detect copying between data sources,
leverage the results in truth discovery, and provide a user-friendly
interface to facilitate users in understanding the results.
Date and Time
Location
Hosts
Registration
-
Add Event to Calendar
- Stevens Institute of Technology
- 513 River Street
- Hoboken, New Jersey
- United States 07030
- Building: Babbio Center
- Room Number: 319
- Click here for Map
- Contact Event Host
- Prof. Hong Man Dept. Electrical and Computer Engineering Stevens Institute of Technology
Speakers
Dr. Xin Luna Dong of AT&T Labs-Research
Truth Finding on the Deep Web
The Web has been changing our lives enormously and people rely more and more on the Web to fulfill their information needs. Compared with traditional media, information on the Web can be published fast, but with fewer guarantees on quality and credibility. Indeed, Web sources are of different qualities, sometimes providing conflicting, out-of-date and incomplete data. The sources can also easily copy, reformat and modify data from other sources, propagating erroneous data. In this talk we present a recent study for truthfulness of Deep Web data in two domains where we believed data quality is important to people's lives: Stock and Flight. We then describe how we can resolve conflicts from different sources by leveraging accuracy of the sources and the copying relationships between the sources using statistical models. We demo our SOLOMON system, which can effectively detect copying between data sources, leverage the results in truth discovery, and provide a user-friendly interface to facilitate users in understanding the results.
Biography:
Dr. Xin Luna Dong is a researcher at AT&T Labs-Research. She received a
Ph.D. in Computer Science and Engineering from University of Washington in
2007, received a Master's Degree in Computer Science from Peking University
in China in 2001, and received a Bachelor's Degree in Computer Science from
Nankai University in China in 1998. Her research interests include
databases, information retrieval and machine learning, with an emphasis on
data integration, data cleaning, personal information management, and web
search. She has led the Solomon project, whose goal is to detect copying
between structured sources and to leverage the results in various aspects of
data integration, and the Semex personal information management system,
which got the Best Demo award (one of top-3) in Sigmod'05. She has
co-chaired Sigmod/PODS PhD Symposium'12, Sigmod New Researcher Symposium'12,
QDB'12, WebDB'10, has served as a track chair for the program committee of
ICDE'13, CIKM'11, and has served in the program committee of VLDB'13,
Sigmod'12, VDLB'12, Sigmod'11, VLDB'11, PVLDB'10, WWW'10, ICDE'10, VLDB'09,
etc.
Address:New Jersey, United States