International Journal of Emerging Trends & Technology in Computer Science
A Motivation for Recent Innovation & Research
ISSN 2278-6856
www.ijettcs.org

Call for Paper, Published Articles, Indexing Infromation Vector space model for deep web data retrieval and extraction , Authors : Dr. Poonam yadav, International Journal of Emerging Trends & Technology in Computer Science (IJETTCS), www.ijettcs.org
Volume & Issue no: Volume 3, Issue 5, September - October 2014

Title:
Vector space model for deep web data retrieval and extraction
Author Name:
Dr. Poonam yadav
Abstract:
Abstract Deep web data extraction is challenging problem recently since the structured data from deep web pages underlie intricate structure. So, extraction of web data from deep web pages received much attention among the researchers. In this research, vector space model and content features are utilized for deep web data extraction. Initially, extracted deep web pages are taken as input for the proposed method and Document Object Model (DOM tree) is constructed. Through the DOM tree, information given in the whole web pages is split into block wise and block with its contents are given for feature computation process. Here, frequency level, title level and numerical level features are calculated after constructing vector space model which is a vector of words and its frequency. From the feature score value of every block, the important blocks are chosen as final useful data for the taken web page. The proposed approach of deep web data extraction is implemented using deep web pages which are collected from the complete planet web site and performance of the system is evaluated using precision and recall. Keywords:- Deep web data extraction, deep search engine, web data extraction, DOM tree, precision, recall
Cite this article:
Dr. Poonam yadav , " Vector space model for deep web data retrieval and extraction " , International Journal of Emerging Trends & Technology in Computer Science (IJETTCS), Volume 3, Issue 5, September - October 2014 , pp. 274-276 , ISSN 2278-6856.
Full Text [PDF]                          Home