My Subscription
To access upgrades and renewal, please fill in following information.
Email and License key can be found on Your invoice.

Buyer email:
License key:


Web semantization

Since Internet lives in its own limitless world where everybody is allowed to join and contribute, all information placed on the web is growing every second. It is full of various data, like daily newspapers, discussion forums, shop catalogs, images or videos, which are accessible by almost anyone. The automatic data extraction represents huge nowadays problem.

Web pages containing huge amount of information are designed for human readers; and it makes their automatic computer processing difficult. Moreover web pages live – their content is changing. The complexity of development of such application is enormous since the nature of data does not conform to common programming paradigms.

Our slaves involved in this problematic are continually inventing different methods of performing these tasks even more easier and faster. Several projects related to web data extraction and web semantization have been designed, developed or participated – originally on premises of university. Currently there are several projects publicly available:

LinqToWeb

LinqToWeb is a framework for web data extraction. It is designed in an innovative way that allows defining strongly typed object model transparently reflecting data on the living web. This mechanism provides access to raw web data in a completely object oriented way using modern techniques of Language Integrated Query (LINQ). Using this framework, development of web-based applications such as data semantization tools is more efficient, type-safe, and the resulting product is easily maintainable and extendable.
Thanks to LinqToWeb by using .NET and Language Integrated Query you can easily access information from various sources. The LinqToWeb project is focused on generating abstraction over web resources, generating .NET proxy classes and allowing to use LINQ for reading web in strongly typed object oriented way. The project development status is available on linqtoweb.codeplex.com. For anybody who would like to integrate LinqToWeb into his .NET application, please feel free to contact us.

AgentMat

AgentMat system is designed for efficient extraction of large amount of data from the web pages. AgentMat processing is based on an XML-based language describing the given extraction task in a declarative way. The task description consists of system components, which connected together are able to perform the desired functionality on a general web page. Thanks to this scraping system the raw contents from the irregularly updated and unstructured web pages can be kept categorized and accessed together with the semantic metadata.

Publications

  • Jakub Míšek, Filip Zavoral:
    High-Level Web Data Abstraction Using Language Integrated Query,
    Intelligent Distributed Computing IV – IDC 2010, Studies in Computational Intelligence 315, Springer Verlag, 2010
  • Miloslav Beňo, Jakub Míšek, Filip Zavoral:
    AgentMat: Framework for Data Scraping and Semantization,
    Proceedings of the Third International Conference on Research Challenges in Information Science, IEEE Computer Society Press, 2009

Share this page


RSS php-compiler.net

  • Phalanger 3.0 updates for March 2013 March 6, 2013
    After several months of development, contributions from opensource community and collaboration with big commercial users, Phalanger is getting bigger. Today we’ve released package of Phalanger, containing many new extensions and latest integration for Visual Studio. New goodies in Phalanger Mainly … Continue reading → […]
  • Announcing WP.NET May 23, 2012
    Phalanger was capable of compiling WordPress for quite a long time, but this support was always targeted at developer audience. Experienced developers could compile WordPress and run it on their servers. However, we noticed that the combination is interesting not only to developers, … Continue reading → […]