Reuters releases free research news archive

Reuters releases free research news archive

Reuters is to release its archive news stories free of charge to research communities around the globe. The first Reuters Corpus archive includes over 800,000 English language news stories on emerging technologies.

The Reuters Corpus features research into language processing, speech synthesis, voice recognition, indexation, search and information retrieval.

Richard Willis, head of research and standards, Reuters, says: "To strengthen our links with the research community around the world, we have made available one of the most complete news archives ever released. The data provided will aid research into many aspects of language processing and information retrieval."

The archive includes all English language stories produced by Reuters globally between 20 August 1996 and 19 August 1997. The news data is available on two CD-ROMs and formatted in XML to make it easier to use as a research tool. All the news stories are fully referenced using a total of 775 different category codes for topic, geography and industry sector, says Willis.

Dr Marc Moens, head of Edinburgh University’s language technology group says the Corpus allows for the systematic evaluation of progress and comparison of results between different development groups.

Users of the archive must agree to supply Reuters with a copy of any material published using the data. Working with this feedback, Reuters hopes to bring out other Corpora including multi-lingual versions and volumes covering other date ranges.

Comments: (0)
