A range of electronic corpora has become increasingly accessible via the WWW and CD-ROM. This development coincided with improvements in the standards governing the collecting, encoding and archiving of such data. Less attention, however, has been paid to making other types of digital data available - especially that which one might describe as 'unconventional', namely, the fragmentary texts and voices left to us as accidents of history. Advances in technology have enabled the collection and organisation of such data sets into a growing number of user-friendly electronic corpora. The latter have the potential to offer new insights into linguistic universals, for instance, since they allow, for the first time, rapid and systematic comparisons across genres as well as social, temporal and geographical space. This book provides state-of-the-art methods and guidelines for creating and digitising these resources taking full advantage of the dramatic recent improvements in computing and analytical tools.