To create a bookworm using OneClick, a zip file containing the following 3 components is required:
Sometimes it's easiest if we just start by looking at some completed examples. In the table below you'll find a growing collection of demo zip files we've put together. These examples demonstrate how to incorporate different types of data into a Bookworm.
The Difficulty rating assigned to each example is primarily for relative comparisons. In general, the examples rated closer to Hard generally just make use of many different types of metadata.
Although these example files are here for you to learn how to structure your own zipfile, you should also feel free to create a Bookworm with them. Just use one of the URLs along with a name you come up with and Create a Bookworm. You may find this useful in seeing how quick and painless the whole process is!
|US Congress Bills||congress.zip||Medium||Daily data using Monthly and Yearly bins.||Text files containing the summary of bills, resolutions, and amendments in the US Senate and House of Representatives from late 2006 to early 2013. The metadata here is marginally more complex than in the history dissertations and the text files are a lot longer (relatively speaking) as well.|
|History PhD Dissertation Titles||historydiss.zip||Easy-Medium||Annual data using Yearly bins.||Text files containing the title of History Ph.D. dissertations dating back to the early 1800s. The .txt files themselves are still small here, but the metadata is a bit more complex than the Baby Names data here.|
|Baby Names||babynames.zip||Easy||Annual data using Yearly bins.||Contains first names given to a sample of children born in 1920 to 2008.|
These files should help get your feet wet with what to expect while creating your zip files. For a more fine-grained look, the next section provides a detailed description for each of the 3 required components.
The field descriptions file describes the properties of each available metadata field. It is a json object consisting of an array of hashmaps, each corresponding to one metadata field which you will be supplying for at least some of the texts in your collection. Each hashmap consists of the following parameters:
|field||string||The name of the metadata variable.|
|datatype||string||The type of the data.
|type||string||The format of the data.
|unique||boolean||Whether any given text can have only one type of this field (e.g. title) or not (e.g. subject).|
If the datatype is time, there is an additional parameter "derived" which maps to an array of hashmaps, each corresponding to a time variable (x-axis) which you would like to make available to the API/front-end (e.g. month or year). Each hashmap consists of:
|resolution||string||The time resolution to bin by (e.g.
The metadata catalog file is a list of the metadata for each text, one json hashmap per line, each corresponding to one text in your collection. Each hashmap should consist of mappings from fields (as defined in the field_descriptions.json) to values for as many fields as are available.
There are 3 required fields that must be in each json hashmap:
|Key||Description of Value|
|filename||The filename of the corresponding text file (with .txt omitted and no whitespace in the name).|
|date||The date corresponding to a text file. Dates which are not integers should be specified as a string in the format: YYYY-MM-DD.|
|searchstring||The HTML code displayed for a text when points are clicked on in the ngram graph.|
The raw texts are the text files in your collection (in .txt format). Each text file should contain only the raw text for each document. For example, here are 2 text files (bills in the US Congress) corresponding to the example jsoncatalog.txt and field_descriptions.json files used above:
Place all of your text files the /texts/raw/ directory of the zip file.
The contents of these text files should be
encoded as Unicode (UTF-8).
Our system does pretty decent job of encoding ugly characters,
but after too many of them it starts to get upset and may cause your Bookworm to fail when building. Also, avoid having any whitespace in the filename.