A Portable Web Publication is a collection of content items (e.g., pages, chapters, modules, articles) whose content is compatible with Web usage, and structured as a single, self-contained logical unit. This document describes the use cases that inform the requirements for a Portable Web Publication, and should be read as part of the Portable Web Publications for the Open Web Platform.
This document is a Work In Progress.
Publisher P works with multiple authors to create an anthology and uses resources from different rights’ holders from different locations on the web. Following the current practice on the Web, the publication consists of many different resources (HTML, SVG, CSS, etc.). The publisher needs the collection of all the resources as a unit to include it into its business workflow. The publication must also be deposited to the national library of as a legal deposit.
A book on wines that can be read from A-Z, or personalized to only read about red wines or wines from a specific region. Each wine may be a resources/small chunk of data.
User A has access to materials only through an old computer in her local library. While she has time to read the entire copy of War and Peace, the system is unable to display the entire resource as one huge HTML file. Parsing through one 2000-page HTML document is difficult and resource- intensive. Parsing through a package of 20 10-page HTML documents is less resource-intensive.
Connectivity as a commodity: Students reading e-textbooks in a village in Africa where there is none or not a reliable connection.
Many institutions, such as schools and government organizations (even in the wealthiest countries), do not have the resources to update equipment frequently. Therefore, it is necessary for publications to be accessible on current as well as older browsers.
Bob wants read a PWP on an airplane.
Anna is a self-publishing author. Anna creates a PWP, both packed and unpacked. Anna uses some cloud storage system such as DropBox to publish her PWP online. Her friend Bob is able to read Anna's PWP online and offline, either packed or unpacked, as Bob sees fit.
As a reader, I want to discover a publication on the web so that I can start reading it right away.
Bob finds a PWP online. His preference is to read the publication in his web browser.
Publisher ACME creates a publication that is consumable across a variety of reader platforms, whether online or offline.
Reading systems have a variety of requirements around the ability to validate the contents, determine the order, and handle any processing instructions involved in the rendering of a PWP.
Requirement: Manifest Requirement 1
Dave wants to read a publication that consists of a several chapters, and includes lots of illustrations. Dave also wants to be able to read that content offline.
The publisher's production workflow is such that each chapter is provided as a separate HTML file; each HTML file links to a number of auxiliary files like CSS or data files, and the illustrations are also provided through separate high resolution images. The HTML files may also include links to other Web resources, which do not form an essential part of the publication, though (e.g., a link to a Wikipedia page describing the subject of the publication); Dave is fine if that resource is not necessarily available while offline
Requirement: Manifest Requirement 2
Many publications - especially long form fiction and non-fiction - that users engage with for many hours. During this time, the user may shift states in many ways - starting consumption on an internet enabled PC, moving to an internet enabled portable device, going into offline-mode on that device, and then back to the PC.
During all of these experiences, the user needs to ensure they have access to critical pieces of data while secondary assets have a pre- defined fallback that will allow the user to continue (for example, a poster image of a video that serves as a placeholder for an externally streamed video when internet is available).
Lets take the use case of a user, Let's call him Nick. Nick is reading long-form narrative non-fiction. A publication filled with text, images, sounds, and multi-media files. Nick is also a multi-device user who wishes to consume the publication on multiple devices. Some of those devices have limited storage, and some of them have limited connectivity. Nick also rides the subway - where he loses internet connection, without warning - for long stretches of time.
During offline or low-storage situations, there are still critical parts of the publication that are consumable - mainly the text (and possibly images). Having a reasonable fallback for video (a poster image or placeholder image) would allow Nick to read the content while offline or in limited storage. While this should be the job of the reading system, having a method in the publication for the author of the publication to mark what items are critical, and what need a fallback for limited connectivity/storage situations would greatly help the reading system and give more control to the publisher to ensure consistent experience with consuming the publication.
Nick may know he's going to be in a no-connectivity situation and may want to obtain and locally store the entire (even non-critical) contents. This would be up to the reading system to provide a mechanism, but having a way to denote critical and fallback assets ensures that an entire package isn't downloaded when not necessary.
For the case of scripting - it's possible that certain items will be dynamic - and will hit an external resource (or server) to generate on-the-fly data. In offline mode, the user would want to be alerted that content could not be obtained, or be shown some fallback set of data. In t his case, being able to specify a "no-connectivity" or "offline-mode" alternative for scripts would allow the publication author to have more control over the user's experience and replace a potential error-display with a limited subset of a good experience.
Writer Annie has her book published on her own web space as a PWP [http/packed]. Reader Bob opens it online using the PWP reading plugin PWPRead-plugin, and selects a nice quote to bookmark via PWPMark-plugin.
Writer Annie has her book published on her own web space as a PWP [http/packed]. Reader Bob caches it using a plugin [cache/packed], and selects a nice quote to bookmark via PWPMark-plugin.
Writer Annie has her book published on her own web space as a PWP, both packed [http/packed] as unpacked [http/unpacked]. Reader Bob reads [http/unpacked] online and selects a nice quote to bookmark via PWPMark-plugin. Then, Bob downloads [http/packed] to his local filesystem [file/packed] to open in his reading system of choice, namely, PWPRead-soft.
PWPRead-soft synchronizes with Bob's PWPMark profile, and can show Bob's bookmarks when he continues reading [file/packed]
Writer Annie has her book published on her own web space as a PWP [http/unpacked]. Reader Bob reads it online, and selects a nice quote to bookmark via its browser PWP bookmarking plugin: PWPMark-plugin.
When Bob re-opens the PWP offline, the bookmarks are shown via PWPMark-plugin
When Bob re-visits the original PWP, PWPMark-plugin can also show Bob's bookmarks in the online version.
Chef Bob writes a cookbook with a lot of embedded videos to explain certain techniques. Bob finds it very important that his videos remain available even offline, and configures this in his cookbook. Reader Annie starts reading Bob's cookbook online. When Annie gets disconnected, the fonts of the cookbook fall back to the system fonts, but the videos remain available.
Typographer Charlie writes a book on typography, and configures differently: he finds fonts a very important aspect of his book, whilst the embedded videos may fall back to a still. Annie can read Charlie's book without err, online or offline. The fonts remain available, but the videos fall back to stills when offline.
Author David does not configure anything to his novel, but still, Annie can read David's book without problems whether she is online or not.
Corp.Inc. creates a PWP with dynamically updatable stock exchange information on chapter 4. Anna sends the locator for chapter 4 to Bob on April 1st. When Bob reads the PWP offline, chapter 4 is filled with some default content. However, when Bob gets online and clicks on the locator for chapter 4, he gets the updated stock exchange information, which might be different than the stock exchange information that Anna saw when she created the bookmark.
Alice is working on potentially Nobel prize winning research, and has drafted her paper describing her discoveries. She asks Bob to review the paper, but needs to make sure that the PWP retains specific protections, regardless of whether it is read online or offline.
Publisher Corp. Inc. publishes a new PWP, and sends this PWP to ACME its customers. This PWP is downloaded to devices, or synced across several devices, or made available to a customer-specific cloud. Customers can access this file from different retailers, through different applications, either directly or downloaded from private cloud. Thus, the PWP is duplicated many, many, many, many, many times, resulting in a huge number of items. There is one source manifestation, one ISBN identifier, and lots of items spread across devices and buyers.
Annie buys a book and downloads it offline. She bookmarks a certain chapter (i.e., creates a locator for that chapter). She sends that bookmark to Bob. Bob is able to use that locator on any item of the same PWP, and gets redirected to the correct chapter.
(e)books that are sold need to be delivered, so that purchasers can load them on offline devices.
Purchased content has different expectations - one being that you have “something” or a reasonable use of that content in a logical way (such as always being able to read your amazon purchases through the amazon app)
The web is not permanent - sites go down, when you purchase a book, you need an offline copy that you can continue to read when the retailer you purchased from goes kaput…
Sales Auditing -> Ability to track “what” is sold so that it can be paid. If all content is just free and different chunks are purchased, chasing rights/payments is an issue. Basically the package can have an ISBN associated so that it can be tracked for sale - even multiple versions.
EsteemedPublisher creates apps to distribute several journals to readers. These apps share script libraries, CSS files, and other resources. Distributing many journals as a package should enable systems to call the shared resources once, speeding up processing and enabling offline reading.
Writer Annie writes a dissertation. She references to her Master's thesis, published on the university website. Her colleague Bob has read her Master's thesis before. When he clicks the reference in Annie's dissertation, he gets redirected to his local copy of Annie's Master's thesis. Her friend, Charlie, hasn't read her Master's thesis before. Charlie needs to be online when clicking the reference, to read Annie's Master's thesis.
Rosa submitted an article to EsteemedJournal and provided her research data in CSV format. She and EsteemedJournal wish to provide users access to the CSVs when they gain access to her article. EsteemedJournal recommends that the package is built in such a way that a system can query the manifest to assess whether it is situationally appropriate to offer downloads. For example, the package might not offer the option to download the CSV while a user is reading offline.
EsteemedJournalPublisher would like to offer the users of the EsteemedJournal of Chemistry App the opportunity to read only the abstracts of the journals in the app. The App Package must offer the user a list (table of contents) of abstracts (disjoint objects in the package with semantic information or metadata informing the package of the nature of the object).
(Is the abstract-only view built-in? spun-off using shared resource? totally independent publication?)
Shoshana is an organic chemist. She has purchased the Esteemed Journal of Chemistry App. She downloads Organic Chem Quarterly in her lab and reads the first article over lunch. Shoshana begins the book reviews during office hours but must tend to her students' questions, so she closes the app. Shoshana opens the app on the train ride home to resume reading the book reviews. She is happy to find that the app opens to the exact location and opens quickly because most of the material does not need to be downloaded a second time.
An archival service needs to update an Archival Information Package (i.e., a previously harvested PWP) because a new version of a component of the PWP has been published
An archival service wants to harvest (spider) a PWP, and expects to find in the manifest what it will need to make sure it gets all the pieces of the PWP that need to be archived, even if on separate servers.
A government agency (e.g., laws, regulations, judicial decisions) publishes information that need to persist without any loss of information forever.
Journal article (e.g. announcing novel compound in chemistry) must be published in method that is persistent because it serves at the document of record for scientific record.
An archival service needs to harvest the retraction notice and update the Archival Information Package for the original PWP to include / link to the Retraction Notice.
As an archiving service provider, I would like to be able to harvest the subset of a PWP's components that have been added or changed since the last archived version so that I can ensure archival completeness and minimize unnecessary storage costs, post-harvest processing, latency in synchronizing published and archived PWP versions, and load on the PWP host's servers.
An archival service needs to update an Archival Information Package (i.e., a previously harvested PWP) because it or one of its components has been taken down by the publisher.
Alice, a dyslexic student, downloads a textbook and proceeds to personalize the material with larger font and different contrast.
While reading a book on computer programming, Bob wants to change the font into a local font. However, the code should remain in a fixed-width font.
As a publisher of accessible content, I need to add content such as a braille style sheet, image descriptions, or video captioning (text / descriptive audio) to a PWP previously published by a third party.
As a user of assistive technology, Alice wants to perceive the full PWP.
Bobbie is learning to read and viewing a picture book. The picture book is fixed layout that will turn the page and reads along in sync with the page currently open.
Alice wants to download a PWP that captures only the external resources she needs to perceive the PWP.
Any kind of pagination and also indexing has to be through a whole collection of documents that constitute a PWP, which may raise issues around transition between documents.
As a reader, I want to choose between a scrolled view and a paginated view of content that extends across multiple html documents
As a publisher I want to navigate from chapter 1 to chapter 2 (separate HTML files) seamlessly. The presentation should not be bound to HTML organization. This happens, for example, in Japanese publication where there are no chapter separations but content is still split between multiple files for organizational reasons.
As a publisher I want my footnotes to number sequentially across the publication, even when the publication is constructed of multiple documents.
Content may have significantly different styles between files. For instance, some Japanese books will have documents whose root element is vertical-rl and others whose root element is horizontal-tb. These root element styles must be preserved.
Placeholder: Comics-like transition
Placeholder: Page Transition effects both within an HTML document, and between HTML documents
Corp.Inc. creates an internal manual for its employees as a PWP. This PWP is not published online, but is sent around to all its employees. Employee Anna has some questions about figure 2b, and sends an email to co-worker Bob with a locator to that figure. Bob clicks on that locator, and his company-branded PWP reader opens Bob's personal copy of the manual, and redirects immediatly to that figure.
Oksana submits a scholarly article to a EsteemedJournal. EsteemedJournal puts the article through the Peer Review process, during which EsteemedJounral editors and third-party reviewers provide comments on the article. Ultimately, EsteemedJournal chooses not to publish Oksana's article, but the comments (annotations) that have already been made should persist as she submits it to RespectedJournal.
Placeholder: Formal usage terms and engineering or legal documents, possibly for accessibility also.
Specialized semantics are required for users and processors.
Be able handle intermittent changes in online/offline status during a single reading session while still minimizing the amount of material cached locally.
It SHOULD be possible to describe explicitly which resource does and which does not belong to the PWP or, to the “portable” part of a PWP.
There MUST be a separation between a format-independent (“canonical”) and format-dependent locator.
It MUST be possible (and necessary) to use, for all cross-references, the canonical locator.
There MUST be a separation between the identifier (e.g., ISBN) and the (canonical) locator of a specific instance of a PWP.
There SHOULD be a possibility in the PWP to follow (if necessary) the copying (provenance) chain
The Identifier (e.g., ISBN) MAY serve as a canonical locator for a specific instance of a PWP
It SHOULD be possible to use, in all circumstances, a relative locator to manipulate, annotate, etc, content in a PWP
A PWP Processor MUST be able to combine a relative locator with the canonical as well as state dependent locators of a PWP
Locating a resource within a PWP should not depend on the PWP's state.
A state independent locator should be part of the PWP.
There needs to be persistence of identifiers across PWP instances.
Any set up and mechanism, handling canonical and state-dependent locators, MUST be easily settable on any server (albeit maybe not in the most efficient manner) based on basic server behavior control.
There should be the capability for dynamic updating of information based on online/offline status and time of publication.
A PWP must allow for access control and write protections as part of the resource.
Package may contain/point to textual/graphical/media as well as data (CSV, code repository)
Package should include a queryable manifest.
Content in package is fully annotatable
Annotations are considered part of the package
There needs to be metadata on package components
Manifest with metadata and semantic information
Customizable manifest functionality
Manifest indicates whether content is new (relative to last download) and triggers download only when content is new/updated.
There should be a discovery service or trigger available to indicate content changes or long-term availability changes.
The complex collection of documents, media, and other resources that comprise a publication must remain intact and complete across all transitions (online/offline), rendering in various formats, and distribution over time.
Data that affects content may be stored apart from the content.
The choice should be driven by content or workflow, not mandated by specification.
Long-form content should not be stored in one giant file. This affects performance, storage, workflow.
The components of a publication should be aggregated or disaggregated without loss of information.
Every resource can hold it’s own rights. The rights of this resource should be kept in the process of distributing the resource.
A collection of documents must be treated as a single unit for: Searching, values of counters (page counters, section numbering, footnotes), and user stylesheets (users must be able to adjust display, e.g., font selection, font-size adjustment, background color)
PWP needs to support rendering for visual output, tactile output, and audio output.
Package needs to support Time based media and Text such as audio synchronized with text, Video synchronized with Text, sign language sync with text.
Referenced in: Manifest Requirement 1
Referenced in: Manifest Requirement 2