Wikimedia Product/Analytics Infrastructure/Standard

The instrumentation platform (Actual name TBD) is a set of interfaces that standardize the way we design and build software instrumentation. By using the instrumentation platform, software instruments can be controlled and coordinated, even across products. Through its guidelines and conventions, the instrumentation platform supports software re-use, and practices that make instrumentation more adaptable, rigorous, and safe. TODO: it's for the creation of data sets updated according to production conditions.

It facilitates the creation of instrumentation events in response to situations of interest (such as an error or a button click) that are detected by software instruments. To assist in this process, an instrumentation platform library is provided for each product platform, with algorithms and an API regulated using a common standard. Developers have the ability to design their own instrumentation events using an event schema, collect events from one or more instruments together into different event streams, and manage those instruments remotely using an event stream configuration.

It is built using the event platform, which is more general system that also carries other kinds of event traffic, such as events used to drive application behavior.

Conformance
How conformance is determined. RFC 2199

Status
This software is under active development. It is maintained by the team a. Platform support is given below.

Overview
To start using the instrumentation platform, you must have a software application you want to instrument, called the software under measurement, and an instrumentation platform library for that software application.

If the application does not have an instrumentation platform library (see current coverage), it may still be possible to use the instrumentation platform with a reduced feature set. In a production environment, a formal is worked out among data scientists, engineers, and product owners, in order to specify the study question, the data needed to answer the question, and how to collect it. Once this is done and approved, implementation can proceed.

Implementing an instrumentation plan involves the use of three different programming interfaces: instrument, event schema, and event stream.

These interfaces are supported by two additional pieces of software: the instrumentation platform library, which resides on the client under instrumentation, and the event intake service, which resides on the server receiving events. The role of the instrumentation client library is to apply event stream configuration directives at collection-time, and also to assign certain properties on the instrumentation event. The event intake service may also perform some limited, but its primary role is to the event against its event schema.

Example instrumentation task
The following example illustrates the relationship between the three programmable interfaces of the instrumentation platform.

Example application
Consider the following example application (in this case a simple website with embedded JavaScript). This application renders a button on screen, and will print a message to the browser console when the button is clicked. Suppose we want to instrument that click, so that the message is also sent as an instrumentation event.

Example event schema
The first thing we need to do is create a schema for the instrumentation event that will be able to carry the message. In this case, we will want a property called  that takes values of type. For a full guide to schema definition, see this page. { "title": "clickMessageSchema", "description": "Carries a click message", "$schema": "https://json-schema.org/draft-07/schema#", "type": "object", "required": [ "$schema", ], "properties": { "$schema": { "type": "string", "description": "URI of the event's schema." },   "message": { "type": "string", "description": "Message from the click" } } }

Example event stream
The events need to go to an event stream, so we need to configure one that will receive events of this schema. By default, event streams will be directed to a Hive table in the Data Lake with a name corresponding to the stream. See "clickMessageStream": { "stream_name": "clickMessageStream", "schema_title": "clickMessageSchema", }

Example instrumented application
Finally, we need to add the instrumentation code that detects the event, and use it to call the submit method of the instrumentation platform library API.

<!--

Terminology

 * Event
 * An event is a collection of properties and associated property values including at least:
 * a time (at which the event occurred)
 * a type (specifying its intended data structure).
 * The event will usually also contain additional properties describing the event. Such data are specified by the type.


 * Property
 * A property of an event refers to the properties of the JSON object encoded by the JSON string representation of the event. Integral properties (those that have a values that are not JSON arrays, JSON objects) are the smallest indexable units of data in the instrumentation platform. Properties may be composite or integral. For a full list of values, etc, see the JSONSchema docs.


 * Instrumentation Event
 * An instrumentation event is an event carrying an observation about the software under instrumentation. It is strictly observational - the software under instrumentation must behave identically whether or not the event is fired. This property allows instrumentation to be enabled or disabled at will, and ensures that regressions or interruptions in instrumentation do not degrade the actual software under instrumentation. This makes it safer for instrumentation to be managed independently, and allows systems which carry instrumentation events to be held to a lower service tier.


 * Instrument
 * An instrument is the unit of application code responsible for submitting the event data to the instrumentation platform library. The event data alone is not yet an event, as it does not have a time. The time, and other additional fields, will be added by the instrumentation platform library before the event is produced.


 * Metric
 * The value that is computed from the data. Examples are "average page dwell time", "frequency of button clicks," or "number of abandoned edits". The metric depends on individual pieces of data for its computation. For example, to compute the average page dwell time, we must have a set of page dwell times. Each collected page dwell time is its own piece of data.


 * An instrumentation event can carry any number of properties, and these properties may support multiple different metrics. For example, we could define a "buttonClick" schema that has properties "isUserLoggedIn" and "isUserUsingDarkMode" which are filled out accordingly at the time the button is clicked. Such an event would support metrics like "frequency of clicks among logged in users", "frequency of clicks among dark mode users", and their respective negations, conjunction, and disjunction.


 * Is this a good idea? It depends. Often this is done in order to provide flexibility to the data scientist down the road. Because it can take time for events to land in the database, and in some situations the data is so sparse that it may take time to make enough observations, it is reasonable to add fields that may be of use in advance. Taken to its extreme, this approach is termed schema-on-read and can be summarized as "collect everything you can, and only use what you need". This approach is more flexible and allows for answering unanticipated questions, but it is very wasteful of storage, and does not respect the user's privacy. Our approach heavily favors schema-on-write, which can be summarized as "collect only what you need."

-->
 * Instrument
 * Application (or client)
 * Library
 * Event
 * Schema
 * Stream
 * Database

Instrumentation Event
An instrumentation event is a special kind of event. Like all events, its canonical form is a JSON string, and it must carry a reference to a JSONSchema event schema file that can be used by the event intake service to validate its properties. The schema must include the instrumentation common schema fragment, and may also include other fragments in the instrumentation namespace. Instrumentation events must follow a strict observational policy. This policy states that the software under measurement must behave identically, whether or not the event is fired. This is unlike other events, for example the kind that drive application behavior in an event-driven architecture. Such events are also carried by the event platform.

The properties of instrumentation events can be divided into two categories.


 * Data properties
 * The direct observations taken by an instrument. Once set by the instrument, they are not modified again (the faithfulness policy). Faithfulness avoids involving general systems with properties that have a high turnover rate and low generalizability. Data quality problems arising from these properties will be local to the instrumentation code, which is under the developer's direct control.


 * Metadata properties
 * The wider context in which the event occurred. They are automatically managed. This means their values are assigned automatically by either the or the . Unlike data properties, their values are not faithful -- they may be subject to replacement or censorship. Metadata properties are exactly the properties defined by instrumentation schema fragments.

The management of these properties, and of the event itself, follows a well-defined lifecycle. To produce an instrumentation event, an instrument submits the data properties to the IPL. The IPL will assign the metadata properties, and perform a number of other computations, before transmitting the event. This process, and subsequent steps, is detailed in the section.

Instrumentation event schema
An event schema is a |JSONSchema file that can be used with a JSONSchema validator to classify any JSON object as "valid" or "invalid", according to the rules of the schema file. In the event platform (and hence the instrumentation platform), validation occurs server-side, after the event has been received by the, but before it has been inserted into a. Events which are classified invalid are discarded immediately.

All events, and hence all instrumentation events, must identify a schema which they will be validated against (see ). An event stream only accept events from one designated schema, but multiple event streams may designate the same schema. Schema are, and authored in YAML.

Instrumentation event schema fragments
A schema fragment is JSONSchema file that can be included in other event schema in a modular way. Instrumentation event schema can choose from a set of standardized fragments designed specifically for instrumentation. Using these fragments makes schema faster to author, and makes common property names and types consistent across different schema (and hence database tables). This makes analysts' job easier and makes it simpler to compare, join, and union tables.

The properties defined by instrumentation schema fragments are all metadata properties and have values managed downstream of the instrument.

As an example, we can modify the  from  to use a couple fragments.

Identifiers fragment
The identifiers schema fragment provides the core identifiers that are used to associate together events in the same scope. The properties specified in this fragment are reserved for the use of the. Within the library, they are serviced by the.


 * (string)
 * Identifies a client across multiple sessions. This is the "app install ID" on mobile apps and enables calculation of retention metrics for anonymous users since we do not have a user ID for those. MediaWiki-based instrumentation does not include this identifier in the events it sends.


 * (string)
 * Identifies a session. On MediaWiki, a session last for the lifetime of the browser process (refer to T223931 for additional information) and can be retrieved with . On iOS and Android apps, where the app is allowed to enter a background state, sessions expire after 15 minutes of inactivity. If the app returns to the foreground after 15 minutes, a new session ID is generated.


 * (string)
 * Identifies a page view, applicable only on the web. Interactions with multiple features (instrumented separately) on the same page may be linked together via this identifier. On MediaWiki this is retrievable with.

Activity sequencing fragment
The activity sequencing fragment is used to reconstruct sequences of events, e.g. a funnel. For example, suppose the user is making an edit. We group the actions performed in this activity with. In the old way of doing things it would be feature-specific "editing_session_id". As the user interacts with various (instrumented) features/elements in the editor, previews the edit, continues editing, and finally publishes the edit, specific data about all of those interactions can be tracked in schema-specific fields, but the order in which those interactions happen is recorded in.


 * (string)
 * Identifies a sequence of actions in the same context or funnel. In the past, teams have used terms like "session ID" and "sub-session ID" to refer to a set of connected events, such as interacting with a widget. This identifier is useful for grouping together impressions with corresponding clicks, and for grouping together steps in a process such as making an edit. Activity identifier can be randomly generated or a counter.


 * (integer)
 * Starting at 1, this is a counter for reconstructing the order of events in the same activity. For a variety of reasons we cannot trust the timestamp of receipt or the client-side timestamp of when the event was generated for putting events in order. In cases where the exact sequence of events needs to be established, this identifier can be used to record which event happened 1st, which happened 2nd, and so on.

Event stream configuration
'''TODO: Put the spec here along with a table of feature-level support on each platform for each item. (e.g. web does not support device_id).'''

In the previous instrumentation system, an event schema not only mapped to a database table schema, but to a database table itself. That meant that re-using a schema meant sharing the same table, which is not a good idea (see ), and so developers would copy and paste the desired schema and simply re-name it. Schemas proliferated and diverged in subtle ways over time. In some cases, to avoid the hassle of creating and deploying a new schema, teams would simply add a new field to an existing one, and use that field as a flag to filter their events from the other events when querying the database table. Because event schemas must be backwards compatible, these flags remained even after the studies they were used for had finished, building up and obscuring the schema's purpose.

The lap times for a human sprinter and a racecar use the same  schema, and can be measured with the same instrument (a stopwatch), but that does not mean that racecar times and human times should be stored in the same table. Yet the only way around this in the previous system was to make one  schema for the human, and another (id, timestamp) schema for the racecar, and to insist that they were different by virtue of the fact that one was named   and the other. This caused numerous problems mostly due to the confusing fact that a schema's name wasn't naming the schema, it was naming the database table that events using this schema would end up in. The schema itself was defining structure, but the schema's name was defining routing. Having these two functions on the same piece of metadata caused many problems.

To address this, the event platform introduced an. An event stream expects events of a particular schema, and will route those events to a particular database table or other backend depending on how the stream is configured. This allows an event schema to focus on defining structure, and makes it more re-usable and robust. It also frees the event stream configuration from the strict backwards-compatibility requirements of the schema, allowing it to be more dynamic.

It also provides the opportunity to interact with the instrumentation. An instrument knows what kind of event it is making (event schema), and it knows where it is sending it (event stream). But an event stream is not just "where it's being sent". In the human/racecar laptime example, why was it not a good idea to put human and racecar laptimes together in the same table?

Part of it is just making that distinct studies have distinct datasets. The racecar data scientists, for example, may insist on flushing all racecar laptimes at the end of each work day to avoid industrial espionage, whereas the human laptimers not only want theirs to be stored forever, they want to publish them in the press! Having separate datasets makes all of these things more convenient.

But there's another reason: racecars are faster than humans. The statistical properties of one dataset is much different from the other. In the example, this is because racecars have V-12 engines and humans have legs. The thing that is being measured is different. But the other thing that can change the statistical properties is if the process or conditions under which the thing is being measured are different. If we mixed the laptimes of students running their first race versus laptimes of all runners, it would also be confusing.

So the other function of stream config is to define and shape these conditions with sampling.

So in the new system, the laptime example would use a single  "laptime" schema, and have two event streams   and , each of which expect events using that schema. The two streams can direct those events into different database tables as they wish.

Instrumentation event lifecycle
The instrumentation event pipeline is a well-defined sequence of processing stages that an instrumentation event will undergo. As it progresses from stage to stage, the instrumentation event will change in various ways in a process called the instrumentation event lifecycle. The pipeline and lifecycle span multiple software systems, from the originating instrument to the ultimate datastore.

Event rejection
Event rejection refers to the deliberate discarding of events. An event will be rejected by the client if the client is misconfigured or if the event is found to be out of sample according to the 's assessment of the associated stream configuration. An event will be rejected by the server if it fails schema validation according to the of the.

Event rejection is the major controlling factor in the event pipeline. Imagined as a flow of events, the pipeline resembles a funnel, and a rather harsh one.

Configuration conflict
Configuration conflict is a form of involuntary sampling. Clients which do not meet certain requirements are not able to receive events from certain streams. This is not classified as loss due to error as it is defined behavior, but does not reflect the specification of the instrument developer.


 * Event stream not provided
 * The client's clock, randomness, or other compatibility has caused the library to disable itself
 * The client's submission rate has climbed so high that the library has disabled itself

Sampling
Sampling is interpreted broadly to mean any detemination made about whether or not to send an event on a particular stream on a particular client. Fine-grained control of sampling and targeting is performed by the IPL via the. The IPL itself can also place the entire client out-sample, independent of any particular stream.


 * Stream disabled
 * User opt-out
 * User not in sample (according to sampling logic)

Validation
Events which are classified as invalid (against their schema) by the are immediately rejected. Schema validation failure in production should be rare. It should mostly occur during and proving of the instrumentation in development.

Event CC-ing
An event bound for one stream can be copied to other streams in a prescribed fashion. This makes it convenient to perform multiple simultaneous studies using the same instrument for data collection. This allows the entire study to be implemented without alteration of application code. Event CC-ing occurs on the client and the CC targets are prescribed in the stream that receives the original event. For a full explanation, see submit function.

Event loss due to error
Event loss due to error (as opposed to rejection), refers to the loss of events due to factors beyond the control of the event pipeline. It can occur due to error at any point of the pipeline, but in practice is concentrated in the network interface between the client and the server.


 * Network connectivity loss
 * Network route failure (firewall, packet loss, proxy, etc.)
 * Intake server timeout (handling too many concurrent requests)
 * Intake server misconfiguration
 * Client malformed HTTP POST request
 * Client malformed JSON object in POST body
 * Failure to fetch schema for validation
 * Failure to compute validation
 * Failure to perform property value replacement

Event gain due to error
Event gain due to error is a rare but documented error state in which an event can become duplicated just prior to being sent to the network. It is suspected to result from inconsistent behavior of web browsers in certain situations. Measures are taken by the event intake service to mitigate this duplication.
 * Duplicate events sent as a result of browser error (this has happened, see bug

Property value assignment
Property value assignment refers to the assignment of values to metadata properties by either the or (less frequently) the. Assignment of metadata property values must be completed prior to. After schema validation, property values should be treated as immutable by policy. This is because validation is the last reliable time to verify that insertion into the destination storage will be successful.

Property value replacement
Property value replacement refers to the replacement of a metadata property's value with another value. This is most commonly done as a form of cardinality reduction. "Cardinality reduction" is a catch-all term for any procedure that maps values from of a large set to values from a smaller (typically much smaller) set. A common example of cardinality reduction in web analytics is mapping the (large) set of user agent strings to the (much smaller) set of browsers. "Mozilla/5.0 (Windows NT 6.1; rv:5.0) Gecko/20100101 Firefox/5.0" => "firefox" "Mozilla/5.0 (Windows; U; Windows NT 6.0;en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6" => "firefox" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36 OPR/54.0.2952.64" => "opera"

In data science, cardinality reduction is typically done for properties whose values will be used to partition the whole dataset. If the cardinality of the value set is too large, the database cannot keep an efficient index and will not partition efficiently, particularly for large datasets. It is also done as a form of censorship: the mapping from a large set to a small set involves a loss of information, which can benefit privacy (for example, mapping an IP address to a country code). It makes sense to perform mappings like these prior to insertion, rather than at query time or as part of a regular table turnover, because the large cost of the operation is amortized.


 * All property value replacement must occur prior to schema validation (and none after validation)
 * The originating instrument may perform property value replacement
 * Property value replacement should not be performed between the originating instrument and the instrumentation platform library (see ). Additional layers of pre-processing which take place outside of the scope of this documented standard will lead to a loss of a developer's ability to reason about how an event evolves.
 * Property value replacement must not be performed by the instrumentation client library (it does perform ). Properties and property values that the originating instrument provides to the  function are faithfully handled.
 * Property value replacement at the server (pre-validation) is indicated only when
 * The property value that will be reduced is assigned at the server (pre-validation), OR
 * The cardinality reduction is too complicated to be computed on the client

The mapping of user agent strings to browsers is an example of a property value replacement that is best sited at the server (pre-validation), for both reasons. First, the client is actually not very good at providing its user agent string, but it can be found in an HTTP request header, which is easily accessible to the server. Second, computing the mapping requires a high degree of sophistication due to the number of cases involved (see e.g. |UA Parser). It is not desirable to perform this computation on the client because it would require a large amount of code being sent for only this purpose, which harms performance.

Property value censorship
Property value censorship refers to the intentional destruction of information for the purposes of privacy. Some property values may contain sensitive information (or information that could become sensitive if stored with other values). This need not only refer to PII as legally defined. In general, care should be taken collecting property values that will need to be censored. If possible, this information should be handled on the server.

Testing
TODO

Instrumentation platform library (IPL)
It should be called something like instrument hub or instrument control hub or something to emphasize that it is driven by external config, and is central, i.e., events do not usually pass out of the client unless going through it.

A single application may have numerous instruments, designed independently, operating simultaneously, and maintained by different teams for different purposes. The instrumentation platform library acts as a hub for all of the instruments, ultimately controlling which events the instruments signal get sent to the network, and which are discarded. If an event will be sent to the network, it is also responsible for assigning values to certain standard properties of the event.

The library is constructed in modules, called controllers. Each controller is driven by which are dynamically loaded at runtime, allowing the instrumentation to be changed remotely without additional software deployment.


 * Stream controller
 * Loads the event stream configuration controls, and provides them to the library at runtime


 * Sampling controller
 * Determines whether an event or a stream is in- or out- sample


 * Association controller
 * Controls the assignment of random and unique tokens that associate events belonging to a common (consider scope controller?).

Standardizing properties
In previous systems, we observed a high degree of heterogeneity among both event schema and instruments in regards to the names of properties, the format of property values, and the method by which those values are assigned. For example, an audit found six different properties identifying themselves as a unique session identifier token, most with completely different concepts of what a session was. The determination of "what a session was" could only be determined by detailed inspection of the application code, and in some cases, did not agree with the documentation.

Such cases are not unique. In order to be useful for data analysis, most instrumentation events need to carry a lot of additional information besides what the instrument has set out to measure. For example, the time at which the event occurred, whether it occurred in association with any other events, the domain or program where the event originated, and so on. Standardization of this additional information is helpful for analysts who can expect a consistent set of fields at query time, and for event schema designers looking to re-use properties and value definitions. The instrumentation platform makes this easy with a collection of that can be composed with any new instrumentation event schema.

But these properties still need to be filled out somewhere on the client application. Historically, the instrument has been responsible for filling out all of the event properties, but because instruments are designed and operate independently, this has led to inconsistency in how some values are computed. Most platforms are full of quirks that affect the quality of this additional information if not handled with care. In particular, the treatment of, , and are notorious for their inconsistency. Along with the creation of for commonly used properties,  to assign values to those properties was created. By reserving the task of assigning these properties, the library allows the instrumentation code to focus on assigning only the data which it was designed to collect, and stay out of complicated or unnecessary bookkeeping. It also allows enforcement of the standard fragment semantics, since value assignment is happening in a standard way.

Supporting portability
The instrumentation library concentrates the majority of the unportable code in itself, where care and attention can be focused on ensuring that portability is working property. Instruments become simpler, data quality and portability are improved.

Portability is important, because the goal of the instrumentation platform is to allow as many client platforms as possible to access a common set of instrumentation capabilities. This set of capabilities is defined in the common instrumentation definition [WIP].

Each target platform receives an instrumentation library that implements the CID[WIP]. Most target platforms will not use the same programming language or runtime environment. To ensure portability, core algorithms are built with a small but well-chosen set of primitives that can be implemented in a transparent style, avoiding the use of language-specific abstractions. This makes it easier to verify critical behavior in a new target language. This core is then wrapped by an integration layer that implements platform-specific functions according to the specified contract.

Do I need to use the IPL?
No, but it's complicated.

streams
.

Configure
These are properties on the library itself, is how you can think about it. TODO: put the random block here?

The instrumentation platform library is dynamically configured at runtime with a set of. Once configuration has occured, the controls are immutable until the runtime is reset. The point of first configuration and the point of runtime reset are platform-dependent and are listed in the table.

The, along with the value of the , completely determine the behavior of the instrumentation platform library during the runtime of the application under measurement.

Stream intake service URI
A URL of the that will receive an event. The instrumentation library may have a default stream intake service URL or it may be stream-dependent.

Queue linger seconds
TODO: Change the name...

Initialize
Binding of various things related to the integration layer, session change detection, etc.

Run.

Submit
Submit is the main algorithm of the library and is the only driver of runtime behavior post-initialization. Its steps are carried out in a particular order.

To submit an instrumentation event carrying data eventData to event stream streamName, the submit method must run these steps:


 * 1) Verify the event stream streamName is recognized
 * 2) Let config be the result of running getStreamConfig on streamName.
 * 3) If config is , return.
 * The StreamController could not recognize streamName.
 * We assume the client is misconfigured, i.e., the instrumentor has failed to create an event stream configuration, make an event stream configuration available for loading, or has specified the wrong streamName. Rather than produce potentially inconsistent data, the event submission does not proceed.
 * 1) Verify eventData is well-formed
 * 2) If eventData is ,   (implementation-dependent), or empty, return.
 * Every event must define at least its, therefore an empty eventData can never be valid.
 * 1) If eventData.$schema is ,  , or the empty string, return.
 * Every event must define its . See .
 * 1) Assign the event's.
 * 2) Let eventData.meta.stream be streamName.
 * 3) Assign the event's . Jump to the first appropriate substep:
 * TODO: Actually doesn't this need to be done for ALL of the properties that are set by the library, not just this one? Yes, I think this is the case.
 * 1) ; If eventData.client_dt is not set
 * Let eventData.client_dt be the current to millisecond resolution, formatted as an.
 * 1) ; If eventData.client_dt is set
 * Do nothing. Assume that this is a event, which should have the same timestamp as its original.
 * 1) Dispatch copies of eventData to any designated copy target streams
 * 2) For each copyRecipientStreamName in config.copyToStreamNames[ streamName ]
 * 3) Let copyOfEventData be a copy of eventData
 * 4) Run
 * 5) Schedule the event for transmission
 * 6) If STREAM_INTAKE_SERVICE_URL is not defined, return.
 * 7) Let eventDataSerialized be the result of running a JSON string serialization algorithm on eventData.
 * 8) Run  on streamIntakeServiceURI and eventDataSerialized.

Schedule
To schedule an instrumentation event eventDataSerialized to a destination streamIntakeServiceURI, the network controller must run these steps:
 * 1) Enqueue eventDataSerialized and streamIntakeServiceURI
 * 2) When the queue wakes, run the  algorithm on each item in the queue

Send
To send an instrumentation event eventDataSerialized to a destination streamIntakeServiceURI, the network controller must run these steps:
 * 1) Create an HTTP POST request with URL streamIntakeServiceURI and request body eventDataSerialized
 * 2) Issue the HTTP POST request asynchronously and do not wait for or handle response or timeout ("fire-and-forget")

Stream controller
The stream controller is responsible for making the available to for use in the library. The library uses the event stream configuration to drive various aspects of event submission.

Constructor
The constructor loads fetched stream configuration into a read-only property. After it has been fetched as fetchedStreamConfig, using a, run the following steps
 * 1) Initialize streamConfig to the empty
 * 2) For each key streamName in fetchedStreamConfig, let StreamController.streamConfig[ streamName ] be fetchedStreamConfig[ streamName ].
 * Note that the creation of stream CC routes has been delegated to the, so we don't need to have extra work here constructing them.

getStream
To retreive a stream configuration for stream streamName, run the following steps
 * 1) If a key matching streamName exists in StreamController.streamConfig, and StreamController.streamConfig[ streamName ] has type , return StreamController.streamConfig[ streamName ].
 *  TODO, how to verify it's a real stream config? Do it in submit then.
 * The value under the key may be an empty, but   or   values are not acceptable.
 * 1) Return
 * The stream streamName is not recognized.

getCopyTargets
To retreive the copy targets for stream streamName, run the following steps
 * 1) If a key matching streamName exists in StreamController.streamConfig, and StreamController.streamConfig[ streamName ] has type , and StreamController.streamConfig[ streamName ].copyTargets exists and has type array of string, return StreamController.streamConfig[ streamName ].copyTargets.
 * The value under the key may be an empty, but   or   values are not acceptable.
 * 1) Return

Association controller
The association controller has a PAGEVIEW_ID, initially, which represents the pageview token for the current page.

The association controller has a SESSION_ID, initially, which represents the session token for the current session.

The association controller has an ACTIVITY_TABLE, initially, which holds activity tokens for each stream.

The association controller has an ACTIVITY_COUNT, initially, which holds a monotonic increasing sequence of integers, incremented for each new activity started.

pageview_id
To retreive the current pageview id, run the following steps:
 * 1) If PAGEVIEW_ID is
 * 2) Let newPageviewId be the result of running.
 * 3) Let PAGEVIEW_ID equal newPageviewId
 * 4) Return PAGEVIEW_ID

session_id
To retreive the current session id, run the following steps:
 * 1) If SESSION_ID is , run the following steps
 * 2) Let sessionIdPersisted be the result of running  on the sessionIdStorageKey.
 * 3) If sessionIdPersisted is not , let SESSION_ID equal sessionIdPersisted, otherwise run the following steps
 * 4) Let sessionIdGenerated be the result of running.
 * 5) Run  on the sessionIdStorageKey and sessionIdGenerated.
 * 6) Let SESSION_ID equal sessionIdGenerated.
 * 7) Return SESSION_ID.

activity_id
To retreive the current activity id for stream streamName and activity activityName, run the following steps:
 * 1) If ACTIVITY_COUNT or ACTIVITY_TABLE is , run the following steps:
 * 2) Let ACTIVITY_COUNT be the result of running  on the activityCountStorageKey
 * 3) Let ACTIVITY_TABLE be the result of running  on the activityTableStorageKey
 * 4) If ACTIVITY_COUNT or ACTIVITY_TABLE are , run the following steps:
 * 5) Let ACTIVITY_COUNT equal
 * 6) Let ACTIVITY_TABLE equal the empty object
 * 7) Run  on activityCountStorageKey and ACTIVITY_COUNT
 * 8) Run  on activityTableStorageKey and ACTIVITY_TABLE
 * 9) If streamName is ,  , or the empty string, return.
 * 10) If ACTIVITY_TABLE[ streamName ] is not set, run the following steps
 * 11) Let ACTIVITY_TABLE[ streamName ] equal ACTIVITY_COUNT + 1.
 * 12) Let ACTIVITY_COUNT equal ACTIVITY_COUNT + 1.
 * 13) Run  on activityCountStorageKey and ACTIVITY_COUNT
 * 14) Run  on activityTableStorageKey and ACTIVITY_TABLE
 * 15) Let currentCount equal ACTIVITY_TABLE[ streamName ].
 * 16) Return.

begin_new_session
To begin a new session, run the following steps:
 * 1) Let PAGEVIEW_ID equal
 * Pageviews are nested in sessions; a change of session necessitates a change of pageview.
 * Pageviews are not persisted, so they do not need to be removed from persistent storage.
 * 1) Let SESSION_ID equal
 * 2) Run  on sessionIdStorageKey.
 * 3) Let ACTIVITY_TABLE equal
 * 4) Let ACTIVITY_COUNT equal
 * 5) Run  on activityTableStorageKey.
 * 6) Run  on activityCountStorageKey.

begin_new_activity
To begin a new activity for stream streamName, run the following steps:
 * 1) Run activity_id.
 * This ensures ACTIVITY_TABLE and ACTIVITY_COUNT are loaded from the persistent store, or generated.
 * 1) If ACTIVITY_TABLE[ streamName ] is set, run the following steps:
 * 2) Unset ACTIVITY_TABLE[ streamName ]
 * ''I.e., delete or remove the key streamName and its value from ACTIVITY_TABLE.
 * 1) Run  on activityTableStorageKey and ACTIVITY_TABLE

in_sample
.

Library Integration Functions
The core algorithms make use of a number of platform-specific functions that are defined in the integration layer. These functions and their contracts are outlined below.

get_persisted_value
.

set_persisted_value
.

del_persisted_value
.

Fetch
Fetching of stream configuration is a platform-dependent process that may be tailored to the needs of the platform and its requirements. Stream configuration data may be injected as configuration (e.g. ResourceLoader) via one or more HTTP requests to the (e.g. apps), or by locally provisioning it (e.g. during development or testing).

Stream configuration fetch can consist of any number of steps, and should not be confused with, which can be run once and only once per application runtime.