API reference¶

wiggelen¶

Wiggelen, working with wiggle tracks in Python.

The wiggle (WIG) format is for storing dense, continuous genomic data such as GC percent, probability scores, read depth, and transcriptome data.

exception wiggelen.ParseError[source]¶: Raised if a wiggle track cannot be parsed.

exception wiggelen.ReadError[source]¶: Raised if a wiggle track does not provide random access. Reading with random access is needed for using and creating an index.

wiggelen.walk(track=<open file '<stdin>', mode 'r'>, force_index=False)[source]¶

Walk over the track and yield (region, position, value) tuples.

The values are always of type int or float.

Parameters:	track (file) – Wiggle track. force_index (bool) – Force creating an index if it does not yet exist.
Returns:	Tuples of (region, position, value) per defined position.
Return type:	generator(str, int, _)

Example:

>>> for x in walk():
...     x
...
('chr18', 34344, 629.0)
('chr18', 34345, 649.0)
('chr18', 34446, 657.0)
('chrM',  308,   520.0)
('chrM',  309,   519.0)

wiggelen.zip_(*walkers)[source]¶

Walk over all tracks simultaneously and for each position yield the region, position and a list of values for each track, or None in case the track has no value on the position.

Note

This assumes the order of regions is compatible over all walkers. If you are unsure if this is the case for your input wiggle tracks, use the walk() function with the force_index keyword argument.

Parameters:	walkers (list(generator(str, int, _))) – List of generators yielding tuples of (region, position, value) per defined position.
Returns:	Tuples of (region, position, values) per defined position.
Return type:	generator(str, int, list(_))

Example:

>>> for x in zip_(walk(open('a.wig')), walk(open('b.wig'))):
...     x
...
('18', 7, [29.0, None])
('18', 8, [49.0, None])
('18', 9, [None, 87.0])
('MT', 1, [20.0, None])
('MT', 2, [36.0, 92.0])

wiggelen.fill(walker, regions=None, filler=None, only_edges=False)[source]¶

Fill in undefined positions with filler (or None).

Parameters:	walker (generator(str, int, _)) – Tuple of (region, position, value) per defined position. regions (dict(str, (int, int))) – Dictionary with regions as keys and (start, stop) tuples as values. If not None, fill positions from start to stop (both including) in these regions. If None, fill positions in all regions between their first and last defined positions. filler – Value to use for filling undefined positions. only_edges (bool) – Only fill the first and last of continuously undefined positions.
Returns:	Tuples of (region, position, value) per position where value is filler if it was not defined in the original walker.
Return type:	generator(str, int, _)

Example:

>>> for x in walk(open('a.wig')):
...     x
...
('MT', 3, 29.0)
('MT', 5, 49.0)
('MT', 8, 87.0)
('MT', 9, 20.0)
>>> for x in fill(walk(open('a.wig')):
...     x
...
('MT', 3, 29.0)
('MT', 4, None)
('MT', 5, 49.0)
('MT', 6, None)
('MT', 7, None)
('MT', 8, 87.0)
('MT', 9, 20.0)

The only_edges argument might seem a bit out of place here, but can be useful in combination with filler=0 when creating a line plot. Without any filling, non-zero lines may be plotted where there is actually no data.

Note

This might be a tiny bit memory-hungry on Python 2.x if there are very large gaps to fill since we use the range function to generate the positions. I don’t think it’s worth it to add version specific code paths for this.

wiggelen.write(walker, track=<open file '<stdout>', mode 'w'>, serializer=<type 'str'>, name=None, description=None)[source]¶

Write items from a walker to a wiggle track.

Parameters:

walker (generator(str, int, _)) – Tuples of (region, position, value) per defined position.
track (file) – Writable file handle.
serializer (function(_ -> str)) – Function making strings from values.
name (str) – Optional track name (displayed to the left of the track in the UCSC Genome Browser).
description (str) – Optional track description (displayed as center label in the UCSC Genome Browser).

Note

Values of None are discarded.

Example:

>>> write(walk(open('a.wig')), name='My example')
track type=wiggle_0 name="My example"
variableStep chrom=1
1 520.0
4 536.0
8 553.0
variableStep chrom=MT
1 568.0
2 598.0
6 616.0

wiggelen.merge¶

Merge any number of wiggle tracks in various ways.

The algorithm can be parameterized by a merge operation. Four of these operations are predefined in mergers:

Merger sum: Compute the sum of all values.

Merger mean: Compute the mean of all values (and use 0 for undefined values).

Merger count: Compute the number of defined values.

Merger minus: Subtract the second value from the first (and use 0 for undefined values). Only defined on exactly two values.

Merger min: Compute the minimum of all values (and use 0 for undefined values).

Merger max: Compute the maximum of all values (and use 0 for undefined values).

Merger div: Divide the second value by the first (and use 0 for undefined values). Only defined on exactly two values.

Merger intersect: Return the first value is the second value is defined and non-zero (and use 0 for undefined values). Only defined on exactly two values.

Merger ctz: Select the value closest to 0. (and use 0 if there is a mix of positive and negative values).

wiggelen.merge.merge(*walkers, **options)[source]¶

Merge wiggle tracks.

This assumes the walkers have their regions in the same order. You can force this by using indices. Example:

>>> from wiggelen import walk
>>> walkers = [walk(open(track), force_index=True)
...            for track in ('a.wig', 'b.wig', 'c.wig')]
>>> for x in merge(*walkers):
...     x
...
('18', 8, 849.0)
('18', 9, 987.0)
('MT', 1, 820.0)

Parameters:	walkers (list(generator(str, int, _))) – List of generators yielding tuples of (region, position, value) per defined position. merger (function(list(_) -> _)) – Merge operation (default: sum).
Returns:	Tuples of (region, position, merged value) per defined position in walkers.
Return type:	generator(str, int, _)

wiggelen.merge.mergers = {'count': <function <lambda> at 0x7f0bca3e9938>, 'div': <function <lambda> at 0x7f0bca2d2230>, 'intersect': <function <lambda> at 0x7f0bca2d22a8>, 'min': <function <lambda> at 0x7f0bca2d2140>, 'max': <function <lambda> at 0x7f0bca2d21b8>, 'sum': <function <lambda> at 0x7f0bca3e9a28>, 'minus': <function <lambda> at 0x7f0bca2d20c8>, 'ctz': <function _merger_ctz at 0x7f0bca2d2320>, 'mean': <function <lambda> at 0x7f0bca3e9758>}¶: Predefined mergers. See wiggelen.merge for their definition.

wiggelen.distance¶

Calculate the distance between two wiggle tracks using a metric designed for multisets.

This module can be used to assess similarity of next generation sequencing datasets. A multiset distance measure is used for pairwise comparison of genomic information as provided by wiggle tracks.

The algorithm can be parameterized by a pairwise distance metric. Four of these metrics are predefined in metrics:

Metric a: \(\frac{|x - y|}{(x + 1) (y + 1)}\)

Metric b: \(\frac{|x - y|}{x + y + 1}\)

Metric c: \(\frac{\text{max}(x, y) \, |x - y|}{(x^2 + 1) (y^2 + 1)}\)

Metric d: \(\frac{|x - y|}{\text{max}(x, y) + 1}\)

Note

These metrics are ill-defined on the interval (0, 1) so we scale all values if necessary.

wiggelen.distance.distance(*tracks, **options)[source]¶

Calculate the pairwise distances between wiggle tracks.

Parameters:	tracks – List of wiggle tracks. metric – Pairwise distance metric (default: a). threshold (float) – Threshold for noise filter (default: no noise filter)
Returns:	Pairwise distances between tracks as a mapping from coordinates in the distance matrix to their values.
Return type:	dict((int, int), float)

wiggelen.distance.matrix(size, reflexive=False, symmetric=False)[source]¶

Create all coordinates in a square matrix.

With the default False value for reflexive and symmetric, include only the coordinates below the diagonal.

Parameters:	size (int) – Width and height of the matrix. reflexive (bool) – Include coordinates on (x, x) diagonal. symmetric (bool) – Include coordinates (x, y) above the diagonal (where x < y).
Returns:	All coordinates in the matrix as tuples.
Return type:	list(int, int)

Examples:

>>> matrix(5)
[(1, 0),
 (2, 0), (2, 1),
 (3, 0), (3, 1), (3, 2),
 (4, 0), (4, 1), (4, 2), (4, 3)]
>>> matrix(5, reflexive=True)
[(0, 0),
 (1, 0), (1, 1),
 (2, 0), (2, 1), (2, 2),
 (3, 0), (3, 1), (3, 2), (3, 3),
 (4, 0), (4, 1), (4, 2), (4, 3), (4, 4)]
>>> matrix(5, symmetric=True)
[        (0, 1), (0, 2), (0, 3), (0, 4),
 (1, 0),         (1, 2), (1, 3), (1, 4),
 (2, 0), (2, 1),         (2, 3), (2, 4),
 (3, 0), (3, 1), (3, 2),         (3, 4),
 (4, 0), (4, 1), (4, 2), (4, 3)        ]

wiggelen.distance.metrics = {'a': <function <lambda> at 0x7f0bca3e9b18>, 'c': <function <lambda> at 0x7f0bca2d2500>, 'b': <function <lambda> at 0x7f0bca2d2410>, 'd': <function <lambda> at 0x7f0bca2d2938>}¶: Predefined pairwise distance metrics. See wiggelen.distance for their definition.

wiggelen.distance.normalize(*values)[source]¶

Normalize values relative to the smallest value.

Parameters:	values (list(float)) – List of values.
Returns:	Scale the values such that the minimum is 1.
Return type:	list(float)

wiggelen.transform¶

Various transformations on wiggle tracks.

wiggelen.transform.backward_divided_difference(walker, step=None, auto_step=False)[source]¶

Derivative calculated by the backward divided difference method.

Note

This transformation only works on walkers with numerical values.

Parameters:	walker (generator(str, int, float)) – Generator yielding tuples of (region, position, value) per defined position. step (int) – Restrict calculation to positions that are this far apart (no restriction if None). auto_step (bool) – If True and step=None, automatically set step to a value based on the first two positions in walker.
Returns:	Tuple of (region, position, derivative value) per defined position in walker for which the derivative value is defined.
Return type:	generator(str, int, float)

wiggelen.transform.central_divided_difference(walker, step=None)[source]¶

Derivative calculated by the central divided difference method.

Note

This transformation only works on walkers with numerical values.

Parameters:	walker (generator(str, int, float)) – Generator yielding tuples of (region, position, value) per defined position. step (int) – Restrict calculation to positions that are this far apart. If None, automatically set step to a value based on the first two positions in walker.
Returns:	Tuple of (region, position, derivative value) per defined position in walker for which the derivative value is defined.
Return type:	generator(str, int, float)

wiggelen.transform.forward_divided_difference(walker, step=None, auto_step=False)[source]¶

Derivative calculated by the forward divided difference method.

Note

This transformation only works on walkers with numerical values.

Parameters:	walker (generator(str, int, float)) – Generator yielding tuples of (region, position, value) per defined position. step (int) – Restrict calculation to positions that are this far apart (no restriction if None). auto_step (bool) – If True and step=None, automatically set step to a value based on the first two positions in walker.
Returns:	Tuple of (region, position, derivative value) per defined position in walker for which the derivative value is defined.
Return type:	generator(str, int, float)

wiggelen.intervals¶

Get covered intervals from wiggle tracks and write to BED format.

wiggelen.intervals.coverage(walker)[source]¶

Get intervals of consecutively defined positions from a walker.

Parameters:	walker (generator(str, int, _)) – Tuple of (region, position, value) per defined position.
Returns:	Tuples of (region, begin, end) per position where begin and end are one-based and inclusive.
Return type:	generator(str, int, int)

Example:

>>> for x in coverage(walk(open('a.wig'))):
...     x
...
('MT', 3, 3)
('MT', 5, 20)
('MT', 400, 420)

wiggelen.intervals.write(intervals, track=<open file '<stdout>', mode 'w'>, name=None, description=None)[source]¶

Write intervals to a bed track.

Parameters:	intervals (generator(str, int, int)) – Tuples of (region, begin, end) per interval. track (file) – Writable file handle. name (str) – Optional track name (displayed to the left of the track in the UCSC Genome Browser). description (str) – Optional track description (displayed as center label in the UCSC Genome Browser).

Example:

>>> write(coverage(walk(open('a.wig'))), name='My example')
track name="My example"
MT 2 3
MT 4 20
MT 399 420

wiggelen.plot¶

wiggelen.index¶

Index regions/chromosomes in wiggle tracks for random access.

Indexing a wiggle track results in a mapping of regions to summaries. The summaries are dictionaries including start and stop positions and some statistical metrics. A summary for the entire wiggle track is included as the special region _all.

This data can be written to a file next to the wiggle track file (in case this is a regular file). Example of the serialization we use:

region=_all,start=0,stop=12453,sum=4544353,count=63343
region=1,start=47,stop=3433,sum=4353,count=643
region=X,start=3433,stop=8743,sum=454,count=343
region=Y,start=8743,stop=10362,sum=7353,count=343
region=MT,start=10362,stop=12453,sum=353,count=143

Note that we do not impose a certain order on the lines in the index nor on the fields on a line.

Additional custom fields can be added to the index by providing custom field definitions. Such a definition is created with the Field constructor and the following arguments:

The name of the field.
A function casting a field value from string.
Initial value.
Aggregate function used as the function argument in a reduce- or fold-like operation to construct the field value. This function takes as inputs the accumulated field value, the current value and the current span, and returns a new accumulated field value.

As an example, the standard sum field could be defined as the following tuple:

Field('sum', float, 0, lambda acc, value, span: acc + value * span)

In practice, choose unique names for custom fields, not clashing with the standard fields such as sum.

wiggelen.index.CACHE_INDEX = True¶: Whether or not indices are cached in memory during execution.

class wiggelen.index.Field(name, caster, init, func)¶

Type for custom index field definitions.

caster¶: Alias for field number 1

func¶: Alias for field number 3

init¶: Alias for field number 2

name¶: Alias for field number 0

wiggelen.index.INDEX_SUFFIX = '.idx'¶: Suffix used for index files.

wiggelen.index.WRITE_INDEX = True¶: Whether or not indices are written to a file.

wiggelen.index.clear_cache()[source]¶: Clear the in-memory cache of index objects.

wiggelen.index.index(track=<open file '<stdin>', mode 'r'>, force=False, fields=None)[source]¶

Return index of region positions in track.

Parameters:	track (file) – Wiggle track. force (bool) – Force creating an index if it does not yet exist. fields (list) – List of custom index field definitions.
Returns:	Wiggle track index and index filename.
Return type:	dict(str, dict(str, _)), str

wiggelen.index.read_index(track=<open file '<stdin>', mode 'r'>, fields=None)[source]¶

Try to read the index from a file.

Parameters:	track (file) – Wiggle track the index belongs to. fields (list) – List of custom index field definitions.
Returns:	Wiggle track index, or None if the index could not be read.
Return type:	dict(str, dict(str, _))

wiggelen.index.write_index(idx, track=<open file '<stdout>', mode 'w'>)[source]¶

Try to write the index to a file and return its filename.

Parameters:	idx (dict(str, dict(str, _))) – Wiggle track index. track (file) – Wiggle track the index belongs to.
Returns:	Filename for the written index, or None if the index could not be written.
Return type:	str