\
We load a comma separated data file using the generic LoadTable function.
>>> from cogent import LoadTable
>>> table = LoadTable('stats.txt', sep=',')
>>> print table
====================================
Locus Region Ratio
------------------------------------
NP_003077 Con 2.5386
NP_004893 Con 121351.4264
NP_005079 Con 9516594.9789
NP_005500 NonCon 0.0000
NP_055852 NonCon 10933217.7090
------------------------------------
For really large files the automated conversion used by the standard read mechanism can be quite slow. If the data within a column is consistently of one type, set the LoadTable argument static_column_types=True. This causes the Table object to create a custom reader.
>>> table = LoadTable('stats.txt', static_column_types=True)
>>> print table
====================================
Locus Region Ratio
------------------------------------
NP_003077 Con 2.5386
NP_004893 Con 121351.4264
NP_005079 Con 9516594.9789
NP_005500 NonCon 0.0000
NP_055852 NonCon 10933217.7090
------------------------------------
We change the Ratio column to using scientific notation.
>>> table.setColumnFormat('Ratio', '%.1e')
>>> print table
==============================
Locus Region Ratio
------------------------------
NP_003077 Con 2.5e+00
NP_004893 Con 1.2e+05
NP_005079 Con 9.5e+06
NP_005500 NonCon 7.4e-08
NP_055852 NonCon 1.1e+07
------------------------------
This can be done on table loading,
>>> table = LoadTable('stats.txt', sep=',', digits=1, space=2)
>>> print table
=============================
Locus Region Ratio
-----------------------------
NP_003077 Con 2.5
NP_004893 Con 121351.4
NP_005079 Con 9516595.0
NP_005500 NonCon 0.0
NP_055852 NonCon 10933217.7
-----------------------------
or, for spacing at least, by modifying the attributes
>>> table.Space = ' '
>>> print table
=================================
Locus Region Ratio
---------------------------------
NP_003077 Con 2.5
NP_004893 Con 121351.4
NP_005079 Con 9516595.0
NP_005500 NonCon 0.0
NP_055852 NonCon 10933217.7
---------------------------------
The table Header is immutable. Changing column headings is done as follows.
>>> table = LoadTable('stats.txt', sep=',')
>>> print table.Header
['Locus', 'Region', 'Ratio']
>>> table = table.withNewHeader('Ratio', 'Stat')
>>> print table.Header
['Locus', 'Region', 'Stat']
This can be used to take a single, or multiple columns and generate a new column of values. Here we’ll take 2 columns and return True/False based on a condition.
>>> table = LoadTable('stats.txt', sep=',')
>>> table = table.withNewColumn('LargeCon',
... lambda (r,v): r == 'Con' and v>10.0,
... columns=['Region', 'Ratio'])
>>> print table
================================================
Locus Region Ratio LargeCon
------------------------------------------------
NP_003077 Con 2.5386 False
NP_004893 Con 121351.4264 True
NP_005079 Con 9516594.9789 True
NP_005500 NonCon 0.0000 False
NP_055852 NonCon 10933217.7090 False
------------------------------------------------
Can be done without specifying a new column. Here we simply use the same table data.
>>> table1 = LoadTable('stats.txt', sep=',')
>>> table2 = LoadTable('stats.txt', sep=',')
>>> table = table1.appended(None, table2)
>>> print table
====================================
Locus Region Ratio
------------------------------------
NP_003077 Con 2.5386
NP_004893 Con 121351.4264
NP_005079 Con 9516594.9789
NP_005500 NonCon 0.0000
NP_055852 NonCon 10933217.7090
NP_003077 Con 2.5386
NP_004893 Con 121351.4264
NP_005079 Con 9516594.9789
NP_005500 NonCon 0.0000
NP_055852 NonCon 10933217.7090
------------------------------------
or with a new column
>>> table1.Title = 'Data1'
>>> table2.Title = 'Data2'
>>> table = table1.appended('Data#', table2, title='')
>>> print table
=============================================
Data# Locus Region Ratio
---------------------------------------------
Data1 NP_003077 Con 2.5386
Data1 NP_004893 Con 121351.4264
Data1 NP_005079 Con 9516594.9789
Data1 NP_005500 NonCon 0.0000
Data1 NP_055852 NonCon 10933217.7090
Data2 NP_003077 Con 2.5386
Data2 NP_004893 Con 121351.4264
Data2 NP_005079 Con 9516594.9789
Data2 NP_005500 NonCon 0.0000
Data2 NP_055852 NonCon 10933217.7090
---------------------------------------------
Note
We assigned an empty string to title, otherwise the resulting table has the same Title attribute as that of table1.
>>> table = LoadTable('stats.txt', sep=',')
>>> table.summed('Ratio')
20571166.652847398
We define a strictly numerical table,
>>> all_numeric = LoadTable(header=['A', 'B', 'C'], rows=[range(3),
... range(3,6), range(6,9), range(9,12)])
>>> print all_numeric
=============
A B C
-------------
0 1 2
3 4 5
6 7 8
9 10 11
-------------
and sum all columns (default condition)
>>> all_numeric.summed()
[18, 22, 26]
and all rows
>>> all_numeric.summed(col_sum=False)
[3, 12, 21, 30]
We define a table with mixed data, like a distance matrix.
>>> mixed = LoadTable(header=['A', 'B', 'C'], rows=[['*',1,2], [3,'*', 5],
... [6,7,'*']])
>>> print mixed
===========
A B C
-----------
* 1 2
3 * 5
6 7 *
-----------
and sum all columns (default condition), ignoring non-numerical data
>>> mixed.summed(strict=False)
[9, 8, 7]
and all rows
>>> mixed.summed(col_sum=False, strict=False)
[3, 8, 13]
We can do this by providing a reference to an external function
>>> table = LoadTable('stats.txt', sep=',')
>>> sub_table = table.filtered(lambda x: x < 10.0, columns='Ratio')
>>> print sub_table
=============================
Locus Region Ratio
-----------------------------
NP_003077 Con 2.5386
NP_005500 NonCon 0.0000
-----------------------------
or using valid python syntax within a string, which is executed
>>> sub_table = table.filtered("Ratio < 10.0")
>>> print sub_table
=============================
Locus Region Ratio
-----------------------------
NP_003077 Con 2.5386
NP_005500 NonCon 0.0000
-----------------------------
You can also filter for values in multiple columns
>>> sub_table = table.filtered("Ratio < 10.0 and Region == 'NonCon'")
>>> print sub_table
=============================
Locus Region Ratio
-----------------------------
NP_005500 NonCon 0.0000
-----------------------------
We select only columns that have a sum > 20 from the all_numeric table constructed above.
>>> big_numeric = all_numeric.filteredByColumn(lambda x: sum(x)>20)
>>> print big_numeric
========
B C
--------
1 2
4 5
7 8
10 11
--------
>>> table = LoadTable('stats.txt', sep=',')
>>> print table.sorted(columns='Ratio')
====================================
Locus Region Ratio
------------------------------------
NP_005500 NonCon 0.0000
NP_003077 Con 2.5386
NP_004893 Con 121351.4264
NP_005079 Con 9516594.9789
NP_055852 NonCon 10933217.7090
------------------------------------
>>> print table.sorted(columns='Ratio', reverse='Ratio')
====================================
Locus Region Ratio
------------------------------------
NP_055852 NonCon 10933217.7090
NP_005079 Con 9516594.9789
NP_004893 Con 121351.4264
NP_003077 Con 2.5386
NP_005500 NonCon 0.0000
------------------------------------
>>> print table.sorted(columns=['Region', 'Ratio'], reverse='Ratio')
====================================
Locus Region Ratio
------------------------------------
NP_005079 Con 9516594.9789
NP_004893 Con 121351.4264
NP_003077 Con 2.5386
NP_055852 NonCon 10933217.7090
NP_005500 NonCon 0.0000
------------------------------------
>>> table = LoadTable('stats.txt', sep=',')
>>> raw = table.getRawData('Region')
>>> print raw
['Con', 'Con', 'Con', 'NonCon', 'NonCon']
>>> table = LoadTable('stats.txt', sep=',')
>>> raw = table.getRawData(['Locus', 'Region'])
>>> print raw
[['NP_003077', 'Con'], ['NP_004893', 'Con'], ...
>>> table = LoadTable('stats.txt', sep=',')
>>> for row in table:
... print row['Locus']
...
NP_003077
NP_004893
NP_005079
NP_005500
NP_055852
>>> table = LoadTable('stats.txt', sep=',')
>>> print table[:2, :'Region']
=========
Locus
---------
NP_003077
NP_004893
---------
>>> table = LoadTable('stats.txt', sep=',')
>>> print table[:2,: 1]
=========
Locus
---------
NP_003077
NP_004893
---------
>>> table = LoadTable('stats.txt', sep=',')
>>> assert table.getDistinctValues('Region') == set(['NonCon', 'Con'])
>>> table = LoadTable('stats.txt', sep=',')
>>> assert table.count("Region == 'NonCon' and Ratio > 1") == 1
SQL-like join operations requires tables have different Title attributes which are not None. We do a standard inner join here for a restricted subset. We must specify the columns that will be used for the join. Here we just use Locus but multiple columns can be used, and their names can be different between the tables. Note that the second table’s title becomes a part of the column names.
>>> rows = [['NP_004893', True], ['NP_005079', True],
... ['NP_005500', False], ['NP_055852', False]]
>>> region_type = LoadTable(header=['Locus', 'LargeCon'], rows=rows,
... title='RegionClass')
>>> stats_table = LoadTable('stats.txt', sep=',', title='Stats')
>>> new = stats_table.joined(region_type, columns_self='Locus')
>>> print new
============================================================
Locus Region Ratio RegionClass_LargeCon
------------------------------------------------------------
NP_004893 Con 121351.4264 True
NP_005079 Con 9516594.9789 True
NP_005500 NonCon 0.0000 False
NP_055852 NonCon 10933217.7090 False
------------------------------------------------------------
>>> table = LoadTable('stats.txt', sep=',')
>>> table.writeToFile('stats_tab.txt', sep='\t')
It is also possible to specify column alignment, table caption and other arguments.
>>> table = LoadTable('stats.txt', sep=',')
>>> print table.tostring(format='latex')
\begin{longtable}[htp!]{ r r r }
\hline
\bf{Locus} & \bf{Region} & \bf{Ratio} \\
\hline
\hline
NP_003077 & Con & 2.5386 \\
NP_004893 & Con & 121351.4264 \\
NP_005079 & Con & 9516594.9789 \\
NP_005500 & NonCon & 0.0000 \\
NP_055852 & NonCon & 10933217.7090 \\
\hline
\end{longtable}
This format allows display of annotation tracks on genome browsers.
>>> rows = [['1', 100, 101, 1.123], ['1', 101, 102, 1.123],
... ['1', 102, 103, 1.123], ['1', 103, 104, 1.123],
... ['1', 104, 105, 1.123], ['1', 105, 106, 1.123],
... ['1', 106, 107, 1.123], ['1', 107, 108, 1.123],
... ['1', 108, 109, 1], ['1', 109, 110, 1],
... ['1', 110, 111, 1], ['1', 111, 112, 1],
... ['1', 112, 113, 1], ['1', 113, 114, 1],
... ['1', 114, 115, 1], ['1', 115, 116, 1],
... ['1', 116, 117, 1], ['1', 117, 118, 1],
... ['1', 118, 119, 2], ['1', 119, 120, 2],
... ['1', 120, 121, 2], ['1', 150, 151, 2],
... ['1', 151, 152, 2], ['1', 152, 153, 2],
... ['1', 153, 154, 2], ['1', 154, 155, 2],
... ['1', 155, 156, 2], ['1', 156, 157, 2],
... ['1', 157, 158, 2], ['1', 158, 159, 2],
... ['1', 159, 160, 2], ['1', 160, 161, 2]]
...
>>> bgraph = LoadTable(header=['chrom', 'start', 'end', 'value'],
... rows=rows)
...
>>> print bgraph.tostring(format='bedgraph', name='test track',
... description='test of bedgraph', color=(255,0,0))
track type=bedGraph name="test track" description="test of bedgraph" color=255,0,0
1 100 108 1.12
1 108 118 1.0
1 118 161 2.0
The bedgraph formatter defaults to rounding values to 2 decimal places. You can adjust that precision using the ``digits`` argument.
>>> print bgraph.tostring(format='bedgraph', name='test track',
... description='test of bedgraph', color=(255,0,0), digits=0)
track type=bedGraph name="test track" description="test of bedgraph" color=255,0,0
1 100 118 1.0
1 118 161 2.0