Skip to content

dexplo¶

A data analysis library comparable to pandas

Installation¶

You must have cython installed. Run python setup.py build_ext --use-cython -i

Main Goals¶

A minimal set of features
Be as explicit as possible
There should be one-- and preferably only one --obvious way to do it.

Data Structures¶

Only DataFrames
No Series

Only Scalar Data Types¶

All data types allow nulls

[x] bool - always 8 bits
[x] int
[x] float
[x] str - stored as a categorical
[x] datetime
[x] timedelta

Column Labels¶

No hierarchical index
Column names must be strings
Column names must be unique

Row Labels¶

No row labels for now
Only a number display on the output

Subset Selection¶

Only one way to select data - [ ]
Subset selection will be explicit and necessitate both rows and columns
Rows will be selected only by integer location
Columns will be selected by either label or integer location. Since columns must be strings, this will not be amibguous
Slice notation is also OK

Development¶

Must use type hints
Must use 3.6+ - fstrings
numpy

Advantages over pandas¶

Easier to write idiomatically
String processing will be much faster
Nulls allowed in each data type
Nearly all operations will be faster

API¶

Attributes¶

[x] size
[x] shape
[x] values
[x] dtypes

Methods¶

Stats

Selection

[x] drop
[x] head
[x] isin
[x] rename
[x] sample
[x] select_dtypes
[x] tail
[x] where

Missing Data

[x] isna
[x] dropna
[x] fillna
[ ] interpolate

Other

[x] append
[x] astype
[x] factorize
[x] groupby
[x] iterrows
[ ] join
[x] melt
[x] pivot
[x] replace
[x] rolling
[x] sort_values
[x] to_csv

Other (after 0.1 release) - [ ] cut - [ ] plot - [ ] profile

Functions

[x] read_csv
[ ] read_sql
[ ] concat

Group By - specifically with groupby method

[x] agg
[x] all
[x] apply
[x] any
[x] corr
[x] count
[x] cov
[x] cumcount
[x] cummax
[x] cummin
[x] cumsum
[x] cumprod
[x] head
[x] first
[ ] fillna
[x] filter
[x] last
[x] max
[x] median
[x] min
[x] ngroups
[x] nunique
[x] prod
[ ] quantile
[ ] rank
[ ] rolling
[x] size
[x] sum
[x] tail
[x] var

str - df.str.<method>

dt - df.dt.<method>

td - df.td.<method>

[ ] ceil
[ ] components
[x] days
[ ] floor
[ ] freq
[x] microseconds
[x] milliseconds
[x] nanoseconds
[ ] round
[x] seconds
[ ] to_pytimedelta