formulas: An Excel formulas interpreter in Python.¶
release: | 0.3.0 |
---|---|
date: | 2019-04-24 13:30:00 |
repository: | |
pypi-repo: | |
docs: | |
wiki: | |
download: | |
donate: | |
keywords: | excel, formulas, interpreter, compiler, dispatch |
developers: |
|
license: |
What is formulas?¶
formulas implements an interpreter for Excel formulas, which parses and compile Excel formulas expressions.
Moreover, it compiles Excel workbooks to python and executes without using the Excel COM server. Hence, Excel is not needed.
Installation¶
To install it use (with root privileges):
$ pip install formulas
Or download the last git version and use (with root privileges):
$ python setup.py install
Install extras¶
Some additional functionality is enabled installing the following extras:
- excel: enables to compile Excel workbooks to python and execute using:
ExcelModel
. - plot: enables to plot the formula ast and the Excel model.
To install formulas and all extras, do:
$ pip install formulas[all]
What is formulas?¶
formulas implements an interpreter for Excel formulas, which parses and compile Excel formulas expressions.
Moreover, it compiles Excel workbooks to python and executes without using the Excel COM server. Hence, Excel is not needed.
Installation¶
To install it use (with root privileges):
$ pip install formulas
Or download the last git version and use (with root privileges):
$ python setup.py install
Install extras¶
Some additional functionality is enabled installing the following extras:
- excel: enables to compile Excel workbooks to python and execute using:
ExcelModel
. - plot: enables to plot the formula ast and the Excel model.
To install formulas and all extras, do:
$ pip install formulas[all]
Basic Examples¶
The following sections will show how to:
- parse a Excel formulas;
- load, compile, and execute a Excel workbook;
- extract a sub-model from a Excel workbook;
- add a custom function.
Parsing formula¶
An example how to parse and execute an Excel formula is the following:
>>> import formulas
>>> func = formulas.Parser().ast('=(1 + 1) + B3 / A2')[1].compile()
To visualize formula model and get the input order you can do the following:
>>> list(func.inputs) ['A2', 'B3'] >>> func.plot(view=False) # Set view=True to plot in the default browser. SiteMap([(=((1 + 1) + (B3 / A2)), SiteMap())])
Finally to execute the formula and plot the workflow:
>>> func(1, 5) Array(7.0, dtype=object) >>> func.plot(workflow=True, view=False) # Set view=True to plot in the default browser. SiteMap([(=((1 + 1) + (B3 / A2)), SiteMap())])
Excel workbook¶
An example how to load, calculate, and write an Excel workbook is the following:
>>> import formulas
>>> fpath = 'file.xlsx'
>>> xl_model = formulas.ExcelModel().loads(fpath).finish()
>>> xl_model.calculate()
Solution(...)
>>> xl_model.write()
{'EXCEL.XLSX': {Book: <openpyxl.workbook.workbook.Workbook ...>}}
Tip
If you have or could have circular references, add circular=True to finish method.
To plot the dependency graph that depict relationships between Excel cells:
>>> dsp = xl_model.dsp >>> dsp.plot(view=False) # Set view=True to plot in the default browser. SiteMap([(ExcelModel, SiteMap())])
To overwrite the default inputs that are defined by the excel file or to impose some value to a specific cell:
>>> xl_model.calculate(
... inputs={
... "'[EXCEL.XLSX]DATA'!A2": 3, # To overwrite the default value.
... "'[EXCEL.XLSX]DATA'!B3": 1 # To impose a value to B3 cell.
... },
... outputs=[
... "'[EXCEL.XLSX]DATA'!C2", "'[EXCEL.XLSX]DATA'!C4"
... ] # To define the outputs that you want to calculate.
... )
Solution([("'[EXCEL.XLSX]DATA'!A2", <Ranges>('[EXCEL.XLSX]DATA'!A2)=[[3]]),
("'[EXCEL.XLSX]DATA'!A3", <Ranges>('[EXCEL.XLSX]DATA'!A3)=[[6]]),
("'[EXCEL.XLSX]DATA'!B3", <Ranges>('[EXCEL.XLSX]DATA'!B3)=[[1]]),
("'[EXCEL.XLSX]DATA'!B2", <Ranges>('[EXCEL.XLSX]DATA'!B2)=[[9.0]]),
("'[EXCEL.XLSX]DATA'!C2", <Ranges>('[EXCEL.XLSX]DATA'!C2)=[[9.0]]),
("'[EXCEL.XLSX]DATA'!C4", <Ranges>('[EXCEL.XLSX]DATA'!C4)=[[1.0]])])
To build a single function out of an excel model with fixed inputs and outputs, you can use the compile method of the ExcelModel that returns a DispatchPipe. This is a function where the inputs and outputs are defined by the data node ids (i.e., cell references).
>>> func = xl_model.compile( ... inputs=[ ... "'[EXCEL.XLSX]DATA'!A2", # First argument of the function. ... "'[EXCEL.XLSX]DATA'!B3" # Second argument of the function. ... ], # To define function inputs. ... outputs=[ ... "'[EXCEL.XLSX]DATA'!C2", "'[EXCEL.XLSX]DATA'!C4" ... ] # To define function outputs. ... ) >>> func <schedula.utils.dsp.DispatchPipe object at ...> >>> [v.value[0, 0] for v in func(3, 1)] # To retrieve the data. [9.0, 1.0] >>> func.plot(view=False) # Set view=True to plot in the default browser. SiteMap([(ExcelModel, SiteMap())])
Custom functions¶
An example how to add a custom function to the formula parser is the following:
>>> import formulas
>>> FUNCTIONS = formulas.get_functions()
>>> FUNCTIONS['MYFUNC'] = lambda x, y: 1 + y + x
>>> func = formulas.Parser().ast('=MYFUNC(1, 2)')[1].compile()
>>> func()
4
Next moves¶
Things yet to do: implement the missing Excel formulas.
Contributing to formulas¶
If you want to contribute to formulas and make it better, your help is very welcome. The contribution should be sent by a pull request. Next sections will explain how to implement and submit a new excel function:
- clone the repository
- implement a new function/functionality
- open a pull request
Clone the repository¶
The first step to contribute to formulas is to clone the repository:
- Create a personal fork of the formulas repository on Github.
- Clone the fork on your local machine.
Your remote repo on Github is called
origin
. - Add
the original repository as a remote called
upstream
, to maintain updated your fork. - If you created your fork a while ago be sure to pull
upstream
changes into your local repository. - Create a new branch to work on! Branch from
dev
.
How to implement a new function¶
Before coding, study
the Excel function that you want to implement. If there is something similar
implemented in formulas, try to get inspired by the implemented code (I mean,
not reinvent the wheel) and to use numpy
. Follow the code style of the
project, including indentation. Add or change the documentation as needed.
Make sure that you have implemented the full function syntax, including the
array syntax.
Test cases are very important. This library uses a data-driven testing approach.
To implement a new function I recommend the test-driven development cycle. Hence, when you implement a new function,
you should write new test cases in test_cell/TestCell.test_output
suite to
execute in the cycle loop. When you think that the code is ready, add new raw
test in test/test_files/test.xlsx
(please follow the standard used for other
functions) and run the test_excel/TestExcelModel.test_excel_model
. This
requires more time but is needed to test the array syntax and to check if
the Excel documentation respects the reality.
When all test cases are ok (python setup.py test
), open a pull request.
Do do list:
- Study the excel function syntax and behaviour when used as array formula.
- Check if there is something similar implemented in formulas.
- Implement/fix your feature, comment your code.
- Write/adapt tests and run them!
Tip
Excel functions are categorized by their functionality. If you are
implementing a new functionality group, add a new module in
formula/function
and in formula.function.SUBMODULES
and a new
worksheet in test/test_files/test.xlsx
(please respect the format).
Note
A pull request without new test case will not be taken into consideration.
How to open a pull request¶
Well done! Your contribution is ready to be submitted:
- Squash your commits into a single commit with git’s interactive rebase. Create a new branch if necessary. Always write your commit messages in the present tense. Your commit message should describe what the commit, when applied, does to the code – not what you did to the code.
- Push your branch to
your fork on Github (i.e.,
git push origin dev
). - From your fork open a pull request in the correct branch.
Target the project’s
dev
branch! - Once the pull request is approved and merged you can pull the changes from
upstream
to your local repo and delete your extra branch(es).
Donate¶
If you want to support the formulas development please donate and add your excel function preferences. The selection of the functions to be implemented is done considering the cumulative donation amount per function collected by the campaign.
Note
The cumulative donation amount per function is calculated as the example:
Function | Donator 1 | Donator 2 | Donator 3 | TOT | Implementation order |
---|---|---|---|---|---|
150€ | 120€ | 50€ | |||
SUM | 50€ | 40€ | 25€ | 125€ | 1st |
SIN | 50€ | 25€ | 75€ | 3rd | |
TAN | 50€ | 40€ | 90€ | 2nd | |
COS | 40€ | 40€ | 4th |
API Reference¶
The core of the library is composed from the following modules:
It contains a comprehensive list of all modules and classes within formulas.
Modules:
parser |
It provides formula parser class. |
builder |
It provides AstBuilder class. |
errors |
Defines the formulas exception. |
tokens |
It provides tokens needed to parse the Excel formulas. |
functions |
It provides functions implementations to compile the Excel functions. |
ranges |
It provides Ranges class. |
cell |
It provides Cell class. |
excel |
It provides Excel model class. |
Changelog¶
v0.3.0 (2019-04-24)¶
Feat¶
Fix¶
- (tokens): Correct string nodes ids format adding “.
- (ranges): Correct behaviour union of ranges.
- (import): Enable PyCharm autocomplete.
- (import): Save imports.
- (test): Add repo path to system path.
- (parser): Parse empty args for functions.
- (functions) #30: Correct implementation of GCD and LCM.
- (ranges) #24: Enable full column and row reference.
- (excel): Correct bugs due to new openpyxl.
v0.2.0 (2018-12-11)¶
Fix¶
- (core): Add python 3.7 and drop python 3.4.
- (excel): Make ExcelModel dillable and pickable.
- (builder): Avoid FormulaError exception during formulas compilation.
- (excel): Correct bug when compiling excel with circular references.
v0.1.3 (2018-10-09)¶
Fix¶
- (functions) #18: Enable check_error in IF function just for the first argument.
- (functions) #18: Disable input_parser in IF function to return any type of values.
- (rtd): Define fpath from prj_dir for rtd.
- (rtd): Add missing requirements openpyxl for rtd.
- (setup): Patch to use sphinxcontrib.restbuilder in setup long_description.
Other¶
- Update documentation.
- Replace excel with Excel.
- Create PULL_REQUEST_TEMPLATE.md.
- Update issue templates.
- Update copyright.
- (doc): Update author mail.
v0.1.2 (2018-09-12)¶
Feat¶
- (functions) #14: Add ROW and COLUMN.
- (cell): Pass cell reference when compiling cell + new function struct with dict to add inputs like CELL.
Fix¶
- (ranges): Replace system max size with excel max row and col.
- (tokens): Correct number regex.
v0.1.1 (2018-09-11)¶
Feat¶
- (contrib): Add contribution instructions.
- (setup): Add additional project_urls.
- (setup): Update Development Status to 4 - Beta.
v0.0.10 (2018-06-05)¶
Feat¶
- (look): Simplify _get_type_id function.
Fix¶
- (functions): Correct ImportError for FUNCTIONS.
- (operations): Correct behaviour of the basic operations.
v0.0.9 (2018-05-28)¶
Feat¶
- (excel): Improve performances pre-calculating the range format.
- (core): Improve performances using DispatchPipe instead SubDispatchPipe when compiling formulas.
- (function): Improve performances setting errstate outside vectorization.
- (core): Improve performances of range2parts function (overall 50% faster).
v0.0.8 (2018-05-23)¶
Feat¶
- (functions): Add MATCH, LOOKUP, HLOOKUP, VLOOKUP functions.
- (excel): Add method to compile ExcelModel.
- (travis): Run coveralls in python 3.6.
- (functions): Add FIND,`LEFT`,`LEN`,`LOWER`,`MID`,`REPLACE`,`RIGHT`,`TRIM`, and`UPPER` functions.
- (functions): Add IRR function.
- (formulas): Custom reshape to Array class.
- (functions): Add ISO.CEILING, SQRTPI, TRUNC functions.
- (functions): Add ROUND, ROUNDDOWN, ROUNDUP, SEC, SECH, SIGN functions.
- (functions): Add DECIMAL, EVEN, MROUND, ODD, RAND, RANDBETWEEN functions.
- (functions): Add FACT and FACTDOUBLE functions.
- (functions): Add ARABIC and ROMAN functions.
- (functions): Parametrize function wrap_ufunc.
- (functions): Split function raise_errors adding get_error function.
- (ranges): Add custom default and error value for defining ranges Arrays.
- (functions): Add LOG10 function + fix LOG.
- (functions): Add CSC and CSCH functions.
- (functions): Add COT and COTH functions.
- (functions): Add FLOOR, FLOOR.MATH, and FLOOR.PRECISE functions.
- (test): Improve log message of test cell.
Fix¶
- (rtd): Update installation file for read the docs.
- (functions): Remove unused functions.
- (formulas): Avoid too broad exception.
- (functions.math): Drop scipy dependency for calculate factorial2.
- (functions.logic): Correct error behaviour of if and iferror functions + add BroadcastError.
- (functions.info): Correct behaviour of iserr function.
- (functions): Correct error behaviour of average function.
- (functions): Correct iserror and iserr returning a custom Array.
- (functions): Now xceiling function returns np.nan instead Error.errors[‘#NUM!’].
- (functions): Correct is_number function, now returns False when number is a bool.
- (test): Ensure same order of workbook comparisons.
- (functions): Correct behaviour of min max and int function.
- (ranges): Ensure to have a value with correct shape.
- (parser): Change order of parsing to avoid TRUE and FALSE parsed as ranges or errors as strings.
- (function):Remove unused kwargs n_out.
- (parser): Parse error string as formulas.
- (readme): Remove downloads_count because it is no longer available.
v0.0.7 (2017-07-20)¶
Fix¶
- (install): Force update setuptools>=36.0.1.
- (functions): Correct iserror iserr functions.
- (ranges): Replace ‘#N/A’ with ‘’ as empty value when assemble values.
- (functions) #4: Remove check in ufunc when inputs have different size.
- (functions) #4: Correct power, arctan2, and mod error results.
- (functions) #4: Simplify ufunc code.
- (test) #4: Check that all results are in the output.
- (functions) #4: Correct atan2 argument order.
- (range) #5: Avoid parsing function name as range when it is followed by (.
- (operator) #3: Replace strip with replace.
- (operator) #3: Correct valid operators like ^- or *+.
Other¶
- Made the ufunc wrapper work with multi input functions, e.g., power, mod, and atan2.
- Created a workbook comparison method in TestExcelModel.
- Added MIN and MAX to the test.xlsx.
- Cleaned up the ufunc wrapper and added min and max to the functions list.
- Relaxed equality in TestExcelModel and made some small fixes to functions.py.
- Added a wrapper for numpy ufuncs, mapped some Excel functions to ufuncs and provided tests.
v0.0.6 (2017-05-31)¶
Fix¶
- (plot): Update schedula to 0.1.12.
- (range): Sheet name without commas has this [^Wd][w.] format.