Differences

This shows you the differences between two versions of the page.

--- round_vs._format [2023/02/13 23:07] – raju
+++ print_hundredths [2023/02/14 23:22] – [align hundredths column with spaces] raju
@@ Line 1: / Line 1: @@
-===== round vs. format =====
+===== print hundredths =====
-==== write simple data ====
+Let's define hundredths as numbers with two decimal digits. This can be money amounts in dollars and cents.
-<code>
-$ ipython
+tags | pennies, dollar-cent amounts, print two digits after decimal
-Python 3.10.9 | packaged by conda-forge | (main, Jan 11 2023, 15:15:40) [MSC v.1916 64 bit (AMD64)]
-IPython 8.8.0 -- An enhanced Interactive Python. Type '?' for help.
+==== write single numbers ====
+There are two possible ways - round, format expression. I prefer the format expression as it always gives the same number of digits after the decimal.
+^ ^ round ^ format ^
+| output type | float | string |
+| ::: | <code>
 In [1]:
 a = 10.30467
 In [2]:
-'{:.2f}'.format(a)
+type(round(a,2))
 Out[2]:
-'10.30'
+float
 In [3]:
@@ Line 18: / Line 22: @@
 Out[3]:
 str
+</code> ||
+| number of digits after the decimal point | varies | always two |
+| ::: | <code>
 In [4]:
-round(a,2)
+a = 10.30467
-Out[4]:
-.3
 In [5]:
-type(round(a,2))
+round(a,2)
 Out[5]:
-float
+.3
-</code>
+In [6]:
+'{:.2f}'.format(a)
+Out[6]:
+'10.30'
+</code> ||
+Tested with | Python 3.10.9, ipython 8.8.0
+tags | round vs. format
+==== write dataframe to csv files ====
+If you round and dump the data into a csv file, it does not align around the decimal point. The result is also difficult to align using command line tools.
-Conclusions:
+On the other hand, if the data is formatted using the format expression, it will still not align but can be aligned using command line tools.
-  * The output of round is a floating point number. The output of format is a string
-  * To output dollars and pennies, format expression is better than rounding as it always gives the same number of digits after the decimal point.
-==== Write data into csv files ====
+For example, consider
-Create some sample data
 <code>
 $ ipython
 Python 3.10.9 | packaged by conda-forge | (main, Jan 11 2023, 15:15:40) [MSC v.1916 64 bit (AMD64)]
+Type 'copyright', 'credits' or 'license' for more information
 IPython 8.8.0 -- An enhanced Interactive Python. Type '?' for help.
 In [1]:
 import pandas as pd
-df = pd.DataFrame({'symbol': ['A', 'B', 'C', 'D'], 'price': [8.222, 7.007, 3.971, 9.801], 'change': [6.601, 7.241, -9.341, 48.001]})
+df = pd.DataFrame({
+  'symbol': ['A', 'B', 'C', 'D'],
+  'price': [8.222, 7.007, 3.971, 9.801],
+  'change': [6.601, 7.241, -9.341, 48.001]})
 df
 Out[1]:
@@ Line 53: / Line 71: @@
 </code>
-If you round and dump the data to a file, it does not align around the decimal point
+If it is rounded and dumped into a csv file
 <code>
 In [2]:
 df.round({'price':2, 'change': 2}).to_csv('x/foo1.csv', index=False, lineterminator='\n')
 </code>
+The result does not align
 <code>
 $ cat ~/x/foo1.csv
@@ Line 66: / Line 84: @@
 C,3.97,-9.34
 D,9.8,48.0
+</code>
+and can't easily be aligned using other command line tools
+<code>
 $ cat ~/x/foo1.csv | column -t -s, -R 2,3
 symbol  price  change
@@ Line 75: / Line 95: @@
 </code>
-But if we format the data, it can be aligned easily
+However, if format expression is used
 <code>
 In [3]:
@@ Line 84: / Line 104: @@
 df2.to_csv('x/foo2.csv', index=False, lineterminator='\n')
 </code>
+the result still does not align
 <code>
 $ cat ~/x/foo2.csv
@@ Line 92: / Line 112: @@
 C,3.97,-9.34
 D,9.80,48.00
+</code>
+but can be using command line tools
+<code>
 $ cat ~/x/foo2.csv | column -t -s, -R 2,3
 symbol  price  change
@@ Line 103: / Line 125: @@
 Ref:- https://stackoverflow.com/questions/20003290/output-different-precision-by-column-with-pandas-dataframe-to-csv - shows how to format different columns with different precision.
-tags | print two digits after decimal, float_format by column
+tags | round vs. format, float_format by column
+==== align hundredths column with spaces ====
+Use
+<code>
+import pandas as pd
+from tabulate import tabulate
+def to_fwf(df, fname):
+    content = tabulate(df.values.tolist(), list(df.columns), tablefmt="plain")
+    with open(fname, "w") as FileObj:
+        FileObj.write(content)
+pd.DataFrame.to_fwf = to_fwf
+</code>
+For example, consider
+<code>
+$ ipython
+Python 3.10.9 | packaged by conda-forge | (main, Jan 11 2023, 15:15:40) [MSC v.1916 64 bit (AMD64)]
+Type 'copyright', 'credits' or 'license' for more information
+IPython 8.8.0 -- An enhanced Interactive Python. Type '?' for help.
+In [1]:
+import pandas as pd
+df = pd.DataFrame({
+  'symbol': ['A', 'B', 'C', 'D'],
+  'price': [8.222, 7.007, 3.971, 9.801],
+  'change': [6.601, 7.241, -9.341, 48.001]})
+df
+Out[1]:
+  symbol  price  change
+      A  8.222   6.601
+      B  7.007   7.241
+      C  3.971  -9.341
+      D  9.801  48.001
+In [2]:
+import pandas as pd
+from tabulate import tabulate
+def to_fwf(df, fname):
+    content = tabulate(df.values.tolist(), list(df.columns), tablefmt="plain")
+    with open(fname, "w") as FileObj:
+        FileObj.write(content)
+pd.DataFrame.to_fwf = to_fwf
+</code>
+round the data and dump it
+<code>
+In [3]:
+df.round({'price':2, 'change': 2}).to_fwf('x/foo3.txt')
+</code>
+the result is aligned and space separated
+<code>
+$ cat ~/x/foo3.txt
+symbol      price    change
+A            8.22      6.6
+B            7.01      7.24
+C            3.97     -9.34
+D            9.8      48
+</code>
+You can also do it using format expression
+<code>
+In [4]:
+formats = {'price': '{:.2f}', 'change': '{:.2f}'}
+df2 = df.copy()
+for col, f in formats.items():
+    df2[col] = df2[col].apply(lambda x: f.format(x))
+df2.to_fwf('x/foo4.txt')
+</code>
+which gives the same result
+<code>
+$ cat ~/x/foo4.txt
+symbol      price    change
+A            8.22      6.6
+B            7.01      7.24
+C            3.97     -9.34
+D            9.8      48
+</code>
+See also:-
+  * https://stackoverflow.com/a/35974742 - initial version of the function is from here.
+  * To see it in action
+    * https://github.com/KamarajuKusumanchi/market_data_processor/blob/master/src/utils/DataFrameUtils.py - I generalized the original version to print index if needed.
+    * https://github.com/KamarajuKusumanchi/market_data_processor/blob/master/tests/src/utils/test_DataFrameUtils.py - test cases
+  * https://pypi.org/project/tabulate/