Installation Program_Execution Data_Structures Data_Input Data_Handling Functions Data_Manipulation

Goal

Learn how to read in text data, from a file either in the same folder or in a different folder.

Additional Information

Data input is often much underestimated - but something which you will have to do very often. In my experience, the most flexible way to handle most data input is to use df = pd.read_csv from the Python package Pandas, and then extract the numpy-values with df.values.
For more information, have a look at


Read in a data-file in the same directory

  1. Save the data set 2022_antarctic_mass_loss.csv in a directory of your choice. This file contains the time, mass of anarctic ice, and the corresponding uncertainty. The data have been collected by Wiese et al (2019), and have been taken from EOSDIS Earthdata.
  2. Have a look at this file with a good text-editor.
  3. Start Jupyter (QtConsole or JupyterLab)
  4. Change to that directory with cd <data_dir>
  5. Generate the variable file_name by typing
    file_name = '2022
    and then hitting the tabulator. This should auto-complete the filename to
    file_name = '2022_antarctic_mass_loss.csv
    Terminate this string with ' to
    file_name = '2022_antarctic_mass_loss.csv'
  6. Read in the data with np.loadtxt, taking into consideration that the first 31 lines are a header. Note: In IPython you can get help on a command by typing a ? at the end of the command. For example, you can get help on np.loadtxt by typing np.loadtxt? .
  7. Plot the second column as a function of the first column. (All common plot commands are in matplotlib.pyplot, which is commonly abbreviated as plt.)

Read in a data-file in a different directory

  • Move with cd to a different folder.
  • Define the variable
    data_dir = <...>
  • and combine it with the foldername to the full filename
      import os
      in_file = os.path.joint(data_dir, file_name)
      data = np.loadtxt(in_file, ....)
                          

Read in a pandas DataFrame

For data input, output, and data manipulation pandas is the best tool. pandas has its background in data bases, and as a result uses a different syntax than numpy. The most common data element in pandas is a DataFrame.
  • Read in a DataFrame with the anarctic ice mass, with
      import pandas as pd 
      df = pd.read_csv(in_file, skiprows=31, delim_whitespace=True, header=None)
                          
  • Check the data type of the new variable with
      type(df)
      type(data)
                            
  • Check if you have read in all data, with df.head() and df.tail().
  • Check if you have read in the correct number of columns, by typing df.columns
  • Extract a numpy array with the data values from the DataFrame, with
    data_values = df.values

Data input in a Python program

Write a running, documented program (a .py-file) that preforms the steps described above, for pandas DataFrames. The only step that you really need to change is that you have to terminate a plot with plt.show().

Note: In programs you can NOT use cd to change directories, but have to use os.chdir( ... )!

Solution