{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lab 6: Pandas" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this lab, we'll work through some of the basics of using Pandas, using a few different tabular data sets. Ultimately, one need not do anything particularly fancy with DataFrames for them to be useful as data containers. But we would like to highlight a few extra abilities these objects have, that illustrate situations where we may actually have a strong reason to use pandas over another library. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Problem 1: HII regions + Planetary Nebulae measurements in M81\n", "\n", "For our first data set, we're going to look at a file (`table2.dat`), which contains measurements of the flux and intensity of various ions' line emission from a set of known emitting objects (PNs and HII regions) in the M81 galaxy. \n", "\n", "The columns of this file are `name`, `ion`, `wl`, `flux`, and `I` (intensity). Two of the columns are string-valued (name and ion), three are numerical-values (wl, flux, I). This mix of strings and floats tells us before we even decide how to read in this file that `numpy` data structures won't be usable, as they demand all values in an array to have the same `dtype`. \n", "\n", "### Problem 1.1 \n", "\n", "Using the `pd.read_csv()` function shown in the lecture, read this data file into a dataframe called `df`, and print it. \n", "```{hint}\n", "You can get a \"pretty\" visualization of a dataframe by simply typing its name into a jupyter cell -- as long as it's the last line of the cell, the dataframe will print more nicely than typing `print(df)`. This does not work outside of notebooks.\n", "```" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "%matplotlib inline \n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Problem 1.2 \n", "\n", "Though it doesn't show up in the clean representation above, the strings associated with the name and ion columns above have trailing and leading spaces that we don't want. \n", "\n", "Use a *list comprehension* to modify the data frame such that each value in the name and ion columns are replaced with a `.strip()` version of themselves." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Problem 1.3 \n", "\n", "Write a function `select_object` which takes in as an argument the name of an HII region or planetrary nebula, and filters the dataframe for only the entries for that object using `df.loc[]`. Consider having the dataframe be an optional argument you set to `df`, the dataframe we are \n", "working with.\n", "\n", "Have your function take in an optional argument `drop_empty=True` which additionally selects only those rows where the flux/intensity is **not** zero." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | name | \n", "ion | \n", "wl | \n", "flux | \n", "rms | \n", "I | \n", "
---|---|---|---|---|---|---|
0 | \n", "PN3m | \n", "[OII] | \n", "3727 | \n", "373.9 | \n", "58.6 | \n", "517.3 | \n", "
8 | \n", "PN3m | \n", "HI | \n", "4340 | \n", "50.0 | \n", "3.5 | \n", "58.0 | \n", "
15 | \n", "PN3m | \n", "HI | \n", "4861 | \n", "100.0 | \n", "4.5 | \n", "100.0 | \n", "
16 | \n", "PN3m | \n", "[OIII] | \n", "4959 | \n", "35.4 | \n", "3.8 | \n", "34.4 | \n", "
17 | \n", "PN3m | \n", "[OIII] | \n", "5007 | \n", "104.2 | \n", "5.2 | \n", "99.9 | \n", "
19 | \n", "PN3m | \n", "[NII] | \n", "5755 | \n", "1.3 | \n", "0.3 | \n", "1.1 | \n", "
20 | \n", "PN3m | \n", "HeI | \n", "5876 | \n", "9.1 | \n", "0.3 | \n", "7.2 | \n", "
24 | \n", "PN3m | \n", "[NII] | \n", "6548 | \n", "59.8 | \n", "3.5 | \n", "42.2 | \n", "
25 | \n", "PN3m | \n", "HI | \n", "6563 | \n", "412.0 | \n", "6.9 | \n", "290.1 | \n", "
26 | \n", "PN3m | \n", "[NII] | \n", "6584 | \n", "142.5 | \n", "4.5 | \n", "100.0 | \n", "
28 | \n", "PN3m | \n", "[SII] | \n", "6717 | \n", "60.3 | \n", "3.8 | \n", "41.4 | \n", "
29 | \n", "PN3m | \n", "[SII] | \n", "6731 | \n", "44.3 | \n", "3.4 | \n", "30.4 | \n", "
\n", " | id | \n", "x | \n", "y | \n", "ra | \n", "dec | \n", "faper_f160w | \n", "eaper_f160w | \n", "faper_f140w | \n", "eaper_f140w | \n", "f_f160w | \n", "... | \n", "irac2_contam | \n", "irac3_contam | \n", "irac4_contam | \n", "contam_flag | \n", "f140w_flag | \n", "use_phot | \n", "near_star | \n", "nexp_f125w | \n", "nexp_f140w | \n", "nexp_f160w | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1 | \n", "11876.639 | \n", "1632.890 | \n", "53.093012 | \n", "-27.954546 | \n", "55.142755 | \n", "0.046190 | \n", "-99.000000 | \n", "-99.000000 | \n", "152.454867 | \n", "... | \n", "0.000031 | \n", "0.000187 | \n", "0.001174 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "4.0 | \n", "0.0 | \n", "4.0 | \n", "
1 | \n", "2 | \n", "12056.715 | \n", "1321.055 | \n", "53.089613 | \n", "-27.959742 | \n", "0.530063 | \n", "0.077372 | \n", "-99.000000 | \n", "-99.000000 | \n", "0.638394 | \n", "... | \n", "-99.000000 | \n", "-99.000000 | \n", "-99.000000 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "2.0 | \n", "0.0 | \n", "1.0 | \n", "
2 | \n", "3 | \n", "11351.875 | \n", "1327.244 | \n", "53.102913 | \n", "-27.959642 | \n", "0.467791 | \n", "0.200590 | \n", "-99.000000 | \n", "-99.000000 | \n", "0.714355 | \n", "... | \n", "-99.000000 | \n", "-99.000000 | \n", "-99.000000 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1.0 | \n", "0.0 | \n", "1.0 | \n", "
3 | \n", "4 | \n", "11415.681 | \n", "1396.836 | \n", "53.101709 | \n", "-27.958481 | \n", "12.497384 | \n", "0.086093 | \n", "-99.000000 | \n", "-99.000000 | \n", "27.270285 | \n", "... | \n", "0.057395 | \n", "0.206347 | \n", "0.000656 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "2.0 | \n", "0.0 | \n", "2.0 | \n", "
4 | \n", "5 | \n", "11385.570 | \n", "1384.729 | \n", "53.102277 | \n", "-27.958683 | \n", "1.101740 | \n", "0.087183 | \n", "-99.000000 | \n", "-99.000000 | \n", "1.412912 | \n", "... | \n", "2.027536 | \n", "0.575527 | \n", "-2.653543 | \n", "1 | \n", "0 | \n", "1 | \n", "0 | \n", "2.0 | \n", "0.0 | \n", "2.0 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
50502 | \n", "50503 | \n", "3207.811 | \n", "18767.998 | \n", "53.256225 | \n", "-27.668900 | \n", "0.083831 | \n", "0.017599 | \n", "0.009053 | \n", "0.083001 | \n", "0.151608 | \n", "... | \n", "-0.858407 | \n", "-99.000000 | \n", "-99.000000 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "24.0 | \n", "4.0 | \n", "26.0 | \n", "
50503 | \n", "50504 | \n", "3319.077 | \n", "18889.404 | \n", "53.254129 | \n", "-27.666879 | \n", "0.030584 | \n", "0.017599 | \n", "0.079141 | \n", "0.082396 | \n", "0.033300 | \n", "... | \n", "-1.151515 | \n", "-99.000000 | \n", "-99.000000 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "24.0 | \n", "4.0 | \n", "26.0 | \n", "
50504 | \n", "50505 | \n", "7634.091 | \n", "18915.908 | \n", "53.172928 | \n", "-27.666490 | \n", "0.303036 | \n", "0.024853 | \n", "-99.000000 | \n", "-99.000000 | \n", "0.555012 | \n", "... | \n", "1.567742 | \n", "-0.129032 | \n", "-1.215094 | \n", "1 | \n", "0 | \n", "1 | \n", "0 | \n", "6.0 | \n", "0.0 | \n", "6.0 | \n", "
50505 | \n", "50506 | \n", "8669.859 | \n", "18840.100 | \n", "53.153437 | \n", "-27.667759 | \n", "0.416449 | \n", "0.024596 | \n", "-99.000000 | \n", "-99.000000 | \n", "0.526232 | \n", "... | \n", "1.182879 | \n", "0.794521 | \n", "-0.668966 | \n", "1 | \n", "0 | \n", "0 | \n", "1 | \n", "6.0 | \n", "0.0 | \n", "6.0 | \n", "
50506 | \n", "50507 | \n", "3041.903 | \n", "18822.670 | \n", "53.259346 | \n", "-27.667986 | \n", "0.030183 | \n", "0.017599 | \n", "0.011191 | \n", "0.084154 | \n", "0.036352 | \n", "... | \n", "-99.000000 | \n", "-99.000000 | \n", "-99.000000 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "24.0 | \n", "4.0 | \n", "26.0 | \n", "
50507 rows × 141 columns
\n", "\n", " | z | \n", "lmass | \n", "lsfr | \n", "l153 | \n", "l155 | \n", "l161 | \n", "U-V | \n", "V-J | \n", "
---|---|---|---|---|---|---|---|---|
id | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
44 | \n", "0.68402 | \n", "9.89 | \n", "-1.28 | \n", "3.76739 | \n", "14.43560 | \n", "46.5213 | \n", "1.458486 | \n", "1.270543 | \n", "
92 | \n", "0.69426 | \n", "9.76 | \n", "-0.03 | \n", "8.08497 | \n", "21.07790 | \n", "48.3969 | \n", "1.040372 | \n", "0.902476 | \n", "
178 | \n", "0.94790 | \n", "10.36 | \n", "-1.90 | \n", "1.94721 | \n", "8.60585 | \n", "49.2509 | \n", "1.613452 | \n", "1.894051 | \n", "
194 | \n", "0.65070 | \n", "10.12 | \n", "0.33 | \n", "13.25220 | \n", "42.32950 | \n", "112.5340 | \n", "1.260888 | \n", "1.061602 | \n", "
236 | \n", "0.53257 | \n", "10.35 | \n", "-0.74 | \n", "8.31773 | \n", "42.76950 | \n", "111.6020 | \n", "1.777823 | \n", "1.041345 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
49702 | \n", "0.53180 | \n", "9.97 | \n", "0.36 | \n", "41.66110 | \n", "91.52100 | \n", "200.9220 | \n", "0.854475 | \n", "0.853767 | \n", "
50000 | \n", "0.83535 | \n", "9.84 | \n", "0.23 | \n", "9.25872 | \n", "21.86220 | \n", "52.2972 | \n", "0.932857 | \n", "0.946961 | \n", "
50399 | \n", "0.68057 | \n", "10.46 | \n", "-10.14 | \n", "6.08630 | \n", "38.30250 | \n", "119.6430 | \n", "1.997184 | \n", "1.236650 | \n", "
50494 | \n", "0.71150 | \n", "9.59 | \n", "-0.71 | \n", "13.17360 | \n", "25.57520 | \n", "41.6413 | \n", "0.720286 | \n", "0.529263 | \n", "
50500 | \n", "0.57982 | \n", "9.79 | \n", "-1.38 | \n", "10.99390 | \n", "29.39910 | \n", "67.9242 | \n", "1.067956 | \n", "0.909226 | \n", "
669 rows × 8 columns
\n", "