The Tendril Finder - WIP

The data science component of Skidmore extragalactic research

Detects structures in the universe -- namely filaments and tendrils, using graph theory (and eventually machine learning)

Read about the goals/description of this project Here

Progress Update #1

Data Science: This week's main focus was preparing our data, along with the catalog, for phase 2 of this research project -- developing a machine learning model to reduce the computational complexity of tendril detection


Web Development: This week's main focus was dockerizing the application, and getting it deployed to a temporary server!

Preparing the Data

One of the major challenges so far with this research project is the vast amount and variance of data. On Wednesday during our research meeting with Professor Odekon, we went over the SDSS Optical Properties / SDSS Derived Properties spreadsheets from the ALFALFA SDSS survey, along with the spreadsheets used by Professor Odekon in her 2018 research paper that we are building off of.

Specifically, we reviewed the physical meaning of each column, simultaneously determining which features would be suitable to use in our machine learning model. These are our results:

Potential Features

These galactic properties were deemed to be promising candidates for feature engineering, so they are the ones we explored first

logH1mass
The mass of the galaxy, in solar masses
NND3to10
Nearest neighbor density (average). We expect tendrils to be located in underdense environments, and group/filament members to be in overdense environments
H1Def25*
Hydrogen gas deficiency, describes whether the galaxy has more/less H1 gas for its size. Professor Odekon's research in 2018 concluded that tendrils have abnormally high H1 gas for their mass and location, so this property will likely be our most important feature.
logMstarTaylor
Stellar mass of the galaxy
GMIshao
"Measure" of the color of the galaxy.
SFR22*
Star Formation Rate. Our hypothesis is that if tendrils have abnormally high excess H1 gas, they might also have a high rate of star formation.

Project Description

The universe is not a completely random distribution of matter. Instead, cosmologists have noticed certain patterns and structures that emerge from the seemingly disordered nature of extragalactic space. The term "cosmic web" is often used to describe the universe, and paints a decent picture of the large scale structure of it. Huge conglomerations of matter form galaxy clusters, which interconnect with each other through long strands of matter (making up the "strings" of the cosmic web). These supergalactic "highways" are referred to as filaments, and are the largest known structures in the universe.

Bounded by these supergalactic walls are massive voids, where there is much less activity, due to a lack of available excess matter. While at first glance these voids seem uninteresting, there are some interesting developments that can be seen in these regions that are seemingly devoid of galactic activity.

The term "Tendril" refers to basically a miniature filament, found within these regions of underdense space. Our research advisor, Professor Odekon, coined the term tendril in a published 2018 paper. Interestingly, these regions correlate to higher hydrogen gas contents, implying higher than normal star formation rates. Starting from this hypothesis, we hope to explore the nature of these celestial objects.

Phase I of our research revolves around strengthing the definition of a "Tendril". Currently, Tendrils are found through a series of graph theory techniques, applied to the universe as a graph. Galaxies and superclusters of galaxies are treated as nodes, with edges forming filaments between them. My first job as a researcher was rewriting the original 'tendril definer' code, using contemporary Python libraries like Numpy, Pandas, and NetworkX.

However, through our definition of tendrils, there are a few arbitrary choices made, specifically in parameters that govern the nature of the graph theory algorithms (mainly Kruskal's algorithm for a minimal/maximal spanning tree). We want these parameters to be optimized, such that they are based on the properties of the tendrils themselves. For that, we are applying machine learning methods (the current state of the project)