Skip to main content

Five-College Datafest


Spring 2023: Won "Best in show" (1st place) in the ASA Five College DataFest. Collaborated in a team of 6 (Clara Li, Nikki Lin, Quinn White, Rose Porta, Kushagra Srivastava) to create an analysis of consumer data provided by a certain legal firm. We had a 3-fold approach, wherein I contributed towards the text wrangling/NLP stuff.

This was the first ASA Five-College DataFest team which was inter-college (Smith College and UMass Amherst). We were able to synthesize our university's main teaching aspects into our final product, thus making it an inter-disciplinary win :)


Without giving away to much data due to its sensitive nature (we had to sign a contract): we essentially received a relative database wherein communications between consumers and a legal firm were recorded, along with other details on consumer actions with the firm. We were to analyze this data and determine how to equip people employed to better serve the consumer needs.


Link to official report on the event standings

Link to GitHub Repository

  • GLM.Rmd: all R code for logistic regression model.
  • GLM.html: knitted output ofGLM.Rmd.
  • textWrangling.ipynb: python notebook with text analysis.