Software Training
Navigating the digital landscape has become essential for social scientists, as data analysis, visualization, and management increasingly underpin research in this field. To support this shift, we’ve curated a list of software training resources tailored specifically to social scientists. Whether you're new to data analysis or looking to expand your skills in advanced techniques, these resources cover a range of tools and platforms that are vital for modern social science research. This collection is designed to help you efficiently manage and analyze data, enhance the rigor of your research, and ultimately, make more informed contributions to your field. Additional software training resources are available via the UChicago Library.
Find the full PDF guide here.
Python is a high-level, general-purpose programming language, supporting multiple programming paradigms, including structured, object-oriented and functional programming. Python can be used to build websites, create software applications, automate tasks, and perform data analysis, visualization, and machine learning.
- Essential Training (Linkedin Learning)
- Length: 4 hours, 20 min
What You Learn: how to run Python code, explore Python data structures, understand control flow, and work with files
- Data Visualization
- Basic Data Visualization (Linkedin Learning)
- Length: 2 hours, 20 min
What You Learn: how to use NumPy and Panda packages for data processing, basic data visualization
- Advanced Data Visualization (DataCamp)
- Length: 3 hours, 50 min
What You Learn: capabilities of the Matplotlib package, advanced plotting technique
Introduction to NumPy document
- Basic Data Visualization (Linkedin Learning)
- Statistical Analyses (Coursera) - accessible through free audit version
- Length: 1 hour, 50 min
- What You Learn: various statistical tests; performing t/z tests, ANOVA, and regression analyses
- Machine Learning/Deep Learning
- Watch the following videos to learn about machine learning and its applications in Python:
- Web Scraping (extracting data from webpages)
MATLAB is a programming and numeric computing platform used to analyze data, develop algorithms, and create models. MATLAB’s Simulink package supports simulation and Model-Based Design, which are integral for multidisciplinary projects in control systems, computational finance, and other fields.
Recommended Tutorials:
- Essential Training (LinkedIn Learning)
- Length: 2 hours
What You Learn: familiarity with the Matlab interface and syntax, basic plotting
- Data Visualization (Coursera)
- Length: 2 hours
What You Learn: how to load, prepare, and visualize data in Matlab, perform basic computations, and communicate results
- Overview of Matlab Commands document
- Overview of Simulink Package
- Length: 1 hour
- What You Learn: how to create basic Simulink models and run simulations
Structured query language (SQL) is a domain specific language used to manage data, especially in a relational database management system. It is most useful for handling structured data.
Recommended Tutorials:
- Basic SQL Commands (LinkedIn Learning)
- Length: 1 hour, 50 minutes
What You Learn: how to query data from databases, aggregate data, transform and perform mathematical operations on data, and apply SQL commands to edit data.
High-performance computing (HPC) is the process of aggregating computing resources in order to gain greater performance than what is achieved from using a single workstation/server. It uses clusters of computers to solve advanced computing tasks.
Recommended Tutorials:
- Basics of Git (O’Reilly)
- Length: 1 hour, 20 min
- What You Learn: how to utilize Git repositories on various operating systems, understand Git’s internal mechanisms, and coordinate with teams on GitHub
- Basics of Bash Command (LinkedIn Learning)
- Length: 3 hours
- What You Learn: essential Linux commands, create and manage Linux virtual machines and file systems, and use command integration
- Parallel Programming (LinkedIn Learning)
- Length: 2 hour, 10 min
- What You Learn: how to understand complex concepts and processes in Python parallel processing (or processing commands across multiple systems in order to improve performance)
- Intermediate/Advanced: Using PySpark (O’Reilly)
- Length: 2 hour, 10 min
- What You Learn: how to use the PySpark package to manage large-scale data and run HPC tasks
Geographic Information Systems (GIS) are powerful tools used for mapping and analyzing spatial data, enabling users to visualize, question, and interpret data to understand relationships, patters, and trends across geographies.
Recommended Tutorials:
- What is GIS? (LinkedIn Learning)
- Length: 1 hour
- What You Learn: fundamental concepts of GIS, where to find geospatial data, and how to integrate GIS into professional workflows
- Basics of ArcGIS Pro (LinedIn Learning)
- Length: 3 hours, 50 minutes
- What You Learn: how to manage projects and data in ArcGIS, style and label maps, build 2D and 3D visualizations, and hone additional practical skills
- Basics of QGIS (LinkedIn Learning)
- Length: 3 hours
- What You Learn: how to create detailed maps with data in QGIS, apply advanced styling techniques, and incorporate Python-driven plugins to enhance functionality
- SpatialRegression Analysis with Python (YouTube)
- Length: 3 hours
- What You Learn: how to embed spatial and geographical context in regression models and utilize spatial feature engineering to incorporate geographic data into model features
- GIS with R (LinkedIn Learning)
- Length: 2 hours, 30 minutes
- What You Learn: how to incorporate mapping fundamentals and visualizations in R, handle different GIS data formats, and develop interactive, mobile-friendly maps using R packages
- Additional Resources:
- Software GeoDa Handbook(Free)
- Case Study Using R
- Esri Official ArcGIS Tutorials
- PySAL Handbook