Data Scientists
Develop and implement a set of techniques or analytics applications to transform raw data into meaningful information using data-oriented programming languages and visualization software. Apply data mining, data modeling, natural language processing, and machine learning to extract and analyze information from large structured and unstructured datasets. Visualize, interpret, and report data findings. May create dynamic data reports.
33 alternate job titles:
Salary Information
Data Scientists: $112,590 (median) per year.
Starting salary (10th percentile).
Required expreience: Varies by Company
Data source: U.S. Bureau of Labor Statistics
Daily Tasks & Responsibilities
| Task Description | Category |
|---|---|
| Analyze data to inform operational decisions or activities. | Analysis |
| Analyze business or financial data. | Analysis |
| Present research results to others. | Analysis |
| Develop procedures to evaluate organizational activities. | Analysis |
| Analyze data to identify trends or relationships among variables. | Analysis |
| Analyze data to identify or resolve operational problems. | Analysis |
| Determine appropriate methods for data analysis. | General |
| Prepare data for analysis. | General |
| Apply mathematical principles or statistical approaches to solve problems in scientific or applied fields. | General |
| Prepare analytical reports. | General |
| Select resources needed to accomplish tasks. | General |
| Advise others on analytical techniques. | General |
| Write computer programming code. | General |
| Prepare graphics or other visual representations of information. | Communication |
| Update technical knowledge. | Maintenance |
| Develop scientific or mathematical models. | Development |
Technology Requirements
| Technology | Description |
|---|---|
| Amazon Web Services AWS software | Data base user interface and query software |
| Apache Hadoop | Data base management system software |
| Apache Spark | Business intelligence and data analysis software |
| Microsoft Power BI | Business intelligence and data analysis software |
| PyTorch | Data base user interface and query software |
| Structured query language SQL | Data base user interface and query software |
| Tableau | Business intelligence and data analysis software |
| Alteryx software | Business intelligence and data analysis software |
| Amazon Elastic Compute Cloud EC2 | Data base user interface and query software |
| Amazon Redshift | Data base user interface and query software |
| Apache Cassandra | Data base management system software |
| Apache Hive | Data base management system software |
| Elasticsearch | Data base management system software |
| Microsoft Access | Data base user interface and query software |
| Microsoft SQL Server | Data base user interface and query software |
| MongoDB | Data base management system software |
| NoSQL | Data base management system software |
| PostgreSQL | Object oriented data base management software |
| Teradata Database | Data base management system software |
| NumPy | Data base user interface and query software |
| pandas | Data base user interface and query software |
| Apache Pig | Data base management system software |
| BigQuery | Data base user interface and query software |
| Business intelligence software | Business intelligence and data analysis software |
| MapReduce big data software | Business intelligence and data analysis software |
| Neo4j | Data base user interface and query software |
| PySpark | Data base user interface and query software |
| Qlik Tech QlikView | Business intelligence and data analysis software |
| Reporting software | Data base reporting software |
| C | Development environment software |
| C++ | Object or component oriented development software |
| Docker | Application server software |
| Git | File versioning software |
| Microsoft Azure software | Development environment software |
| Microsoft Excel | Spreadsheet software |
| Oracle Java | Object or component oriented development software |
| Python | Object or component oriented development software |
| R | Object or component oriented development software |
| SAS | Analytical or scientific software |
| Scala | Object or component oriented development software |
| TensorFlow | Analytical or scientific software |
| The MathWorks MATLAB | Analytical or scientific software |
| Amazon Simple Storage Service S3 | Storage networking software |
| Apache Kafka | Development environment software |
| Atlassian Confluence | Project management software |
| Atlassian JIRA | Content workflow software |
| Bash | Operating system software |
| C# | Object or component oriented development software |
| GitHub | Application server software |
| Go | Development environment software |
| IBM SPSS Statistics | Analytical or scientific software |
| JavaScript | Web platform development software |
| JavaScript Object Notation JSON | Web platform development software |
| Jenkins CI | Enterprise application integration software |
| Kubernetes | Application server software |
| Linux | Operating system software |
| Microsoft Office software | Office suite software |
| Microsoft PowerPoint | Presentation software |
| Perl | Object or component oriented development software |
| Ruby | Development environment software |
| Shell script | Operating system software |
| Splunk Enterprise | Enterprise system management software |
| UNIX | Operating system software |
| Scikit-learn | Development environment software |
| Amazon Web Services AWS SageMaker | Cloud-based management software |
| Apache Airflow | Procedure management software |
| Apache MXNet | Industrial control software |
| Flask | Development environment software |
| Google Cloud software | Cloud-based management software |
| Google Looker Analytics | Analytical or scientific software |
| Julia | Development environment software |
| Jupyter software | Object or component oriented development software |
| Keras | Operating system software |
| Kubeflow | Analytical or scientific software |
| Management information systems MIS | Enterprise resource planning ERP software |
| Mathematical software | Analytical or scientific software |
| Mlflow | Analytical or scientific software |
| OpenAI ChatGPT | Development environment software |
| RESTful API | Web platform development software |
| SciPy | Object or component oriented development software |
| Shiny | Object or component oriented development software |
| StataCorp Stata | Analytical or scientific software |
| Statistical software | Analytical or scientific software |
| XGBoost | Development environment software |
| spaCy | Object or component oriented development software |
| Geographic information system GIS systems | Geographic information system |
Relevant Certifications
Industry certifications that may be valuable for this career role:
| Certification | Issuing Organization | Practice Test |
|---|---|---|
| Professional Data Engineer | Google Inc. | Not Available |
| Professional Machine Learning Engineer | Google Inc. | Not Available |
| SAP Certified Application Associate - Data Integration with SAP Data Services 4.2 | SAP America, Inc. | Not Available |