Data science according to Wikipedia is defined as
"Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data. Data science is related to data mining, machine learning and big data."
Many people have this notion that Data Science just means training Machine Learning models but it is an amalgamation of various fields. A person must have knowledge of statistics, cloud technologies, coding and databases to become a data scientist. With the increasing change in technology knowledge of DevOps for Machine Learning(known as MLOps) and AutoML is also necessary.
Kaggle the largest community of Data Scientists conducted a survey and based on that survey I am presenting the list of different tools and technologies widely used in different domains of Data Science. Feel free to add some other tools you know which are popular and not mentioned in this list.
Machine Learning Frameworks
Machine Learning is one of the core technologies associated with Data Science. Python and R are the widely used languages for ML. The most popular frameworks are based on Python namely scikit-learn, Tensorflow and PyTorch.
- Scikit-learn
- Tensorflow
- Keras
- Xgboost
- PyTorch
- LightGBM
- Caret
- Catboost
- Prophet
- Fast.ai
- Tidymodels
- H2O 3
- MXNet
- JAX
Enterprise Machine Learning Tools
Most of the beginners don't know about the tools for ML on cloud, these are some of the best enterprise ML tools by software giants like Google, Amazon and Azure.
- Amazon SageMaker
- Google Cloud Vertex AI
- Azure Machine Learning Studio
- Google Cloud Vision AI
- Google Cloud Natural Language
- Azure Cognitive Services
- Amazon Rekognition
- Google Cloud Video Ai
- Amazon Forecast
Business Intelligence Tools
Business Intelligence means analyzing data of companies and giving reports and predicting sales and markets. It is one of the most popular use cases of datascience and deals mainly with Statistics and Data Visualization. Some of the most popular Business Intelligence tools used in the industry are given below.
- Tableau
- Microsoft Power BI
- Google Data Studio
- Qlik
- Amazon QuickSight
- Salesforce
- Looker
- Alteryx
- SAP Analytics Cloud
- TIBCO Spotfire
- Sisense
- Einstein Analytics
- Domo
Databases Used
Databases form an important part of DataScience because without data there is no datascience. The different databases used by data scientists all over the world is as follow.
- MySQL
- PostgreSQL
- Microsoft SQL server
- MongoDB
- SQLite
- Google Cloud BigQuery
- Oracle Database
- Amazon Redshift
- Microsoft Azure Datalake Storage
- Amazon Athena
- Snowflake
- Amazon DynamoDB
- Microsoft Access
- IBM DB2
- Google Cloud Firestore
Automated Machine Learning (AutoML)
AutoML is one of the most promising technology of modern era which is growing at an alarming rate. Some of the famous AutoML tools are-
Top comments (1)
please add some links to your articles...