DEV Community

Kaira Kelvin.
Kaira Kelvin.

Posted on • Updated on

Excel for beginners

When cleaning data in Excel - (formatting and standardizing )- currency, (spaces use the Trim formulae) capital letters, dates wrong format,
looking for duplicates-
Standardizing-(uppercase, lowercase, proper)
**filter a column and edit the columns or data with errors.

  1. for instance if there are letters or acronyms in a worksheet (ctrl+H) replace the letters with their full meaning.**

In Excel, the IF function is a powerful and commonly used logical function. It allows you to perform different actions based on whether a specified condition evaluates to true or false. The basic syntax of the IF function is as follows:

excel
Adavantages of excel -

  1. spreadsheets can be really useful in the right context.

  2. you can see the data cleanly laid out in a table.

  3. its easy to determin what is and the format.

Disadvantages of excel.

  1. Its hard to reproduce state.
  2. theres no way to tell what steps have been taken on a dataset.
  3. Because of the plethora of functions it's nearly impossible to know them all.
  4. You can find yourself in analysis paralysis.
  5. spend a lot of time and effort trying to figure one thing out.
  6. it is very difficult to handle extremely large datasets.
  7. Spreadsheets have less flexibility for complicated analysis and presentation.

Copy code
=IF(logical_test, value_if_true, value_if_false)
logical_test: This is the condition you want to check. If this condition is true, the function returns the value specified in value_if_true; otherwise, it returns the value specified in value_if_false.
value_if_true: This is the value that the function returns if the logical_test is true.
value_if_false: This is the value that the function returns if the logical_test is false.
pivot tables -

this is how to create pivot tables
Adding charts to Excel from a PivotTable can be a useful way to visualize and analyze your data. Here are the steps to add a chart based on a PivotTable in Excel:

Create a PivotTable:

  1. Select the data range that you want to analyze.
  2. Go to the "Insert" tab on the ribbon.
  3. Click on "PivotTable" and choose the location where you want the PivotTable to be placed (e.g., a new worksheet).
  4. In the PivotTable Fields pane, drag and drop the fields into the Rows and Values areas to arrange and summarize your data.
  5. Create the PivotChart:

under view in excel u get- Workbookviews,show,zoom,window and marcos. under windows u get freeze panes - u can freeze panes,freeze top row ,freeze first column.
CTRL + F6 is used to switch windows.
edit a cell u press F2,

Click anywhere within the PivotTable

  1. Go to the "Insert" tab on the ribbon.
  2. Click on "PivotChart."
  3. Select the chart type you want to use (e.g., Column, Bar, Line, etc.) and click "OK."
  4. Customize the Chart:

Once the PivotChart is inserted, you can customize it further.
You can change the :

  • chart title
  • axis labels
  • legend and other elements by clicking on them and modifying the text.
  • To change the chart type, right-click on the chart and choose "Change Chart Type."

Update the PivotChart:

As you interact with the PivotTable (e.g., by changing filters or updating data), the PivotChart will automatically update to reflect those changes.
Remember that the specific steps might vary slightly depending on the version of Excel you're using, but the general process is similar. Creating a PivotChart linked to a Pivot Table is a powerful way to visualize and understand complex data sets in Excel.
The term pandas came from wesmickneey from joining two words panel and data. A data frame a feature available in the library and is defined as a two-dimensional,size mutable potentially heterogeneous tabular data structure with labeled axes (rows and column)
A data frame is a two-dimensional data structure that is data is aligned in a tabular fashion in rows and columns.

Image description
What I enjoy about using pandas and DataFrames is the flexibility of the built-in commands that are provided to you as a data analyst.

Creating our first DataFrame.

some useful commands to run in pandas,

  • pd.read_csv(‘import_filename.csv', header=1): Reads data from a CSV file directly into a pandas DataFrame

  • my_df.to_csv(‘export_filename'): Directly exports the DataFrame to a CSV file to your workstation

  • my_df.shape: Provides the number of rows and columns of your DataFrame. keep in mind .shape is used without parentheses. The correct usage is to access it as an attribute, not as a method.
    Remember the number of rows come first then followed by
    columns eg in this data from kaggle ("netflix_titles.csv") has
    8807 rows,12columns.

  • my_df.info(): Provides metadata about your DataFrame, including data types for each column.

  • my_df.describe(): Includes statistical details with a column that includes the count, mean, standard deviation (std), minimum, maximum, and percentiles (25th, 50th, and 75th) for any numeric column.

  • my_df.head(2): Displays the first two records from the DataFrame.

  • my_df.tail(2): Displays the last two records from the DataFrame

  • my_df.sort_index(1): Sorts by the labels along an axis—in this example, by the column label headers alphabetically from left to right.
    my_df.isnull(): Displays a list of all rows with a True/False indicator if any of the values by column are null.

when you want to known the column titles(headers) run this line of code
column_headers = data.columns
then print (column_headers)

more lines to keep note of

.value_counts()
This method is useful for understanding the distribution of values in a dataset, especially in categorical data, where you want to know how many times each category appears.It shows the counts of each unique value in the Series. The index represents the unique values, and the corresponding values represent the counts of each unique value in the original Series.

Advantages of seaborn.
named after Samuel norman seaborn.(sns)

  • Easy to use.
  • Works well with pandas data structures.
  • built on top of matplotlib.

A primary key is the unique value (typically an integer) used to
represent a single distinct record or tuple in each table.
A foreign key would be a field in one table that references the primary key from another.

  • To create a connection and use SQLite , we have to import a new library using the code: import sqlite,

  • next we need to assign a connection to a variable named conn and point to the location of the database file.

  • use the following function to communicate with the database:

        In[]:conn =sqlite3.connect('_____.db')
    

Pandas library includes a read_sql_query()function to make it easier to communicate with databases using SQL.
df__=pd.read_sql_query("SELECT*FROM databasename:",conn)
to view results in a DataFrame we just run the head() command against this dataframe using this code:
In[]:df
__.head()
A data warehouse which is commonly known as an enterprise Data Warehouse is a centralized location of data from multiple sources with the purpose of providing,reporting and analyzing data.

Top comments (0)