www.dataschool.io /how-to-ask-for-coding-help-online/

How to write a great Stack Overflow question (6 steps)

Kevin Markham 8-10 minutes 2021/8/23

August 23, 2021 · Python tutorial

In my 7 years of teaching data science, I've answered thousands of questions online. I know what makes a great question, because those are the questions that get my attention!

When you have a code question, Stack Overflow is an excellent place to ask. There are tons of skilled developers hungry for questions to answer so that they can earn points and build their reputation.

If you write a great question, it may be answered quickly! If you don't, it will probably be ignored.

So how do you write a great Stack Overflow question? Jake VanderPlas summarizes my thinking nicely:

Below, I'll demonstrate my step-by-step process for writing a great question which you can use any time you need to ask for coding help online!

Here's the initial question:

I'll use this pandas question as an example, which I received from one of my students:

I'm trying to tackle a dataset where each row represents a home transaction with a unique listing id. I know fillna is a column-based function that applies to all nans, but can we apply per row instead?

This dataset has town and architectural style columns, and I want to replace a nan value in architectural style with a groupby town of that row's town, and then apply the most common (highest value count) architectural style. I was going to use df.groupby('town')['architectural style'].value_counts().

I have a feeling I can do this with transform and a lambda function on the whole dataset, but how do I interact with the row, to get its town within the fillna function?

Conceptually, this is an excellent question, but it needs some work in order to be successful on Stack Overflow!

Here's my process for rewriting that question:

  1. Write a brief introduction
  2. Provide a self-contained code example
  3. Detail the expected results and why I expect those results
  4. Add any important notes
  5. Link to any relevant questions
  6. Write a title that summarizes the question

Below I'll rewrite my student's pandas question, and then I'll explain 9 lessons you can take away from this example!

Here's my rewritten question:

Title: How to fill missing values in a DataFrame with the most frequent value of each group?

I have a pandas DataFrame with two columns: toy and color. The color column includes missing values.

How do I fill the missing color values with the most frequent color for that particular toy?

Here's the code to create a sample dataset:

import pandas as pd
import numpy as np
df = pd.DataFrame({
    'toy':['car'] * 4 + ['train'] * 5 + ['ball'] * 3 + ['truck'],
    'color':['red', 'blue', 'blue', np.nan, 'green', np.nan,
             'red', 'red', np.nan, 'blue', 'red', np.nan, 'green']
    })

Here's the sample dataset:

      toy  color
0     car    red
1     car   blue
2     car   blue
3     car    NaN
4   train  green
5   train    NaN
6   train    red
7   train    red
8   train    NaN
9    ball   blue
10   ball    red
11   ball    NaN
12  truck  green

Here's the desired result:

  • Replace the first NaN with blue, since that is the most frequent color for a car.
  • Replace the second and third NaNs with red, since that is the most frequent color for a train.
  • Replace the fourth NaN with either blue or red, since they are tied for the most frequent color for a ball.

Notes about the real dataset:

  • There are many different toy types (not just four).
  • There are no toy types that only have missing values for color, so the answer does not need to handle that case.

This question is related, but it doesn't answer my question of how to use the most frequent value to fill in missing values.

Here's what makes this a good question:

  1. I summarized the entire question at the top. If someone might know the answer, I want to give them the question right away in order to motivate them to keep reading. If the question was instead buried at the bottom, the reader might just give up and move on to another question.

  2. I wrote code that can be copied, pasted, and run. Coding questions can be hard to solve in your head, and so I want to make it as easy as possible for the reader to work on this problem on their own machine. I even included the imports so that they can run the code from a new environment.

  3. I printed out the dataset. Even though the reader may end up running my code on their machine, I don't want to require them to do so in order to start thinking through a solution. This is also important because they will need something to look at when I explain the expected results. (In case you're wondering, I generated this output by running the sample code in IPython.)

  4. I used short and simple object names. This makes it easier for the reader to read the code, interact with it on their own machine, and write out an answer.

  5. I added proper formatting. This makes it easier for the reader to understand your question quickly.

  6. I provided the expected results and why I expect those results. Detailing what you expect is important because it's unlikely that your question alone will be crystal clear to every reader. Detailing why you expect it is equally important because you don't want to receive solutions that arrive at the right result but for the wrong reason.

  7. I covered many different cases with the example dataset. My dataset included a toy with one missing value, a toy with two missing values, a toy for which there was a "tie" in terms of most frequent color, and a toy with no missing values. This is important for ensuring that the solution you receive will work for all of the cases present in your real dataset.

  8. I added any special notes that wouldn't be obvious from the example dataset. In this case, I wanted to ensure that the solution "scaled" to more than four toy types, and I didn't want the reader to worry about handling an edge case that doesn't exist in the real dataset. Thus, I'm trying to increase the likelihood that I get a useful answer as well as eliminate any questions that may have come up in the reader's mind.

  9. I linked to a related question. This demonstrates that you have already searched for an answer, and it gives you a chance to make the case for why your question is not a duplicate (in which case it may be closed by a moderator). It also gives the reader a "head start" by linking to a resource that might help them to come up with an answer to your question.

Here are some additional things you might want to include:

My guiding principle is to make it easy for the reader by telling them everything they need to know and nothing else. Here are a few things that I didn't include in my question, but may be worth including in some cases:

  • An error message, if your question is about how to fix a specific error.
  • A diagram, if you think that might help to clarify your question.
  • Which software versions you are using, if you think that might be relevant.
  • Code you tried that didn't work, if you think that might help to clarify your goal.

Note: Including code that didn't work is often recommended as a way to "prove" that you have put in enough effort solving your own problem, but I don't universally recommend this since it makes your question longer without necessarily adding useful information.

Here are a few more tips:

  • Ask questions that have a correct answer. The site is not designed for opinion-based questions.
  • Focus on a single issue in your question. The site is optimized for discrete questions with discrete answers, and so questions that contain multiple issues aren't as well-received.
  • Keep revising your question until it is as clear and concise as possible. You're asking people to help you for free, so writing a high-quality question is one way of honoring the time that they are giving you.
  • If you find the answer on your own, you should post both the question and the answer. This is a great way to share your knowledge with the community, and you may even receive an answer that is better than your own!

Good luck!

If you used this guide to help you write a great Stack Overflow question, feel free to share a link in the comments below and I'll take a look! 👇

P.S. I posted my pandas question on Stack Overflow and received a helpful answer within 5 minutes! 🙌 This is by no means guaranteed, but by writing a high-quality question, you will greatly increase the likelihood of a useful response!

::...
免责声明:
当前网页内容, 由 大妈 ZoomQuiet 使用工具: ScrapBook :: Firefox Extension 人工从互联网中收集并分享;
内容版权归原作者所有;
本人对内容的有效性/合法性不承担任何强制性责任.
若有不妥, 欢迎评注提醒:

或是邮件反馈可也:
askdama[AT]googlegroups.com


点击注册~> 获得 100$ 体验券: DigitalOcean Referral Badge

订阅 substack 体验古早写作:


关注公众号, 持续获得相关各种嗯哼:
zoomquiet


自怼圈/年度番新

DU22.4
关于 ~ DebugUself with DAMA ;-)
粤ICP备18025058号-1
公安备案号: 44049002000656 ...::